system bogged down when copying deeply nested directories

I am taking data files that are buried, for example:

Code:
/usr/local/data/static/i/7/9/e/d/1/9/7/9/3/1/5/f/8/7/b/9/d/a/6/e/c/e/8/6/d/4/d/a/3/7/f/e/7/7/0/3/9/0/0/7/79ed1979315f87b9da6ece86d4da37fe77039007

And I am "flattening" them out to (e.g.):

Code:
/mnt/mirror0/data/static/79ed1979315f87b9da6ece86d4da37fe77039007

There are 27,000 such copies I'd like to perform.

The only problem is that no matter what method I seem to try to copy them over, the machine gets bogged down. At first, I thought it was because /mnt/mirror0 was ZFS, but happens no matter what (even tried gmirror and striped).

The only thing at this point that the copies have had in common is that they've all be done in a shell environment - either using utilities (find, etc) or having a big shell script with 27,000 explicit [cmd=]cp[/cmd] commands (DB generated, mainly to troubleshoot).

9.1-RELEASE FreeBSD 9.1-RELEASE #0 (GENERIC + "options KVA_PAGES=512"), built with ZFS in mind originally.

At this point, I don't really know what system resources to check or monitor to see what is causing the slowdown. I'm ready for some suggestions.

Thank you,
Brett :stud
 
Perhaps it would help, not to actually copy the files in the first place, but to make hard links at a flattened new location, somehow along the following:

# mkdir /usr/local/data/flat_static
# cd /usr/local/data/flat_static

Code:
#!/bin/sh

# converted from the 27.000 explicit cp commands to 27.000 ln commands

ln /usr/local/data/static/i/7/9/e/d/1/9/7/9/3/1/5/f/8/7/b/9/d/a/6/e/c/e/8/6/d/4/d/a/3/7/f/e/7/7/0/3/9/0/0/7/79ed1979315f87b9da6ece86d4da37fe77039007
ln /usr/local/data/static/j/8/0/e/d/1/9/7/9/3/1/5/f/8/7/b/9/d/a/6/e/c/e/8/6/d/4/d/a/3/7/f/e/7/7/0/3/9/0/0/7/80ed1979315f87b9da6ece86d4da37fe77039007
...

Finally, you would copy over the flatened directory to the other volume:

# cp -pR /usr/local/data/flat_static /mnt/mirror0/data/static

And eventually delete (or not) one or both of the original directories.
 
So, does that imply you are trying to build a single directory with 27000 files in it? My guess is that will be a mess, and very slow to search (and probably to append a new file as well).
This thread may have useful information for you.
 
Thank you, both. I'll investigate both avenues.

I thought that a more hierarchical approach would work better, which is why I switched. But I think I went from one extreme to another - from all files in a single directory to a maximally high tree of directories containing one file each.

Brett
 
Are you using ZFS together with some other FS?
When I last copied a larger set of data from a USB disk I could see ZFS (Arc) and UFS (inactive memory) fighting for cache memory, with ZFS loosing. That can seriously harm performance. In that case setting a minimum arc size might help.
 
Crivens said:
Are you using ZFS together with some other FS?
When I last copied a larger set of data from a USB disk I could see ZFS (Arc) and UFS (inactive memory) fighting for cache memory, with ZFS loosing. That can seriously harm performance. In that case setting a minimum arc size might help.

I was thinking it was the interaction before I dumped ZFS. I tried gmirror, then striped, then back to gmirror. Both approaches cause the slowdown, though I have not monitored system resource statistics. I was using zfs-status when ZFS was employed; I suppose I can use the same to mine out the relavent system stats for that as well.

The only time it really worked was when I had the disks added a regular UFS. The "flattening" was done using a shell script that iterated over the files returned by 'find'.

Thank you,
Brett
 
One more thing comes to mind. You are also limited by coupling the meta data accesses of both file systems, one FS finds a file, reads that file, then the other FS has to create one and write it. Repeat untill done.

What happens when you try to apply threading to that, run 2 or more instances of your operation on disjunct input sets so they can interleave in the read-write cycles?
 
Back
Top