ZFS Deleting a directory with an unknown number of files

I haven't read all the helpful suggestions provided, but here is a bit of general information.
There is no magic command that "just" deletes the directory, at least not without going through back-doors.
Any way you try to remove a directory would still iterate over each directory entry and remove it first, whether it's a sub-directory or a file.
Doing it in any other way would result in leaked inodes, disk space, etc.
 
If a simple "ls -f the-directory" command doesn't even complete (in reasonable time), then the first thing to do is procstat -kk of that ls command.
You need to get at least some information about what ls is doing and where.
ktrace-ing that command could also be a good idea.
 
It's impossible from external reading forum posts only to tell how long you wait until you decide a process will "never end."
In this case, I have waited hours (12 at the longest) for any kind of return from any of the commands I've tried to delete the directory contents. Were the different commands working? Probably... but as you point out, I wasn't waiting long enough. Good point!

By all my experience with FreeBSD in most cases you either get an error message, or the process will end. It may take a long time, maybe even days, but eventually it ends. Then it's either done, or you get a (late) error message. But that a process simply stucks infinite without any sign of life whatsovever under FreeBSD is a very rare exception. To be clear: I'm not talking the software you add with ports/packages. I'm talking FreeBSD.
This ain't Windows 😉
But I also know there can be circumstances, when this may not be the case, and errors are not detected and handled correctly in time, depening on many possibilities. And I don't know your directory, and how it came to it, and what else might be...

I thought about to attach an etxra SSD at my machine, and do some time measurement experiments on dirs with 5k, 10k, and 15k files, just to get some values one can extrapolate to get at least a roughly estimated idea about which times can be expected on >31M files for certain actions. By my gut feeling only I would say you have to wait several hours until something like ls can even finish on such a large amount of files - you are way beyond those "normal default" 10k ralphbsz mentioned. But that doesn't mean the OS couldn't handle it at all.
Anyway one cannot expect the reactions of the OS with 31M files is as fast as with directories containing <=10k files, simply because it's a lot more to be handled. Even the fastest hardware working with lightspeed needs time.

However,
I'm also thinking practical, which means:
If there is no valuable data needed to be rescued, what is the easiest, quickest solution?
Of course, you already got to it on your own: copy the valueable data, wipe the crap clean, and start all over on a clean drive again. I would do it the exact same way.

Point is there are lessons to be learned from that (I tell this, because this is an open forum anybody on the internet can read here, so don't take that personal on you):

Maybe you kind of "heired" this directory. But if you "produced" it by yourself, there had been some error in testings. If for example some one wants to log data on files, after a while one looks into the directory just to check the shit works as intended. In this case it should have been attracted attention:"Almost 400 files within 3 minutes! crap. Something must went wrong." So one had to check if either the amount of files produced per minute was what was intended, or to think of an routine that limits the number of files, and automatically deleted all old ones.
On the other hand there are cases such an amount of data really needs to be saved, e.g. some measurement of a technical device, physical experiment, data collecting buoy in the ocean,...whatever.
Then there is to think of how to organize this data, since 31M files is nothing any human will ever analyse by hand, but being processed by computers.
And even if there is for whatever reasons no other way to place it all into single files, then those files for sure don't get random names, but be named by some kind of a comprehandable system, and better be distributed into directories, also named senseful...because random file names on 31M files is garbage, no matter what they contain.
🧐🤓🥸😎:beer:😂
Yes, many good take-aways from this experience. This problem was created by exporting and serializing sets of data from a database with a tee-like process: data was going one way for a separate process, and also being serialized at the same time. Whoops. I'm just glad I didn't do this on my primary drive :D.

In any case, I appreciate your thoughtful help! :beer::beer:Cheers and have a great day wherever you are!
 
Exactly which dataset is that directory in?

What's the full filepath?

Who owns that directory? ( ls -ld /full/file/path/ocd-data)

My thinking goes, this ocd-data/ directory probably lives in a dataset where the ZFS settings add up in a way that prevent manual deletion, even by root. I mean, root can't exactly delete stuff like /dev/drm/0 because of the way settings add up to prevent deletion of stuff like block devices, even though in UNIX, absolutely everything is represented as a file, even pipes and block devices.
 
find .... | xargs rm

This and similar combos are going to fail if any file name contains new line characters. Using -print0/ -0 is better.

find ... | while read ... suffers from that, and also backslash is interpreted. read -r mitigates the latter.
 
This and similar combos are going to fail if any file name contains new line characters. Using -print0/ -0 is better.

find ... | while read ... suffers from that, and also backslash is interpreted. read -r mitigates the latter.
Why would specifically a filename contain a newline character? I'd say it would be a pretty safe assumption that they don't, unless there's a good reason to include it.
 
You mostly need the -print0 construct for filenames with spaces, as otherwise word splitting will strike (by default, you can change IFS).

And yes, filenames can contain newlines.
 
In general they might :cool: I am not entering the philosophical territory to question whether creating such file names is or isn't a dumb idea. Slash and \0 are forbidden and all other codes – a new line included – are permitted.

Code:
matlib@freebsd14:/tmp$ perl -e 'open $f,">","test\ntest"'
matlib@freebsd14:/tmp$ ls -l
total 0
-rw-r--r--  1 matlib wheel 0 Sep 21 18:49 test?test
matlib@freebsd14:/tmp$ find . -type f | xargs rm -v
rm: ./test: No such file or directory
rm: test: No such file or directory
matlib@freebsd14:/tmp$ find . -type f -print0 | xargs -0 rm -v
./test
test
matlib@freebsd14:/tmp$ ls -l
total 0

Depending on shell > 'test
test'
or > test$'\n'test may also work instead of perl.
 
That doesn't really answer my questions... Even if now I know that a newline is a valid character in a filename, why would anyone want to include it in a filename? Esp. when there's plenty of reasons not to?

I'm looking for a technical explanation, not a philosophical debate.
 
That doesn't really answer my questions... Even if now I know that a newline is a valid character in a filename, why would anyone want to include it in a filename? Esp. when there's plenty of reasons not to?

I'm looking for a technical explanation, not a philosophical debate.

Why would anyone put a space in a filename? The FreeBSD tree has one since a few weeks ago.
 
Why would anyone put a space in a filename? The FreeBSD tree has one since a few weeks ago.
NTFS (ooh, ooh, Microsoft, bad! Let's get ready to gag and desinfect our keyboards!) beat FreeBSD to the punch by over 20 years in this case.

And, I'd say that spaces are easier to deal with programmatically than accidental newlines. sed and perl can definitely do it.

This does have me thinking that there's no technical merit to [allowing as valid] a newline character in a filename in any filesystem, be it ZFS or UFS. Please prove me wrong and give me links to good technical papers that show otherwise.
 
...and by the way, if you want to be known as a joker at your workplace, then create a directory with new line as the last character and some more under it, for example:

Code:
perl -'emkdir"test\n$_"foreach("","/bin","/etc","/usr","/sbin","/lib")'

Now find gives the following output:

Code:
$ find /tmp/matlib -type d
/tmp/matlib
/tmp/matlib/test

/tmp/matlib/test
/bin
/tmp/matlib/test
/etc
/tmp/matlib/test
/usr
/tmp/matlib/test
/sbin
/tmp/matlib/test
/lib

And just wait for the global cron job that cleans up unused users' directories.
 
What I'd try in such a situation would be, if the filesystem is ZFS and having a plenty of snapshots, destroy now-unneeded and large snapshot to allow ZFS to work faster, then, run any of below.
  • find -X -type f the-directory-to-be-deleted -name \* | xargs rm -i
  • find -X -type f the-directory-to-be-deleted -name \* -exec rm {} +
The former should avoid too long command line for rm, unless the pathes and / or filenames themselves are already too long.
The latter lest find(1) itself to call rm instead of xargs(1). See each manpages for details.

Maybe I'm too paranoid, but to avoid missingly delete special files.
Subdiectories should be deleted later, after confirming no dangerous-to-delete things are remaining.
 
  • find -X -type f the-directory-to-be-deleted -name \* | xargs rm -i
  • find -X -type f the-directory-to-be-deleted -name \* -exec rm {} +
I always liked 2nd solution (with -exec rm) better, in my experience it works faster than | xargs rm.

But I never had case like OP CanOfBees has, so I stayed quiet and let professionals come up with solution.
 
rm -vxw ocd-data/ would sit and sit and sit and never print a thing
This would have produced an immediate error message & program exited since the -w option doesn't exist for rm. So you likely made a mistake in transcribing here what you ran.

You should have typed "rm -rxv ocd-data" -- that -r says to apply this recursively, -v to display each file/dir as it is deleted, -x to not cross mount points. "man rm" to understand what it does.

If you want to try debug this and you still have all this around, please *ignore* all other helpful(!) messages and try this:

1. rm -rxv ocd-data # run this as super-user
2. *If* this doesn't produce any output and doesn't terminate, hit ^T to see what is going on. Cut-and-paste here what ^T outputs.
3a. If it is not making any progres, ^C the command.
3b. if it is making progress, try to time it for a minute or so and see how many files/dirs are being deleted & report here. Ignore the steps below.
4. ktrace -di rm -rxv ocd-data # run it under ktrace
5. Wait for 10 seconds and then hit ^C.
6. kdump | egrep 'NAMI|CALL' > kd.out
7. Cut-n-paste here the output of "tail -50 kd.out"


This will tell me/us what were the last 50 or so syscalls made by rm. May be that will give us a hint as to what to look at next.

If we know what is actually going wrong, we can take the guesswork out of trying to help you and suggest something specific. So for any non-trivial bug you have first try to find the root cause while minimizing other changes.
 
I always liked 2nd solution (with -exec rm) better, in my experience it works faster than | xargs rm.

Hm, that should be unlikely. With -exec rm you fork and execute the rm binary for every single file, shared library dynamic linking and all. With xarg you only fork and exec once per xarg group.

Edge cases exist. If the is only one file to delete than -exec saves you the fork'n'exec of xargs.
 
Hm, that should be unlikely. With -exec rm you fork and execute the rm binary for every single file, shared library dynamic linking and all. With xarg you only fork and exec once per xarg group.

Edge cases exist. If the is only one file to delete than -exec saves you the fork'n'exec of xargs.
Thanks for the info, this totally makes sense 👍 I do love using xargs for all kind of things, but what I said was based on impression, I never did benchmark on one or the other.

As I said:
But I never had case like OP CanOfBees has, so I stayed quiet and let professionals come up with solution.
 
This would have produced an immediate error message & program exited since the -w option doesn't exist for rm. So you likely made a mistake in transcribing here what you ran.

You should have typed "rm -rxv ocd-data" -- that -r says to apply this recursively, -v to display each file/dir as it is deleted, -x to not cross mount points. "man rm" to understand what it does.

If you want to try debug this and you still have all this around, please *ignore* all other helpful(!) messages and try this:

1. rm -rxv ocd-data # run this as super-user
2. *If* this doesn't produce any output and doesn't terminate, hit ^T to see what is going on. Cut-and-paste here what ^T outputs.
3a. If it is not making any progres, ^C the command.
3b. if it is making progress, try to time it for a minute or so and see how many files/dirs are being deleted & report here. Ignore the steps below.
4. ktrace -di rm -rxv ocd-data # run it under ktrace
5. Wait for 10 seconds and then hit ^C.
6. kdump | egrep 'NAMI|CALL' > kd.out
7. Cut-n-paste here the output of "tail -50 kd.out"


This will tell me/us what were the last 50 or so syscalls made by rm. May be that will give us a hint as to what to look at next.

If we know what is actually going wrong, we can take the guesswork out of trying to help you and suggest something specific. So for any non-trivial bug you have first try to find the root cause while minimizing other changes.
Hey bakul - I've wanted to follow up with you on this, but haven't had a chance until the last few days.

Firstly, yes - I think the reply you quoted was typed while I was on mobile; definitely *not* the command I've used.

Secondly, I think I mentioned this somewhere in this thread, but it certainly looks like rm builds up some kind of list before deletion. Here's the results of your request:

Close-to-original data mess:

Code:
bridger@dustbin|~
) ls -ldT /astral/ocd-data
drwxr-xr-x  2 bridger bridger 25876011 Dec  6 09:12:39 2025 /astral/ocd-data/

Steps:

1. rm -rxv ocd-data
1a. no output, so here's the result of ^T
Code:
root@dustbin:/astral # rm -rxv ocd-data/
load: 0.42  cmd: rm 3957 [zio->io_cv] 14.17r 0.00u 0.03s 0% 1748k
load: 0.39  cmd: rm 3957 [zio->io_cv] 21.32r 0.00u 0.03s 0% 1752k
load: 0.36  cmd: rm 3957 [zio->io_cv] 56.47r 0.00u 0.03s 0% 1764k
load: 0.30  cmd: rm 3957 [zio->io_cv] 81.54r 0.00u 0.04s 0% 1860k
load: 0.25  cmd: rm 3957 [zio->io_cv] 88.95r 0.00u 0.05s 0% 1920k
load: 0.20  cmd: rm 3957 [zio->io_cv] 107.16r 0.00u 0.06s 0% 2136k
load: 0.18  cmd: rm 3957 [zio->io_cv] 109.87r 0.00u 0.06s 0% 2168k
load: 0.17  cmd: rm 3957 [zio->io_cv] 112.54r 0.00u 0.07s 0% 2196k
load: 0.23  cmd: rm 3957 [zio->io_cv] 141.32r 0.00u 0.10s 0% 2552k
load: 0.20  cmd: rm 3957 [zio->io_cv] 148.67r 0.00u 0.10s 0% 2640k
load: 0.18  cmd: rm 3957 [zfsvfs->z_hold_mtx[i]] 151.49r 0.00u 0.11s 0% 2684k
load: 0.12  cmd: rm 3957 [zio->io_cv] 176.63r 0.00u 0.16s 0% 3028k
load: 0.27  cmd: rm 3957 [zio->io_cv] 180.68r 0.00u 0.17s 0% 3080k
load: 0.27  cmd: rm 3957 [zio->io_cv] 181.51r 0.00u 0.17s 0% 3096k
No progress, so I went on to ktrace.

Code:
root@dustbin:/astral # ktrace -di rm -rxv ocd-data 
^C
root@dustbin:/astral # kdump | egrep 'NAMI|CALL' > kd.out
root@dustbin:/astral # tail -n 50 kd.out 
  4002 rm       NAMI  "dfefcca9-8480985b11e.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c44f8,0x40360a0c4418,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "d26af937-21738624b070.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c46b8,0x40360a0c45d8,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "3d777aee-20489071939a.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c4878,0x40360a0c4798,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "be94826a-1130420885db.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c4a38,0x40360a0c4958,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "011e2d64-16525387a13f.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c4bf8,0x40360a0c4b18,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "bd29dccd-23649233a0e6.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c4db8,0x40360a0c4cd8,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "9611da13-24355582995a.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c4f78,0x40360a0c4e98,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "1d70b28d-474069a94.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c5138,0x40360a0c5058,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "7201eb13-12023543b224.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c52f8,0x40360a0c5218,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "ab7b25bf-2344439b7a0.xml"
  4002 rm       CALL  getdirentries(0x4,0x403609847000,0x1000,0x403609832088)
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c54b8,0x40360a0c53d8,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "4e4b9066-11296873a87c.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c5678,0x40360a0c5598,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "fba152cc-45994909d66.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c5838,0x40360a0c5758,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "354da4db-5159158af49.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c59f8,0x40360a0c5918,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "9feb0eaf-3295538285.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c5bb8,0x40360a0c5ad8,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "09d421d3-15461647aa44.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c5d78,0x40360a0c5c98,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "4ea6f510-16603283a484.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c5f38,0x40360a0c5e58,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "a2281067-14767824bdf3.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c60f8,0x40360a0c6018,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "081a8fed-177586999479.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c62b8,0x40360a0c61d8,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "7ad736e0-77642499526.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c6478,0x40360a0c6398,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "469ea48a-118405348214.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c6638,0x40360a0c6558,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "17cbd6c8-13161274ae36.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c67f8,0x40360a0c6718,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "51b007df-205523779ec5.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c69b8,0x40360a0c68d8,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "b969746e-84125918ec1.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c6b78,0x40360a0c6a98,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "93464487-4547494a1df.xml"
  4002 rm       CALL  fstatat(AT_FDCWD,0x40360a0c6d38,0x40360a0c6c58,0x200<AT_SYMLINK_NOFOLLOW>)
  4002 rm       NAMI  "2956f52b-23392679a98e.xml"

As of right now, no files have been deleted (2956f52b-23392679a98e.xml still exists).

I think revisiting the ideas outlined earlier in the thread is pretty important. For example, leverage ZFS and tee to a dataset that can be dropped, or just pay better attention to the tee itself.

Anyhow, maybe you see some other option here - I'm curious to hear your thoughts - but I don't know that there's any solutions other than 1) waiting a *really* long time, or 2) reformatting the drive (which worked in my case).

Have a good one!
Best wishes
 
Unzip the attached zip file and compile it: unzip ftw.zip; cc -o ftw ftw.c Give it the dir. name such as "ocd-data" and it will spit out a rm command for each file and directory as they are visited (a dir is listed *after* all of its entries are). Check a few and ^C it. If this is what you want, pipe it to sh. Example: ./ftw ocd-data | sh -x -- hopefully this will catch them all. But make *very sure* that this is what you want. You should read ftw.c and understand it as well.
 

Attachments

Unzip the attached zip file and compile it: unzip ftw.zip; cc -o ftw ftw.c Give it the dir. name such as "ocd-data" and it will spit out a rm command for each file and directory as they are visited (a dir is listed *after* all of its entries are). Check a few and ^C it. If this is what you want, pipe it to sh. Example: ./ftw ocd-data | sh -x -- hopefully this will catch them all. But make *very sure* that this is what you want. You should read ftw.c and understand it as well.
That's an interesting piece of code - thank you for writing and sharing!

After waiting a bit, this is what I had out of it:

Code:
) ./ftw /astral/ocd-data
load: 0.19  cmd: ftw 7846 [zio->io_cv] 28.38r 0.00u 0.04s 0% 1872k
load: 0.07  cmd: ftw 7846 [zio->io_cv] 438.89r 0.02u 0.74s 0% 11120k
^C

So, again, maybe if I had 250K files in a directory, or 500K, that would be a more reasonable number for command line tools to work around. Since I made an extraordinarily silly mistake, I'm in the (unenviable) position of needing to take extraordinary steps to address the outcome; ie back to reformatting the drive, etc.
Again, in any case, man - wow! thanks for thinking of that and sharing it! I'm very grateful!
 
Looks like even nftw(3) does far too much! It seems to want to read the whole damn directory before invoking any callback functions. 🤬🤬🤬

Ok, so we go down one more level! Try the following (now called walk.c). It should start producing output *right away*! Only lightly tested but I think it is right. If it doesn't produce any output even for a few seconds, use ktrace -di on as before and copy last few lines here.
 

Attachments

Back
Top