ZFS Deleting a directory with an unknown number of files

I haven't read all the helpful suggestions provided, but here is a bit of general information.
There is no magic command that "just" deletes the directory, at least not without going through back-doors.
Any way you try to remove a directory would still iterate over each directory entry and remove it first, whether it's a sub-directory or a file.
Doing it in any other way would result in leaked inodes, disk space, etc.
 
If a simple "ls -f the-directory" command doesn't even complete (in reasonable time), then the first thing to do is procstat -kk of that ls command.
You need to get at least some information about what ls is doing and where.
ktrace-ing that command could also be a good idea.
 
It's impossible from external reading forum posts only to tell how long you wait until you decide a process will "never end."
In this case, I have waited hours (12 at the longest) for any kind of return from any of the commands I've tried to delete the directory contents. Were the different commands working? Probably... but as you point out, I wasn't waiting long enough. Good point!

By all my experience with FreeBSD in most cases you either get an error message, or the process will end. It may take a long time, maybe even days, but eventually it ends. Then it's either done, or you get a (late) error message. But that a process simply stucks infinite without any sign of life whatsovever under FreeBSD is a very rare exception. To be clear: I'm not talking the software you add with ports/packages. I'm talking FreeBSD.
This ain't Windows 😉
But I also know there can be circumstances, when this may not be the case, and errors are not detected and handled correctly in time, depening on many possibilities. And I don't know your directory, and how it came to it, and what else might be...

I thought about to attach an etxra SSD at my machine, and do some time measurement experiments on dirs with 5k, 10k, and 15k files, just to get some values one can extrapolate to get at least a roughly estimated idea about which times can be expected on >31M files for certain actions. By my gut feeling only I would say you have to wait several hours until something like ls can even finish on such a large amount of files - you are way beyond those "normal default" 10k ralphbsz mentioned. But that doesn't mean the OS couldn't handle it at all.
Anyway one cannot expect the reactions of the OS with 31M files is as fast as with directories containing <=10k files, simply because it's a lot more to be handled. Even the fastest hardware working with lightspeed needs time.

However,
I'm also thinking practical, which means:
If there is no valuable data needed to be rescued, what is the easiest, quickest solution?
Of course, you already got to it on your own: copy the valueable data, wipe the crap clean, and start all over on a clean drive again. I would do it the exact same way.

Point is there are lessons to be learned from that (I tell this, because this is an open forum anybody on the internet can read here, so don't take that personal on you):

Maybe you kind of "heired" this directory. But if you "produced" it by yourself, there had been some error in testings. If for example some one wants to log data on files, after a while one looks into the directory just to check the shit works as intended. In this case it should have been attracted attention:"Almost 400 files within 3 minutes! crap. Something must went wrong." So one had to check if either the amount of files produced per minute was what was intended, or to think of an routine that limits the number of files, and automatically deleted all old ones.
On the other hand there are cases such an amount of data really needs to be saved, e.g. some measurement of a technical device, physical experiment, data collecting buoy in the ocean,...whatever.
Then there is to think of how to organize this data, since 31M files is nothing any human will ever analyse by hand, but being processed by computers.
And even if there is for whatever reasons no other way to place it all into single files, then those files for sure don't get random names, but be named by some kind of a comprehandable system, and better be distributed into directories, also named senseful...because random file names on 31M files is garbage, no matter what they contain.
🧐🤓🥸😎:beer:😂
Yes, many good take-aways from this experience. This problem was created by exporting and serializing sets of data from a database with a tee-like process: data was going one way for a separate process, and also being serialized at the same time. Whoops. I'm just glad I didn't do this on my primary drive :D.

In any case, I appreciate your thoughtful help! :beer::beer:Cheers and have a great day wherever you are!
 
Exactly which dataset is that directory in?

What's the full filepath?

Who owns that directory? ( ls -ld /full/file/path/ocd-data)

My thinking goes, this ocd-data/ directory probably lives in a dataset where the ZFS settings add up in a way that prevent manual deletion, even by root. I mean, root can't exactly delete stuff like /dev/drm/0 because of the way settings add up to prevent deletion of stuff like block devices, even though in UNIX, absolutely everything is represented as a file, even pipes and block devices.
 
This and similar combos are going to fail if any file name contains new line characters. Using -print0/ -0 is better.

find ... | while read ... suffers from that, and also backslash is interpreted. read -r mitigates the latter.
Why would specifically a filename contain a newline character? I'd say it would be a pretty safe assumption that they don't, unless there's a good reason to include it.
 
You mostly need the -print0 construct for filenames with spaces, as otherwise word splitting will strike (by default, you can change IFS).

And yes, filenames can contain newlines.
 
In general they might :cool: I am not entering the philosophical territory to question whether creating such file names is or isn't a dumb idea. Slash and \0 are forbidden and all other codes – a new line included – are permitted.

Code:
matlib@freebsd14:/tmp$ perl -e 'open $f,">","test\ntest"'
matlib@freebsd14:/tmp$ ls -l
total 0
-rw-r--r--  1 matlib wheel 0 Sep 21 18:49 test?test
matlib@freebsd14:/tmp$ find . -type f | xargs rm -v
rm: ./test: No such file or directory
rm: test: No such file or directory
matlib@freebsd14:/tmp$ find . -type f -print0 | xargs -0 rm -v
./test
test
matlib@freebsd14:/tmp$ ls -l
total 0

Depending on shell > 'test
test'
or > test$'\n'test may also work instead of perl.
 
That doesn't really answer my questions... Even if now I know that a newline is a valid character in a filename, why would anyone want to include it in a filename? Esp. when there's plenty of reasons not to?

I'm looking for a technical explanation, not a philosophical debate.
 
That doesn't really answer my questions... Even if now I know that a newline is a valid character in a filename, why would anyone want to include it in a filename? Esp. when there's plenty of reasons not to?

I'm looking for a technical explanation, not a philosophical debate.

Why would anyone put a space in a filename? The FreeBSD tree has one since a few weeks ago.
 
Why would anyone put a space in a filename? The FreeBSD tree has one since a few weeks ago.
NTFS (ooh, ooh, Microsoft, bad! Let's get ready to gag and desinfect our keyboards!) beat FreeBSD to the punch by over 20 years in this case.

And, I'd say that spaces are easier to deal with programmatically than accidental newlines. sed and perl can definitely do it.

This does have me thinking that there's no technical merit to [allowing as valid] a newline character in a filename in any filesystem, be it ZFS or UFS. Please prove me wrong and give me links to good technical papers that show otherwise.
 
...and by the way, if you want to be known as a joker at your workplace, then create a directory with new line as the last character and some more under it, for example:

Code:
perl -'emkdir"test\n$_"foreach("","/bin","/etc","/usr","/sbin","/lib")'

Now find gives the following output:

Code:
$ find /tmp/matlib -type d
/tmp/matlib
/tmp/matlib/test

/tmp/matlib/test
/bin
/tmp/matlib/test
/etc
/tmp/matlib/test
/usr
/tmp/matlib/test
/sbin
/tmp/matlib/test
/lib

And just wait for the global cron job that cleans up unused users' directories.
 
What I'd try in such a situation would be, if the filesystem is ZFS and having a plenty of snapshots, destroy now-unneeded and large snapshot to allow ZFS to work faster, then, run any of below.
  • find -X -type f the-directory-to-be-deleted -name \* | xargs rm -i
  • find -X -type f the-directory-to-be-deleted -name \* -exec rm {} +
The former should avoid too long command line for rm, unless the pathes and / or filenames themselves are already too long.
The latter lest find(1) itself to call rm instead of xargs(1). See each manpages for details.

Maybe I'm too paranoid, but to avoid missingly delete special files.
Subdiectories should be deleted later, after confirming no dangerous-to-delete things are remaining.
 
  • find -X -type f the-directory-to-be-deleted -name \* | xargs rm -i
  • find -X -type f the-directory-to-be-deleted -name \* -exec rm {} +
I always liked 2nd solution (with -exec rm) better, in my experience it works faster than | xargs rm.

But I never had case like OP CanOfBees has, so I stayed quiet and let professionals come up with solution.
 
rm -vxw ocd-data/ would sit and sit and sit and never print a thing
This would have produced an immediate error message & program exited since the -w option doesn't exist for rm. So you likely made a mistake in transcribing here what you ran.

You should have typed "rm -rxv ocd-data" -- that -r says to apply this recursively, -v to display each file/dir as it is deleted, -x to not cross mount points. "man rm" to understand what it does.

If you want to try debug this and you still have all this around, please *ignore* all other helpful(!) messages and try this:

1. rm -rxv ocd-data # run this as super-user
2. *If* this doesn't produce any output and doesn't terminate, hit ^T to see what is going on. Cut-and-paste here what ^T outputs.
3a. If it is not making any progres, ^C the command.
3b. if it is making progress, try to time it for a minute or so and see how many files/dirs are being deleted & report here. Ignore the steps below.
4. ktrace -di rm -rxv ocd-data # run it under ktrace
5. Wait for 10 seconds and then hit ^C.
6. kdump | egrep 'NAMI|CALL' > kd.out
7. Cut-n-paste here the output of "tail -50 kd.out"


This will tell me/us what were the last 50 or so syscalls made by rm. May be that will give us a hint as to what to look at next.

If we know what is actually going wrong, we can take the guesswork out of trying to help you and suggest something specific. So for any non-trivial bug you have first try to find the root cause while minimizing other changes.
 
I always liked 2nd solution (with -exec rm) better, in my experience it works faster than | xargs rm.

Hm, that should be unlikely. With -exec rm you fork and execute the rm binary for every single file, shared library dynamic linking and all. With xarg you only fork and exec once per xarg group.

Edge cases exist. If the is only one file to delete than -exec saves you the fork'n'exec of xargs.
 
Hm, that should be unlikely. With -exec rm you fork and execute the rm binary for every single file, shared library dynamic linking and all. With xarg you only fork and exec once per xarg group.

Edge cases exist. If the is only one file to delete than -exec saves you the fork'n'exec of xargs.
Thanks for the info, this totally makes sense 👍 I do love using xargs for all kind of things, but what I said was based on impression, I never did benchmark on one or the other.

As I said:
But I never had case like OP CanOfBees has, so I stayed quiet and let professionals come up with solution.
 
Back
Top