ZFS Forcebly remove files/directories

After power outage i have some corrupt directories on zfs.
rm -vfR
returns
Directory not empty.

zpool status -x
all pools are healthy
 
Code:
zdb --checksum tank

Does that come up with anything related?
I did this. It ended with,
Code:
leaked space: vdev 2, offset 0x2d3fe18000, size 4096
space_map_iterate(sm, space_map_length(sm), iterate_through_spacemap_logs_cb, &uic) == 0 (0x61 == 0)
ASSERT at cmd/zdb/zdb.c:460:iterate_through_spacemap_logs()zsh: abort (core dumped)  zdb --checksum ZT
Big question, what to do i now.
 
Code:
chflags -Rv noschg *
rm -vfR *           
zsh: sure you want to delete all 2 files in /test [yn]? y
rm: test1/Tools/scripts: Directory not empty
rm: test1/Tools: Directory not empty
rm: test1/audio/mous: Directory not empty
rm: test1/audio: Directory not empty
rm: test1: Directory not empty
rm: test3/4: Directory not empty
rm: test3/c/8: Directory not empty
rm: test3/c: Directory not empty
rm: test3: Directory not empty
 
Code:
/bin/ls -al
ls: Makefile: No such file or directory
total 9
drwxr-xr-x  2 root  wheel  3 10 dec. 22:14 .
drwxr-xr-x  3 root  wheel  3 10 dec. 22:14 ..
 
Code:
chflags -Rv noschg *
rm -vfR *          
zsh: sure you want to delete all 2 files in /test [yn]? y
rm: test1/Tools/scripts: Directory not empty
rm: test1/Tools: Directory not empty
rm: test1/audio/mous: Directory not empty
rm: test1/audio: Directory not empty
rm: test1: Directory not empty
rm: test3/4: Directory not empty
rm: test3/c/8: Directory not empty
rm: test3/c: Directory not empty
rm: test3: Directory not empty
I presume when you issued those two commands, you were in the directory /test, which is the parent directory of things like /test/test1/ and so on, right?

I have no idea why zsh asks us whether we want to delete two files. To begin with, you are running the rm program here. What does zsh have to do with it? Second, you are running rm with the -f flag, and with that flag on, rm will not ask us for confirmation. If you had used the -1 flag, I would have expected it to say "remove test1?".

Could it be that you are not really running the rm program? Are there interestingly complex aliases involved? Suggestion: Repeat this operation, but use /bin/rm as the program, and run it with "/bin/rm -iR .*", which will remove all files in the directory (including ones whose names begin with .), and it will ask for confirmation for each file, which will also give us a convenient list of files that exist. The reason to use /bin/rm is to make sure we are really using the system's rm program, not some alias or script.

Another question: What will ls show in this directory? We'll get to more details of that question below.

Code:
/bin/ls -al
ls: Makefile: No such file or directory
total 9
drwxr-xr-x  2 root  wheel  3 10 dec. 22:14 .
drwxr-xr-x  3 root  wheel  3 10 dec. 22:14 ..
Again, I presume you are in the directory with things that can't get deleted. And this output is actually interesting. To begin with, you used the -l switch on ls, which is good and bad. It's good, because it will tell us the attributes of the things in the directory. It is bad, because it forces ls to look up the attributes of things in this directory, which opens interesting new failure modes.

What does ls tell us? First, that the total size of all the things in this directory is 9 blocks (each block is a KiB). That is weird, because I only see two things, namely . and .., and those are all very small (a few bytes each, that's the common size for a directory in ZFS). Even if we round those up to a whole block, where are the other 7 KiB come from?

Where it gets REALLY interesting is the error message. ls is telling us that it tried to do some operation on a thing called Makefile, but that thing does not exist. Why would ls look at a thing that does not exist? Two options. First: ls is in reality an alias or a script (not the real ls program). We can get around that by running /bin/ls, just like above. Second option: The current directory has a directory entry called Makefile, but that directory entry is wrong and does not really exist. The easiest way to verify that would be to run "ls" without any arguments; then ls will just run the readdir() system call, and report what that returned, without trying to use stat() system calls on each entry in the directory. We can then compare the output of that with what ls -l will say.
 
Code:
/bin/rm -vfR *
rm: test1/Tools/scripts: Directory not empty
rm: test1/Tools: Directory not empty
rm: test1/audio/mous: Directory not empty
rm: test1/audio: Directory not empty
rm: test1: Directory not empty
rm: test3/4: Directory not empty
rm: test3/c/8: Directory not empty
rm: test3/c: Directory not empty
rm: test3: Directory not empty
HOST:root: /jails/a/test #

cd test1/audio/mous
/bin/ls
Code:
Makefile

/bin/ls -al
Code:
/bin/ls -al
ls: Makefile: No such file or directory
total 9
drwxr-xr-x  2 root  wheel  3 10 dec. 22:14 .
drwxr-xr-x  3 root  wheel  3 10 dec. 22:14 ..

find .
Code:
.
./Makefile

/bin/rm ./Makefile
Code:
rm: ./Makefile: No such file or directory


zpool status -v
zpool status -x
shows no problems
 
Assuming all these were done in the same directory, this tells us a few things, all bad. First, when rm iterates over the directory, it gets entries for test1 and test3, which are subdirectories that are not empty, but have things in them that can't be deleted. When ls iterates over the same place, it does NOT find test1 and test3. It instead finds Makefile. The fact that different ways of iterating give different results mean that something is screwed up.

Second, the directory entry Makefile does not exist. That is pretty much a guarantee of file system corruption: Something is in a directory (as per readdir() result), but it is not.

It could still be possible to cause these results, for example using bizarre unicode conversion, or linking, or file attributes, but that would require highly unusual configuration.
 
Don't you have any cross mounts in that dir? NFS, nullfs..
When it comes to the rm failing test also with find this way: find . -type f -exec ls -lai {} \;. Once you know that inode of Makefile you can do find . -xdev -inum $inum -exec rm {} \; where $inum is the Makefile's inum. I purposely used . for the directory you want to remove that file from.
This is just a test to know you don't have non-ascii chars in that file name for some strange reason.
 
I tried to be clever.
Create a zfs dataset and move my corrupted directory to this dataset then destroy this dataset.
But move fails with following error.
Code:
mv test test2
mv: test/test1/Tools/scripts/getpatch.sh: No such file or directory
mv: test/test1/audio/mous/Makefile: No such file or directory
mv: test/test3/4/stats: No such file or directory
mv: /bin/cp test test2/test: terminated with 1 (non-zero) status

It looks indestructible...

/jails/a/test #find . -type f -exec ls -lai {} \;
Code:
find: ./test1/Tools/scripts/getpatch.sh: No such file or directory
find: ./test1/audio/mous/Makefile: No such file or directory
find: ./test3/4/stats: No such file or directory
 
Currently trying the following. Make snapshot of dataset, make zfs-send backup , destroy dataset, & do zfs-receive of backup...
 
What is the result. The corrupt directory become part of the backup-stream. And zfs-receive restored the corrupt directory.
Im clearly in openzfs bug land. Giving up.
 
mv of the 'corrupt directory" into another zfs dataset fails.
But as the dataset has children a "cp" is not easy...
 
Interesting,
find . -ls
Code:
689291        1 drwxr-xr-x    5 root                             wheel                                   5 12 dec. 13:09 .
 22941        1 drwxr-xr-x    3 root                             wheel                                   3 12 dec. 11:31 ./test1
463617        1 drwxr-xr-x    3 root                             wheel                                   3 10 dec. 22:14 ./test1/Tools
463620        1 drwxr-xr-x    2 root                             wheel                                   3 10 dec. 22:14 ./test1/Tools/scripts
find: ./test1/Tools/scripts/getpatch.sh: No such file or directory
462796        1 drwxr-xr-x    3 root                             wheel                                   3 12 dec. 03:59 ./uclcmd-0.1_3.pkg
462798        1 drwxr-xr-x    2 root                             wheel                                   3 12 dec. 13:08 ./uclcmd-0.1_3.pkg/1670813940
find: ./uclcmd-0.1_3.pkg/1670813940/pkg%arch: No such file or directory
833582        1 drwxr-xr-x    4 root                             wheel                                   4 10 dec. 22:11 ./test3
364953        1 drwxr-xr-x    2 root                             wheel                                   3 10 dec. 22:11 ./test3/4
find: ./test3/4/stats: No such file or directory
364854        1 drwxr-xr-x    3 root                             wheel                                   3 15 sep. 18:12 ./test3/c
367858      337 drwxr-xr-x    2 root                             wheel                                   3 15 sep. 18:12 ./test3/c/8
I currently try cp followed by zfs-destroy to get rid of the bad directories...
 
Back
Top