Cannot Delete a Directory

I have a directory that will not delete. When I attempt rm -rf camera it just sits there and does nothing. I let it sit for over 8 hours and still nothing. I try to list the contents ls camera and again nothing, just sits there with a blinking cursor. If I try du same outcome.

Anyone ever see this kind of an issue before and know how to fix it?

Thanks.
 
That sounds like a filesystem corruption. Is this on UFS or ZFS?
 
Its UFS and in fact is NAS4Free. Its been a while since I have used fsck, so will have to review my procedures, but that is something I hadn't thought of. I use Fedora on my primary systems, so not too familiar with FreeBSD. I will try fsck, thanks.
 
Make sure you have some backups beforehand. If this really is a filesystem corruption "fixing" it may inadvertently remove a bunch of stuff. Preferably run the fsck(8) from single user mode. Alternatively you can unmount the filesystem and run fsck(8) (but unmounting would be impossible if the filesystem happens to be /).

Edit: Right, ZFS. In that case try zpool scrub <poolname>. But this may not fix things and it's possible it's so corrupt even ZFS's self-healing can't deal with it any more. In which case there's nothing else to do but restore from backups.
 
I found the issue while running rsync on the drive. I have everything backed up and am in the process of making a second back just in case. Once that is done I will begin to play around a little.
 
Make sure you have some backups beforehand. If this really is a filesystem corruption "fixing" it may inadvertently remove a bunch of stuff. Preferably run the fsck(8) from single user mode. Alternatively you can unmount the filesystem and run fsck(8) (but unmounting would be impossible if the filesystem happens to be /).

Edit: Right, ZFS. In that case try zpool scrub <poolname>. But this may not fix things and it's possible it's so corrupt even ZFS's self-healing can't deal with it any more. In which case there's nothing else to do but restore from backups.

Ok, understand. I was actually trying to delete the directory, so since I can't delete the directory, my only other option may be to rebuild the pool somehow. I am way over my head and will have to do a lot of research here, I have never actually fiddled with ZFS, only RAIDs on large servers, I used ZFS on my home NAS because my reading indicated it was a very fault tolerant system, and so far it has been. This NAS has been running for about 5 years not without a hiccup, and I have been so pleased I am going to start converting some of my customers to the same system instead of Windows file servers.
 
I have never actually fiddled with ZFS,
ZFS is great, easy to use, etc. But if you have filesystem corruptions all bets are off. Then ZFS is anything but easy.

because my reading indicated it was a very fault tolerant system, and so far it has been.
It absolutely is. But there's only so much the self-healing can fix. It's definitely not bulletproof.
 
ZFS is great, easy to use, etc. But if you have filesystem corruptions all bets are off. Then ZFS is anything but easy.


It absolutely is. But there's only so much the self-healing can fix. It's definitely not bulletproof.

Pretty much like every other OS I have used. Great while they are running, but when things go wrong, it can be difficult. But that's why we need computer engineers and technicians. If everything worked perfectly all the time I would be out of a job and life would be very boring!!:)
 
OK, time to diagnose some more. Just knowing that you have some commands that don't work, but zpool scrub found no problems, will not help us give any advice for how to proceed.

You say you have a directory called "camera", which is stored somewhere in ZFS. First question: Make sure that directory is actually what you think it is: Go to the parent of the directory, and do a "ls -lF", which will show you whether that directory is in reality a link, and it shows you the link count (second field), which is the number of subdirectories in "camera". Second step: Do stat -x camera, which will show you a lot of interesting statistics about it. Anything unusual? Is it on the same device as its parent directory? How about permissions, ACLs, and such?

Next thing: You say that rm -rf, ls and du all hang. Do you know anything about the content of that directory? If you could get a listing of all things in that directory (not via using ls, which will hang, but for example from prior knowledge), you might be able to see whether the problem affects every entry in that directory, or only one.

When these commands hang, do you know what state they are in? Do they use CPU time? Can you interrupt them with control C? Are there are console messages about hardware problems or ZFS internals at that time? Ideally, you could drill down further and see exactly what they are doing: Either run ls under a debugger and see how for it gets, or use dtrace to see what system calls it makes, and what system call it hangs up on.
 
Have you been using autofs to mount a camera on that directory?
Or could it be a auto-created directory by /etc/autofs/special_hosts, so related to a hostname on your network?
It would show the same behaviour if the device/filesystem to be mounted is not available.
In that case all autofs related services have to be stopped/disabled first.
 
OK, time to diagnose some more. Just knowing that you have some commands that don't work, but zpool scrub found no problems, will not help us give any advice for how to proceed.

You say you have a directory called "camera", which is stored somewhere in ZFS. First question: Make sure that directory is actually what you think it is: Go to the parent of the directory, and do a "ls -lF", which will show you whether that directory is in reality a link, and it shows you the link count (second field), which is the number of subdirectories in "camera". Second step: Do stat -x camera, which will show you a lot of interesting statistics about it. Anything unusual? Is it on the same device as its parent directory? How about permissions, ACLs, and such?

Actually the full directory structure looks like this: /mnt/Files/John/mnt/Media/Camera/LHouse. When I cd to Camera and do an ls -IF on LHouse, it hangs and doesn't provide any information. I am able to ^C out of that.

Here is the output of the stat command:

File: "LHouse"
Size: 20022870 FileType: Directory
Mode: (0777/drwxrwxrwx) Uid: ( 1000/ jwright) Gid: ( 1000/ admin)
Device: 103,3126263904 Inode: 73742 Links: 100
Access: Sat Jul 9 22:47:48 2016
Modify: Sun Apr 9 13:13:13 2017
Change: Sun Apr 9 13:13:13 2017

The content is video from a security camera I used to have in my shop. I shut it down a while ago, but never removed the video clips from the server.

Next thing: You say that rm -rf, ls and du all hang. Do you know anything about the content of that directory? If you could get a listing of all things in that directory (not via using ls, which will hang, but for example from prior knowledge), you might be able to see whether the problem affects every entry in that directory, or only one.

The content is video from a security camera I used to have in my shop. I shut it down a while ago, but never removed the video clips from the server.

When these commands hang, do you know what state they are in? Do they use CPU time? Can you interrupt them with control C? Are there are console messages about hardware problems or ZFS internals at that time? Ideally, you could drill down further and see exactly what they are doing: Either run ls under a debugger and see how for it gets, or use dtrace to see what system calls it makes, and what system call it hangs up on.

I inserted some answers after the appropriate para above:

When I run ls -al on LHouse and then top here is the ls line:

Code:
  PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 5387 root            1  89    0 32256K  2760K RUN     1  12:04  58.98% rsync
 2747 root           13  20    0   229M 14944K nanslp  1 648:40  51.95% fuppesd
 5385 root            1  52    0 32256K  3028K select  0  10:46  51.37% rsync
 5425 jwright         1  22    0 16872K  3380K zio->i  1   0:02   3.37% ls

So it appears ls is running, not sure what the state zio->i means. I can kill the process with ^C. But there are no console messages about anything. I will research how to use a debugger and dtrace and try those.

Many thanks for you advice and assistence.
 
It seems that the directory is on the correct file system. and the stat doesn't show anything extraordinary. The directory LHouse has 98 subdirectories (a link count of 100), which is not particularly high (if it had a million, that would have raised eyebrows, but on a modern system 100 is uninteresting).

The thing which is really odd: The ls command is doing something (it is using some CPU time, 3.37% in the example above), and it is waiting for IO (that's what zio->i means). Either that subdirectory is gigantic and the ls just takes a horrendously long time (seems very unlkely, doing an ls can not take hours in practice), or the IOs are ludicrously slow so the little bit of work the ls has to do takes very long (same argument, IOs can't be longer than ~30s apiece without raising error messages on the console), or something else is wrong.

My only suggestion is a bug in ZFS; I bet that something is stuck in a loop. This would be a question for ZFS internals developers.
 
Back
Top