Other Problem wiith removing folders on NFS

Hi,

I'm not sure if problem with NFS should go here or ot networking?

Configuration:

NFS Server: Synology Disk Station

NFS Client: FreeBSD 12.2 or 13.0 (tested on both)

Last tested configuration in /etc/fstab:

Code:
ds_address:/volume1/test  /mnt/ds_test  nfs  rw,rsize=131072,wsize=131072,noatime,intr,tcp,actimeo=1800,vers=3  0  0

Problem Description:

When I'm trying to delete folder with a lot of files from Midnight Commander or from Moodle (using Apache with PHP) I'm getting error "Could not delete directory, it is not empty".

I tested different mount_nfs options and I found a way for test: I'm copying admin folder moodle instalation to mounted NFS volume and then I'm trying to delete it from Midnight Commander. So far always with error :(

I mounted the same resource in Ubuntu and I made the same test - folder was successfully deleted.

I have no idea what I can tune from client side.

From the server side I tested synchronus and asynchronus mode. The first one is much slower when writing files but problem with deletion is the same.

Regards,
Marcin
 
Tested: rm -rf works properly, but this is not a solution. Moodle does not work in this way. And I wrote about mc because it was a method to test outside moodle.

With mc procedure is as follows: I'm trying to delete directory, deleting starts and in some moment (with this admin folder mentioned earlier always in the same moment) the red window is displayed with Retry or Skip option. Retry does not work I have to choose Skip. And next I'm using F8 to delete the same folder again and the rest of files are deleted.

With Moodle it works similary. I mean that if I'm trying to delete directory with files from web interface it is finally deleted after a few attempts.
 
you can probably hack the php code to add some delay before each rmdir or make it use system() instead;
kind of sucks but seems the easier path for now
 
I suspect that this is a known problem, and that the real solution is probably in modifying mc and moodle to handle the situation correctly.

The problem is this: if an application performs readdir() while the directory is being modified, it is not guaranteed that it will find all directory entries. So think about what happens when you want to delete a directory: the program has to read the directory and delete entries (which modifies the directory). Different file systems handle this differently well, and most try pretty hard to find all directory entries for common usage patterns. It is theoretically impossible for a file system to handle all possible cases though. The underlying problem is that the Unix/POSIX opendir/readdir API does not allow atomically deleting all entries from a directory and the directory itself, so file systems have to make compromises to make it work "close to correct" for common cases. My educated guess is that FreeBSD's NFS client (which is a file system in the sense of presenting a VFS interface and thereby a file system API) handles this particular case badly, but not incorrectly. Note that NFS clients have a particularly hard time implementing this well, as the real delete operations are performed on their server.

The best cure is for applications that want to use the opendir/readdir API to be aware of the limitations and work around them. One example is to not loop over directory entries, and instead repeatedly calling opendir(), calling readdir() only once, then closedir(), and deleting one entry at a time. Another example is to first read the whole directory into memory without modifying it, and only then begin the deletes. Yet another possibility is to simply restart the opendir()/readdir()/unlink() loop if entries are left. I would contact the maintainers of mc and moodle and ask them to look at how other programs (such as rm) handle it.
 
Thank you for your comprehensive answer.

If I well understood this solution won't help:
(...) to add some delay before each rmdir (...)
?

I'll try to contact with maintainers of moodle and maybe they will be able to help.

But...

I read PR linked by facedebouc and it looks like the problem has been known for 18 years and it is possible to fix it by patching NFS client in FreeBSD. Patching nfsclient will solve problems with moodle, with mc and probably with some other apps, so in my opinion it is the best solution. Maybe someone will do it...
 
Actually, adding a delay MAY work. Here's why: NFS clients do heavy caching, and updates to/from the NFS server don't go instantaneously (in particular in NFS V2 and V3). One example, the NFS protocol can't handle the deletion of open files cleanly, which is why sometimes you find files named ".nfs12345....." (with a string of digits) in a directory: those are files that have been deleted (have no directory entry) but are open, and NFS has to create a fake directory entry for them. So on NFS, if you delete all files from a directory (or what you thought were all files), you may very well find that the directory isn't actually empty, so the rmdir on the directory won't work. Give NFS a few milliseconds or seconds to clean up, and the rmdir might work. This is a workaround for NFS being sluggish. A similar workaround is to try the rmdir() system call a few times after a failure (which is sort of the same as a delay).

The correct answer would be to use a better remote file protocol, better than NFSv2/3. But that is often not feasible.
 
Protocol-wise, NFSv4 is sort of the only realistic option for free software in Unix land. Personally, would actually prefer to run SMB/CIFS (even between two Unix machines), since NFS is such a disaster of a protocol (NFSv4 at least works correctly), but that requires setting up Samba (painful), and having to deal with the Windows permission and ownership models (painful for all but the simplest cases). Also, lots of Unix zealots will refuse getting near anything windows related because they claim it causes a skin rash (even though SMB/CIFS is much better designed, having learned from the lessons of NFS and having evolved).

If you have to run NFSv2/3, at least get a really good server; having the protocol implementation be carefully tuned on the server side makes the client's life easier. Sadly, the perfect answer is NetApp, which is too expensive for amateurs.

Technically, Ceph and Lustre are available as free software. I think for amateurs and small clusters, they are unrealistically complex. Don't know the state of Gluster these days.

Andrew (AFS) support and complexity are daunting. Coda never went past the research prototype.

If you're willing to spend 5- and 6-digit amounts, there are many better options, including Isilon, GPFS, CXFS, and lot of software that's internal to big companies and hyperscalers.
 
The problem is this: if an application performs readdir() while the directory is being modified, it is not guaranteed that it will find all directory entries. So think about what happens when you want to delete a directory: the program has to read the directory and delete entries (which modifies the directory).

The above well describes the problem. And the below I implemented in moodlelib.php:

Another example is to first read the whole directory into memory without modifying it, and only then begin the deletes.

And it seems to work properly :)

Actually, adding a delay MAY work.

I made some more investigation after your previous answers and the delay is no the solution. Like you wrote before: the handle of the directory changes and not all elements were listed in the delete loop.

I've changed:

PHP:
    if (!$handle = opendir($dir)) {
        return false;
    }
    $result = true;
    while (false!==($item = readdir($handle))) {
        if ($item != '.' && $item != '..') {
            if (is_dir($dir.'/'.$item)) {
                $result = remove_dir($dir.'/'.$item) && $result;
            } else {
                $result = unlink($dir.'/'.$item) && $result;
            }
        }
    }
    closedir($handle);

to:

PHP:
    if (!$handle = opendir($dir)) {
        return false;
    }
    $result = true;
    // First list all files in the directory ...
    $dirlist = [];
    while (false!==($item = readdir($handle))) {
        $dirlist[] = $item;
    }
    closedir($handle);
    // ... then delete all elements
    foreach ($dirlist as &$item) {
        if ($item != '.' && $item != '..') {
            if (is_dir($dir.'/'.$item)) {
                $result = remove_dir($dir.'/'.$item) && $result;
            } else {
                $result = unlink($dir.'/'.$item) && $result;
            }
        }
    }
    // Clear variables from memory...
    unset($item);
    unset($dirlist);
 
Back
Top