UFS Data loss due to vfs buffering (?)

I'm experiencing data loss on a UFS filesystem on an iSCSI disk when the iSCSI connection is terminated abruptly. I know the issue isn't that the data didn't have time to flush to disk; unmounting the filesystem after the copy always returns immediately.

Details:
FreeBSD 10.1-RELEASE
iscsictl(8) (ie. the new iSCSI initiator)
single GPT partition on disk
UFS with soft-update journaling

I mount the fs, copy a 1G file (have tried source file on tmpfs and a local SATA disk), wait ~10 seconds, then pull the Ethernet cable on the NIC which is connected to the iSCSI disk. I then reboot gracefully with shutdown -r now. After the system comes back up a fsck is necessary; I answer y to all the questions. After mounting, I either find no evidence the file ever existed, a file of zero size, or a truncated file. Even calling sync before terminating the connection does not prevent data loss.

The first indication the problem had something to do with buffering was that during the reboot, the buffer sync (ie. Syncing disks, buffers remaining...) always indicates something in the range of 20-50 buffers that need syncing, all of which are eventually given up on.

I've found two workarounds:
  1. Set the sysctl variable vfs.lodirtybuffers to 1. With this setting, it takes 2-3 seconds for the sysctl variable vfs.numdirtybuffers to return to the level it was before the I started the copy. At that point I can pull the ethernet cable, reboot (there are still a few buffers that don't get synced), fsck, etc. and the file is intact. I haven't seen any side-effects to this, but since the default value is 13110, I expect it's not exactly best practice.
  2. Use UFS without soft-updates or journaling and sync before terminating the iSCSI connection. The sync actually does what's expected in this case, and although fsck is still required, AFAICT it only needs to mark the fs clean.

Some initial testing with UFS and gjournal seems to point to it as a solution, but I don't know how well that would scale with the goal of the project I'm working on. It also seems a bit flaky. All I really need to do is find a good way to prevent those buffers from indefinitely holding just enough data to screw things up...
 
That's what I was thinking, but then I've never had cause to think about buffers before.

I'm also starting to wonder if there's a larger issue with buffer flushing. The goal I'm working towards is a highly-available CIFS NAS. With my lodirtybuffers workaround in place, it seems like the socket buffers used by samba hold on to that last bit of data as well.
Edit: Nope, just impatience.
 
Back
Top