Solved FreeBSD 10 i386 data corruption problems in virtual machines (WAS: Do not use soft update ...)

If you installed a fresh 10.0-BETA 1 you might want to turn off the soft update journaling on UFS filesystem including the root filesystem. I experienced two rather nasty crashes on my test system running under VirtualBox and I didn't get my system stable until I turned off soft update journaling on the root file system and ran a full fsck(8) on the filesystem. This is how to do it:

Boot into single user mode by selecting 2) at the loader menu. Press enter when prompted for the single user shell. Then use these commands and nothing else once you are in the single user shell:

tunefs -j disable /dev/ada0p2
reboot

Replace ada0p2 with the partition with the root filesystem on your system if it is different than the one used here.

If you have to run fsck(8) on the root filesystem do it in single user mode when the filesystem is mounted read-only.
 
Have you reported this to the current mailing list? If not and nobody knows about it then there is a good chance whatever bug this is makes its way into the release which would be bad. Developers are far more likely to read the mailing list than this forum. In fact you should also raise a PR for it.

I use SU journalling on 9.2 and will be upgrading to 10 after it's released and so this concerns me if it is a general problem that goes unfixed.
 
I have not reported it yet. The initial crash occured during an svn update operation while the host OS was running other CPU intensive tasks so I can not really rule out a VirtualBox problem completely at this stage. I'd need to be able to provide a repeatable test case before I file a PR. There have been similar reports on FreeBSD 9 about SU+J instability and there's the known problem of UFS snapshots hanging on filesystems with SU+J set that probably still exists on 10.0-BETA1.
 
A while back I presumed frequent core dumps (v9 newly upgraded) to be the result of the debugging stuff still present in one's GENERIC kernel, and (though not virtualized) they largely went away with a custom one (which included removing the sound drivers as well as the debugging parts).
 
I think the majority of SU+J problems were fixed in 9.2. I usually run -STABLE and read the SVN commit logs for certain key things that make me sit up and take notice if I see commits to them. SU+J being one of these. I remember there were a lot of related bug fixes made a few months ago. So it's likely that 9.1 was buggy whereas 9.2 isn't. I'm not sure what the status is with snapshots though as I don't use those. I was just interested in keeping up with the bug fixes to hopefully keep my data intact. And to be fair I've not seen a single crash.

I guess I'll just have to try 10 with SU+J still enabled. If it crashes at all I'll switch it off and report it myself.
 
Well, the test system on VirtualBox is still quite unstable even with softupdate journaling turned off but my firewall running the exact same version of 10.0-BETA1 that is on real hardware is rock solid. I think I'll have to put the blame here on VirtualBox...
 
I had FreeBSD 10 BETA 2 installed on my Macbook and I also run into problems getting VFS errors and soon there were usable filesystems at all. first it restarted my computer at random times, I thought it would be Xorg messing with me, then I saw the messages in dmesg, that there were lots of seek errors (and it was not my harddisk, it was VFS errors). the configuration was the same ... UFS and softupdates-journaling.
 
Did you do any debugging? Did you try fsck with the journal still enabled to see what happened, and then disable SU+J, fsck it again and then see what happened? Both yourself and the OP both mention SU+J but don't offer any evidence that it is actually this, and in the case of the OP it wasn't.
 
Shame. I'm currently running 9.2-STABLE and am itching to upgrade it to 10. I would have probably done so during the ALPHA or BETA releases if it wasn't for me reading posts in this thread. At least with the original post being proven to be something else I was about to go for it this weekend, but now you've put me off again. I'll probably end up waiting for the official release unless it can be proven one way or the other. I suppose the answer is to just try it out myself, but it would be a huge inconvenience to have much downtime (or data corruption) on my box so I'm hopeful to not introduce something on it which could cause problems.
 
For me the beta of FreeBSD 10 has been rock solid on real hardware. If you are forced to use a virtual machine for testing wait until this issue looks like solved.
 
Folks,

I am hearing on IRC that some other dudes did not experience crashes on UFS+SU+J. So, it may be the case because I adjusted the fragment, block and inode sizes. I had used 4K for fragments, 8K for inodes and 16K for blocks. So whenever people experience crashes on UFS+SU+J on 10-BETA, be sure to tell if you have adjusted these sizes, so that some developers do not waste their time trying to reproduce this with default settings.
 
FYI. I took the plunge and upgraded my server to 10-BETA3. It's rock solid and hasn't caused any issues related to the filesystem or SU+J so this looks OK to me. I'm using whatever the standards are for the block, inode sizes etc. I just used GPT partitioning, aligned all my partitions to 4K boundaries, and newfs'd it with default settings except for turning on SU+J.

Upgrading to 10 has been a bit mental though. Replaced BIND with dns/unbound and dns/nsd. Had to uninstall converters/libiconv because it's now in the base system. Had to convert to pkgng. However I've had all sorts of hassles trying to recompile all ports, dependencies seem messed up and so I had to delete all ports in the end and reinstall them all from scratch. Finally got it back to how it was before except I can't compile www/squid33 with either clang or gcc46 which is annoying.
 
xtaz said:
FYI. I took the plunge and upgraded my server to 10-BETA3. It's rock solid and hasn't caused any issues related to the filesystem or SU+J so this looks OK to me. I'm using whatever the standards are for the block, inode sizes etc. I just used GPT partitioning, aligned all my partitions to 4K boundaries, and newfs'd it with default settings except for turning on SU+J.

That proves my point. You are one of them not experiencing problems, because you did use the default settings, so maybe we are on the right path here.
 
I'll chime in since I experienced the same issue with 9.2-p1 in a Hyper-V VM so the issue is not limited to VirtualBox or 10.x.
My problem was fixed using the advice provided in @kpa's bug report.

Thanks again! :)
 
Last edited by a moderator:
It turned out to be a problem with so called unmapped I/O. Setting this loader.conf(5) variable should fix the issue for now when running FreeBSD 9.2 or 10.0-BETA on a virtual machine:

Code:
vfs.unmapped_buf_allowed=0

Reference:

http://lists.freebsd.org/pipermail/freebsd-current/2013-November/046154.html

I'm not sure if this is needed when the OS is running on real hardware, so far I haven't seen any instability on any of my system running on real hardware.
 
Yes, but I do not know why you all are talking about VM's, to me it occurred on real hardware on 10-BETA2, but to me 9.2-RELEASE works solidly. So there has to be some more differences than that, but I can set that variable though, maybe it will make me feel more confident.
 
@gustopn, I'm talking about VMs because those are the only environment where I have witnessed the problem so far. Are you on i386 or AMD64? So far the problem on virtual machines has been limited to i386 guests based on what I have seen. If you're on AMD64 and on real hardware your problem may be something entirely different.
 
Last edited by a moderator:
FWIW, In addition I have a 10.0-BETA3 Hyper-V VM and a 9.2-STABLE VirtualBox VM. Neither seems to have the issue - just the 9.2-p1 VM on Hyper-V does.
 
Re: Do not use soft update journaling on UFS filesystems on

It made it into 10-RELEASE. It has the same problem. GPT. No ZFS. No VM. Repeated issues.

When I got X back[ ]up I will upload:
  • loader.conf
  • rc.conf
  • make.conf
  • fstab
  • etc.

AMD FX 8350
16 GB 1866 DDR3
Nvidia [modern but let me get back to you]
Samsung 1 TB SATA III
10-RELEASE

RaspberryPi B longing for FreeBSD :)
 
tunefs -j disable /dev/ada0p2

It's actually tunefs -J disable /dev/devicename.

I didn't have softupdates enabled so I'm guessing it's the unmapped I/O problem.

Does anyone know if vfs.unmapped_buf_allowed has a fix in 8.x? I am seeing this same issue and am reluctant to go to 10. There is really no need unless this remains a problem. I can't change VT-x/AMD-V acceleration as this is a 64 bit guest.
 
If you installed a fresh 10.0-BETA 1 you might want to turn off the soft update journaling on UFS filesystem including the root filesystem. I experienced two rather nasty crashes on my test system running under VirtualBox
I stopped reading your post and got really mad at this spot. Why in the world are you giving recommendations to people not to use some feature when you don't even run FreeBSD on the physical hardware. How do you know that that is not VirtualBox bug?
 
I stop reading your post and got really mad at this spot. Why in a world you are giving recommendation to people not to use some feature whey you don't even run FreeBSD on the physical hardware. How do you know that that is not VirtualBox bug?

Read the whole thread trough please before making comments.
 
Back
Top