Stability issues with heavy I/O load [-CURRENT, VirtualBox, poudriere bulk]

zirias@

Developer
I'm placing this in Off-Topic for two reasons:
  1. OS version is FreeBSD 11.0-CURRENT
  2. Environment is VirtualBox on a Linux host, so maybe the problem is not FreeBSD after all
I'm experimenting with a virtual machine as a poudriere build host. To be able to test my ports on -CURRENT, too, the whole machine is installed from svn -CURRENT branch. In the long run, I want to do this on a new server running FreeBSD, but for now, just to get a grip, a VirtualBox vm must do. I don't use ZFS at the moment, so poudriere does a lot of copying. The backend storage for the virtual disk is a .vdi disk image located on an LVM volume with dm-crypt below on a normal HDD, so disk operations are kind of slow.

What's happening to me after 30 minutes to a few hours of poudriere bulk building (no matter whether using just one job or the default 4 for my CPU) are kernel panics starting with ahcich0: Timeout on slot X. Right now, I'm testing with Message Signaled Interrupts disabled, but it doesn't solve the problem. What could I look into?
panic.png
 
And this explains why it's posted here in Off-Topic, see above. The kind of crash isn't unique to -CURRENT, though (but there seems to be a lot of possible reasons), and there's a remark on poudriere's TODO list about the risk of panics, so maybe someone still knows a tip for what to try to get this stable. After all, poudriere building on -CURRENT shouldn't be that uncommon, in case you want to test your ports. But maybe the attempt to do this in VirtualBox is flawed ....
 
All communication about -CURRENT should take place on the freebsd-current mailing list (join), not on the forums. There are very few developers on the forums, and the amount of 'regular users' routinely running -CURRENT who are willing and able to lend support is likely in the single digits. If you want support on these forums, run either a supported version of the -RELEASE branch (for proven, stable, solid installations) or of the -STABLE branch (a slightly more experimental, but still very stable version that incorporates some of the newer developments of the -CURRENT branch).
 
As SirDice stated/quoted, there is zero support for FreeBSD CURRENT in any capacity anywhere here on the forums.

Having said that, I've ran into this also on 10-STABLE in the past when trying to build ports with Poudriere on VirtualBox. I know this doesn't help, but I gave up trying to troubleshoot it and only compile from source in any decent capacity on a bare metal host now days. Much less of a headache.
 
As SirDice stated/quoted, there is zero support for FreeBSD CURRENT in any capacity anywhere here on the forums.
I'm even subscribed to freebsd-current but have no intention creating noise there for a problem, that's probably not related to -CURRENT at all, because:

Having said that, I've ran into this also on 10-STABLE in the past when trying to build ports with Poudriere on VirtualBox. I know this doesn't help, but I gave up trying to troubleshoot it and only compile from source in any decent capacity on a bare metal host now days. Much less of a headache.

This is somehow what I suspected. To shed some more light on this, I'm trying to reproduce it with a 10.3 VM configured the same right now, we will see. The plan is of course to use 10.3 for my new FreeBSD server, but what about testing own ports? I can't test with -CURRENT on a -RELEASE host, AFAIK...

edit: and now the same thing happened with 10.3-RELEASE as well. I really suspect it's somehow the congested I/O causing that panic. Does anyone here use a VirtualBox VM for poudriere building successfully? Or maybe bhyve with a zvol?
 
I guess I found the "magic knob" ... my poudriere in VirtualBox is now building for 8 hours 30 minutes without crash (tested again with -CURRENT, but I'm pretty sure that's irrelevant). I changed a simple setting in VirtualBox: "use host I/O cache" is now disabled. So far it looks like THIS did the trick.
 
What controller was selected in VirtualBox? If I recall correctly by default it uses an emulated IDE controller. I tend to remove it and use the AHCI/SATA controller. Perhaps this makes a difference too?
 
Yep, I used the AHCI, but tried with IDE too and the result was the same. Of course I tried virtually anything else before unchecking this #+*! checkbox ;). Now it's definitive, poudriere is running for nearly 23 hours without the crash. So if anyone else experiences a related problem, try to disable the host I/O cache :)
 
Back
Top