FreeBSD as KVM guest drops network connection vtnet once a day

I have knowledge since FreeBSD since 5.x, mostly on bare metal, but also with virtualized FreeBSD guests (on my own hardware). Managed to fix almost all problems, but this one is driving me mad.

Now, I have a FreeBSD guest 12.1-p3 with two jails. The host is with an external VPS-company and it's UNIX/KVM-based. VirtIO.

The jails run Apache2.4/PHP-FPM/MySQL. Since december 2019, once a day but not a fixed time of the day, the network loses connection to one of the jails (but not both). I monitor the jails via the hosting company on the ssh, 80 and 443 ports every 60 seconds. The network drop never exceeds 120 seconds.

I also monitor the FreeBSD host itself on it's ssh port; that one never goes away.

I have absolutely nothing in the logs (messages, php, etc.) that indicates a problem. The hosting company also can't find anything.

Code:
rc.conf:

ifconfig_vtnet0="inet 37.97.a.x netmask 255.255.255.0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso"
ifconfig_vtnet0_alias0="inet 37.97.a.y netmask 255.255.255.255"
ifconfig_vtnet0_alias1="inet 37.97.b.z netmask 255.255.255.255"
defaultrouter="37.97.a.1"

(tried it also without the whole -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso)

sysctl:

net.inet.tcp.tso=0

I think I tried for the last six weeks all the usual suspects, but now I have absolutely no idea where to investigate further or if it's maybe a problem in the routing / something in the virtualization platform of the provider.

I know jails inside a virtualized system maybe is a little bit exotic, but then I would also like to understand better why exactly that's a problem, because I always understood / thought that a jail is not a completely extra virtualization layer.

Much appreciated for any hints...
 
to be honest, I have switched from using virtio to other network drivers like em1000. I have had so many obscure problems with virtio based VMs that for me the performance gain is not worth the time debugging network problems. From FreeBSD, OpenBSD and Linux as guests with KVM, bhyve, Xen and various proprietary hypervisors I have had many troubles, so my advice: stay away from virtio and virtio-based network drivers.
 
Yep. The very first time I noticed it personally in realtime. Couldn't connect to the webserver. Tried to SSH to the jail. No connection. Connected via SSH to the jailhost. Worked. After a minute, I could also connect to the jail. That's the moment I created the monitors and I noticed it happened 'regularly'.

Last week, I was working on the jailhost via SSH. Suddenly, I got a warning from the monitor that one of the jails was unreachably. I was working on the jailhost and that simply kept on functioning. First, I connected with my browser to the jail:80, no connection. Then I did a jexec to the jail and I noticed I got there, in the meanwhile the monitor gave me the sign everything was OK again.

So, everytime it happens, I can personally see (if I'm fast enough) that the jail is indeed unreachable, I can always connect to the jailhost, and afterwards there is nothing in the logs on jails or jailhost.

It's like the route to the jails disappears for a 60 - 120 seconds (if that makes sense).

I know it maybe sounds like an ambiguous problem, but the server is hosting documents for some healthcare organizations regarding COVID, so if I can't find the solution, I want to move the documents to another server, but at the time, I like to documents to be available 24/7 ...
 
to be honest, I have switched from using virtio to other network drivers like em1000. I have had so many obscure problems with virtio based VMs that for me the performance gain is not worth the time debugging network problems. From FreeBSD, OpenBSD and Linux as guests with KVM, bhyve, Xen and various proprietary hypervisors I have had many troubles, so my advice: stay away from virtio and virtio-based network drivers.

Yup, I heard a lot about virtio problems over the years, however until now I haven't really had any (big) problems (nothing beats BSD on baremetal, though).
However, this was my first jails-inside-virtualization 'experiment', and I think I pushed my luck a little too hard.
 
Update: at the 27th of March I requested my provider if they could change the networkdriver from virtio to e1000. Normally, they don't do that, but they made an exception. I have been running with the e1000 driver for three weeks now. The first week, I didn't have any dropout. Then, it returned, and then the '1-minute drop out' rate goes up again. I stopped the VM before the weekend, but that didn't make a difference. All in all, it's just very strange.
 
Are you sure that they use KVM? This sounds more like the problems I have encountered with VMWare ESX hypervisors...
 
Are you sure that they use KVM? This sounds more like the problems I have encountered with VMWare ESX hypervisors...

They have 'their own VPS-platform' using KVM for the virtualization, but I think the whole infrastructure is (based on) OpenStack.
 
Back
Top