bhyve VM hangs/Cannot find the problem

Hello,

I currently have a Thinkpad T430 that I use as a tiny server in my office. On that machine I run FreeBSD 15 and have a bhyve-VM that also runs FreeBSD 15. This VM is responsible for establishing a connection via tailscale VPN to one of my friend's networks (and then forward some traffic).

The VM:
I use the following configuration for the virtual machine:
Code:
cpu=1
memory=768M
network0_type="virtio-net"
network0_switch="public"
disk0_type="virtio-blk"
disk0_name="disk0.img"
uuid="..."
network0_mac="..."

That is basically only the template from bhyve-vm. I only upped the memory a bit. The only thing running on the VM is tailscale and a SSH server that I use for port forwarding.

The problem:
Every so often the virtual machine starts to completely freeze up and refuses to come back. It appears that this happens more frequently in the morning hours but I cannot really tell. The actual main problem is that I dont have any information about the problem. I checked /var/log/messages on the host computer and the VM (after killing and rebooting it) but found nothing. Furthermore, the logs of the bhyve hypervisor vm-bhyve.log does not show anything. In my latest attempt I tried to keep printing logs to the vm console via tail -F /var/log/messages but the VM still froze and did not show me when and why. During all of that the host computer keeps running normally.

Does anyone know a way I could get any debugging information or maybe encountered this problem before and knows how to solve it?
It is super strange behavior because there is barely anything running on the host and the VM.
 
What filesystem is the host using? And what filesystem does the guest use?

Every so often the virtual machine starts to completely freeze up and refuses to come back.
How can you tell? Does it stop responding to the network, does it still ping? What about the console ( vm console ...)?
 
What filesystem is the host using? And what filesystem does the guest use?


How can you tell? Does it stop responding to the network, does it still ping? What about the console ( vm console ...)?
I use UFS inside the virtual machine and ZFS on the host.

When the VM freezes, all existing connections time out, I cannot create new ones or ping it. And regarding vm console I get no response whatsoever. The console output is just as frozen. I've tried existing "connections" via vm console and creating new ones, both are without any reaction from the VM.

The VM also does not react to ACPI shutdown signals from bhyve (vm stop) and I have to use the forceful 'poweroff' command.
 
When the VM freezes, all existing connections time out, I cannot create new ones or ping it.
That's pretty, definitively, dead. If there's I/O issues for example you'll notice pings would still work, but you'd be unable to login with ssh(8). Parts that don't require disk access would continue to work, like the TCP/IP stack itself.

Anything visible with top(1) on the host when that happens? What's the state the bhyve(8) process is in?
 
That's pretty, definitively, dead. If there's I/O issues for example you'll notice pings would still work, but you'd be unable to login with ssh(8). Parts that don't require disk access would continue to work, like the TCP/IP stack itself.

Anything visible with top(1) on the host when that happens? What's the state the bhyve(8) process is in?
The VM froze again, which is really nice because that makes it somewhat reproducible, albeit with a ~7h delay every time.
ps aux | head gave me the following process information:
Code:
[root@hostpc ~]# ps aux | head
USER      PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED        TIME COMMAND
root       11 276.3  0.0      0     64  -  RNL  Sun15   10042:22.36 [idle]
root    52581  99.1  5.6 833176 686148  3  SC   15:46     567:45.95 bhyve: enricogateway (bhyve)

I don't really understand that. Is the VM process just going to sleep every few seconds? The state also did not change within the last 2h (while the VM was frozen).
 
I don't really understand that. Is the VM process just going to sleep every few seconds?
Looks normal. That's to say, it has that state on my VM host too, and none of my VMs are hanging (they're all just happily chugging along).

Code:
root   62436   13.2  3.9  8481452 3972216  -  SC   Fri20      1041:21.70 bhyve: sdgame01 (bhyve)
root    5994    3.4  2.3  4245276 2275844  -  SC   18Dec25    2834:11.86 bhyve: riviera (bhyve)
root    3598    1.6  0.9 16837660  908024 v0- SC   18Dec25    3293:33.91 bhyve: lady3jane (bhyve)
root    4434    1.3  0.9  4245276  860772  -  SC   18Dec25    1782:27.00 bhyve: tessierashpool (bhyve)
root    5007    0.1  0.9  4245276  895784  -  SC   18Dec25     333:37.98 bhyve: errol (bhyve)
root   70486    0.1  2.0  4254748 2029964  -  SC   Sat19        78:25.32 bhyve: wintermute (bhyve)
root    6041    0.0  1.6  4253724 1580176  -  SC   18Dec25     664:28.20 bhyve: gl-runner-1 (bhyve)
root   70603    0.0  3.2  4253724 3216940  -  SC   Sat19        63:48.60 bhyve: case (bhyve)
root    4480    0.0  3.1  4253724 3088080  -  SC   18Dec25     855:24.54 bhyve: jenkins (bhyve)
root   82397    0.0  0.8  4245404  817764  -  SC   27Dec25     210:53.43 bhyve: fbsd-test (bhyve)
root    2900    0.0  0.5  2181020  548396 v0- SC   18Dec25     189:53.30 bhyve: kdc (bhyve)
 
I see, but then the VM should also be running. One thing that seems somewhat weird is that the CPU usage by that frozen VM is still at 99%. This is also shown by htop. During normal operation it is usually quite low since the only job of the VM is to open a tailscale connection and then perform SSH forwarding. Could it be that the kernel is stuck in some infinite loop? And is there any way to debug this? I have done kernel debugging for Linux (on Linux) using qemu and gdb but I have never attempted such thing on FreeBSD/bhyve.

Thank you so much for your time ^^
 
Without logs, it's difficult to impossible to help you.

As it can be whatever problem (I'm thinking to a bug in FreeBSD 15.0 that manifests only in the VM and the way you use it), it remains you nothing but random tests.

Examples: try to add options H and P with bhyve_options. Modify the emulated disk type, go for nvme for instance (it may not boot, see before fstab and the likes). Change the network interface type with an e1000 (it will boot but you will likely lose network connectivity unless you change at least rc.conf). And anything else you can imagine.

Save your disk file before any test.
 
For what it's worth, my VM host runs 15-STABLE (stable/15-n281412-26365bf2516f), all VMs (except one) are running 15.0-RELEASE. The one exception is Debian 12.
 
Without logs, it's difficult to impossible to help you.
Yes, I know. And this is part of the reason why I made this post. Because usually figuring stuff out myself is not that hard and can be done with some reading and tinkering. But the issue here is, that there are literally no logs. There is nothing written anywhere regarding this problem.
 
Yes, I know. And this is part of the reason why I made this post. Because usually figuring stuff out myself is not that hard and can be done with some reading and tinkering. But the issue here is, that there are literally no logs. There is nothing written anywhere regarding this problem.
You seem to be using vm-bhyve? Have you tried adding
Code:
debug="yes"
to your config file - and then there will be an additional log file created that might contain more info.
 
If all else fails you can attach a debugger.

Just follow the guide for kernel debugging a guest kernel in bhyve. If you use the -G port (as opposed to a serial port) you actually debug the virtual machine. You should be able to Control-C inside the hang and get a backtrace.

(not actually tried that)
 
Back
Top