bhyve Booted Debian and FreeBSD 15 using qemu accelerated with bhyve/vmm for the first time. This is an epic milestone for the community !

Hello to everyone.

After 3 days of full work,we have booted Debian and FreeBSD with qemu accelerated with bhyve/vmm for the first time. This is an epic milestone.


Istantanea_2026-05-17_21-47-27.jpg



Istantanea_2026-05-17_21-48-50.jpg



Further development is needed...a lot of development...but anyway this is a storic moment....we can use another hypervisor. This time in cooperation with the storic and mature qemu. FreeBSD is second to none.

This success has been possible thanks to the competence of @Abhinav Chavali who started this project for the GSOC 2025 ; thanks bro.

If someone wants to help the development,tell me. We have shared the code on github to continue the development.
 
Last edited:
Is this the same as https://github.com/dumrich/qemu or built on top of it or something else?

It's built on top of dumrich's work. Specifically:

- QEMU side: We started from https://github.com/dumrich/qemu branch accel-vmm (his GSoC 2025 code)
and applied 8 patches on top — restructured the meson build, rewrote bhyve-all.c with proper VMX
segment descriptor conversion, added MMIO userspace fallback, fixed i8259/IOAPIC interrupt
delivery, etc.

- Kernel side: We started from dumrich's FreeBSD 16.0-CURRENT fork (with the vmm.ko QEMU support)
and applied 4 kernel patches — IOAPIC MMIO routed to userspace (so QEMU's own IOAPIC model handles
it), HLT returns to userspace, and debug printfs in vmx_inject_interrupts/vlapic_pending_intr
that accidentally fixed a timer race condition.

The original dumrich code could enter VMX and run SeaBIOS but would hang or crash before booting a
real OS. Our patches fix the critical bugs (NULL deref, ENAMETOOLONG, segment descriptor sync,
interrupt delivery) that blocked a full guest boot. With all patches applied, Debian 13 boots to a
login shell in ~3 seconds with -accel bhyve.
 
● What we have achieved between yesterday and today :

1. SMP up to 8 CPUs — previously only 1 CPU worked, now the Debian VM runs with 2, 4 or 8 processors
2. Fast boot — previously it took 30+ seconds per systemd service line, now the full boot takes ~4 seconds
3. Working interactive login — previously the VM reached the login prompt but you couldn't type anything.
Now you can log in, use the shell, run apt update, etc.
4. Keyboard input fix — discovered that glib (the library QEMU uses to read input) stops working with the
bhyve accelerator. Created a workaround that reads directly from the keyboard every 5ms
5. stdin fix with sudo — discovered that echo password | sudo leaves stdin dead for QEMU. Fixed with exec
0</dev/tty in the start script
6. Proper multi-CPU handling — implemented the INIT-SIPI-SIPI protocol that the BIOS uses to bring up
additional processors
7. Working networking — the VM can access the internet, run apt update
8. Improved start script — automatic cleanup of zombie VMs, optimized parameters
 
What's the performance loss if you do this with both FreeBSD host and VM guest? And what's left of the graphics inside the VM?
 
What's the performance loss if you do this with both FreeBSD host and VM guest? And what's left of the graphics inside the VM?


Istantanea_2026-05-19_23-56-17.jpg


Summary: Pure CPU-bound code runs with ~15% overhead — this is hardware-accelerated via VT-x/EPT, the guest runs natively on the CPU. The real cost is in syscalls (3-4x slower due to VM exits) and memory bandwidth (EPT double-translation). Disk I/O is reasonable (~22% overhead on writes). Fork is ~30% slower. These numbers are comparable to what you'd expect from KVM on Linux.

Graphics in the VM:

The guest sees a QEMU stdvga (Bochs VBE, vendor 0x1234:0x1111) — a basic framebuffer with 16MB VRAM. FreeBSD loads the vgapci driver and runs in VT(vga) text 80x25 mode. There is no GPU acceleration — no DRI,no /dev/dri, no DRM driver.

Available GPU options in QEMU (all emulated, none accelerated):

- stdvga/bochs-display (current) — basic framebuffer, works for console and X11 with scfb or vesa driver
- virtio-gpu — paravirtualized, needs virtio_gpu driver (available in FreeBSD 14+), better performance for 2D

- qxl — Spice protocol, good for remote desktop scenarios
- vmware-svga — VMware SVGA II, FreeBSD has vmwgfx driver

None of these provide 3D hardware acceleration inside the VM. For that you'd need either GPU passthrough (VFIO/PCI passthrough of a real GPU) or virgl (OpenGL forwarding, not yet available with bhyve). The VM is suitable for server workloads, console use, and basic X11/Wayland desktop — but not for GPU-intensive tasks like gaming or 3D rendering.
 
Any chance of getting Chavali's (& later) vmm.ko merged in to support accelerated qemu? I noticed that his (dumrich) fork of freebsd-src is over 5700 commits behind freebsd's repo.
 
Performance Test Methodology

Setup

- Host: i9-9900K, FreeBSD 16.0-CURRENT, ~64GB RAM
- Guest : FreeBSD 15.0-RELEASE, QEMU + bhyve accelerator (-accel bhyve), 1 vCPU, 2GB RAM
- Guest disk: 6GB qcow2 on host UFS
- Access: SSH into guest via port forwarding (host:2222 → guest:22)

Benchmarks used :

No sysbench — all custom. Two rounds:

Round 1 — shell-based (dd):

dd if=/dev/zero of=/dev/null bs=1m count=256 # memory bandwidth
dd if=/dev/zero of=/tmp/bench.dat bs=1m count=64 conv=sync # disk write
time sh -c 'i=0; while [ $i -lt 100 ]; do /bin/true; i=$((i+1)); done' # fork

Round 2 — compiled C program (cc -O2), run on both host and guest:

- CPU integer: 100M volatile additions with clock_gettime(CLOCK_MONOTONIC)
- fork+wait: 200x fork()/waitpid() cycle
- Syscall latency: 1M getpid() calls
- Disk I/O: 128MB sequential write (1MB chunks) + fsync(), then sequential read

Raw numbers (Round 2 — the precise one)

Istantanea_2026-05-19_23-56-17.jpg


Notes:

- Memory bandwidth number is misleading — it's EPT double-translation penalty on a kernel zero-copy loop, not real application memory bandwidth
- Disk read showed guest faster than host (13.1 vs 9.5 GB/s) because the file was hot in host page cache
- CPU runs natively on VT-x/EPT, 15% overhead is from EPT page walks + occasional VM exits

Comparison with native bhyve

We did not benchmark native bhyve. All comparisons are guest-vs-host.

However, the overhead profile should be very similar to native bhyve because:

1. The QEMU bhyve accelerator uses the same vmm.ko kernel module and the same VT-x/EPT hardware path as native bhyve
2. The vm_run() ioctl is the same — guest vCPUs execute natively on hardware in both cases
3. The main difference is device emulation: QEMU emulates devices in its own userspace (e1000, IDE, etc.) while native bhyve uses its own device models. This affects I/O-heavy workloads but not CPU/syscall benchmarks

The areas where QEMU+bhyve might differ from native bhyve:

- Disk I/O: QEMU's block layer (qcow2 + AIO) vs bhyve's direct block backend — could go either way
- Network I/O: QEMU's e1000/virtio-net emulation vs bhyve's — similar
- VM exit handling: QEMU's accelerator loop has slightly more userspace overhead per exit than bhyve's tighter loop — this explains most of the syscall/fork overhead

A proper head-to-head comparison would require running the same guest image under both native bhyve and qemu -accel bhyve with equivalent device configurations.
 
Can you test the code on your machine,make patches,open issues,propose code,etc ?

Ofc. I have 22 core 128 GB RAM machine with 15.0-RELEASE, and not a stranger to FreeBSD kernel dev. Also not a stranger to Linux kernel dev or Debian.

Are you on FreeBSD discord maybe?
 
Any chance of getting Chavali's (& later) vmm.ko merged in to support accelerated qemu? I noticed that his (dumrich) fork of freebsd-src is over 5700 commits behind freebsd's repo.

Chavali and Dumrich are the same person ? Dumrich is a Chavali's friend ? It seems there are two source code locations with different patches applied ?
 
Bhyve is nice. Qemu accelerated with bhyve is nice too. And useful. Qemu is older and more tested and structured than bhyve. Why not use all the features it offers ? In the future I want to build the passthru of the PCI devices / GPU.
 
Bhyve is nice. Qemu accelerated with bhyve is nice too. And useful. Qemu is older and more tested and structured than bhyve. Why not use all the features it offers ? In the future I want to build the passthru of the PCI devices / GPU.
Is this even possible? It would already be great to support games to, say 2006 that still depend on DirectX 9 and an old Nvidia driver. I have +200 XP CD games. I believe we actually have to abandon hardware 3d-accelleration totally. A grid of CISC cores with the right intercommunication interface can defeat this indefinately. It's a fake market based on single-unit polygon processing speed.
 
Back
Top