bhyve Booted Debian and FreeBSD 15 using qemu accelerated with bhyve/vmm for the first time. This is an epic milestone for the community !

memory bandwidth seems odd. My host is FreeBSD 15.0 Release and I get 18.3 GB/s. In my bhyve guests FreeBSD 15.0 Release reaches 17.1 GB/s and Ubuntu 24.04 12 GB/s.
 
I will repeat the tests when I have enabled SMP on FreeBSD. At the moment it seems that this task is complicated to achieve. When I say to qemu/bhyve to use more than 1 CPU,it crashes every time.
 
I'm also nostalgic. I've been enjoying qemu/kvm since the 1990s. I cut my teeth on it. Virtualization has always been my favorite thing to do, first on Linux. Then on FreeBSD. I also owe the Nvidia GPU passthru feature to me, working behind the scenes with Corvin.
Qemu was released in 2003 and kvm in 2006 according to Wikipedia.
 
I miss the goal.
You made it because you can get HW acceleration only through QEMU? Can't you?

QEMU has many features that Bhyve does not have.

For example with QEMU you can change ISO on the fly:
- https://www.linux-kvm.org/page/Change_cdrom

You may even connect USB devices over IP network from other computers with QEMU:
- https://zachary.com/freebsd-usbip/
- https://freshports.org/net/usbredir/

You can also passthru single USB device with QEMU - but only entire USB controller (and must support MSI/MSI-X interrupts to work) on Bhyve.

So I welcome having Bhyve acceleration for QEMU as a VERY WANTED feature :)

Hope that helps.

Regards,
vermaden
 
---> You can also passthru single USB device with QEMU

thanks to this feature,I'm working on a parallel project that will fix once for all the problems that FreeBSD has with most (if not all) BT chipsets. I had an idea and I'm developing it. Because Qemu allows me to do it thanks to libusb.
 
With QEMU on top, bhyve will gain :
  • advanced BIOS/UEFI
  • modern virtual device
  • sophisticated PCI topology
  • evoluted virtio
  • a serious USB emulation
  • advanced NVMe
  • advanced snapshot
  • migration
  • advanced block layer
  • complete qcow2
  • a mature SPICE/VNC
  • vfio-like orchestration
  • complete libvirt
Compatibility with :
  • appliance cloud
  • OpenStack images
  • Proxmox images
  • libvirt images
  • tooling CI/CD
  • Kubernetes virtualization
  • Terraform
  • Packer
Qemu already has :
  • virtio-gpu
  • virglrenderer
  • rutabaga
  • gfxstream
  • vhost-user-gpu

How many of these features are also offered by bhyve ? it is useful to know this,for further development. The idea might be to not invest time and money on bhyve for those features that are offered by qemu,but only for its stabilization (of bhyve).
 
Love it 💪.... Keep at it ZioMario all of your post regarding Bhyve passthru through the past 2-5 years got me excited to game again...

Much appreciation, I'll keep lurking around through your efforts help out when I can.
 
There may be security reasons bhyve is not implementing some things that qemu might.

More than for security reasons, I think development is slow by design and that there are few developers capable of working there 'cause FreeBSD cannot boast a large number of developers working on it, generally speaking, as it does not have the same social penetration capacity that Linux has. Anyway,security measures can be implemented, right ?
 
There may be security reasons bhyve is not implementing some things that qemu might.
also: qemu tries to support all kinds of hardware and tons of features. bhyve is slick and does it's job for 95% of standard usecases. bhyves codebase is much smaller, and thus bears far fewer risks and technical debt.
 
bhyve is slick and does it's job for 95% of standard usecases.

Standard yes. 95% should be decreased. Above I have listed some functions not offered by the slick bhyve that makes the 95% decrease until 70% or even less. The average users are only a part of the equation.
 
ZioMario , your signature says "System admins could be defined,somehow, as OS psychologists. Infact in the future a specific OS will be integrated in the human brains,so there will be no difference between fixing an OS or a human brain anymore. As system admins / psychologists we are just ahead the times that will come."

Is that your personal hell pipe dream? Because that's not happening with smart people who know how fallible OSes are and have always been. So unless you want your brain hacked and violated in some nefarious ways, you should not be a proponent of any such fantasy.

Moreover, 95% human and 5% artificial OS is not really human anymore. It's someone's strong desire to make that a sliding scale and single-handedly redefine what it means to be human, but it is not human to not be human to some extent. There are some fundamental reasons why that's so...it's reductive to give up some percent of humanity, to say the least. Even 0.05% non-human has unfathomable effects of profound complexity.

And then you'll have these reductive non-humans trying to run things and make decisions about what's good for humans, having nothing to do with being human anymore. I bet some people will want to do it secretly so that there's no debate about it...and do it slowly because incremental change is harder to identify and easier to shove down people's throat. But the end result will be immediately "the machines have taken over."

The machines may have already taken over, if you analyze what you see around you. How would one know for sure?
 
Enabled the nVidia passthru to a Linux vm :

 
UPDATE : I have enabled the SMP support for FreeBSD guest OS,virtualized with qemu-bhyve and I've launched some benchmarks.

System:

CPU : Intel Core i9-9900K (8C/16T, base 3.6 GHz)
Host OS : FreeBSD 16.0-CURRENT (amd64)
Guest OS: FreeBSD 15.0-RELEASE (amd64)
VM cfg : QEMU + bhyve accelerator, 8 vCPU, 4 GB RAM, virtio-blk disk
Date : 2026-06-09


Screenshot_2026-06-09_21-45-31.jpg


==============================================================================
ANALYSIS AND NOTES
==============================================================================

1. CPU INTEGER (+8%)
The overhead is minimal because integer instructions execute directly on
hardware without hypervisor interception. VT-x/EPT does not introduce
overhead for normal user/kernel instructions. The slight slowdown comes
from additional context switching by the host scheduler, which multiplexes
guest vCPUs onto physical hardware threads.

2. SYSCALL getpid() (~4x)
Every syscall in the guest triggers a VM exit into bhyve, which handles it
and re-enters the guest. The VM exit/entry round-trip on modern hardware
costs approximately 100-200 ns (VMLAUNCH/VMRESUME + context save/restore).
On the native host, getpid() is an extremely fast syscall (~46 ns), often
further optimized via vDSO/vsyscall. Inside the guest this shortcut does
not exist: every getpid() crosses the guest/hypervisor boundary.
Result: 46 ns -> 182 ns (+136 ns fixed overhead per VM exit).

3. FORK()+WAIT (~66x)
The most striking result. Fork inside a VM is expensive because:
a) The fork() syscall must copy the process page tables and mark all pages
as Copy-on-Write. With EPT enabled, this requires INVEPT/INVVPID
operations (extended TLB invalidation), which are privileged instructions
that cause additional VM exits.
b) The newly created child process must be scheduled on a guest vCPU by the
guest scheduler, which must then itself be scheduled by the host
scheduler. This two-level scheduling introduces significant latency.
c) Process creation inside the guest involves manipulation of kernel
structures (pmap, vmspace) that trigger numerous EPT page faults.
87 us (native) -> 5728 us (VM) reflects the real cost of virtualization
for process-intensive workloads.

4. MEMORY: dd /dev/zero (same)
Host and guest bandwidth are nearly equal because dd /dev/zero -> /dev/null
measures how fast the kernel fills memory buffers with zeros (memset speed).
256 MB far exceeds the L3 cache (16 MB on i9-9900K), so this measures real
DRAM bandwidth. The guest accesses its own physical RAM (actual DRAM mapped
by EPT) via the same hardware path as the host for sequential accesses.
EPT overhead for sequential access is negligible because the hardware
prefetcher covers EPT TLB misses before they stall the pipeline.

5. MEMORY: sysbench (-36% write, -48% read, -41/-59% at 8 threads)
sysbench memory uses malloc()+memmove() in a tight loop with many
random-ish accesses. This stresses the TLB at two levels simultaneously:
- Guest page table (guest virtual -> guest physical)
- EPT (guest physical -> host physical)
A TLB miss in the guest requires an Extended Page Table Walk that can touch
up to 24 memory addresses instead of the 4 required by a native page walk
(4 host levels x 4 guest levels = up to 16 accesses, plus overhead).
Read overhead is higher than write because memmove reads before writing,
amplifying the miss penalty.
With 8 threads the percentage overhead is larger (-59% read) because
contention on EPT structures increases with more vCPUs.

6. DISK: write+fsync (~19x)
Large but expected overhead for virtual I/O:
- Each guest write() generates a virtio-blk request
- bhyve in the host kernel processes the request and writes to the image file
- Guest fsync() translates to fdatasync() on the host image file
- Each operation requires multiple VM exit/entry round-trips
The host disk is a raw file on a UFS filesystem. On the native host, fsync()
goes directly to the NVMe controller. Inside the guest the path is:
guest fsync -> virtio-blk -> bhyve -> host UFS -> NVMe.
413 MB/s (native) vs 22 MB/s (VM) shows the full cost of I/O layering.

7. DISK: cached read (guest faster)
The guest reads /tmp/tf from its own page cache (guest RAM, which is already
host RAM). The host reads the same file through the host UFS page cache.
Both are fully cached, but the guest read path is shorter: guest VFS ->
guest page cache -> done, without traversing the virtio layer because the
data is already in the buffer cache. This explains the slightly higher
throughput on the guest side.

==============================================================================
CONCLUSIONS
==============================================================================

Workloads with negligible virtualization overhead:
- Pure integer compute: -8%, essentially transparent
- Large sequential memory: same as native

Workloads with moderate overhead:
- Syscall-heavy code: ~4x (fixed ~136 ns per VM exit)
- Memory bandwidth (alloc): -35% to -48% (EPT TLB miss penalty)
- Memory bandwidth (MT): -40% to -59%

Workloads with severe overhead (avoid in VM for performance-critical use):
- fork() / process creation: ~66x (EPT invalidation + double scheduling)
- Synchronous disk I/O (fsync): ~19x (virtio layering + VM exit per I/O)

Summary: QEMU+bhyve is transparent for pure compute but introduces significant
overhead for anything that crosses the guest/hypervisor boundary: frequent
syscalls, process creation, synchronous disk I/O, and high-frequency memory
allocation patterns. The ideal workload for this configuration is compute-bound
with sequential memory access and asynchronous or in-RAM I/O.

==============================================================================
 
Back
Top