bhyve Current state of bhyve Nvidia passthrough?

Could someone with a nvidia gpu that causes issues on restarting the VM please test this patch and see if that fixes it?
I have VM restart issues with Nobara Linux guest on 14.2-RELEASE-p2 host using Nvidia RTX 3060. I'd be happy to help out with testing but I would need a bit more hand-holding instructions. Should I apply this patch on top of the earlier patch posted on this thread?
 
I have VM restart issues with Nobara Linux guest on 14.2-RELEASE-p2 host using Nvidia RTX 3060. I'd be happy to help out with testing but I would need a bit more hand-holding instructions. Should I apply this patch on top of the earlier patch posted on this thread?
i have a new gtx card coming in on a new HP z8 g4 desktop workstation. ill test when i get that setup
 
Could someone with a nvidia gpu that causes issues on restarting the VM please test this patch and see if that fixes it?
I tried now applying this on top of the other patches and building a world and kernel as per handbook quickstart instructions. My complete diff to the current head of releng/14.2 (ac2cbb46b5f1efa7f7b5d4eb15631337329ec5b2) is attached as nvidiadiff.txt

Sadly it does not seem to work. I ran `vm start nobara` after the reboot and it worked well. Then I shut down the VM from within the VM and ran `vm start nobara` again from host but I get no signal to my display.

The vm is alive. I have ollama server running there and I can connect to it and hear my GPU fans spin up as I perform prompts. But there is no signal to the display.

Let me know if I could help with any further testing or if I understood the assignment incorrectly.

Here are my vm config and devices from pciconf

Code:
loader="uefi"
cpu=12
cpu_sockets=1
cpu_cores=6
cpu_threads=2
memory=16400M

#graphics="yes"
#debug="yes"

ahci_device_limit="8"
network0_type="virtio-net"
network0_switch="public"
disk0_name="disk0"
disk0_dev="sparse-zvol"
disk0_type="virtio-blk"
passthru0="14/0/0=6:0"
passthru1="14/0/1=6:1"
passthru2="5/0/0=8:0"

bhyve_options="-A -H -P"

uuid="dfdcb19f-d45f-11ef-95d9-244bfe8deecc"
network0_mac="58:9c:fc:0c:d5:bc"


Code:
ppt1@pci0:14:0:0:    class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x2504 subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 [GeForce RTX 3060 Lite Hash Rate]'
    class      = display
    subclass   = VGA
ppt2@pci0:14:0:1:    class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x228e subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA

ppt0@pci0:5:0:0:    class=0x0c0330 rev=0x03 hdr=0x00 vendor=0x1912 device=0x0014 subvendor=0x1912 subdevice=0x0014
    vendor     = 'Renesas Electronics Corp.'
    device     = 'uPD720201 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
 

Attachments

There are some reports that the patch linked from the page introduced at Comment #2 here, but some others report it doesn't work, and some others reports worked if static const char bhyve_id[12] in /usr/src/sys/amd64/vmm/x86.c to some different value.

This makes me confused, but seemingly depending on what the guest OS (including versions, distros, variants) is.

I come to think that if its value alone is the culprit that makes the patch works or not, making it non-constant "variable" and allowing it to be overridden via config files and/or command line switch, with providing info which value to be set for specific OS (i.e. Centos7, Rocky Linux9, Windows10, Window11, ...) in bhyve(8) manpage would help.

But not sure it is allowed to be variable on guest side and, if it's unfortunately not, how to assure it to be constant but variable on base FreeBSD side should be investigated (as I'm not sure it is required to be immutable page or not. I'm not using bhyve).
 
Just popping in with my experience.

I'm using FreeBSD 14.2 to run a Ubuntu virtual machine and have had success with the following:

Used the patch
Code:
# cd /usr/
# rm -rf /usr/src
# git clone https://github.com/beckhoff/freebsd-src /usr/src
# cd /usr/src
# git checkout -f origin/phab/corvink/14.2/nvidia-wip
# cd /usr/src/usr.sbin/bhyve
# make && make install

I installed the following. Don't know if they're all needed, but I didn't get any conflicts

Code:
# pkg install bhyve-firmware  edk2-bhyve grub2-bhyve vm-bhyve-devel

The devices I wanted to pass through were the GPU and Mellanox card with the following pciconf info:
Code:
ppt0@pci0:7:0:0:        class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1bb1 subvendor=0x10de subdevice=0x11a3
    vendor     = 'NVIDIA Corporation'
    device     = 'GP104GL [Quadro P4000]'
    class      = display
    subclass   = VGA
ppt1@pci0:7:0:1:        class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x10f0 subvendor=0x10de subdevice=0x11a3
    vendor     = 'NVIDIA Corporation


ppt2@pci0:145:0:0:      class=0x020700 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1011 subvendor=0x15b3 subdevice=0x0179
    vendor     = 'Mellanox Technologies'
    device     = 'MT27600 [Connect-IB]'
    class      = network
    subclass   = InfiniBand


I set them to pptdevs for passthru using /boot/loader.conf :
Code:
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
vmm_load="YES"
hw.vmm.enable_vtd=1

pptdevs="145/0/0 7/0/0 7/0/1"

and for the VM I have the following config file:
Code:
loader="grub"
grub_run_partition="gpt2"
grub_run_dir="/grub"

cpu=8
custom_args="-p 4 -p 6 -p 8 -p 10 -p 12 -p 14 -p 16 -p 18"

memory=8192M
wired_memory=yes

network0_type="virtio-net"
network0_switch="public"

disk0_dev="custom"
disk0_type="ahci-hd"
disk0_name="/dev/zvol/zroot/ubuntu_vm_disk"

passthru0="7/0/0"
passthru1="7/0/1"
passthru2="145/0/0"

pptdevs="msi=on"

uuid="38c6aa07-12c7-11f0-8e5c-0894ef4d85e6"
network0_mac="58:9c:fc:0d:bb:8a"


Inside of the Ubuntu virtual machine I installed nvidida-535-drivers, rebooted and got the following:

Code:
jholloway@ubuntuvm:~$ nvidia-smi
Mon Apr  7 16:16:51 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P4000                   Off | 00000000:00:06.0 Off |                  N/A |
| 46%   35C    P8               5W / 105W |      4MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

So far it is working well. Plex running on the VM detects the card and is performing hardware transcoding as expected. It will even show the transcoding in nvidia-smi if I'm watching something on Plex at the time!

I have encountered one bug it seems where the VM didn't shutdown properly or something and the nvidia driver was unable to find the card. Even turning the VM off and on via the host didn't solve the problem. Fortunately when I rebooted the host and started the VM up again the problem had solved itself. I'm not sure what caused this bug as I have been unable to reproduce it, but I suspect it has something to do with either the Ubuntu VM being reset from either inside the VM, or by being shutoff from the command vm restart ubuntu_vm.

But aside from that hiccup, both the GPU passthrough and the Mellanox passthrough are working well.
 
Just popping in with my experience.

I'm using FreeBSD 14.2 to run a Ubuntu virtual machine and have had success with the following:

I have encountered one bug it seems where the VM didn't shutdown properly or something and the nvidia driver was unable to find the card. Even turning the VM off and on via the host didn't solve the problem. Fortunately when I rebooted the host and started the VM up again the problem had solved itself. I'm not sure what caused this bug as I have been unable to reproduce it, but I suspect it has something to do with either the Ubuntu VM being reset from either inside the VM, or by being shutoff from the command vm restart ubuntu_vm.

But aside from that hiccup, both the GPU passthrough and the Mellanox passthrough are working well.
Nice! I haven't had this problem (running with an OEM dell 3090 for over a year now) which is a bit annoying as I can't figure out whats going wrong, I saw corvin mention the issue in a presentation somewhere.

I tried now applying this on top of the other patches and building a world and kernel as per handbook quickstart instructions. My complete diff to the current head of releng/14.2 (ac2cbb46b5f1efa7f7b5d4eb15631337329ec5b2) is attached as nvidiadiff.txt

Sadly it does not seem to work. I ran `vm start nobara` after the reboot and it worked well. Then I shut down the VM from within the VM and ran `vm start nobara` again from host but I get no signal to my display.

The vm is alive. I have ollama server running there and I can connect to it and hear my GPU fans spin up as I perform prompts. But there is no signal to the display.

Let me know if I could help with any further testing or if I understood the assignment incorrectly.

Here are my vm config and devices from pciconf

Code:
loader="uefi"
cpu=12
cpu_sockets=1
cpu_cores=6
cpu_threads=2
memory=16400M

#graphics="yes"
#debug="yes"

ahci_device_limit="8"
network0_type="virtio-net"
network0_switch="public"
disk0_name="disk0"
disk0_dev="sparse-zvol"
disk0_type="virtio-blk"
passthru0="14/0/0=6:0"
passthru1="14/0/1=6:1"
passthru2="5/0/0=8:0"

bhyve_options="-A -H -P"

uuid="dfdcb19f-d45f-11ef-95d9-244bfe8deecc"
network0_mac="58:9c:fc:0c:d5:bc"


Code:
ppt1@pci0:14:0:0:    class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x2504 subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 [GeForce RTX 3060 Lite Hash Rate]'
    class      = display
    subclass   = VGA
ppt2@pci0:14:0:1:    class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x228e subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA

ppt0@pci0:5:0:0:    class=0x0c0330 rev=0x03 hdr=0x00 vendor=0x1912 device=0x0014 subvendor=0x1912 subdevice=0x0014
    vendor     = 'Renesas Electronics Corp.'
    device     = 'uPD720201 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
Did you enable the quirk with
echo 'debug.acpi.quirks="24"' >> ${DESTDIR}/boot/loader.conf

I'm gonna see if I can find another nvidia card with the issue so I can track this down.
 
I am running 14.2-RELEASE. I followed the steps in https://dflund.se/~getz/Notes/2024/freebsd-gpu/ with the branch 14.2/nvidia-wip, and replaced the signature with
'KVMKVMKVM\0\0\0'. I pinned two host cpus to guest vcpus and passthroughed the GPU. nvidia-smi works under 535, 550, and 570 drivers:

Code:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650        Off |   00000000:00:01.0 Off |                  N/A |
| 50%   33C    P8              7W /   75W |       1MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
But checking with cuda_check.cu failed.
Code:
Found 1 device(s).
Device: 0
  Name: NVIDIA GeForce GTX 1650
  Compute Capability: 7.5
  Multiprocessors: 14
  CUDA Cores: 896
  Concurrent threads: 14336
  GPU clock: 1665 MHz
  Memory clock: 4001 MHz
  cMemGetInfo failed with error code 201: invalid device context

EDIT:
1. I also checked with pytorch, the symptoms are similar to KVM vfio pci passthrough (see No process using GPU, but `CUDA error: all CUDA-capable devices are busy or unavailable`).
2. In TrueNAS Scale (probably using KVM), GPU passthrough failed until "CPU Mode" changed to "Host Mode" indicating it's related to cpu information provided by bhyve to the guest.
 
Back
Top