bhyve Current state of bhyve Nvidia passthrough?

Could someone with a nvidia gpu that causes issues on restarting the VM please test this patch and see if that fixes it?
I have VM restart issues with Nobara Linux guest on 14.2-RELEASE-p2 host using Nvidia RTX 3060. I'd be happy to help out with testing but I would need a bit more hand-holding instructions. Should I apply this patch on top of the earlier patch posted on this thread?
 
I have VM restart issues with Nobara Linux guest on 14.2-RELEASE-p2 host using Nvidia RTX 3060. I'd be happy to help out with testing but I would need a bit more hand-holding instructions. Should I apply this patch on top of the earlier patch posted on this thread?
i have a new gtx card coming in on a new HP z8 g4 desktop workstation. ill test when i get that setup
 
so far setting up bhyve is pretty gnarly. i havent gotten to tuning any gpu passthrough yet on this dell 5510 workstation
 
Could someone with a nvidia gpu that causes issues on restarting the VM please test this patch and see if that fixes it?
I tried now applying this on top of the other patches and building a world and kernel as per handbook quickstart instructions. My complete diff to the current head of releng/14.2 (ac2cbb46b5f1efa7f7b5d4eb15631337329ec5b2) is attached as nvidiadiff.txt

Sadly it does not seem to work. I ran `vm start nobara` after the reboot and it worked well. Then I shut down the VM from within the VM and ran `vm start nobara` again from host but I get no signal to my display.

The vm is alive. I have ollama server running there and I can connect to it and hear my GPU fans spin up as I perform prompts. But there is no signal to the display.

Let me know if I could help with any further testing or if I understood the assignment incorrectly.

Here are my vm config and devices from pciconf

Code:
loader="uefi"
cpu=12
cpu_sockets=1
cpu_cores=6
cpu_threads=2
memory=16400M

#graphics="yes"
#debug="yes"

ahci_device_limit="8"
network0_type="virtio-net"
network0_switch="public"
disk0_name="disk0"
disk0_dev="sparse-zvol"
disk0_type="virtio-blk"
passthru0="14/0/0=6:0"
passthru1="14/0/1=6:1"
passthru2="5/0/0=8:0"

bhyve_options="-A -H -P"

uuid="dfdcb19f-d45f-11ef-95d9-244bfe8deecc"
network0_mac="58:9c:fc:0c:d5:bc"


Code:
ppt1@pci0:14:0:0:    class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x2504 subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 [GeForce RTX 3060 Lite Hash Rate]'
    class      = display
    subclass   = VGA
ppt2@pci0:14:0:1:    class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x228e subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA

ppt0@pci0:5:0:0:    class=0x0c0330 rev=0x03 hdr=0x00 vendor=0x1912 device=0x0014 subvendor=0x1912 subdevice=0x0014
    vendor     = 'Renesas Electronics Corp.'
    device     = 'uPD720201 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
 

Attachments

There are some reports that the patch linked from the page introduced at Comment #2 here, but some others report it doesn't work, and some others reports worked if static const char bhyve_id[12] in /usr/src/sys/amd64/vmm/x86.c to some different value.

This makes me confused, but seemingly depending on what the guest OS (including versions, distros, variants) is.

I come to think that if its value alone is the culprit that makes the patch works or not, making it non-constant "variable" and allowing it to be overridden via config files and/or command line switch, with providing info which value to be set for specific OS (i.e. Centos7, Rocky Linux9, Windows10, Window11, ...) in bhyve(8) manpage would help.

But not sure it is allowed to be variable on guest side and, if it's unfortunately not, how to assure it to be constant but variable on base FreeBSD side should be investigated (as I'm not sure it is required to be immutable page or not. I'm not using bhyve).
 
Just popping in with my experience.

I'm using FreeBSD 14.2 to run a Ubuntu virtual machine and have had success with the following:

Used the patch
Code:
# cd /usr/
# rm -rf /usr/src
# git clone https://github.com/beckhoff/freebsd-src /usr/src
# cd /usr/src
# git checkout -f origin/phab/corvink/14.2/nvidia-wip
# cd /usr/src/usr.sbin/bhyve
# make && make install

I installed the following. Don't know if they're all needed, but I didn't get any conflicts

Code:
# pkg install bhyve-firmware  edk2-bhyve grub2-bhyve vm-bhyve-devel

The devices I wanted to pass through were the GPU and Mellanox card with the following pciconf info:
Code:
ppt0@pci0:7:0:0:        class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x1bb1 subvendor=0x10de subdevice=0x11a3
    vendor     = 'NVIDIA Corporation'
    device     = 'GP104GL [Quadro P4000]'
    class      = display
    subclass   = VGA
ppt1@pci0:7:0:1:        class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x10f0 subvendor=0x10de subdevice=0x11a3
    vendor     = 'NVIDIA Corporation


ppt2@pci0:145:0:0:      class=0x020700 rev=0x00 hdr=0x00 vendor=0x15b3 device=0x1011 subvendor=0x15b3 subdevice=0x0179
    vendor     = 'Mellanox Technologies'
    device     = 'MT27600 [Connect-IB]'
    class      = network
    subclass   = InfiniBand


I set them to pptdevs for passthru using /boot/loader.conf :
Code:
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
vmm_load="YES"
hw.vmm.enable_vtd=1

pptdevs="145/0/0 7/0/0 7/0/1"

and for the VM I have the following config file:
Code:
loader="grub"
grub_run_partition="gpt2"
grub_run_dir="/grub"

cpu=8
custom_args="-p 4 -p 6 -p 8 -p 10 -p 12 -p 14 -p 16 -p 18"

memory=8192M
wired_memory=yes

network0_type="virtio-net"
network0_switch="public"

disk0_dev="custom"
disk0_type="ahci-hd"
disk0_name="/dev/zvol/zroot/ubuntu_vm_disk"

passthru0="7/0/0"
passthru1="7/0/1"
passthru2="145/0/0"

pptdevs="msi=on"

uuid="38c6aa07-12c7-11f0-8e5c-0894ef4d85e6"
network0_mac="58:9c:fc:0d:bb:8a"


Inside of the Ubuntu virtual machine I installed nvidida-535-drivers, rebooted and got the following:

Code:
jholloway@ubuntuvm:~$ nvidia-smi
Mon Apr  7 16:16:51 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P4000                   Off | 00000000:00:06.0 Off |                  N/A |
| 46%   35C    P8               5W / 105W |      4MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

So far it is working well. Plex running on the VM detects the card and is performing hardware transcoding as expected. It will even show the transcoding in nvidia-smi if I'm watching something on Plex at the time!

I have encountered one bug it seems where the VM didn't shutdown properly or something and the nvidia driver was unable to find the card. Even turning the VM off and on via the host didn't solve the problem. Fortunately when I rebooted the host and started the VM up again the problem had solved itself. I'm not sure what caused this bug as I have been unable to reproduce it, but I suspect it has something to do with either the Ubuntu VM being reset from either inside the VM, or by being shutoff from the command vm restart ubuntu_vm.

But aside from that hiccup, both the GPU passthrough and the Mellanox passthrough are working well.
 
Just popping in with my experience.

I'm using FreeBSD 14.2 to run a Ubuntu virtual machine and have had success with the following:

I have encountered one bug it seems where the VM didn't shutdown properly or something and the nvidia driver was unable to find the card. Even turning the VM off and on via the host didn't solve the problem. Fortunately when I rebooted the host and started the VM up again the problem had solved itself. I'm not sure what caused this bug as I have been unable to reproduce it, but I suspect it has something to do with either the Ubuntu VM being reset from either inside the VM, or by being shutoff from the command vm restart ubuntu_vm.

But aside from that hiccup, both the GPU passthrough and the Mellanox passthrough are working well.
Nice! I haven't had this problem (running with an OEM dell 3090 for over a year now) which is a bit annoying as I can't figure out whats going wrong, I saw corvin mention the issue in a presentation somewhere.

I tried now applying this on top of the other patches and building a world and kernel as per handbook quickstart instructions. My complete diff to the current head of releng/14.2 (ac2cbb46b5f1efa7f7b5d4eb15631337329ec5b2) is attached as nvidiadiff.txt

Sadly it does not seem to work. I ran `vm start nobara` after the reboot and it worked well. Then I shut down the VM from within the VM and ran `vm start nobara` again from host but I get no signal to my display.

The vm is alive. I have ollama server running there and I can connect to it and hear my GPU fans spin up as I perform prompts. But there is no signal to the display.

Let me know if I could help with any further testing or if I understood the assignment incorrectly.

Here are my vm config and devices from pciconf

Code:
loader="uefi"
cpu=12
cpu_sockets=1
cpu_cores=6
cpu_threads=2
memory=16400M

#graphics="yes"
#debug="yes"

ahci_device_limit="8"
network0_type="virtio-net"
network0_switch="public"
disk0_name="disk0"
disk0_dev="sparse-zvol"
disk0_type="virtio-blk"
passthru0="14/0/0=6:0"
passthru1="14/0/1=6:1"
passthru2="5/0/0=8:0"

bhyve_options="-A -H -P"

uuid="dfdcb19f-d45f-11ef-95d9-244bfe8deecc"
network0_mac="58:9c:fc:0c:d5:bc"


Code:
ppt1@pci0:14:0:0:    class=0x030000 rev=0xa1 hdr=0x00 vendor=0x10de device=0x2504 subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 [GeForce RTX 3060 Lite Hash Rate]'
    class      = display
    subclass   = VGA
ppt2@pci0:14:0:1:    class=0x040300 rev=0xa1 hdr=0x00 vendor=0x10de device=0x228e subvendor=0x1043 subdevice=0x881d
    vendor     = 'NVIDIA Corporation'
    device     = 'GA106 High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA

ppt0@pci0:5:0:0:    class=0x0c0330 rev=0x03 hdr=0x00 vendor=0x1912 device=0x0014 subvendor=0x1912 subdevice=0x0014
    vendor     = 'Renesas Electronics Corp.'
    device     = 'uPD720201 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
Did you enable the quirk with
echo 'debug.acpi.quirks="24"' >> ${DESTDIR}/boot/loader.conf

I'm gonna see if I can find another nvidia card with the issue so I can track this down.
 
I am running 14.2-RELEASE. I followed the steps in https://dflund.se/~getz/Notes/2024/freebsd-gpu/ with the branch 14.2/nvidia-wip, and replaced the signature with
'KVMKVMKVM\0\0\0'. I pinned two host cpus to guest vcpus and passthroughed the GPU. nvidia-smi works under 535, 550, and 570 drivers:

Code:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650        Off |   00000000:00:01.0 Off |                  N/A |
| 50%   33C    P8              7W /   75W |       1MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                       
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
But checking with cuda_check.cu failed.
Code:
Found 1 device(s).
Device: 0
  Name: NVIDIA GeForce GTX 1650
  Compute Capability: 7.5
  Multiprocessors: 14
  CUDA Cores: 896
  Concurrent threads: 14336
  GPU clock: 1665 MHz
  Memory clock: 4001 MHz
  cMemGetInfo failed with error code 201: invalid device context

EDIT:
1. I also checked with pytorch, the symptoms are similar to KVM vfio pci passthrough (see No process using GPU, but `CUDA error: all CUDA-capable devices are busy or unavailable`).
2. In TrueNAS Scale (probably using KVM), GPU passthrough failed until "CPU Mode" changed to "Host Mode" indicating it's related to cpu information provided by bhyve to the guest.
 
Hello Mario, thanks for the information.

I'll let you know that at first I'd deleted my /usr/src folder, so i had to start off by runing:

> git clone https://github.com/beckhoff/freebsd-src /usr/src

After that, your script worked fine, before it said there was no git repository at the location.

I can confirm that it worked and helped me get to the graphical desktop in my MX Linux (debian 12/bookworm based distro w/o systemd).


One thing that I find puzzling is that, once I updated to 14.3, I could still boot my windows 10 VM with NVIDIA passthrough to the desktop, although it seemed to be crashing after a few minutes. When I tried to boot another VM with mx linux it showed the IRQ errors.

Right now (once the patched bhyve from 14.2 is installed), it shows error 43 as seen from the vnc connection, before it ran fine with video output on the attached monitor and via sunshine.

Is it possible that i still have to include the fix with the KVMKVMKVM string in my case? since I use both windows and linux VMs?

Finally, thanks for documenting your work over the past years, I've found a lot of your posts very useful.

Daniel

Edit: Mario, I'm not sure using the 14.2 branch with 14.3 is recommendable, I ended up with a 14.2 kernel w/14.3 userland, which I'm fairly sure is not supported. I did a freebsd-update fetch + install cycle, which landed me at 14.2-RELEASE-p3.

Any clue when a 14.3 branch is to land? or how much of a bad idea it would be to track 15.0-CURRENT for this?

Are these patches planned for incorporation into the mainline by the time of 15.0-RELEASE?
 
One thing that I find puzzling is that, once I updated to 14.3, I could still boot my windows 10 VM with NVIDIA passthrough to the desktop, although it seemed to be crashing after a few minutes. When I tried to boot another VM with mx linux it showed the IRQ errors.

Right now (once the patched bhyve from 14.2 is installed), it shows error 43 as seen from the vnc connection, before it ran fine with video output on the attached monitor and via sunshine.

You should change the string inside the file x86.c otherwise Windows does not accept the nvidia GPU,reporting error 43.

You can backup some important bhyve files (vmm.ko ; bhyve*) that you have compiled for bhyve 14.2,so when you will upgrade 14.2 to 14.3,you will exchange the new bhyve file produced with the older ones. I don't think Corvin will rebase his patches for 14.3.
 
I have encountered one bug it seems where the VM didn't shutdown properly or something and the nvidia driver was unable to find the card. Even turning the VM off and on via the host didn't solve the problem. Fortunately when I rebooted the host and started the VM up again the problem had solved itself. I'm not sure what caused this bug as I have been unable to reproduce it, but I suspect it has something to do with either the Ubuntu VM being reset from either inside the VM, or by being shutoff from the command vm restart ubuntu_vm.

But aside from that hiccup, both the GPU passthrough and the Mellanox passthrough are working well.
I encountered exactly the same problem with Ubuntu guest, after changing to Debian I can restart VM without any issues.
 
Put all the files in one directory,change the extension from txt to sh and then run :

Code:
./build_branch.sh origin/phab/corvink/14.2/nvidia-wip --without-bhf --verbose
Hi ZioMario, are there any other steps needed after running your scripts. I have a VM working with the Nvidia passthrough. I even installed Plex and I am able to select the graphics card for hardware transcoding. But, the vm seems to lock up across all CPU's. I am getting watchdog: BUG: soft lockup - CPU#1 stuck for 839s. When I start transcoding a video I run nvidia-smi and it shows that the graphics card isn't in use. It is also very slow at transcoding and then locks up. I have a Tesla P4 graphics card. Please let me know any logs I should look into. Thanks
 
Hi ZioMario, are there any other steps needed after running your scripts. I have a VM working with the Nvidia passthrough. I even installed Plex and I am able to select the graphics card for hardware transcoding. But, the vm seems to lock up across all CPU's. I am getting watchdog: BUG: soft lockup - CPU#1 stuck for 839s. When I start transcoding a video I run nvidia-smi and it shows that the graphics card isn't in use. It is also very slow at transcoding and then locks up. I have a Tesla P4 graphics card. Please let me know any logs I should look into. Thanks

I have no idea. I suggest you to open a bug report,since it seems there is a bug.
 
I encountered exactly the same problem with Ubuntu guest, after changing to Debian I can restart VM without any issues.
That's interesting news!

@all: Do you use a GPU ROM file or is this not needed for your card ?
i use a GT1030, checked now it works without ROM file with my ubuntu-vm, NVidia driver 570. nvidia-smi ok and cuda_check.cu is ok.

i did a 'reboot' from inside the vm, i had to start it via the host.. and the NVidia dvice is not useable.. Tried then with ROM, same result. So the ROM has nothing to do with the vm-restart-bug. Maybe..(?)
So i have to stick with the procedure i got used to:"power down host and restart..."
 
Tried then with ROM, same result. So the ROM has nothing to do with the vm-restart-bug. Maybe..(?)
GT1030 is, IIUC, Pascal generation of GPU which doesn't have GSP (GPU System Processor) in it. And the firmware image files are only for GSP. So shouldn' affect in your case (nowhere to be transfered into GPU itself, just loaded into OS side of memory as a dummy kernel module).

GSP are incorporated in Tesla generation and later only.
 
GT1030 is, IIUC, Pascal generation of GPU which doesn't have GSP (GPU System Processor) in it. And the firmware image files are only for GSP. So shouldn' affect in your case (nowhere to be transfered into GPU itself, just loaded into OS side of memory as a dummy kernel module).

GSP are incorporated in Tesla generation and later only.

Corvin said multiple times that ROM should not be used.
 
If having GSP firmware loaded harms passthrough of GPU but without loading GSP firmware harms X11 to work, it should be unfortunate for users having a GPU having GSP for local display and one or more GPUs having GSP for guests by passing through.
AFAIK, there's no option (at loading time) to control loading GSP firmware or not per-GPU basis. If GPU for local doesn't have GSP (like my Quadro P1000), it's safe not to load GSP firmware.
 
I have no idea. I suggest you to open a bug report,since it seems there is a bug.
I mean there is no step in your script that installs the kernel or builds world. So I have a feeling I am doing something wrong. I have never created a custom kernel before so I am new to this. Thanks.
 
GT1030 is, IIUC, Pascal generation of GPU which doesn't have GSP (GPU System Processor) in it. And the firmware image files are only for GSP. So shouldn' affect in your case (nowhere to be transfered into GPU itself, just loaded into OS side of memory as a dummy kernel module).

GSP are incorporated in Tesla generation and later only.
T-Aoki, I have a Tesla P4 graphics card do you know if I need these firmware image files with this card?
 
Back
Top