PCI Passthrough of VFs Crash Host Card if Jails + Bhyve in Use

I've been dealing with an issue for about 6 months now. Curious if anyone has ideas how to expand troubleshooting.

Summary of issue:
  • Use any SR-IOV capable network card on a Supermicro motherboard
  • Enable SR-IOV
  • Create only vnet jails using SR-IOV VFs: works great
  • Create only bhyve VMs using SR-IOV VFs: works great
  • Create BOTH vnet jails AND bhyve VMs using SR-IOV VFs: works for about 3 minutes after starting the 2nd thing, then the host card network functionality stops. (By "2nd thing" I mean if you start the jails and also have a VF passed through to bhyve that's not started, it will work; but once you start the bhyve VM and the VF gets the driver assigned, the clock is ticking until failure.)
  • Has existed in 13.1-RELEASE through 14-STABLE (possibly earlier, that's just when I noticed it).
  • I have submitted a bug (273372) [1]
  • I have written up exactly what I do to trigger the bug [2]
Things I've tried:
  • Different motherboards and processors (all Intel + Supermicro)
  • Different NICs (Intel, Chelsio, and Mellanox)
  • Different PCIe slots
  • Various combinations of motherboard settings that have anything to do with IOMMU
My guess is it's something to do with IOMMU and how the device is passed through. The drivers work fine. SR-IOV works fine. The failure seems to be something "filling up" or "spilling over," i.e., the fairly consistent amount of time until failure. I'm wondering if anyone has any advice on how to troubleshoot this further.

References:
1: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273372
2: https://markmcb.com/freebsd/vs_linux/sriov_is_first_class/
 
Hey Mark,

I don't have an answer to the problem you are encountering, but I did set something up very similar to you that is working.
I happened to be looking at how to accomplish transparent vlans before stumbling upon your post. Surely unrelated, but wanted to follow up regardless.

I've the got the Intel X710 hooked up to a ASRock X470D4U: https://www.asrockrack.com/general/productdetail.asp?Model=X470D4U

I'm currently only using one of the two ports on the X710. I create 64 VF's assigning half of them for use w/ bhyve passthru and the other half for use in jails.

From bhyve:
# uptime ; ping -c 1 192.168.35.1
5:56AM up 16 mins, 1 user, load averages: 0.28, 0.37, 0.25
PING 192.168.35.1 (192.168.35.1): 56 data bytes
64 bytes from 192.168.35.1: icmp_seq=0 ttl=64 time=0.227 ms

From a jail:
# uptime; ping -c 1 192.168.35.1
5:56AM up 7:37, 0 users, load averages: 0.88, 0.97, 0.97
PING 192.168.35.1 (192.168.35.1): 56 data bytes
64 bytes from 192.168.35.1: icmp_seq=0 ttl=64 time=0.084 ms

Now, I know means very little to your problem, but I wanted to at least provide the board I'm using w/ success.
 
I've the got the Intel X710 hooked up to a ASRock X470D4U: https://www.asrockrack.com/general/productdetail.asp?Model=X470D4U
Now, I know means very little to your problem, but I wanted to at least provide the board I'm using w/ success.
Thanks! This is actually quite insightful. All cases I'm aware of where no issues exist have been in conjunction with AMD processors and relevant motherboards. I don't have enough data to be conclusive, but it would seem a likely root issue is related to something Intel specific.
 
Curious whether this boils down to the same/similar underlying issue:
 
Back
Top