10.3-BETA 2 zroot doesn't boot with VT-d enabled and vmm in loader.conf

dehrmann

Member

Reaction score: 2
Messages: 43

I installed FreeBSD 10.3-BETA2 on a system with a Z170 chipset and Skylake CPU on a ZFS root device. When I have both VT-d enabled in the bios and vmm (for bhyve) enabled in loader.conf, sometime during the boot, there's an unrecoverable error with my SATA hard drive, it's reattched, but the boot was interrupted, and the boot loader can't find the ZFS root. If either VT-d is disabled or vmm isn't enabled in loader.conf, the system boots fine.

Here's a screenshot. The interesting part is probably
Code:
CAM status: CCB request was invalid
Error 22. Unretryable error.
<disk> detached
Cannot find the pool label for 'zroot'
Mounting from zfs:zroot/ROOT/default failed with error 5.


dVvBNjz.png
 

tobik@

Daemon
Developer

Reaction score: 1,419
Messages: 1,909

Maybe loading vmm from /etc/rc.conf (i.e. later) helps:
Code:
kld_list="vmm"
 
OP
D

dehrmann

Member

Reaction score: 2
Messages: 43

It might (actually, it almost has to). I'll give it a try. It's probably The Right Thing® to do, anyway:
Code:
    kld_list     (str) A list of kernel    modules    to load    right after the    local
         disks are mounted.  Loading modules at    this point in the boot
         process is much faster    than doing it via /boot/loader.conf
         for those modules not necessary for mounting local disk.

rc.conf(5).
 
OP
D

dehrmann

Member

Reaction score: 2
Messages: 43

It booted. Now I need to see if bhyve still starts.

But still, specifying it in loader.conf shouldn't cause this problem.
 
OP
D

dehrmann

Member

Reaction score: 2
Messages: 43

It works if I don't user PCI passthrough. Once I try PCI passthrough, bhyve exits with an exit code of 1.

The PCI passthrough docs say
Set up vmm.ko to be preloaded at boot-time.
  • edit /boot/loader.conf and put in a line
Code:
vmm_load="YES"

So maybe it really does need to be in /boot/loader.conf.
 
OP
D

dehrmann

Member

Reaction score: 2
Messages: 43

It turns out removing pptdevs="..." also prevents the problem, but I'm not sure if that PCI device is the culprit, or if having it at all triggers some code path. I'll try a different device.

It looks like my problem is an interrupt storm that triggers a a PCI controller reset (and that causes failures with the SSD).

Code:
root monut waiting for: usbus1 usbus0
uhub1: 4 ports with 4 removable, self powered
uhub0: 26 ports with 24 removable, self powered
root monut waiting for: usbus0
pci_interrupt: host controller halted
pci_interrupt: host controller halted
...

I have the IRQ of the device, so maybe I'll be able to see which might be causing this.
 
OP
D

dehrmann

Member

Reaction score: 2
Messages: 43

The issue still exists on RC1, and the root fs type doesn't matter; it also happens with a UFS root.
 
Top