qemu Qemu (kvm) and shared IRQ for dual Network Interfaces

Edit: link to workaround



I have a (to me) strange issue with either the drivers in FreeBSD or my assumptions about PCIe and the PCI hierarchy.
Using Qemu, I have the following structure:


qemu-system-x86_64 \
-cpu host \
-enable-kvm \
-machine q35,accel=kvm \
-device intel-iommu \
-m 4096 \
-display none \
-qmp unix:/tmp/pfsense.qmp,server,nowait \
-monitor unix:/tmp/pfsense.monitor,server,nowait \
-drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
-drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_VARS.fd \
-spice port=5930,disable-ticketing \
-device pxb-pcie,id=pcie.1,bus_nr=1 \
\
-device pcie-root-port,bus=pcie.1,id=root_port1,slot=0,chassis=0x1,addr=0x5 \
-device e1000,mac=00:00:00:41:7f:01,id=network0.0,netdev=network0,bus=root_port1,bootindex=5 \
-netdev tap,ifname=tap1,id=network0,script=no,downscript=no \
\
-device pcie-root-port,bus=pcie.1,id=root_port2,slot=1,chassis=0x2,addr=0x10 \
-device e1000,mac=00:00:00:41:7f:02,id=network1.0,netdev=network1,bus=root_port2,bootindex=6 \
-netdev tap,ifname=tap2,id=network1,script=no,downscript=no \
\
-device ide-hd,drive=hdd0,bus=ide.0,id=scsi0,bootindex=1 \
-drive file=./pfsense.qcow2,if=none,format=qcow2,discard=unmap,aio=native,cache=none,id=hdd0 \
-device ide-cd,drive=cdrom0,bus=ide.1,id=scsi1,bootindex=2 \
-drive file=./pfSense-CE-2.5.0-DEVELOPMENT-amd64-20201221-0250.iso,media=cdrom,if=none,format=raw,cache=none,id=cdrom0


Tl;dr would be:

Code:
pcie.1
   |-- root_port1
   |          |-- tap1
   |-- root_port2
   |          |-- tap2
   |-- AHCI
           |-- hdd0
           |-- cdrom0

The issue presents itself with:

Code:
interrupt storm detected on "irq10:"; throttling interrupt source
interrupt storm detected on "irq10:"; throttling interrupt source
interrupt storm detected on "irq10:"; throttling interrupt source
interrupt storm detected on "irq10:"; throttling interrupt source

Running vmstat -i it will give me:

Code:
interrupt                                 total         rate
irq1: atkbd0                              72             0
irq10: em0: irq0+++        2256461       1826
irq16: ahci                           13151           11
cpu0: timer                          59774           48
Total                                2329458       1885

Looking at the dmesg it'll show:
Code:
pcib2: <PCI-PCI bridge> mem 0xc1641000-0xc1641fff irq 10 at device 5.0 on pci1
em0: <Intel(R) PRO/1000 Network Connection> port 0x8000-0x803f mem 0xc1400000-0xc141ffff irq 10 at device 0.0 on pci2
pcib3: <PCI-PCI bridge> mem 0xc1640000-0xc1640fff irq 10 at device 16.0 on pci1
em1: <Intel(R) PRO/1000 Network Connection> port 0x7000-0x703f mem 0xc1200000-0xc121ffff irq 10 at device 0.0 on pci3

And pciconf -lv shows:
Code:
em0@pci0:2:0:0: class=0x020000 card=0x11001af4 chip=0x100e8086 rev=0x03 hdr=0x00
em1@pci0:3:0:0: class=0x020000 card=0x11001af4 chip=0x100e8086 rev=0x03 hdr=0x00

I'm out on very deep water here, and no one that I know of knows why/how the NIC's end up on the same IRQ.
Does anyone know where to begin debugging this or what's going on, I would prefer to be able to have multiple PCIe network cards on the same PCIe bus without having to reconfigure the machine manually each time, so I'm guessing there's some Qemu specifics I could give it in order for FreeBSD to detect and map the devices correctly?

I've also tried x3130-upstream devices to create a switch, which caused the same issue. Having them on separate PCIe Busses works, for instance if card 1 is on pcie.0 and card 2 is on pcie.1. But since I can only add one additional Root Bus, it's not ideal as a solution.

Here's a overview of an earlier version with just one NIC to get a sense visually for how everything is tied together (or the goal at least):
kvm_diagram.png


Thanks in advance!
 
I tried using the virtio-net-pci which the installer could not detect and aborted with, "pfsense needs at least one network card".

But you're saying that should work?
 
Apologies, was a bit on the run when I wrote the last message.
What I meant (between the lines) was that FreeBSD does not detect any network cards if I use virtio-net-pci and IIRC they were only 100Mbit/s cards as well. Although that last statement may be inaccurate but can't verify because wiki.qemu.org appears down for the moment. Where as e1000 is a 1Gbit/s card, which is what I would like :)

I'll double-verify this tho.
 
FreeBSD supports virtio(4)/vtnet(4) devices, no need for the em(4) emulated devices. They've been part of the GENERIC kernel for quite some time now.
Yes so, just to verify.
Using this setup:
Code:
qemu-system-x86_64 \
    -cpu Nehalem \
    -enable-kvm \
    -machine q35,accel=kvm \
    -device intel-iommu \
    -display none \
    -qmp unix:/tmp/iFirewall.qmp,server,nowait \
    -monitor unix:/tmp/iFirewall.monitor,server,nowait  \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd  \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_VARS.fd \
    -m 4096 \
    -spice port=5930,disable-ticketing \
    -device pxb-pcie,id=pcie.1,bus_nr=1\
    \
        -device pcie-root-port,bus=pcie.1,id=root_port1,slot=0,chassis=0x1,addr=0x5 \
            -device virtio-net-pci,mac=a8:73:ea:41:7f:01,id=network0.0,netdev=network0,bus=root_port1,bootindex=5 \
                -netdev tap,ifname=tap1,id=network0,script=no,downscript=no \
        \
        -device pcie-root-port,bus=pcie.1,id=root_port2,slot=1,chassis=0x2,addr=0x10 \
            -device virtio-net-pci,mac=a8:73:ea:41:7f:02,id=network1.0,netdev=network1,bus=root_port2,bootindex=6 \
                -netdev tap,ifname=tap2,id=network1,script=no,downscript=no \
    \
    -device ide-hd,drive=hdd0,bus=ide.0,id=scsi0,bootindex=1 \
        -drive file=./storage/machines/testmachine/machine.qcow2,if=none,format=qcow2,discard=unmap,aio=native,cache=none,id=hdd0 \
    -device ide-cd,drive=cdrom0,bus=ide.1,id=scsi1,bootindex=2 \
        -drive file=./storage/isos/pfSense-CE-2.5.0-DEVELOPMENT-amd64-20210104-0250.iso,media=cdrom,if=none,format=raw,cache=none,id=cdrom0

I'm getting:
Code:
pfSense cannot continue without at least one Network Interface Card.

And dmesg says: pci3: <network, ethernet> at device 0.0 (no driver attached)

2021-01-07-184111_2145x1704_scrot.png


I can't find many other drivers that are VirtIO net drivers, are there any other than virtio-net-pci that you're referring to?


Edit: Found that the virtio-net-pci supprts 1Gb/s, so that's nice, if I could get it working.
 
I have no idea what's possible on KVM, but the VPS I have also runs on KVM, so it's certainly possible:
Code:
Hypervisor: Origin = "KVMKVMKVM"
{...}
virtio_pci0: <VirtIO PCI Network adapter> port 0xc160-0xc17f mem 0xfebd1000-0xfebd1fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
{...}
vtblk0: <VirtIO Block Adapter> on virtio_pci1
vtblk0: 153600MB (314572800 512 byte sectors)
virtio_pci2: <VirtIO PCI Balloon adapter> port 0xc180-0xc19f irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci2
vtnet(4) is typically a 10Gbit interface:
Code:
vtnet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6c04bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        ether <mac-address>
        inet <ip> netmask 0xffffff00 broadcast <broadcast>
        inet6 <link-local>%vtnet0 prefixlen 64 scopeid 0x1
        inet6 <globalip6> prefixlen 48
        media: Ethernet 10Gbase-T <full-duplex>
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

But this on a FreeBSD machine, I have no idea what pfSense supports or not. pfSense is a heavily modified FreeBSD derivative.
 
I have no idea what's possible on KVM, but the VPS I have also runs on KVM, so it's certainly possible:
Code:
Hypervisor: Origin = "KVMKVMKVM"
{...}
virtio_pci0: <VirtIO PCI Network adapter> port 0xc160-0xc17f mem 0xfebd1000-0xfebd1fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
{...}
vtblk0: <VirtIO Block Adapter> on virtio_pci1
vtblk0: 153600MB (314572800 512 byte sectors)
virtio_pci2: <VirtIO PCI Balloon adapter> port 0xc180-0xc19f irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci2
vtnet(4) is typically a 10Gbit interface:
Code:
vtnet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6c04bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        ether <mac-address>
        inet <ip> netmask 0xffffff00 broadcast <broadcast>
        inet6 <link-local>%vtnet0 prefixlen 64 scopeid 0x1
        inet6 <globalip6> prefixlen 48
        media: Ethernet 10Gbase-T <full-duplex>
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

But this on a FreeBSD machine, I have no idea what pfSense supports or not. pfSense is a heavily modified FreeBSD derivative.
Thanks for the feedback, any help is greatly appreciated!
This might boil down to a difference between pfSense and FreeBSD so I'll have to try a vanilla FreeBSD.

But the really peculiar thing is that if I swap one of the network cards ( e1000) to the bus pcie.0 meaning they're on separate busses, the IRQ's collisions seam to dissapear. There's some other slight issues with the networking tho, such as the DHCP requests not going out in the physical world correctly, but at least the IRQ issue kinda disappears. All be it at the cost of prolonging the inevitable, which is if I add another card to any of these buses the issue comes back.

Here's a "working" one:
Code:
qemu-system-x86_64 \
    -enable-kvm \
    -machine q35,accel=kvm \
    -device intel-iommu \
    -cpu Nehalem \
    -m 4096 \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_VARS.fd  \
    \
    -device pxb-pcie,id=pcie.1,bus_nr=1 \
        -device pcie-root-port,bus=pcie.1,id=root_port1,slot=0 \
            -device e1000,mac=00:00:00:00:00:02,id=network0.0,netdev=network0,bus=root_port1 \
                -netdev tap,ifname=tap1,id=network0,script=no,downscript=no \
    -device e1000,mac=00:00:00:00:00:01,id=network1.0,netdev=network1,bus=pcie.0 \
        -netdev tap,ifname=tap2,id=network1,script=no,downscript=no \
    \
    -device ide-hd,drive=hdd0,bus=ide.0,id=scsi0,bootindex=1 \
        -drive file=./storage/machines/testmachine/machine.qcow2,if=none,format=qcow2,discard=unmap,aio=native,cache=none,id=hdd0 \
    -device ide-cd,drive=cdrom0,bus=ide.1,id=scsi1,bootindex=2 \
        -drive file=./storage/isos/pfSense-CE-2.5.0-DEVELOPMENT-amd64-20210104-0250.iso,media=cdrom,if=none,format=raw,cache=none,id=cdrom0

*(Just note that the MAC/tap# change due to the order of which they get assigned as em0 or em1 in this case)*.
 
Final note regarding pfsense before I try the vanilla one.
Is that if I set up both e1000 interfaces to be on the bus pcie.0, it works.

So this is an issue regarding having multiple PCIe busses and/or root ports assigned with devices under them.
This for instance, works:

Code:
qemu-system-x86_64 \
    -enable-kvm \
    -machine q35,accel=kvm \
    -device intel-iommu \
    -cpu Nehalem \
    -m 4096 \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_VARS.fd  \
    \
    -device e1000,mac=00:00:00:00:00:01,id=network0.0,netdev=network0,bus=pcie.0 \
        -netdev tap,ifname=tap1,id=network0,script=no,downscript=no \
    -device e1000,mac=00:00:00:00:00:02,id=network1.0,netdev=network1,bus=pcie.0 \
        -netdev tap,ifname=tap2,id=network1,script=no,downscript=no \
    \
    -device ide-hd,drive=hdd0,bus=ide.0,id=scsi0,bootindex=1 \
        -drive file=./storage/machines/testmachine/machine.qcow2,if=none,format=qcow2,discard=unmap,aio=native,cache=none,id=hdd0 \
    -device ide-cd,drive=cdrom0,bus=ide.1,id=scsi1,bootindex=2 \
        -drive file=./storage/isos/pfSense-CE-2.5.0-DEVELOPMENT-amd64-20210104-0250.iso,media=cdrom,if=none,format=raw,cache=none,id=cdrom0

Which is odd, I thought/think that enabling separate PCIe Busses with separate PCIe Ports on individual busses should work according to PCIe standard (unless I got that backwards) and would help the kernel out by separating on a hardware level.

And swapping to virtio-net-pci works too as long both are on pcie.0!
So there's clearly something going on with the PCIe address assignment. Having a secondary PCIe Bus causes everything to go haywire.

Identical issues on FreeBSD, both regarding multiple PCIe Root Bus setups and the e1000 driver. And solution works for FreeBSD too. So now I know how to "fix" it, but I don't understand why this happens.
 
It also seams allowed within the FreeBSD space to assign PCIe Root Port's under the pcie.0 bus, meaning this is allowed and works just fine:

Code:
qemu-system-x86_64 \
    -enable-kvm \
    -machine q35,accel=kvm \
    -device intel-iommu \
    -cpu Nehalem \
    -m 4096 \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_CODE.fd \
    -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/OVMF_VARS.fd  \
    \
    -device pcie-root-port,bus=pcie.0,id=root_port1,slot=0 \
        -device virtio-net-pci,mac=00:00:00:00:00:01,id=network0.0,status=on,netdev=network0,bus=root_port1 \
            -netdev tap,ifname=tap1,id=network0,script=no,downscript=no \
    -device pcie-root-port,bus=pcie.0,id=root_port2,slot=1 \
        -device virtio-net-pci,mac=00:00:00:00:00:02,id=network1.0,status=on,netdev=network1,bus=root_port2 \
            -netdev tap,ifname=tap2,id=network1,script=no,downscript=no \
    \
    -device virtio-scsi-pci,bus=pcie.0,id=scsi0 \
        -device scsi-hd,drive=hdd0,bus=scsi0.0,id=scsi0.0,bootindex=1 \
            -drive file=./storage/machines/testmachine/machine.qcow2,if=none,format=qcow2,discard=unmap,aio=native,cache=none,id=hdd0 \
    -device virtio-scsi-pci,bus=pcie.0,id=scsi1 \
        -device scsi-cd,drive=cdrom0,bus=scsi1.0,id=scsi1.0,bootindex=2 \
            -drive file=./storage/isos/pfSense-CE-2.5.0-DEVELOPMENT-amd64-20210104-0250.iso,media=cdrom,if=none,format=raw,cache=none,id=cdrom0

As long as the PCI Network Card is not virtio-pci-net, because that does not support this setup for unknown reasons.
Whereas if you skip the pcie-root-port and flatten the structure, it starts working fine again.


Edit: Must have been a bit tired yesterday banging my head on this issue. The e1000 kinda almost works, but not really. So both virtio-net-pci and e1000 needs to be directly on the pcie.0 bus.

I would like to hear from someone more experienced (not me) surrounding FreeBSD and PCIe hierarchies if this is "as expected" or a weird edge-case bug.
If it's a bug, I'd like to report it and have it fixed as other OS:es behave wildly different given these setups.
 
Back
Top