vmware Broadcom P225P SR-IOV inconsistent behavior/no longer loading

Hi all,

This is largely a copy/paste from a couple other product specific forums but I've come to the horse's mouth as it were.

I've been using OPNSense and TrueNAS in a home lab environment for some time now, but I'm in the midst of building a new box and the setup is killing me here.

I'll try to be brief here.

In short, the new HW is an Epyc 7282 with a Supermicro H12SSL-i board running ESXi 7.

I have a Broadcom P225P/BCM57414 NIC with SR-IOV enabled.

The initial deployment of a new OPNSense VM (of which is on FreeBSD 13) was met with some issues regarding the drivers for the cards. It showed:

Code:
none0@pci0:11:0:0:   class=0x020000 rev=0x00 hdr=0x00 vendor=0x14e4 device=0x16dc subvendor=0x14e4 subdevice=0x16d7
    vendor     = 'Broadcom Inc. and subsidiaries'
    device     = 'NetXtreme-E Ethernet Virtual Function'
    class      = network
    subclass   = ethernet
none1@pci0:19:0:0:   class=0x020000 rev=0x00 hdr=0x00 vendor=0x14e4 device=0x16dc subvendor=0x14e4 subdevice=0x16d7
    vendor     = 'Broadcom Inc. and subsidiaries'
    device     = 'NetXtreme-E Ethernet Virtual Function'
    class      = network
    subclass   = ethernet
none2@pci0:27:0:0:   class=0x020000 rev=0x00 hdr=0x00 vendor=0x14e4 device=0x16dc subvendor=0x14e4 subdevice=0x16d7
    vendor     = 'Broadcom Inc. and subsidiaries'
    device     = 'NetXtreme-E Ethernet Virtual Function'
    class      = network
    subclass   = ethernet

...similar to another thread I'd seen on their forums. I was able to manually load the drivers via "kldload if_bnxt" with success, and added it to loader.conf.local, also with success upon a reboot.

However, upon rebooting the host, it all went to pot.

After previously starting to configure the new interfaces within the UI, and rebooting, all the newly created interfaces disappeared. I was met with the same "none0@" output above. However, upon checking to see if something didn't run as expected:
Code:
kldload if_bnxt
kldload: can't load if_bnxt: module already loaded or in kernel
I rebooted the host again, only to be met with the VM not starting up at all as the NIC in question seemed to drop out of having SR-IOV enabled. I rebooted once more, and one port was enabled, one disabled. Both times ESXi was indicating that it WAS enabled, but required a reboot.

After this seemed to resolve itself, I was still experiencing these interfaces dropping off. I deleted the loader.conf.local file to prevent the drivers from being loaded, rebooted, manually tried to load the drivers again but indicated they were already loaded. I don't quite know how that would have been the case.

The only thing I could see from dmesg was

Code:
bnxt0: <Broadcom NetXtreme-E Ethernet Virtual Function> mem 0xffa04000-0xffa07fff,0xff900000-0xff9fffff,0xffa00000-0xffa03fff at device 0.0 on pci5
bnxt0: Timeout sending HWRM_VER_GET: (timeout: 1000) seq: 0
bnxt0: attach: hwrm ver get failed
bnxt0: IFDI_ATTACH_PRE failed 60
device_attach: bnxt0 attach returned 60


In other attempts to boot, I saw:

Code:
bnxt0: Timeout sending HWRM_RING_ALLOC: (timeout: 2000) seq: 225
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 226
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 227
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 228
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 229
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 230
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 231
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 232
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 233
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 234
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 235
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 236
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 237
bnxt0: Timeout sending HWRM_FUNC_RESET: (timeout: 2000) seq: 238
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 239
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 240
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 241
bnxt1: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 15
bnxt1: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 16
bnxt1: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 17
bnxt2: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 15
bnxt2: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 16
bnxt2: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 17
bnxt0: Timeout sending HWRM_PORT_PHY_QCFG: (timeout: 2000) seq: 242
...

Which seemed to repeat endlessly.

I now have no idea why or how that occurred and am at a loss here. I'm guessing this is a guest issue. I see similar behavior with TrueNas, which is on FreeBSD 12.

I've taken the FW SW above and did a fresh install to try to mimic the original steps, but it's still in the same boat. I've also installed a fresh vanilla instance of FreeBSD 13 to see if it happened to be something specific to the builds for the other stuff and it's the exact same thing.

I'm sure I'm missing something here, but this side of the house is not my forte (I'm a network guy). Happy to provide any additional info required.

TIA
 
Today the first beta of 13.1-RELEASE will get built. It should be available for download soon. When it's available do you think you can test it? If you still get the issue on 13.1-BETA1 you should probably report it as a bug here: https://bugs.freebsd.org/bugzilla/
 
Of course. I'll keep an eye out for the build and report here and submit said bug report if required.

Planning ahead a bit, if it is good to go, what would be the best method to back port, if that's possible?
 
What OS is ESXi running on?
It's running on ....ESXi 7?
Those network interfaces are bad choice with that deluxe hardware.
Getcha Intel or Chelsio 10G card and save you some headache.

Perhaps a fair point...however, this suggests:

+o Broadcom BCM57414 NetXtreme-E Ethernet Virtual Function
+o Broadcom BCM57414 NetXtreme-E Partition

...which is what we've got here and that page is specific to 12.0, which for the TrueNas case, is what the underlying is.

Now, while there's a small discrepancy in the docs...

DESCRIPTION
The bnxt driver provides support for various NICs based on the Broadcom
BCM57301/2/4, and BCM57402/4/6 Ethernet controller chips.

As that doesn't explicitly indicate the 57414 card, I assume it is actually included.
 
As a point of process…

I wiped everything and reinstalled from scratch, ESXi included.

Updated the drivers in ESXi as before. Created all my new virtual switches and port groups and created the VM again. Specifically OPNSense.

As expected, the interfaces did not load correctly.

I simply added if_bnxt_load=“YES” to loader.conf and rebooted. And it worked just as it did the first time.

However, it doesn’t survive a reboot of the VM, just as it seems to have done the first time. I’ve run of out of time for the day and will pick up testing in my AM. But I’m not quite sure where to go to determine why it does actually work…once…and that’s all.
 
It's running on ....ESXi 7?

Is that your final answer? What is host OS for your ESXi?
That was the question.
ESXi is the OS/is an OS (for lack of a better term)/is the hypervisor. Unless you're referring to the guest, in which case is OPNSense which I'd mentioned is on FreeBSD 13.

You need to put that setting in /boot/loader.conf.local for it to remain persistant in OPNSense.
I had as referenced in my original post. However, for the second bit of testing, I did not take that into account and quickly forgot that changes to loader.conf don't persist. That's my mistake.

In the previous case, I'd removed the loader.conf.local config full stop, but it had indicated that the module was still loaded/loading upon boot, but the interfaces were not in fact working. This does work after a guest reboot (as expected).

In this particular instance, after a proper shutdown of the VM, and a reboot of the host, the config survived.

I wish I had an answer to the previous issue. In the previous case, I believe I'd rebooted the host without properly shutting down the VM. Whether that has anything to do with it I can't say. I've done that process again, rebooting the host without taking into account the VM's state and now, it seems to be working.

It's a bit nonsensical. So I either massively screwed something up and didn't do something I thought I did do (quite possible) or something was just in an odd state to begin with. I did move some HW around...changing slots for a couple things. But I sincerely doubt that had anything to do with anything.

Apologies for the unnecessary thread here. ?
 
Sorry. I did not know ESXi was its own host OS.
I knew ESX used a custom Linux kernel but I was unaware they dropped that.
A truly freestanding VM OS then. I learned something new here.
 
Back
Top