Dell AMD R7525 install fails

Hello,

We have a Dell R7525 server with 2 Honeybadger NVMe cards (LQD4500) and a Dell 355e SAS HBA connected to a JBOD of disks. There is also a PERC H755 raid controller for the front end array ( pair of mirrored SSDs, boot/root) and a dual port Broadcom NIC.

We have been unsuccessful at installing freebsd onto the server when all the above is connected.

I've seen others reporting looping messages of "Out of chain frames, consider increasing hw.mpr.max_chains" and followed what worked for some to resolve the problem, but none of them are working for us on FreeBSD 13.2 or 13.1. Both versions loop endlessly (left it over night one night) enumerating the SAS devices, stalling, resetting and starting over.

From my understanding hw.mpr.max_chains isn't part of the kernel source anymore, but i could be wrong. I get an OU not found error when attempting to query the value its set to via sysctl.

To get a successful install we need to either disable the Honeybadger cards or the SAS HBA, both can not be enabled at the same time. After installing and re-enabling whichever one was disabled, we get into that loop. (the same loop happens if we attempt to install with both enabled)

BIOS and firmware is all up to date as of today on all bits.

We tried both legacy and UEFI, trusted boot is off.

Neither Liqid nor Dell list freebsd on their supported hardware matrix for the pieces involved. We need to move forward with setting this service up, was hoping to use freebsd. Figured I should ask here as a last-ditch effort.

Was wondering if anyone has any ideas on how to resolve this or a work around?


Below is the output from linux's lspci command (which installs and operates as expected and without issue - with all option cards enabled). I removed most of the lines.

Code:
[root@backup01 ~]# sed -e '/AMD/d' -e '/^$/d' 7525.txt
root@storage2[/mnt/stg2pool1/storage2]# lspci
01:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
25:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
26:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
26:04.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
26:1c.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
27:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:04.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:08.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:0c.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:10.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:14.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:18.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
28:1c.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
29:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
2a:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
2b:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
2c:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
2d:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
2e:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
2f:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
30:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
31:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
33:00.0 Mass storage controller: Broadcom / LSI Device c010 (rev b0)
41:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
61:00.0 PCI bridge: PLDA PCI Express Bridge (rev 02)
62:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
63:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
63:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
e1:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e2:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e2:04.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e2:1c.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e3:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:04.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:08.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:0c.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:10.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:14.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:18.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e4:1c.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
e5:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
e6:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
e7:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
e8:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
e9:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
ea:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
eb:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
ec:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
ed:00.0 PCI bridge: Broadcom / LSI Device c010 (rev b0)
ef:00.0 Mass storage controller: Broadcom / LSI Device c010 (rev b0)
root@storage2[/mnt/stg2pool1/storage2]#

Here is a few screen shots when it gets stuck in the loop.

hw.mpr.max_chains.png


If I 'disable boot driver' from bios on the SAS card slot:
Screenshot 2023-09-18 at 9.12.54 PM.png

And that goes on forever too (Root mount waiting for CAM). Let it sit for hours.

Screenshot 2023-09-19 at 5.04.55 PM.png


sometimes it panics. The above is from an effort where I unplugged the SAS JBOD to see if i could break the looping above. (noticed it mentions linux driver in the output, was wondering what's up with that..).


Screenshot 2023-09-19 at 5.16.27 PM.png


The above is after unplugging the SAS JBOD, but without the panic. :). Seems the first time we make a change (in this case unplugging the JBOD) the system panics on its first boot, then subsequent ones do something different. (but I can't confirm this is consistent behaviour...).


thanks in advance for any thoughts or ideas. Happy to provide further info or output of any commands.

-greg
 
Last edited by a moderator:
Have you tried turning off MSI-X?

Code:
     To disable MSI-X interrupts for all mpr driver instances, set this
     tunable value in loader.conf(5):

           hw.mpr.disable_msix=1

     To disable MSI-X interrupts for a specific mpr driver instance, set this
     tunable value in loader.conf(5):

           dev.mpr.X.disable_msix=1
 
hi!

>> "What about the latest beta of 14.0?"

I have not tried it, I'm not keen on using beta release in production so I wouldn't be comfortable there. Once this machine is in production, getting maintenance windows can take months.

>> "Is it possible that the firmware is v24 and the driver is v23?"

totally possible, the output indicates that as well, but i'd like to think newer firmwares are a bit backwards compatible with older drivers. I don't think i can downgrade the firmware but i could be wrong.

>>"Have you tried turning off MSI-X?"

I have not but if I get the chance to attempt a re-install of freeBSD again I'll give that a go. I had to go with a linux install and am 400TB into a 600TB migration. Maybe next week if it finishes up. Thanks for your suggestion SirDice.

appreciate everyone's time and input!

thank you,
-greg
 
@dafibble not on the freebsd front. Tried for a few days, asked a few people, posted here. Had to give up unfortunately. Ended up installing linux which went on without issue. :(

I may revisit it but it's in production for now.

take care,
-greg
 
Back
Top