ZFS Dell PERC H730P mini RAID card broken coursed ZFS boot fail

Dell PERC H730P mini RAID card broken coursed ZFS boot fail

FreeBSD 11.1, Dell PowerEdge R730, PERC H730P mini RAID card( 8 disks , RAID 6)
2 virtual volume, one for zroot boot , one for user home data (all are zfs file system)

One day PowerEdge R730 server failed and auto reboot, hung in a screen (bios version 2.7.1)

08.JPG


Send the Dell Lifecycle hardware log to Dell Supporter , they told me try to update the raid card firmware.

Enter Dell Lifecycle controller, found RAID firmware is 25.5.3.0005,A11, try to update to 25.5.5.0005 A13 , updating progress ok, but when Lifecycle controller reboot back , found the firmware version not changed, try another time, the same. It seems firmware can't update.

Dell send a new card (firmware 25.5.5)and replace the old one, the new card boot FreeBSD normally , uptime for about 2 hours, and then display the screen
07.JPG


Ctrl + Alt + Del reboot the system , Server hung in the first images screen again
08.JPG



Call DELL again , they send the second raid card, replaced ... boot the server

but now FreeBSD display screen
--------------
>>FreeBSD EFI boot block
Loader path:/boot/loader.efi

Initializing modules:ZFS UFS
Probing 7 block devices.......*. done
ZFS found the foloowing pools:zroot
UFS found no partitions
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool zroot
Failed to load '/boot/loader.efi'
panic: No bootable partitions found!


All 8 disks lights blink normal.
So how to do first? any reference documents should I read?
If I reinstall FreeBSD system in the first volume , can I restore the second volume's files ( user home data)?
the two volumes are zfs system.
 
I think what you are experiencing here is that FreeBSD uses a slightly older driver version than the newer cards firmware.
The FreeBSD driver "updates" the firmware on the card to version 24 and you end up stuck. The boot firmware is still Version 25.
What you need to do is flash the same firmware on the Dell MiniSAS module that FreeBSD uses.
I think FreeBSD uses LSI driver ver. 24 and ver. 25 has been out a while. In fact I see version 25 in you notes.
That controller is pretty new so maybe the firmware is ahead of the FreeBSD driver.
That is my best guess.
Over on the https://forums.servethehome.com I saw a very similar problem on the same Dell R730 machine.

if I were you I would snoop around and see what driver version FreeBSD 12 Beta1 is shipping with. It might fix you right up.
 
I will defer to you as you have a much greater knowlege. I cannot find the post I had read. Perhaps it was old.
I still would consider a driver-firmware mis-match possible.

The other possibility is this has nothing to do with FreeBSD and is simply a product of different firmware's on the 2 cards.
From my reading Dells R730 will only allow RAID. No HBA mode. So you have to setup 8 separate RAID0 arrays.
So maybe the raid0 arrays need rebuilding because of firmware differences between the two H730 used.
He is using the older mpi driver too as you note. That would be the first thing to correct.
EDIT:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230557

I do see a note where Dell might now support IT (HBA)mode on H730P.
 
I still would consider a driver-firmware mis-match possible.

The other possibility is this has nothing to do with FreeBSD and is simply a product of different firmware's on the 2 cards.
I haven't ran into issues like these myself but it's certainly a possibility. Some firmware versions definitely have bugs.

I'm not sure how 'different' the Dell cards are in relation to the original LSI cards. I've mainly used original LSI cards in SuperMicro machines.
 
You definatly nailed the problem for the OP.

I just finished doing a LSI SAS2008 that is embedded on the motherboard.
Running sas2flash.efi -o -e 7 was scary. Wiping out the whole shebang.
Booting from UEFI thumbdrive was a blast. I cheated and renamed the rom.nsh to startup.nsh.
Guess thats UEFI's version of autoexec.bat
Ran my flasher for me autromagically...
 
I got my versions wrong too. FreeBSD driver is at ver. 21 and the latest SAS2008 firmware is version 20 from 4 years ago.
Need to check my SAS3008's and compare versions. Got me curious now.

Code:
mps0: <Avago Technologies (LSI) SAS2008> port 0xb000-0xb0ff at device 0.0 on pci2
mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
 
Reviving this after a falure involving a PERC H730P.

This is what I see in /var/run/dmesg.boot:
Code:
AVAGO MegaRAID SAS FreeBSD mrsas driver version: 06.712.04.00-fbsd
mfi0: <Invader> port 0x2000-0x20ff mem 0x91d00000-0x91d0ffff,0x91c00000-0x91cfffff at device 0.0 numa-domain 0 on pci3
mfi0: Using MSI
mfi0: Megaraid SAS driver Ver 4.23
mfi0: FW MaxCmds = 928, limiting to 128
mfi0: MaxCmd = 928, Drv MaxCmd = 128, MaxSgl = 70, state = 0xb73c03a0
mfi0: 1782 (601573339s/0x0020/info) - Shutdown command received from host
mfi0: 1783 (boot + 10s/0x0020/info) - Firmware initialization started (PCI ID 005d/1000/1f47/1028)
mfi0: 1784 (boot + 10s/0x0020/info) - Firmware version 4.270.00-8178
mfi0: 1785 (boot + 11s/0x0008/info) - Battery Present
mfi0: 1786 (boot + 11s/0x0020/info) - Package version 25.5.3.0005
mfi0: 1787 (boot + 11s/0x0020/info) - Board Revision A09
mfi0: 1788 (boot + 15s/0x0008/info) - Battery temperature is normal
mfi0: 1789 (boot + 15s/0x0008/info) - Current capacity of the battery is above threshold
mfi0: 1790 (boot + 16s/0x0004/info) - Enclosure PD 20(c None/p1) communication restored
mfi0: 1791 (boot + 16s/0x0002/info) - Inserted: Encl PD 20

ADDED in case this is useful information:
Code:
% sudo mfiutil show adapter
mfi0 Adapter:
    Product Name: PERC H730P Mini
   Serial Number: 7C802O2
        Firmware: 25.5.3.0005
     RAID Levels:
  Battery Backup: present
           NVRAM: 32K
  Onboard Memory: 2048M
  Minimum Stripe: 64K
  Maximum Stripe: 1M

I am looking into switching to mrsas(4). After adding hw.mfi.mrsas_enable="1" to /boot/loader.conf, are any other changes necessary, or will the pool still function after a reboot?

It looks like there is a recommended firmware update.
 
I am looking into switching to mrsas(4). After adding hw.mfi.mrsas_enable="1" to /boot/loader.conf, are any other changes necessary, or will the pool still function after a reboot?
You may have to tell the bootloader to boot from ada0 instead of mfid0. ZFS isn't going to care about the name change, for UFS you will want to adjust /etc/fstab prior to rebooting or else nothing will be able to mount. Unless you used labels for UFS, then it won't matter what the drives are called either.
 
Switching from mfi(4) to mrsas(4) was indeed as simple as adding the one line to /boot/loader.conf, then rebooting. Hopefully this change and updating to the latest raid controller firmware solves this.
 
Back
Top