LSI SAS 9207-8i prints endless IOC Fault

I am using the following setting
  • OS: FreeBSD-12.0-Release-p9
  • 2 ports PCIe HBA attached with 2 SSDs
    • ZFS mirrored. '/' directory exists and FreeBSD is installed here
  • 6 Mainboard SATA ports
    • ZFS RAIDZ2 with 6 HDD disks for private data. '/zdata' pool is here.
I bought LSI 9207-8i and attached it PCIe port, My goal is to move 6 disks from mainboard slots to 9207 hba card.
I included mps_load="YES" in loader.conf to use the card and rebooted.

mpsutil prints the following
Code:
[root@freebsd01~] mpsutil show all
Adapter:
mps0 Adapter:
       Board Name: SAS9207-8i
   Board Assembly: H5-25412-00C
        Chip Name: LSISAS2308
    Chip Revision: ALL
    BIOS Revision: 7.39.02.00
Firmware Revision: 20.00.07.00
  Integrated RAID: no

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max    Device
0       0001        0009       N         6.0     1.5    6.0    SAS Initiator
1       0002        000a       N         6.0     1.5    6.0    SAS Initiator
2       0003        000b       N         6.0     1.5    6.0    SAS Initiator
3       0004        000c       N         6.0     1.5    6.0    SAS Initiator
4       0005        000d       N         6.0     1.5    6.0    SAS Initiator
5       0006        000e       N         6.0     1.5    6.0    SAS Initiator
6                              N                 1.5    6.0    SAS Initiator
7                              N                 1.5    6.0    SAS Initiator

Devices:
B____T    SAS Address      Handle  Parent    Device        Speed Enc  Slot  Wdt
00   03   4433221100000000 0009    0001      SATA Target   6.0   0001 03    1
00   02   4433221101000000 000a    0002      SATA Target   6.0   0001 02    1
00   01   4433221102000000 000b    0003      SATA Target   6.0   0001 01    1
00   00   4433221103000000 000c    0004      SATA Target   6.0   0001 00    1
00   05   4433221104000000 000d    0005      SATA Target   6.0   0001 07    1
00   04   4433221105000000 000e    0006      SATA Target   6.0   0001 06    1

Enclosures:
Slots      Logical ID     SEPHandle  EncHandle    Type
  08    500605b005e9c9d0               0001     Direct Attached SGPIO

Expanders:
NumPhys   SAS Address     DevHandle   Parent  EncHandle  SAS Level

It looks fine. But mps driver prints the following message endlessly, and the filesystem hangs during that reinitialization.
Code:
mps0: IOC Fault 0x4000265d, Resetting
mps0: Reinitializing controller
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>

All disks are detected by kernel and zpool status looks fine.
Code:
  pool: zdata
state: ONLINE
  scan: resilvered 0 in 0 days 00:00:11 with 0 errors on Sat Sep  7 13:53:11 2019
config:

        NAME        STATE     READ WRITE CKSUM
        zdata       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da5     ONLINE       0     0     0
            da4     ONLINE       0     0     0

errors: No known data errors

I can access the file system in this zdata pool between each IOC fault error.
What do I need to do now?

Thanks.
 
Normally, the LSI 9207 card is excellent; I have used hundreds of them (under Linux and other OSes, but not under FreeBSD). According to what everyone says here, it has very good FreeBSD support. I think that IOC fault means that there is a communication problem on the SAS bus. So the question is: What exactly is plugged in on the SAS side? Do you perhaps have some very bizarre hardware there? Are you using normal production-quality SSDs and disks? I've seen some bizarre errors on LSI SAS cards, but those typically happen with pre-release (alpha test) firmware on disks, which you probably don't have access to. Probably the most important question is: is there a disk backplane or SAS expander between the LSI card and the disks? I don't think so (because the "expander" line in the mpsutil output is empty), but maybe that line was cut off at the end.

This could also be caused by defective SAS cables, or serious power supply problems. Or by cooling problems. Under full workload, LSI cards can get VERY hot. I have an anecdote of an LSI card being overheated (by a configuration error involving the fans), and the last thing we ever heard from the card was that it's temperature was 105 degrees C. When we removed the card, it was brown, and never worked again. That's unlikely to be your problem, but check the cooling too.
 
Normally, the LSI 9207 card is excellent; I have used hundreds of them (under Linux and other OSes, but not under FreeBSD). According to what everyone says here, it has very good FreeBSD support. I think that IOC fault means that there is a communication problem on the SAS bus. So the question is: What exactly is plugged in on the SAS side? Do you perhaps have some very bizarre hardware there? Are you using normal production-quality SSDs and disks? I've seen some bizarre errors on LSI SAS cards, but those typically happen with pre-release (alpha test) firmware on disks, which you probably don't have access to. Probably the most important question is: is there a disk backplane or SAS expander between the LSI card and the disks? I don't think so (because the "expander" line in the mpsutil output is empty), but maybe that line was cut off at the end.

This could also be caused by defective SAS cables, or serious power supply problems. Or by cooling problems. Under full workload, LSI cards can get VERY hot. I have an anecdote of an LSI card being overheated (by a configuration error involving the fans), and the last thing we ever heard from the card was that it's temperature was 105 degrees C. When we removed the card, it was brown, and never worked again. That's unlikely to be your problem, but check the cooling too.

I attached 6 WD 4GB hard disks on LSI card with sas cables.

Interesting point is that same errors is also reported even when I removed all disks and cables from 9207.
But in this case, IOC Fault error occurs at very low frequency.
 
Have you checked that the firmware on the HBA is up-to-date?

There is the theoretical possibility that your HBA itself is defective. Testing that would require buying a second one, which for home users sounds (financially) painful.
 
It's the same card I have:
Code:
root@molly:~ # mpsutil show adapter
mps0 Adapter:
       Board Name: SAS9207-8i
   Board Assembly: H3-25412-00K
        Chip Name: LSISAS2308
    Chip Revision: ALL
    BIOS Revision: 7.39.00.00
Firmware Revision: 20.00.02.00
  Integrated RAID: no

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max    Device
0       0001        0009       N         6.0     1.5    6.0    SAS Initiator
1       0002        000a       N         6.0     1.5    6.0    SAS Initiator
2       0004        000c       N         6.0     1.5    6.0    SAS Initiator
3       0003        000b       N         6.0     1.5    6.0    SAS Initiator
4       0005        000d       N         6.0     1.5    6.0    SAS Initiator
5       0006        000e       N         6.0     1.5    6.0    SAS Initiator
6       0007        000f       N         6.0     1.5    6.0    SAS Initiator
7                              N                 1.5    6.0    SAS Initiator
I have this card for a couple of years now. Never saw any of those errors.
 
1 month passed.

I contacted Broadcom customer service and asked why.
They guessed my card is a defective, so I contacted Amazon to replace it.

I received the new one today! Happy day.

I added mps_load="YES" again.
I connected all my 8 disks (2 mirrored /root ssd disks, 6 raidz disks on 9207-8i)

no error. It works now. I am monitoring /var/log/messages for 1 hour now, haha.
But nobody knows tomorrow. If there is an error again, I will post it again.

Thanks, all people.
 
Back
Top