Other mfi1: COMMAND TIMEOUT issue

Hi,

I have some problems related with mfi:
Code:
Mar 30 06:10:10 research kernel: mfi1: 115794 (544169410s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:10:23 research kernel: mfi1: 115795 (544169423s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:10:48 research kernel: mfi1: 115796 (544169448s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:11:38 research kernel: mfi1: 115797 (544169498s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:11:51 research kernel: mfi1: 115798 (544169511s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:12:04 research kernel: mfi1: 115799 (544169524s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:12:05 research kernel: mfi1: COMMAND 0xffffff8000ac1540 TIMEOUT AFTER 38 SECONDS
Mar 30 06:12:30 research kernel: mfi1: 115800 (544169550s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:12:43 research kernel: mfi1: 115801 (544169563s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:13:50 research kernel: mfi1: 115802 (544169630s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:14:03 research kernel: mfi1: 115803 (544169643s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:14:17 research kernel: mfi1: 115804 (544169657s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Mar 30 06:14:30 research kernel: mfi1: 115805 (544169670s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
It is FreeBSD 9.3-RELEASE-p43 amd64 box.

Any idea why it is happening and is it possible to fix it?

thanks a lot,

Ganbold
 
It is FreeBSD 9.3-RELEASE-p43 amd64 box.

Any idea why it is happening and is it possible to fix it?
I'll strongly second the suggestion to upgrade to a supported FreeBSD release.

All of these seem to be happening on the same physical device (apparently physical disk 35 in enclosure slot 53, though that info is often bogus when SAS expanders are in use). You might be able to get more information with # mfiutil -u 1 | grep S53 which might show you the drive model and serial number of the affected drive. You might want to check it with sysutils/smartmontools or just pro-actively replace it.

As part of your FreeBSD upgrade process, you should visit the manufacturer's support page for your specific controller model (which may be an OEM like Dell, HP, etc.) or the generic LSI Logic support page and download the latest firmware for your controller. I suspect there is also one or more SAS expanders in the picture, and there may or may not be firmware updates for those as well. That firmware is normally available from the chassis manufacturer if this is a "white box" system. Sometimes (like Supermicro) you have to email support and ask for it.
 
Updated the box to 11.0-RELEASE-p9 and the error still appears.
Code:
# mfiutil -u 1 show adapter
mfi1 Adapter:
    Product Name: PERC H800 Adapter
   Serial Number: 0AE00ER
        Firmware: 12.10.2-0004
     RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
  Battery Backup: present
           NVRAM: 32K
  Onboard Memory: 512M
  Minimum Stripe: 8K
  Maximum Stripe: 1M
Overland NEO-XL 80 tape library is connected to it. I updated the firmware of it to v2.20.

camcontrol shows:
Code:
<IBM ULT3580-HH6 E6R3>             at scbus1 target 53 lun 0 (pass49,sa0)
<BDT MULTISTAK 2.20>               at scbus1 target 53 lun 1 (pass50,ch0)
So IBM ULT3580-HH6 E6R3 is the drive and BDT MULTISTAK 2.20 is the changer.

dmesg:
Code:
sa0 at 7mfi1 bus 0 scbus1 target 53 lun 0
sa0: <IBM ULT3580-HH6 E6R3> Removable Sequential Access SPC-4 SCSI device
sa0: Serial Number 11C21210B5
sa0: 150.000MB/s transfers
sa0: Command Queueing enabled

ch0 at mfi1 bus 0 scbus1 target 53 lun 1
ch0: <BDT MULTISTAK 2.00> Removable Changer SPC-3 SCSI device
ch0: Serial Number DE68101363_LL01
ch0: 150.000MB/s transfers
ch0: Command Queueing enabled
ch0: 60 slots, 1 drive, 1 picker, 0 portals

mfi1: 120144 (546318031s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120145 (546318045s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120146 (546318571s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: COMMAND 0xfffffe00010086e8 TIMEOUT AFTER 38 SECONDS
(sa0:mfi1:0:53:0): LOAD UNLOAD. CDB: 1b 00 00 00 00 00
(sa0:mfi1:0:53:0): CAM status: SCSI Status Error
(sa0:mfi1:0:53:0): SCSI status: Check Condition
(sa0:mfi1:0:53:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
(sa0:mfi1:0:53:0): Field Replaceable Unit: 48
(sa0:mfi1:0:53:0): Command byte 0 is invalid
(sa0:mfi1:0:53:0): Error 22, Unretryable error
mfi1: 120147 (546318667s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120148 (546318680s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: COMMAND 0xfffffe000100ae38 TIMEOUT AFTER 34 SECONDS
mfi1: 120149 (546318693s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120150 (546318706s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120151 (546318719s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120152 (546318732s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120153 (546318747s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120154 (546318760s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120155 (546318773s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120156 (546318786s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120157 (546318799s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: COMMAND 0xfffffe000100af48 TIMEOUT AFTER 34 SECONDS
mfi1: 120158 (546318812s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120159 (546318825s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120160 (546318838s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120161 (546318851s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120162 (546318864s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120163 (546318877s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120164 (546318890s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120165 (546318906s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120166 (546318919s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: COMMAND 0xfffffe00010087f8 TIMEOUT AFTER 34 SECONDS
mfi1: 120167 (546318932s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120168 (546318945s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120169 (546318958s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120170 (546318971s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120171 (546318984s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120172 (546318997s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120173 (546319010s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120174 (546319023s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120175 (546319036s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120176 (546319049s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120177 (546319062s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120178 (546319075s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120179 (546319092s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120180 (546319105s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
mfi1: 120181 (546319118s/0x0002/WARN) - PD 35(e0xff/s53) Path 5000e111c21210b6  reset (Type 03)
Not sure what should I try next, maybe try to update the firmware of the drive itself?
Please let me know if you have better ideas.

thanks,
 
Product Name: PERC H800 Adapter
Firmware: 12.10.2-0004
That is really old firmware. The latest is 12.10.7-0001 (link).
<IBM ULT3580-HH6 E6R3> at scbus1 target 53 lun 0 (pass49,sa0)

Likewise here. Current is G9P1 (link).

If it still doesn't work properly after upgrading the controller and drive firmware, download a copy of the IBM diagnostic from here. There isnt' a FreeBSD version so you'll need either Linux, Windows, or MacOS. I generally use a Linux LiveCD. If it still fails the IBM diagnostic, try a different controller / cable, preferably one as simple as possible (non-RAID).
 
Back
Top