I'm also using the IBM M1015 card (flashed in IT mode) with the latest
mpslsi driver and firmware from LSI on FreeBSD 9.1.
Code:
dev.mpslsi.0.%desc: LSI SAS2008
dev.mpslsi.0.%driver: mpslsi
dev.mpslsi.0.%location: slot=0 function=0
dev.mpslsi.0.%pnpinfo: vendor=0x1000 device=0x0072 subvendor=0x1000 subdevice=0x3020 class=0x010700
dev.mpslsi.0.%parent: pci1
dev.mpslsi.0.debug_level: 4
dev.mpslsi.0.disable_msix: 0
dev.mpslsi.0.disable_msi: 0
[B]dev.mpslsi.0.firmware_version: 15.00.00.00
dev.mpslsi.0.driver_version: 15.00.00.00[/B]
dev.mpslsi.0.io_cmds_active: 10
dev.mpslsi.0.io_cmds_highwater: 291
dev.mpslsi.0.chain_free: 2048
dev.mpslsi.0.chain_free_lowwater: 2014
dev.mpslsi.0.max_chains: 2048
dev.mpslsi.0.chain_alloc_fail: 0
One of the drives in my striped mirror zpool configuration is bad and this freezes the whole zfs pool.
Code:
mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff8000862000 cm 0xffffff80008c1f48
mpslsi0: mpssas_alloc_tm freezing simq
mpslsi0: timedout cm 0xffffff80008c1f48 allocated tm 0xffffff8000879908
mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff8000862000 cm 0xffffff800089d708
mpslsi0: queued timedout cm 0xffffff800089d708 for processing by tm 0xffffff8000879908
mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff8000862000 cm 0xffffff80008a0670
mpslsi0: mpssas_free_tm releasing simq
mpslsi0: mpssas_alloc_tm freezing simq
mpslsi0: timedout cm 0xffffff80008a0670 allocated tm 0xffffff8000879a50
mpslsi0: mpssas_free_tm releasing simq
Code:
048 scsi 0 state c xfer(noperiph:mpslsi0:0:3:0): SMID 15 abort TaskMID 89 status 0x0 code 0x0 count 1
(da3:mpslsi0:0:3:0): WRITE(10). CDB: 2a 0 2 d4 40 80 0 1 0 0
(da3:mpslsi0:0:3:0): CAM status: Command timeout
(da3:mpslsi0:0:3:0): Retrying command
(da3:mpslsi0:0:3:0): WRITE(10). CDB: 2a 0 2 d4 41 80 0 1 0 0 length 131072 SMID 325 command timeout cm 0xffffff800088f068 ccb 0xfffffe0008f74000
(da3:mpslsi0:0:3:0): WRITE(10). CDB: 2a 0 2 d4 41 80 0 1 0 0 length 131072 SMID 325 completed timedout cm 0xffffff800088f068 ccb 0xfffffe0008f74000 during recovery ioc 8048 scsi 0 state c xfe(noperiph:mpslsi0:0:3:0): SMID 16 abort TaskMID 325 status 0x0 code 0x0 count 1
(da3:mpslsi0:0:3:0): WRITE(10). CDB: 2a 0 2 d4 41 80 0 1 0 0
(da3:mpslsi0:0:3:0): CAM status: Command timeout
(da3:mpslsi0:0:3:0): Retrying command
I don't want that my zfs "freezes" when one of the disks is behaving bad in a striped mirror configuration. I hoped that the zfs code and
mpslsi driver would drop the disk, but this seems not to be the case.
This was the status of the pool, a few seconds before the "crash"/stuck state of the
lsi driver.
Code:
root@freebsd-san:/root # zpool status
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 0 in 0h10m with 0 errors on Mon Mar 18 09:51:52 2013
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/disk1 ONLINE 0 0 0
label/disk2 ONLINE 0 0 3
mirror-1 ONLINE 0 0 0
label/disk3 ONLINE 0 0 0
label/disk4 ONLINE 0 0 0
cache
label/disk5 ONLINE 0 0 0
errors: No known data errors
Code:
root@freebsd-san:/root # camcontrol stop da3
Error received from stop unit command
Any zfs command seems to be stuck, in a blocked state, probably due to the "crashed"
lsi driver.
Any ideas? I can try with the "normal"
mps driver, but this is based on phase 14 if
I'm correct. My firmware is phase 15, I hope this is not an issue...
edit:
After a long time, the zfs command succeeded.
Code:
(da3:mpslsi0:0:3:0): WRITE(10). CDB: 2a 0 2 d4 4f c5 0 1 0 0
(da3:mpslsi0:0:3:0): CAM status: Command timeout
(da3:mpslsi0:0:3:0): Error 5, Periph was invalidated
(da3:mpslsi0:0:3:0): oustanding 0
(da3:mpslsi0:0:3:0): removing device entry
Code:
root@freebsd-san:/root # zpool status
pool: tank
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Mon Mar 18 10:52:43 2013
2.23G scanned out of 35.3G at 18.3M/s, 0h30m to go
0 repaired, 6.33% done
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
label/disk1 ONLINE 0 0 0
11585224050511345959 REMOVED 0 0 0 was /dev/label/disk2
mirror-1 ONLINE 0 0 0
label/disk3 ONLINE 0 0 0
label/disk4 ONLINE 0 0 0
cache
label/disk5 ONLINE 0 0 0
errors: No known data errors
This took longer than 5 minutes. 60 seconds timeout (kern.cam.da.default_timeout: 60) and 4x retry. Is this possible?
Or is the mpslsi driver handling the timeouts in a different way ?