Hi All,
I have been experiencing a flaky mps driver on 9.0-RELEASE-amd64. This occurred during a 1.4TB rsync transfer of media to the new FreeBSD server sporting a 10 x 2TB raidz2 over NFSv4 to HDD's running on 3 x IBM M1015's - LSI SAS2008 chips with IT fw v14.
The 3 of 4 SATA3 HDD's affected were all on the same mps0 device. da2 was removed from the zpool and da0, da3 experienced errors, but didn't drop out.
A reboot of the system brought everything back and da2 resilvered back into the zpool.
---------------
Questions I have are related to mention of a new LSI-supported mps driver and as I understand it NOT in the 9.0-RELEASE-amd64...
Q1) Does adding mps_load="YES" to /boot/loader.conf in 9.0-RELEASE have any benefit at all with respect to using a different driver for the LSI SAS2008 cards?
Q2) Would upgrading to 9.0-STABLE be of great benefit to me since I also hear that this new mps driver is used in STABLE or should I wait for 9.1-RELEASE instead?
Q3) Alternative approach of dloading the mpslsi.ko and loading it?
Anyways, lots of questions and haven't found any definitive answers either way. Feedback would be much appreciated. Specs and dmesg to follow:
I have been experiencing a flaky mps driver on 9.0-RELEASE-amd64. This occurred during a 1.4TB rsync transfer of media to the new FreeBSD server sporting a 10 x 2TB raidz2 over NFSv4 to HDD's running on 3 x IBM M1015's - LSI SAS2008 chips with IT fw v14.
The 3 of 4 SATA3 HDD's affected were all on the same mps0 device. da2 was removed from the zpool and da0, da3 experienced errors, but didn't drop out.
A reboot of the system brought everything back and da2 resilvered back into the zpool.
---------------
Questions I have are related to mention of a new LSI-supported mps driver and as I understand it NOT in the 9.0-RELEASE-amd64...
Q1) Does adding mps_load="YES" to /boot/loader.conf in 9.0-RELEASE have any benefit at all with respect to using a different driver for the LSI SAS2008 cards?
Q2) Would upgrading to 9.0-STABLE be of great benefit to me since I also hear that this new mps driver is used in STABLE or should I wait for 9.1-RELEASE instead?
Q3) Alternative approach of dloading the mpslsi.ko and loading it?
Anyways, lots of questions and haven't found any definitive answers either way. Feedback would be much appreciated. Specs and dmesg to follow:
Code:
[cpu] Intel Xeon E3-1220-V2
[mobo] Supermicro X9SCM-F (bios 2.0a)
[ram] (4x) Crucial CT51272BA1339 [4GB DDR3 Unbuffered ECC]
[ssd] Crucial M4 64GB (fw 000F)
[sas card] (3x) IBM M1015 (IT mode v14)
.....
[hdd] (4x) 2TB Seagate ST2000DL003
(3x) 2TB WD WD20EARS
(1x) 2TB WD WD20EARX
(1x) 2TB Hitachi HDS5C3020ALA632
(1x) 2TB Samsung/Seagate ST2000DL004
[os] FreeBSD 9.0-RELEASE amd64
[NFS] v4
[ZFS] v28, dedupe, compression OFF
Code:
Dec 4 21:25:11 e1220 kernel: mps0: <LSI SAS2008> port 0xe000-0xe0ff mem 0xf7a00000-0xf7a03fff,0xf7980000-0xf79bffff irq 16 at device 0.0 on pci1
Dec 4 21:25:11 e1220 kernel: mps0: Firmware: 14.00.00.00
Dec 4 21:25:11 e1220 kernel: mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Dec 4 21:25:11 e1220 kernel: pcib2: <ACPI PCI-PCI bridge> irq 16 at device 1.1 on pci0
Dec 4 21:25:11 e1220 kernel: pci2: <ACPI PCI bus> on pcib2
Dec 4 21:25:11 e1220 kernel: mps1: <LSI SAS2008> port 0xd000-0xd0ff mem 0xf7400000-0xf7403fff,0xf7380000-0xf73bffff irq 17 at device 0.0 on pci2
Dec 4 21:25:11 e1220 kernel: mps1: Firmware: 14.00.00.00
Dec 4 21:25:11 e1220 kernel: mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Dec 4 21:25:11 e1220 kernel: pcib3: <ACPI PCI-PCI bridge> irq 19 at device 6.0 on pci0
Dec 4 21:25:11 e1220 kernel: pci3: <ACPI PCI bus> on pcib3
Dec 4 21:25:11 e1220 kernel: mps2: <LSI SAS2008> port 0xc000-0xc0ff mem 0xf6e00000-0xf6e03fff,0xf6d80000-0xf6dbffff irq 19 at device 0.0 on pci3
Dec 4 21:25:11 e1220 kernel: mps2: Firmware: 14.00.00.00
Dec 4 21:25:11 e1220 kernel: mps2: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
.....
.....
Dec 6 05:39:53 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 0
Dec 6 05:39:53 e1220 last message repeated 2 times
Dec 6 05:39:53 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 65536
Dec 6 05:39:53 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 0
Dec 6 05:39:53 e1220 last message repeated 15 times
Dec 6 05:47:31 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 0
Dec 6 05:47:31 e1220 last message repeated 18 times
Dec 6 07:23:54 e1220 su: leeandang to root on /dev/pts/0
Dec 6 18:02:46 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 0
Dec 6 18:02:46 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 0
Dec 6 18:02:46 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 16384
Dec 6 18:02:46 e1220 kernel: mps0: (0:2:0) terminated ioc 804b scsi 0 state c xfer 0
Dec 6 18:02:46 e1220 last message repeated 3 times
Dec 6 18:03:42 e1220 kernel: mps0: mpssas_remove_complete on target 0x0002, IOCStatus= 0x0
Dec 6 18:03:42 e1220 kernel: (da2:mps0:0:2:0): lost device - 0 outstanding
Dec 6 18:03:44 e1220 kernel: (da2:mps0:0:2:0): removing device entry
Dec 6 18:03:47 e1220 kernel: da2 at mps0 bus 0 scbus0 target 2 lun 0
Dec 6 18:03:47 e1220 kernel: da2: <ATA ST2000DL003-9VT1 CC32> Fixed Direct Access SCSI-6 device
Dec 6 18:03:47 e1220 kernel: da2: 600.000MB/s transfers
Dec 6 18:03:47 e1220 kernel: da2: Command Queueing enabled
Dec 6 18:03:47 e1220 kernel: da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
Dec 6 18:03:54 e1220 kernel: mps0: Failure 0x4b reseting device 0x000a
Dec 6 18:06:31 e1220 kernel: (da3:mps0:0:3:0): READ(10). CDB: 28 0 11 16 5e a8 0 0 20 0
Dec 6 18:06:31 e1220 kernel: (da3:mps0:0:3:0): CAM status: SCSI Status Error
Dec 6 18:06:31 e1220 kernel: (da3:mps0:0:3:0): SCSI status: Check Condition
Dec 6 18:06:31 e1220 kernel: (da3:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Dec 6 18:20:52 e1220 kernel: (da2:mps0:0:2:0): SCSI command timeout on device handle 0x000a SMID 899
Dec 6 18:20:52 e1220 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 899 complete
Dec 6 18:21:29 e1220 login: ROOT LOGIN (root) ON ttyv0
Dec 6 18:21:52 e1220 kernel: (da2:mps0:0:2:0): SCSI command timeout on device handle 0x000a SMID 368
Dec 6 18:21:52 e1220 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 368 complete
Dec 6 18:21:54 e1220 login: ROOT LOGIN (root) ON ttyv1
Dec 6 18:22:52 e1220 kernel: (da2:mps0:0:2:0): SCSI command timeout on device handle 0x000a SMID 784
Dec 6 18:22:52 e1220 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 784 complete
Dec 6 18:23:52 e1220 kernel: (da2:mps0:0:2:0): SCSI command timeout on device handle 0x000a SMID 537
Dec 6 18:23:52 e1220 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 537 complete
...
Dec 6 18:42:35 e1220 kernel: mps0: mpssas_abort_complete: abort request on handle 0x0a SMID 427 complete
Dec 6 20:56:07 e1220 kernel: (da0:mps0:0:0:0): READ(10). CDB: 28 0 16 7a 1f 20 0 0 20 0
Dec 6 20:56:07 e1220 kernel: (da0:mps0:0:0:0): CAM status: SCSI Status Error
Dec 6 20:56:07 e1220 kernel: (da0:mps0:0:0:0): SCSI status: Check Condition
Dec 6 20:56:07 e1220 kernel: (da0:mps0:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
....
Dec 6 21:21:40 e1220 kernel: (da3:mps0:0:3:0): READ(10). CDB: 28 0 17 47 e7 68 0 0 20 0
Dec 6 21:21:40 e1220 kernel: (da3:mps0:0:3:0): CAM status: SCSI Status Error
Dec 6 21:21:40 e1220 kernel: (da3:mps0:0:3:0): SCSI status: Check Condition
Dec 6 21:21:40 e1220 kernel: (da3:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)