/dev/hpt27xx CAM Status: SCSI Status Error

TitanIT · Dec 18, 2013

Hello all, We have replaced a disk controller on a FreeBSD server which has been running for some time... The old disk controller Areca 1680i-16, It kept getting time-out errors and disks were unexpectedly dropping from the controller. We have the 16 disks configured in a ZFS pool as well. The disks in the zpool are Western Digital 2TB Green Drives WD20EARX. We decided to get a controller with less features and run it in Legacy mode (No RAID or JBOD) just pass through hopefully eliminating any Green Drive/controller problems.

Code:

[root@storage1 ~]# uname -a
FreeBSD storage1 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: Thu Sep 26 22:50:31 UTC 2013  root bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Code:

Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD20EARX-00PASB0
Serial Number:    WD-WMAZA6398371
LU WWN Device Id: 5 0014ee 25c17f0ec
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Dec 18 14:53:59 2013 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

We decided to install the HighPoint RocketRAID 2760. We were able to simply disconnect the Areca Controller and connect the HPT card into the server. Everything just worked.

We also made sure that the controller had the latest firmware installed on it as well.

Part of the upgrade process included upgrading from FreeBSD 9.1 to 9.2 and recreating the zpool.

We migrated the active virtual machines back to the pool and when the process was complete we decided to run a run a zpool scrub. A few interesting messages popped up on syslog but the server continued to function normally.

Code:

Dec 11 08:43:13 storage1 kernel: hpt27xx: Device error information 0x8000000080000000
Dec 11 08:45:05 storage1 kernel: (da11:hpt27xx0:0:11:0): WRITE(10). CDB: 2a 00 0e 54 82 58 00 00 10 00
Dec 11 08:45:05 storage1 kernel: (da11:hpt27xx0:0:11:0): CAM status: SCSI Status Error
Dec 11 08:45:05 storage1 kernel: (da11:hpt27xx0:0:11:0): SCSI status: OK
Dec 11 08:47:42 storage1 kernel: (da10:hpt27xx0:0:10:0): WRITE(10). CDB: 2a 00 0e 56 52 4c 00 00 10 00
Dec 11 08:47:42 storage1 kernel: (da10:hpt27xx0:0:10:0): CAM status: SCSI Status Error
Dec 11 08:47:42 storage1 kernel: (da10:hpt27xx0:0:10:0): SCSI status: OK
Dec 11 08:49:18 storage1 kernel: (da13:hpt27xx0:0:13:0): WRITE(10). CDB: 2a 00 0e 4c 44 74 00 00 08 00
Dec 11 08:49:18 storage1 kernel: (da13:hpt27xx0:0:13:0): CAM status: SCSI Status Error
Dec 11 08:49:18 storage1 kernel: (da13:hpt27xx0:0:13:0): SCSI status: OK
Dec 11 08:49:43 storage1 kernel: (da10:hpt27xx0:0:10:0): WRITE(10). CDB: 2a 00 0e 58 e6 d9 00 00 10 00
Dec 11 08:49:43 storage1 kernel: (da10:hpt27xx0:0:10:0): CAM status: SCSI Status Error
Dec 11 08:49:43 storage1 kernel: (da10:hpt27xx0:0:10:0): SCSI status: OK
Dec 11 08:51:47 storage1 kernel: (da11:hpt27xx0:0:11:0): READ(10). CDB: 28 00 0a b2 63 4a 00 00 80 00
Dec 11 08:51:47 storage1 kernel: (da11:hpt27xx0:0:11:0): CAM status: SCSI Status Error
Dec 11 08:51:47 storage1 kernel: (da11:hpt27xx0:0:11:0): SCSI status: OK
Dec 11 08:52:55 storage1 kernel: (da2:hpt27xx0:0:2:0): WRITE(10). CDB: 2a 00 0f d3 a5 e4 00 00 18 00
Dec 11 08:52:55 storage1 kernel: (da2:hpt27xx0:0:2:0): CAM status: SCSI Status Error
Dec 11 08:52:55 storage1 kernel: (da2:hpt27xx0:0:2:0): SCSI status: OK
Dec 11 08:53:47 storage1 kernel: (da10:hpt27xx0:0:10:0): WRITE(10). CDB: 2a 00 0e 5d da f5 00 00 08 00
Dec 11 08:53:47 storage1 kernel: (da10:hpt27xx0:0:10:0): CAM status: SCSI Status Error
Dec 11 08:53:47 storage1 kernel: (da10:hpt27xx0:0:10:0): SCSI status: OK

During the scrub a single checksum error appeared on da7:

Code:

[root@storage1 ~]# zpool status
  pool: export
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub in progress since Wed Dec 11 07:19:22 2013
        265G scanned out of 447G at 51.8M/s, 0h59m to go
        128K repaired, 59.34% done
config:

        NAME              STATE     READ WRITE CKSUM
        export            ONLINE       0     0     0
          mirror-0        ONLINE       0     0     0
            label/disk1   ONLINE       0     0     0
            label/disk2   ONLINE       0     0     0
            label/disk3   ONLINE       0     0     0
          mirror-1        ONLINE       0     0     0
            label/disk4   ONLINE       0     0     0
            label/disk5   ONLINE       0     0     0
            label/disk6   ONLINE       0     0     1  (repairing)
          mirror-2        ONLINE       0     0     0
            label/disk7   ONLINE       0     0     0
            label/disk8   ONLINE       0     0     0
            label/disk9   ONLINE       0     0     0
          mirror-3        ONLINE       0     0     0
            label/disk10  ONLINE       0     0     0
            label/disk11  ONLINE       0     0     0
            label/disk12  ONLINE       0     0     0
          mirror-4        ONLINE       0     0     0
            label/disk13  ONLINE       0     0     0
            label/disk14  ONLINE       0     0     0
            label/disk15  ONLINE       0     0     0
        spares
          label/disk16    AVAIL

errors: No known data errors

We let the server run with its usual workload and a few similar errors appear in the syslog over time.

I also received an alert e-mail from the HPT server utility as well, however no reporting problems on ZFS or in syslog regarding this.

Code:

 Wed, 11 Dec 2013 16:52:00 GMT:    
	An error occured on the disk at 'WDC WD20EARX-00PASB0-WD-WMAZA8904653' at Controller1-Channel12.

Code:

Dec 12 15:31:14 storage1 kernel: (da0:hpt27xx0:0:0:0): WRITE(10). CDB: 2a 00 11 c9 fb 27 00 00 08 00
Dec 12 15:31:14 storage1 kernel: (da0:hpt27xx0:0:0:0): CAM status: SCSI Status Error
Dec 12 15:31:14 storage1 kernel: (da0:hpt27xx0:0:0:0): SCSI status: OK
Dec 16 21:15:53 storage1 kernel: (da7:hpt27xx0:0:7:0): WRITE(10). CDB: 2a 00 18 84 f4 19 00 00 10 00
Dec 16 21:15:53 storage1 kernel: (da7:hpt27xx0:0:7:0): CAM status: SCSI Status Error
Dec 16 21:15:53 storage1 kernel: (da7:hpt27xx0:0:7:0): SCSI status: OK
Dec 16 21:15:53 storage1 kernel: (da8:hpt27xx0:0:8:0): WRITE(10). CDB: 2a 00 18 84 f4 19 00 00 10 00
Dec 16 21:15:53 storage1 kernel: (da8:hpt27xx0:0:8:0): CAM status: SCSI Status Error
Dec 16 21:15:53 storage1 kernel: (da8:hpt27xx0:0:8:0): SCSI status: OK

We also performed full SMART selftests on each disk and everything turned up clean.

I am trying to determine if we need to worry about these problem and take action or perhaps the WD20EARX Green Drives with the lacking TLER feature are causing alerts. This behavior is better then the behavior of the Areca controller of simply dropping the disk during a time-out. If it's the TLER of a single disk why would more then one disk complain?

If anyone out there has some ideas or experience with this let me know.

Cheers!

waywardnl · Nov 18, 2014

I have the same problem with HPT 2920 SGL and the HPT 2940.

Code:

root@BSD05:/home/roland # zpool status
  pool: zraid
state: ONLINE
  scan: scrub in progress since Mon Nov 17 22:30:08 2014
  2.94T scanned out of 3.87T at 713M/s, 0h22m to go
  20K repaired, 75.91% done
config:

   NAME  STATE  READ WRITE CKSUM
   zraid  ONLINE  0  0  0
    raidz2-0  ONLINE  0  0  0
    da0.eli  ONLINE  0  0  0
    da1.eli  ONLINE  0  0  0
    da2.eli  ONLINE  0  0  0
    da3.eli  ONLINE  0  0  0
    da4.eli  ONLINE  0  0  0
    da5.eli  ONLINE  0  0  0
    da6.eli  ONLINE  0  0  0
    da7.eli  ONLINE  0  0  0
    ada1.eli  ONLINE  0  0  0
    ada3.eli  ONLINE  0  0  0

errors: No known data errors

This is /var/log/messages:

Code:

Nov 18 21:15:03 BSD05 kernel: (da4:hpt27xx0:0:4:0): WRITE(10). CDB: 2a 00 2e ac d2 00 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da4:hpt27xx0:0:4:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da4:hpt27xx0:0:4:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e9 80 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e7 c8 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e7 f0 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac ea 28 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e9 60 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e9 e8 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e9 c0 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e8 98 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e8 50 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): WRITE(10). CDB: 2a 00 2e ac e9 40 00
00 20 00
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): CAM status: SCSI Status Error
Nov 18 21:15:03 BSD05 kernel: (da6:hpt27xx0:0:6:0): SCSI status: OK

Highpoint has written a new driver, but that prevents the Virtualbox driver to be started and that is an absolute must for me. I also used the Highpoint web interface, but I think this web interface blocks the system when the CAM Status errors appear.

/dev/hpt27xx CAM Status: SCSI Status Error

TitanIT

waywardnl