CAM status: CCB request aborted by the host

Today after several hours of write operation I had 3 disks removed around same time:


Code:
Aug 13 12:05:05 ****** mps0: (da3:mps0:0:8:0): CAM status: CCB request completed with an error
Aug 13 12:05:05 ****** Controller reported scsi ioc terminated tgt 7 SMID 1498 loginfo 31110d00
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): Retrying command, 3 more tries remain
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 cb d0 00 00 00 08 00 00
Aug 13 12:05:05 ****** mps0: Controller reported scsi ioc terminated tgt 7 SMID 1799 loginfo 31110d00
Aug 13 12:05:05 ****** mps0: Controller reported scsi ioc terminated tgt 7 SMID 1181 loginfo 31110d00
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): CAM status: CCB request completed with an error
Aug 13 12:05:05 ****** mps0: Controller reported scsi ioc terminated tgt 7 SMID 2040 loginfo 31110d00
Aug 13 12:05:05 ****** mps0: Controller reported scsi ioc terminated tgt 7 SMID 1825 loginfo 31110d00
/---/
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 b0 58 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 ca d8 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): CAM status: CCB request completed with an error
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): CAM status: CCB request completed with an error
Aug 13 12:05:05 ****** (da2:mps0:00 00 03 89 47 d6 90 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): Retrying command, 2 more tries remain
/---/
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 ca 70 00 00 00 08 00 00
Aug 13 12:05:05 ****** mps0: (da3:mps0:0:8:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** mpssas_prepare_remove: Sending reset for target ID 7
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 ca 68 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** mps0: (da3:mps0:0:8:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** Unfreezing devq for target ID 8
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 ca 60 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): CAM status: CCB request aborted by the host
/---/
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** da3 at mps0 bus 0 scbus0 target 8 lun 0
Aug 13 12:05:05 ****** da3: <ATA Samsung SSD 870 2B6Q>  s/n ***************      detached
Aug 13 12:05:05 ****** GEOM_MIRROR: Request failed (error=6). da3[WRITE(offset=7776301056000, length=4096)]
Aug 13 12:05:05 ****** GEOM_MIRROR: Device k5: provider da3 disconnected.
/---/
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 b0 f0 00 00 00 08 00 00
Aug 13 12:05:05 ****** mps0: (da2:mps0:0:7:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** No pending commands: starting remove_device
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 b0 e8 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 af a8 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** mps0: (da2:mps0:0:7:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** Unfreezing devq for target ID 7
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 af a0 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): CAM status: CCB request aborted by the host
/---/
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 03 89 47 9f c0 00 00 00 08 00 00
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): CAM status: CCB request aborted by the host
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): Retrying command, 2 more tries remain
Aug 13 12:05:05 ****** da2 at mps0 bus 0 scbus0 target 7 lun 0
Aug 13 12:05:05 ****** da2: <ATA Samsung SSD 870 2B6Q>  s/n ***************      detached
Aug 13 12:05:05 ****** GEOM_MIRROR: Request failed (error=6). da2[WRITE(offset=7776297885696, length=4096)]
Aug 13 12:05:05 ****** GEOM_MIRROR: Device k4: provider da2 disconnected.
Aug 13 12:05:05 ****** (da3:mps0:0:8:0): Periph destroyed
Aug 13 12:05:05 ****** (da2:mps0:0:7:0): Periph destroyed


Am I correct assuming, that because I lost 3 disks in succession, this cannot be disk problem or cable problem or hot swap bay problem, but a SATA controller hardware problem instead?

From startup log:

Code:
Aug 12 15:08:02 ****** mps0: <Avago Technologies (LSI) SAS2008> port 0xe000-0xe0ff mem 0xf76c0000-0xf76c3fff,0xf7680000-0xf76bffff irq 16 at device 0.0 on pci1
Aug 12 15:08:02 ****** mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 12 15:08:02 ****** mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

It is LSI 9200-8i controller. Disks are 8TB Samsung 870 QVO.
I currently have no spare controllers to test. I have 9400-16i on order, but it will be month before it arrives.
 
Good point.
I am not using expander board... but instead this:
Oimaster 6-disk hot swap drive enclosure. I guess it contains some electronics too, that can fail.
I have 3 of them. And all 3 disks that failed, were in the same enclosure...

As a temporary solution I connected those failed disks to ASMedia 1064 PCI-E 1X controller card, to see how they behave in the future... If they fail again, it must be the drive enclosure problem.
 
And all 3 disks that failed, were in the same enclosure...
That seems to much of a coincidence. It may indeed be that enclosure. Is it a 1-to-1 enclosure? I mean it has three SATA connectors for the three individual drives? Should have very little electronics, but perhaps it's the power distribution of that enclosure?

With an expander board I generally refer to one that takes one SAS/SATA connector from the controller and expands this to 3 or 4 drives. You see these in servers. Had a brand new server once, 24 disks, 8 port controller. Disks just kept randomly dropping off. Controller was fine, disks were fine, it was the expander board that was dodgy.
 
Yes, this enclosure has individual SATA data connectors for each drives, 6 total. And 2 MOLEX power plugs, that are shared for everything.
I guess electronics are mostly to flash LED-s, blue and red.
 
Are the disks usable now? It could have been a one-time glitch.

Could be expander. To find out whether there is one on the data path, you may need to use LSI-specific utilities to see the SAS topology in detail.

Could be cabling or data connectors. Errors on one cable are capable of causing IOs on disks that are on different cables to be aborted, if you're using the "wrong" LSI firmware version.

The most likely explanation, in my mind, would be shared power distribution or connectors.
 
Yes, disks seem to be working fine now on ASMedia controller. gmirror resync finished at night, so each drive got written 8TB of data. Drives are in same bays in same enclosure, I only changed controller (and SFF-8087 to SATA cable).

Will leave everything as it is until 9400-16i controller arrives. It is also coming with new different type of cables (SFF-8643).

Thank you for advice.
 
Back
Top