Solved CAM status: SCSI Status Error

dvl@ · Aug 25, 2017

I noticed these errors this morning. Any ideas? Two such drives had similar messages, but on msp2:0 (i.e. the other drive was (da19:mps2:0:12:0)).

More info at https://gist.github.com/dlangille/88eac25349577aaca22a401ac08e9d1b

Code:

Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 84 c2 f8 00 00 c0 00 length 98304 SMID 130 terminated ioc 804b scsi 0 state c xfer 81920
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 84 c3 b8 00 01 00 00 length 131072 SMID 852 terminated ioc 804b scsi 0 state c xfer 0
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 84 c2 f8 00 00 c0 00
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): CAM status: SCSI Status Error
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): SCSI status: Check Condition
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): Retrying command (per sense data)
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 86 d0 c8 00 00 18 00
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): CAM status: SCSI Status Error
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): SCSI status: Check Condition
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): Retrying command (per sense data)

The drive in question:

Code:

[dan@knew:~] $ sudo smartctl -a /dev/da18
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p20 amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5" MD04ACA... Enterprise HDD
Device Model: TOSHIBA MD04ACA500
Serial Number: 653IK1IBFS9A
LU WWN Device Id: 5 000039 65bf80144
Firmware Version: FP2A
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Aug 25 12:23:51 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 542) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 529
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 53
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 063 063 000 Old_age Always - 15169
10 Spin_Retry_Count 0x0033 101 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 53
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 5
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 44
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 694
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 43 (Min/Max 18/50)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 0
222 Loaded_Hours 0x0032 063 063 000 Old_age Always - 15000
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 204
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 15169 -
# 2 Extended offline Completed without error 00% 9 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[dan@knew:~] $

SirDice · Aug 25, 2017

I'd say the disk is close to dying. I'd keep a close eye on it, it's likely these errors will get worse and may start stalling the whole pool (due to the bus resets).

Terri_Kennedy · Aug 26, 2017

SirDice said:
I'd say the disk is close to dying. I'd keep a close eye on it, it's likely these errors will get worse and may start stalling the whole pool (due to the bus resets).

I don't see anything alarming in the SMART stats other than the 5 g-force alerts (which didn't cause any sector re-allocations). It passed a short SMART offline test within the last few hours.

The reported error was a Unit Attention. I'd look at power supply / cabling / expander issues - as this is the 12th drive on the 3rd mps(4) controller in the system, presumably there are a lot of drives in there drawing power. Since there was another error on the neighboring drive, look for things in common (same power cable, backplane, expander port, etc.)

There were some recent commits (within the last month or so) with fixes to mps(4). They probably don't have any impact (for better or worse) on this issue.

dvl@ · Aug 26, 2017

Terry_Kennedy said:
I don't see anything alarming in the SMART stats other than the 5 g-force alerts (which didn't cause any sector re-allocations). It passed a short SMART offline test within the last few hours.

I had not noticed that, thank you.

Terry_Kennedy said:
The reported error was a Unit Attention. I'd look at power supply / cabling / expander issues - as this is the 12th drive on the 3rd mps(4) controller in the system, presumably there are a lot of drives in there drawing power. Since there was another error on the neighboring drive, look for things in common (same power cable, backplane, expander port, etc.)

This made me think.

I am in the process of replacing the existing 3TB drives with 5TB drives.

Power may be an issue.

dvl@ · Aug 27, 2017

I found the drive specs, but it does not differentiate between 5V and 12V draw.

I may be OK with a 24A 5V and 80A 12V given total draw is 20x 11.3W = 226W.

Phishfry · Aug 27, 2017

I imagine at startup would be the most power draw. Can you stagger the startup of the drives?

dvl@ · Aug 27, 2017

Agreed.

All drives are behind LSI HBA (e.g. SAS2008).

I would expect the HBA to handle that, not the BIOS.

Phishfry · Aug 27, 2017

Some interesting numbers here:
http://45drives.blogspot.com/2015/06/staggered-spinup-and-its-effect-on.html

dvl@ · Aug 27, 2017

That tells me the motor uses 12V and the circuitboards use 5V.

dvl@ · Aug 27, 2017

dvl@ said:
Agreed.

All drives are behind LSI HBA (e.g. SAS2008).

I would expect the HBA to handle that, not the BIOS.

NOTE: the SATA drives are all on a backplane in a IPC-ML4U20-MSAS 4U chassis and I've just written the manufacturer for more information about staggered spin up.

dvl@ · Aug 27, 2017

This post seems to indicate that SSU depends upon your backplane. It is not available on the SATA power connector from your PSU

Terri_Kennedy · Aug 27, 2017

dvl@ said:
This post seems to indicate that SSU depends upon your backplane. It is not available on the SATA power connector from your PSU

There are a number of different methods. SSU is one. PUIS (Power Up In Standby) is another, but I don't believe LSI controllers support it - it was more of a 3Ware thing.

I think things have veered off on a tangent, though... The problem happened on a system that has been on and running for some time, not after boot, right? As I said above:

I said:
I'd look at power supply / cabling / expander issues

What sort of cabling are you using? Are all cables properly seated, routed appropriately, etc? If you have lots of individual drive cables running next to each other for long distances, are you using spread spectrum clocking? Are there any SAS expanders involved, or does each drive have its own dedicated controller port (you have at least 3 mps(4) controllers, so I hope you don't have any expanders)? You reported several drives showing this error - what is in common between them - same backplane, same power supply connector, same data connector on the LSI controller? Do you have another power supply you can test with? This last may be difficult if this is a not a server chassis with hot-swap connectors, particularly if you have a rats nest of mixed power / drive / etc. cables. That's why I use server chassis and then make every single cable to the exact length needed, like this.

dvl@ · Aug 28, 2017

This did not happen upon power up, no.

I'm using SFP 8087 cables between the SAS2008 cards and the backplane. There are 20 drives, which means 5 cables.

I have no idea about spread spectrum clocking.

There are no SAS expanders. Everything is SATA.

There are two drives which had this problem. da18 (mps2:0:11:0) on one day at 0610, and da19 (mps2:0:12:0) about 0454 & 0526 the next day. Both are TOSHIBA MD04ACA500. They will both be on the same power supply connector, same SFP 8087 data cable. There are two other drives on the same cable, both also MD04ACA500. see https://gist.github.com/dlangille/88eac25349577aaca22a401ac08e9d1b

All drives are in hot-swap drive trays.

There is only one power supply in this chassis.

Thank you.

Terri_Kennedy · Aug 28, 2017

dvl@ said:
I'm using SFP 8087 cables between the SAS2008 cards and the backplane. There are 20 drives, which means 5 cables.

It is unlikely that you are having a noise / interference problem with SFF-8087 cables and a good backplane. This wouldn't happen to be a Norco brand case, would it?

I have no idea about spread spectrum clocking.

This. But likely not relevant with the drives / cabling you have.

There are two drives which had this problem. da18 (mps2:0:11:0) on one day at 0610, and da19 (mps2:0:12:0) about 0454 & 0526 the next day. Both are TOSHIBA MD04ACA500. They will both be on the same power supply connector, same SFP 8087 data cable. There are two other drives on the same cable, both also MD04ACA500. see https://gist.github.com/dlangille/88eac25349577aaca22a401ac08e9d1b

If you can do this without risk to your storage (for example, if you are using drive labels or GUIDs in a ZFS pool instead of hardware device names), you could try swapping these two drives with 2 other drives that are on a different controller / data cable / power supply cable and see if the problem moves with the drives or stays with the physical slots.

dvl@ · Aug 28, 2017

Terry_Kennedy said:
It is unlikely that you are having a noise / interference problem with SFF-8087 cables and a good backplane. This wouldn't happen to be a Norco brand case, would it?

No, it is a ML4U20.

If the problem occurs again, I'll swap the drives around.

dvl@ · Aug 28, 2017

dvl@ said:
NOTE: the SATA drives are all on a backplane in a IPC-ML4U20-MSAS 4U chassis and I've just written the manufacturer for more information about staggered spin up.

SSU is not implemented on that backplane.

dvl@ · Sep 2, 2017

Eight days later. This is the second of the drives mentioned in the original post.

Background but not necessarily relevant: since my previous post, the system now contains 20x 5TB HDD at 11.3 W max each, and was power cycled on Thursday (2 days ago). The following events occurred about 1 day 14 hours after power up.

At the time, a zfs scrub was underway for the zpool in question.

My review of the below indicates it is very similar to the original (shown in the [gist in the original post](https://forums.freebsd.org/threads/62183/) of this thread).

Code:

Sep  2 11:28:17 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 65 a5 00 00 01 00 00 length 131072 SMID 207 terminated ioc 804b scsi 0 state c xfer 16384
Sep  2 11:28:17 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 65 a6 00 00 01 00 00 length 131072 SMID 628 terminated ioc 804b scsi 0 state c xfer 0
Sep  2 11:28:17 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 65 a5 00 00 01 00 00 
Sep  2 11:28:17 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep  2 11:28:17 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep  2 11:28:17 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  2 11:28:17 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)
Sep  2 11:28:18 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 66 7e c8 00 00 20 00 
Sep  2 11:28:18 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep  2 11:28:18 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep  2 11:28:18 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  2 11:28:18 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)

Diff of smartctl today and from the original post. Nothing spectacular here:

Code:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Monaco; color: #f4f4f4; background-color: #0f0a02} span.s1 {font-variant-ligatures: no-common-ligatures; color: #37bf29} span.s2 {font-variant-ligatures: no-common-ligatures}

[dan@knew:~] $ diff da19.1 da19.2

1d0

< [dan@knew:~] $ sudo smartctl -a /dev/da19

18c17

< Local Time is:    Fri Aug 25 12:20:51 2017 UTC

---

> Local Time is:    Sat Sep  2 18:30:44 2017 UTC

62,63c61,62

<   3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       514

<   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       54

---

>   3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       546

>   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       56

67c66

<   9 Power_On_Hours          0x0032   063   063   000    Old_age   Always       -       15169

---

>   9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       15367

69,73c68,72

<  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       54

< 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       262

< 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       45

< 193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       661

< 194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       44 (Min/Max 18/51)

---

>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       56

> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       269

> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       47

> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       663

> 194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       40 (Min/Max 18/51)

79c78

< 222 Loaded_Hours            0x0032   063   063   000    Old_age   Always       -       15001

---

> 222 Loaded_Hours            0x0032   063   063   000    Old_age   Always       -       15199

90,91c89,91

< # 1  Short offline       Completed without error       00%     15169         -

< # 2  Extended offline    Completed without error       00%         9         -

---

> # 1  Extended offline    Completed without error       00%     15181         -

> # 2  Short offline       Completed without error       00%     15169         -

> # 3  Extended offline    Completed without error       00%         9         -

104d103

<

ralphbsz · Sep 3, 2017

Darn little information. We know that the drive spun up twice since yesterday, and that both times it was power cycled. That could very well be the cause of the "unit attention" that's reported by the low-level SCSI code and shows up in dmesg. Most likely, the power supply situation is still not good enough.

Might be a waste of time, but here is an idea: Perhaps the real problem is not the power supply, but the AC supply into it? Maybe you have rare short power outages, or power drops, not long enough to completely kill everything and cause a shutdown (or make the lights go out), but long enough for the biggest power users (20 disks) to notice? Perhaps it's something like the AC circuit is being shared with a really big consumer (air conditioning in the summer, or a big water pump or fan) whose startup surge takes the power down too far? One way to diagnose this would be to put in a UPS; those tend to be pretty good at monitoring power.

What disturbs me is the g-sense error rate. Seven times in those 24 hours, the drive detected that it was vibrating or shaking too much. That's bad; vibration is the #1 cause of one of the most dangerous disk errors (off-track writes, which are dangerous because the error is not detected while reading, and the drive returns obsolete content). If the drive is vibrating enough to detect errors, that's bad. You should look into what's causing the whole chassis or that drive to vibrate. In the old days, when we still had full-size CD-ROM drives, the #1 cause of disk vibration was being mounted right next to the CD-ROM; today it's usually sympathetic vibration from the neighboring drive.

dvl@ · Sep 3, 2017

Spun up twice since yesterday? You refer to Start_Stop_Count? 45 vs 47. That is consistent with the change in
Power_Cycle_Count is it not, from 54 to 56?

NOTE also, it was not 24 hours between smartctl readings. It was between Aug 25 and Sep 2.

There is a UPS in use, an APC 2200.

Re vibrations: I suspect the major vibrations might be: opening/closing the rack doors. inserting other drive trays.

Thank you.

ralphbsz · Sep 4, 2017

Yes, the start/stop count delta is exactly the same as the power cycle count. That tells us that each time it spins up, it is caused by what the drive perceives as a power cycle event (whether it's real or not is the big question). Big oops on my part: I didn't pay attention that this was over a multi-day period, not just 24 hours.

So we're saying: There is already a UPS in use (and it is a big one, the 2200 is not one of those office-supply cheap ones), there were no real power cycles, yet this disk saw two power cycles? That would mean there is still a minor power supply problem. But two power events in over a week may not be a big deal.

The other big question is whether the increase in the vibration (the g sense error rate) is significant or not, and whether it is commensurate with your explanation of human operations (doors, drive trays). The problem with SMART is that the values are only calibrated on a vendor- or device-specific level. So we don't know whether the increase from 262 to 269 is a big deal or a statistical fluctuation. If it bugs you, you'll have to contact the vendor support to find out.

dvl@ · Sep 4, 2017

ralphbsz said:
So we're saying: There is already a UPS in use (and it is a big one, the 2200 is not one of those office-supply cheap ones), there were no real power cycles, yet this disk saw two power cycles? That would mean there is still a minor power supply problem. But two power events in over a week may not be a big deal.

There were indeed two power cycles. I powered it off at least once to remove an internal HDD.

When powering up, there was a boot drive issue, so I had to reboot. It might interpret that as a power cycle.

dvl@ · Sep 4, 2017

FYI, the system booted just fine with 20x 5TB drives. (I thought I posted this several days ago).

dvl@ · Sep 4, 2017

For the record:

Code:

Sep  4 05:22:20 knew kernel: (pass19:mps2:0:2:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 843 command timeout cm 0xfffffe0000dbb270 ccb 0xfffff80a5c84b000
Sep  4 05:22:20 knew kernel: (noperiph:mps2:0:4294967295:0): SMID 1 Aborting command 0xfffffe0000dbb270
Sep  4 05:22:20 knew kernel: mps2: Sending reset from mpssas_send_abort for target ID 2
Sep  4 05:22:24 knew kernel: mps2: mpssas_action_scsiio: Freezing devq for target ID 2
Sep  4 05:22:24 knew kernel: mps2: Unfreezing devq for target ID 2

Sep  4 05:22:24 knew smartd[1139]: Device: /dev/da16 [SAT], failed to read SMART Attribute Data

Sep  4 06:06:07 knew kernel: (da16:mps2:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 01 81 63 d5 78 00 00 00 10 00 00
Sep  4 06:06:07 knew kernel: (da16:mps2:0:2:0): CAM status: SCSI Status Error
Sep  4 06:06:07 knew kernel: (da16:mps2:0:2:0): SCSI status: Check Condition
Sep  4 06:06:07 knew kernel: (da16:mps2:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  4 06:06:07 knew kernel: (da16:mps2:0:2:0): Retrying command (per sense data)

Sep  4 07:06:19 knew kernel: (da19:mps2:0:12:0): READ(16). CDB: 88 00 00 00 00 01 d3 38 53 b0 00 00 00 08 00 00
Sep  4 07:06:19 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep  4 07:06:19 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep  4 07:06:19 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  4 07:06:19 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)

Sep  4 08:00:17 knew kernel: (da19:mps2:0:12:0): READ(16). CDB: 88 00 00 00 00 02 20 97 56 d8 00 00 00 58 00 00
Sep  4 08:00:17 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep  4 08:00:17 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep  4 08:00:17 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  4 08:00:17 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)

dvl@ · Sep 9, 2017

Don't mind me. I'm just keeping track here.

Code:

Sep  9 07:59:57 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 a2 a8 4e e0 00 00 10 00 length 8192 SMID 698 terminated ioc 804b scsi 0 state c xfer 0
Sep  9 07:59:57 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 a2 7f c4 58 00 00 30 00 length 24576 SMID 473 terminated ioc 804b scsi 0 state c xfe
Sep  9 07:59:57 knew kernel: r 0
Sep  9 07:59:57 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 a2 a8 4e e0 00 00 10 00 
Sep  9 07:59:57 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep  9 07:59:57 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep  9 07:59:57 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  9 07:59:57 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)
Sep  9 07:59:58 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 a2 a8 ae 28 00 00 70 00 
Sep  9 07:59:58 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep  9 07:59:58 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep  9 07:59:58 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  9 07:59:58 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)

dvl@ · Sep 9, 2017

Another one:

Code:

Sep  9 12:55:33 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 3f 8d 76 28 00 00 f0 00 length 122880 SMID 491 terminated ioc 804b scsi 0 state c xfer 0
Sep  9 12:55:33 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 3f e2 43 c8 00 01 00 00 length 131072 SMID 912 terminated ioc 804b scsi 0 state c xfer 114692
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 3f 8d 76 28 00 00 f0 00
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): CAM status: SCSI Status Error
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): SCSI status: Check Condition
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): Retrying command (per sense data)
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 3f 8e 60 18 00 00 98 00
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): CAM status: SCSI Status Error
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): SCSI status: Check Condition
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep  9 12:55:34 knew kernel: (da18:mps2:0:11:0): Retrying command (per sense data)

Solved CAM status: SCSI Status Error

Administrator