Other Uncorrectable parity/CRC error FreeBSD with Seagate drives "ST2000LM015"

Hello,

Specs:

Supermicro Server
Board: http://www.supermicro.com/products/motherboard/atom/x10/a1sai-2750f.cfm
32GB ECC

4x Seagate HDD ST2000LM015-2E8174
ada0, ada1, ada2, ada4

FS: ZFS with geli encryption
OS: FreeBSD 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 #0: Thu Sep 27 08:16:24 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

Errors:

Code:
Nov 26 16:48:27 zbsd kernel: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 26 16:49:27 zbsd kernel: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 26 16:49:27 zbsd kernel: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 26 16:49:27 zbsd kernel: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 26 16:50:27 zbsd kernel: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 26 16:50:27 zbsd kernel: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error

Code:
(ada1:ahcich2:0:0:0): Retrying command
(ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 68 f2 27 40 da 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich2:0:0:0): Retrying command
(ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 b0 68 b5 1f 40 c2 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich2:0:0:0): Retrying command
(ada1:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 b0 30 d8 a8 40 c2 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich2:0:0:0): Retrying command

I get a lot of errors all the time.

I already tried:

- exchange all SATA cables -> I get still CRC errors
- connect a Samsung SSD with ada3 SATA cable -> no errors on the SSD
- connect a Toshiba HDD 500GB (old) with ada4, created a zpool with zfs dataset and transfered a lot of data -> no CRC errors
- exchanged a CRC error affected Seagate HDD with a new one from Seagate (newer model warranty ST2000LM015-2E8174) -> I get still errors on the new drive (!!)
- then I connected this new drive with a SATA - USB-C cable on my macbook pro and I did a SMART test:
Code:
### SYSTEM INFORMATION ###
Report Timestamp                     : 27. November 2018 11:38:27 MEZ
Report Timestamp (ISO 8601 format)   : 2018-11-27T11:38:27

Application Name                     : DriveDx
Application Version                  : 1.8.1.605
Application SubBuild                 : 0
Application Edition                  : Standalone
Application Website                  : https://binaryfruit.com/drivedx
DriveDx Knowledge Base Revision      : 9/9

Computer Name                        : xxxxx
Host Name                            : xxxxx
Computer Model                       : MacBookPro15,2

OS Boot Time                         : 2018-11-27T11:07:57
Time Since Boot                      : 00h 30m 30s
OS Name                              : macOS
OS Version                           : 10.14.0
OS Build                             : 18A389
OS Kernel Version                    : Darwin 18.0.0

SAT SMART Driver Version             : 0.8.1s
ATA Command Support Tolerance        : verypermissive
N of drives in report                : 1



### DRIVE 1 OF 1 ###
Last Checked                         : 27. November 2018 11:35:18 MEZ
Last Checked (ISO 8601 format)       : 2018-11-27T11:35:18

Advanced SMART Status                : OK
Overall Health Rating                : GOOD 100%
Overall Performance Rating           : GOOD 100%
Issues found                         : 2

Serial Number                        : xxxxxx
WWN Id                               : 5 000c50 0b96df2f3
Volumes                              : Ohne Titel
Device Path                          : /dev/disk2
Total Capacity                       : 2.0 TB (2.000.398.934.016 Bytes)
Model Family                         : Seagate Barracuda 2.5 5400
Model                                : ST2000LM015-2E8174
Form Factor                          : 2.5 inches
Firmware Version                     : SDM1
Drive Type                           : HDD 5400 rpm

Power On Time                        : 5 hours (5 hours)
Power Cycles Count                   : 6
Current Power Cycle Time             : 0.5 hours



=== DEVICE CAPABILITIES ===
S.M.A.R.T. support enabled           : yes
DriveDx Active Diagnostic Config     : Seagate HDDs config [hdd.seagate]
Sector Logical Size                  : 512
Sector Physical Size                 : 4096
Physical Interconnect                : USB
Logical Protocol                     : USB
Removable                            : yes
Ejectable                            : no
ATA Version                          : ACS-3 T13/2161-D revision 3b
SATA Version                         : SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
I/O Path                             : IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP09@1D/IOPP/UPSB@0/IOPP/DSB2@2/IOPP/XHC3@0/XHC3@01000000/SSP2@01200000/JMS567@01200000/MSC Bulk-Only Transport@0/IOUSBMassStorageInterfaceNub/IOUSBMassStorageDriverNub/IOUSBMassStorageUASDriver/IOSCSITargetDevice/IOSCSIHierarchicalLogicalUnit@0000000000000000/org_dungeon_driver_IOSATDriver/IOSATServices

Enclosure Vendor Id / Product Id     : 0x152d / 0x567
SAT Pass Through Mode                :

Attributes Data Structure Revision   : 10
SMART Command Transport (SCT) flags  : 0x3035
SCT Status supported                 : yes
SCT Feature Control supported        : yes
SCT Data Table supported             : yes
Error logging capabilities           : 0x1
Self-tests supported                 : yes
Offline Data Collection capabilities : 0x71
Offline Data Collection status       : 0x0
Auto Offline Data Collection flags   : 0x0
[Known device                       ]: yes
[Drive State Flags                  ]: 0xc0000000
[Last State Change Timestamp        ]: 2018-11-27T11:29:07
[Last State Change Flags            ]: 0x40000000
[Last State Change Diff Flags       ]: 0x1
Last Email Report Timestamp          : 2018-11-27T11:29:07
Last Email Report Reason Flags       : 0x40000000
Last Email Report State Change Flags : 0x1


=== CURRENT POWER CYCLE STATISTICS ===
Data Read                           : 2.4 GB
Data Written                        : 3.2 GB
Data Read/Write Ratio               : 0.74
Average Throughput (Read)           : 15.4 MB/s
Average Throughput (Write)          : 14.6 MB/s

Operations (Read)                   : 8.156
Operations (Write)                  : 7.071
Operations Read/Write Ratio         : 1
Throughput per operation (Read)     : 306.5 KB/Op
Throughput per operation (Write)    : 480.5 KB/Op

Latency Time (Read)                 : 0 ns
Latency Time (Write)                : 0 ns
Retries (Read)                      : 0
Retries (Write)                     : 0
Errors (Read)                       : 0
Errors (Write)                      : 0


=== PROBLEMS SUMMARY ===
Failed Indicators (life-span / pre-fail)  : 0 (0 / 0)
Failing Indicators (life-span / pre-fail) : 0 (0 / 0)
Warnings (life-span / pre-fail)           : 2 (2 / 0)
Recently failed Self-tests (Short / Full) : 0 (0 / 0)
I/O Error Count                          : 0 (0 / 0)
Time in Under temperature                 : 0 minutes
Time in Over temperature                  : 0 minutes


=== IMPORTANT HEALTH INDICATORS ===
ID  NAME                                         RAW VALUE                  STATUS
  5 Reallocated Sector Count                     0                          100% OK
187 Reported Uncorrectable Errors                0                          100% OK
197 Current Pending Sector Count                 0                          100% OK
198 Offline Uncorrectable Sector Count           0                          100% OK
199 UDMA CRC Error Count                         2.998                      100% Warning
241 Total LBAs Written                           494.120.089 (235.6 GB)     100% OK


=== TEMPERATURE INFORMATION (CELSIUS) ===
Current Temperature                  : 28
Power Cycle Min Temperature          : 22
Power Cycle Max Temperature          : 29
Lifetime Min Temperature             : 20
Lifetime Max Temperature             : 49
Recommended Min Temperature          : 5
Recommended Max Temperature          : 55
Temperature Min Limit                : 5
Temperature Max Limit                : 60


=== DRIVE HEALTH INDICATORS ===
ID   | NAME                                        | TYPE      | UPDATE | RAW VALUE                  | VALUE | THRESHOLD | WORST | LAST MODIFIED        | STATUS      
   1   Raw Read Error Rate                           Pre-fail    online            0xDF0E9C1              84           6     66                       -    100%  OK      
   3   Spin Up Time                                  Pre-fail    online                0                  99           0     99                       -   99.0%  OK      
   4   Start Stop Count                              Life-span   online                13                100          20    100                       -    100%  OK      
   5   Reallocated Sector Count                      Pre-fail    online                0                 100          36    100                       -    100%  OK      
   7   Seek Error Rate                               Pre-fail    online             0x5CF8F              100          45    253                       -    100%  OK      
   9   Power On Hours                                Life-span   online                5                 100           0    100                       -    100%  OK      
  10   Spin Retry Count                              Pre-fail    online                0                 100          97    100                       -    100%  OK      
  12   Power Cycle Count                             Life-span   online                6                 100          20    100                       -    100%  OK      
184   End-to-End Error                              Life-span   online                0                 100          99    100                       -    100%  OK      
187   Reported Uncorrectable Errors                 Life-span   online                0                 100           0    100                       -    100%  OK      
188   Command Timeout                               Life-span   online              4.370               100           0      1                       -    100%  Warning  
189   High Fly Writes                               Life-span   online                0                 100           0    100                       -    100%  OK      
190   Airflow Temperature Celsius                   Life-span   online                28                 72          40     60                       -   53.3%  OK      
191   G-Sense Error Rate                            Life-span   online                1                 100           0    100                       -    100%  OK      
192   Power-Off Retract Count                       Life-span   online                2                 100           0    100                       -    100%  OK      
193   Load Cycle Count                              Life-span   online                85                100           0    100                       -    100%  OK      
194   Temperature (Celsius)                         Life-span   online                28                 72          40     60                       -   53.3%  OK      
197   Current Pending Sector Count                  Life-span   online                0                 100           0    100                       -    100%  OK      
198   Offline Uncorrectable Sector Count            Life-span   offline               0                 100           0    100                       -    100%  OK      
199   UDMA CRC Error Count                          Life-span   online              2.998               200           0    162                       -    100%  Warning  
240   Head Flying Hours                             Life-span   offline               2                 100           0    253                       -    100%  OK      
241   Total LBAs Written                            Life-span   offline     494.120.089 (235.6 GB)      100           0    253                       -    100%  OK      
242   Total LBAs Read                               Life-span   offline     472.702.288 (225.4 GB)      100           0    253                       -    100%  OK      
254   Free Fall Sensor                              Life-span   online                0                 100           0    100                       -    100%  OK      



=== DRIVE ERROR LOG ===
error log is empty


=== DRIVE SELF-TEST LOG ===
#   | LIFETIME (H)   | TEST TYPE         | PROGRESS | STATUS                          | LBA of 1st error
1           5          Short offline         100%     Completed without error              -


So is freebsd incompatible with this seagate drives? Because I didnt get new UDMA CRC errors when I connecting it with my macbook and I do not get CRC errors with this toshiba drive or the SSD?
Maybe its an seagate firmware issue? or a driver issue?

I tried this:

#9
and this
#2

Still with CRC errors.

Maybe someone can help me,

thanks!


##Edit:


One of the other Seagate drives:
Code:
smartctl -A /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   073   064   006    Pre-fail  Always       -       21346192
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       108
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   084   060   045    Pre-fail  Always       -       13720949696
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8300 (83 73 0)
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       108
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       80
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   058   046   040    Old_age   Always       -       42 (Min/Max 26/42)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       2
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       9
193 Load_Cycle_Count        0x0032   094   094   000    Old_age   Always       -       13829
194 Temperature_Celsius     0x0022   042   054   000    Old_age   Always       -       42 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       80
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       8135 (197 19 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       43070359732
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       18232219495
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

Sata Controller:
Code:
ahci0: <Intel Avoton AHCI SATA controller> port 0xe150-0xe157,0xe140-0xe143,0xe130-0xe137,0xe120-0xe123,0xe040-0xe05f mem 0xdf2f2000-0xdf2f27ff irq 19 at device 23.0 on pci0
ahci0: AHCI v1.30 with 4 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahci1: <Intel Avoton AHCI SATA controller> port 0xe110-0xe117,0xe100-0xe103,0xe0f0-0xe0f7,0xe0e0-0xe0e3,0xe020-0xe03f mem 0xdf2f1000-0xdf2f17ff irq 19 at device 24.0 on pci0
ahci1: AHCI v1.30 with 2 6Gbps ports, Port Multiplier not supported
ahcich4: <AHCI channel> at channel 0 on ahci1
ahcich5: <AHCI channel> at channel 1 on ahci1
 
I think the errors you are seeing are communication errors on the SATA bus, between the host (the SuperMicro motherboard) and the target (the Seagate disk). UDMA CRC errors are not caused by the spinning platter or the disk, as far as I know. The source of the errors could be a bad SATA port on the motherboard, a bad SATA cable, a bad SATA port on the disk, or power supply issues to the disk. It is very unlikely to be a firmware issue; these types of drives are used all over. Matter-of-fact, I used to have the predecessor model (the 1TB Barracuda) at home for several years.

You have already replaced the cable. You have connected something else to the motherboard port. Sadly, I conclude that the disk is faulty. Not very faulty (it mostly works, and it worked perfectly when connected to your Macbook). I would try to switch power supply, switch to different motherboard ports, and try replacing the cable again. Sometimes it helps to just plug the cable in the other way round: SATA cables are electrically symmetric, but maybe some of the connectors involved are crooked, or have a little dirt on them.

If nothing helps, throw the disk away. Or keep using it, but understand the risk that it might get worse.

By the way, how did you connect a SATA disk to a Macbook? They don't have SATA port. You must have used some sort of adapter.
 
Your disk is bad. Replace it.
Raw_Read_Error_Rate 0x000f 073 064 006 Pre-fail Always - 21346192
Seek_Error_Rate 0x000f 084 060 045 Pre-fail Always - 13720949696
 
Not necessarily. Raw read error rate can be very high on a good drive. That's because of how drives work internally today: they seek to the location where the track should be, and immediately open the read gate and start assembling data. Now in reality they are not quite at the correct place yet, and are actually still servoing. At this time, they will get a lot of read errors, and that's OK: If by happenstance they get a sector during this phase, that's good; if they don't get it yet, they account for as an error, and keep reading and serving until things stability.

Typically, on SATA disks good predictors of impending failure are items 5, 197 and 198. Items 1 and 7 have little importance, and can increase while the disk is running normally. For details: I think BackBlaze has a paper that explains it, and there are a few academic publications that do the same thing. I know someone somewhere did a "machine learning" paper about what SMART characteristics predict drive failure, and found the same answer everyone else finds.

Note: I'm not saying that a high count of raw read errors is good. I'm just saying that it is not necessarily bad, and is usually irrelevant.
 
Thank you all for your help!

I will try seatools on a Windows machine.
I wrote a report to Seagate, maybe they can help me or offer me to replace the disks.

By the way, how did you connect a SATA disk to a Macbook? They don't have SATA port. You must have used some sort of adapter.

I have a "Argus" USB-C to SATA Adapter.
 
Ok, if I dont use a specific SATA Port on the mainboard it works without problems... weird
 
Back
Top