Uncorrectable Error on SSD Drive

cyrille

Member

Reaction score: 11
Messages: 95

Hello
I've got problems with a SSD hard drive (USB)
smartctl returns me bad block errors.

187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 549


Code:
smartctl -a /dev/da0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 12.0-RELEASE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Intel 540 Series SSDs
Device Model:     INTEL SSDSC2KW240H6
Serial Number:    CVLT736000HE240CGN
LU WWN Device Id: 5 5cd2e4 14e8f3ab3
Firmware Version: LSF036C
User Capacity:    240 056 327 680 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Apr  1 14:09:06 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x53) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  15) minutes.
SCT capabilities:            (0x0039)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       6
  9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age   Always       -       1142h+00m+00.000s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1261
170 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   010    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   010    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       121
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       549
190 Airflow_Temperature_Cel 0x0032   031   059   000    Old_age   Always       -       31 (Min/Max 23/59)
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       121
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       61069
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       0
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       0
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       0
232 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   098   098   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       61069
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       57385
249 NAND_Writes_1GiB        0x0032   100   100   000    Old_age   Always       -       1174
252 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       9

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short off[ICODE][/ICODE]line       Completed without error       00%      1142         -
# 2  Short offline       Completed without error       00%      1142         -

SMART Selective self-test log data structure revision number 1
 SPAN         MIN_LBA         MAX_LBA  CURRENT_TEST_STATUS
    1  70403103932424  70403103932424  Not_testing
    2  70403103932424  70403103932424  Not_testing
    3  70403103932424  70403103932424  Not_testing
    4  70403103932424  70403103932424  Not_testing
    5  70403103932424  70403103932424  Not_testing
Selective self-test flags (0x4008):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I did not find the badblocks program in the freebsd repositories. Is it present elsewhere ?

Just to know if I had a chance to repair it or if I had to direct it to the trash ...

Would a complete reformatting correct the problem? There, I have freeze repeatedly ...

Thanks !
 
OP
cyrille

cyrille

Member

Reaction score: 11
Messages: 95

OK thanks for this information
It's strange, badblock return any error

Code:
badblocks -v /dev/da0
Vérification des blocs 0 à 234430006
Vérification des blocs défectueux (test en mode lecture seule) : complété                                             
Passe complétée, 0 blocs défectueux repérés. (0/0/0 erreurs)

I've just refomated it and retry to use, I'll see...
 

swegen

Member

Reaction score: 57
Messages: 85

I would use security erase to reinitialize all the memory cells before trying to use the drive again.

Still, those remapped sectors with only about a thousand power on hours would indicate there is something seriously wrong with your SSD.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,768
Messages: 39,377

Still, those remapped sectors with only about a thousand power on hours would indicate there is something seriously wrong with your SSD.
Yeah, it doesn't exactly fill me with confidence either.

I always like to show off:
Code:
=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     KINGSTON SV300S37A60G
Serial Number:    XXXXXXXXXX
LU WWN Device Id: YYYYYYY
Firmware Version: 505ABBF1
User Capacity:    60,022,480,896 bytes [60.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr  3 11:34:46 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

{snip}
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   120   120   050    Pre-fail  Always       -       0/0
  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours_and_Msec 0x0032   047   047   000    Old_age   Always       -       46931h+25m+59.490s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       43
171 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       23
177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       9
181 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 Airflow_Temperature_Cel 0x0000   028   070   000    Old_age   Offline      -       28 (Min/Max 18/70)
194 Temperature_Celsius     0x0022   028   070   000    Old_age   Always       -       28 (Min/Max 18/70)
195 ECC_Uncorr_Error_Count  0x001c   100   100   000    Old_age   Offline      -       0/0
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   100   100   000    Old_age   Offline      -       0/0
204 Soft_ECC_Correct_Rate   0x001c   100   100   000    Old_age   Offline      -       0/0
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail  Always       -       100
231 SSD_Life_Left           0x0013   093   093   010    Pre-fail  Always       -       0
233 SandForce_Internal      0x0000   000   000   000    Old_age   Offline      -       25954
234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       12641
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       12641
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       53

Note the power-on hours, that's around 5,5 years of nearly non-stop operation (only 43 power cycles). This thing is long past its expected lifetime, yet all error counts (correctable or not) are still at 0. I do expect it to fail at some point in time but looking at the numbers I think I can squeeze a few more years out of it. In this same time-span a friend of mine managed to burn through 3 or 4 SSDs. I don't know what I'm doing differently, maybe it's because my stuff is on 24/7 and rarely switched off. While he's constantly rebooting, reinstalling and swapping disks in and out.
 

ralphbsz

Son of Beastie

Reaction score: 2,421
Messages: 3,292

I think the answer is simple: You have a quality device: Manufacturer is Kingston (a former division of IBM), with a SandForce controller. That means a good FTL (flash translation layer). I happen to have Intel SSDs that are at 43K hours, 133 power cycles, and also zero errors, similar high-quality devices. I have no idea what I would replace them with if I had to buy something new today; fortunately, I have one spare Intel sitting around.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,768
Messages: 39,377

I think the answer is simple: You have a quality device: Manufacturer is Kingston (a former division of IBM), with a SandForce controller.
Never expected that. It was fairly cheap compared to other comparable (commercial) devices when I bought it. I think I even bought it with a discount, so I spent peanuts on it. As cheap usually doesn't mean "good quality" I more or less expected I would have to replace it after one or two years. Yet here we are, more than 5 years later, and it's still humming along nicely.
 

Sevendogsbsd

Daemon

Reaction score: 695
Messages: 1,142

Interesting data. What smartctl flag(s) did you use SirDice? I use 2 Samsung 850s: 1 Pro and 1 EVO but don't get any
Code:
Retired_Block_Count
entries. I have had mine about 3-4 years, done countless Linux installs and I power cycle daily. I'll post my data later after work. The Samsung's aren't cheap so I am interested in other brands. I believe size is a factor as well in terms of lifespan, correct? I have read larger ones last longer, but I have no way to prove that.

I am running strictly FreeBSD, UFS with trim enabled on both drives, and I use tmpfs.
 

recluce

Active Member

Reaction score: 40
Messages: 170

I have both Samsung and Intel SSDs approaching 50,000 hours and with around 90% SSD life left according to SMART data. I even have two rare early Seagate SSDs beyond 40,000 hours - they are doing fine as well.

@ Sevendogsbsd (wuff): The reported SMART properties vary depending on the brand and model of SSD. While this is also true, to a degree, for HDDs, the variety for SSDs is much more pronounced. So I doubt that there is any option to get Retired_Block_Count from your Samsung SSD.
 

Sevendogsbsd

Daemon

Reaction score: 695
Messages: 1,142

Thanks for the reply (arf!) Since I have about 5k hours (I think, will double check later) on these drives, they've got some life left then!
 

Sevendogsbsd

Daemon

Reaction score: 695
Messages: 1,142

As promised (not the complete output, was too long):

Code:
=== START OF INFORMATION SECTION ===                                                                                                               
Model Family:     Samsung based SSDs                                                                                                               
Device Model:     Samsung SSD 850 EVO 250GB                                                                                                         
Serial Number:    S2R5NXAH310327M                                                                                                                   
LU WWN Device Id: 5 002538 d40bb60cb                                                                                                               
Firmware Version: EMT02B6Q                                                                                                                         
User Capacity:    250,059,350,016 bytes [250 GB]                                                                                                   
Sector Size:      512 bytes logical/physical                                                                                                       
Rotation Rate:    Solid State Device                                                                                                               
Form Factor:      2.5 inches                                                                                                                       
Device is:        In smartctl database [for details use: -P show]                                                                                   
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c                                                                                           
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)                                                                                           
Local Time is:    Wed Apr  3 19:23:41 2019 CDT                                                                                                     
SMART support is: Available - device has SMART capability.                                                                                         
SMART support is: Enabled

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       5082
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       1282
177 Wear_Leveling_Count     0x0013   098   098   000    Pre-fail  Always       -       23
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   076   062   000    Old_age   Always       -       24
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       239
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       13998388140

and for the 850 PRO:
Code:
=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 850 PRO 512GB
Serial Number:    S2BENWAG308300T
LU WWN Device Id: 5 002538 8700d12a6
Firmware Version: EXM04B6Q
User Capacity:    512,110,190,592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr  3 19:29:20 2019 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       6244
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       1804
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       20
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   075   062   000    Old_age   Always       -       25
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       404
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       17547816617
 
Top