Other Do I need to worry about these SMART values on an SSD?

The SSD is a chinese 'maxsun' 512GB, as seen at this link. https://www.maxsun.com/products/terminator-solid-state-drive-x5

I get the following output from 'smartctl -a'

MART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 050 Old_age Always - 843
12 Power_Cycle_Count 0x0032 100 100 050 Old_age Always - 248
160 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
161 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 100
163 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 82
164 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 35
165 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 66
166 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 2
167 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 27
168 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 0
169 Unknown_Attribute 0x0032 100 100 050 Old_age Always - 100
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 83886080
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 240214
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 35696142
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 36
181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 28
194 Temperature_Celsius 0x0032 100 100 050 Old_age Always - 0
198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always - 0
232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always - 100
241 Total_LBAs_Written 0x0032 100 100 050 Old_age Always - 120435
242 Total_LBAs_Read 0x0032 100 100 050 Old_age Always - 48854

SMART Error Log Version: 0
No Errors Logged

Running the short self-test with smartctl returns zero errors.

On the face of it there are no bad blocks; field 5 is zero and fields 181 and 182 are both zero.

HOWEVER - I am confused about the meaning of fields 175 and 176, which both show quite large raw value counts, and both appear to relate to bad nand blocks. I can't find any data on those fields on the manufacturer's website.

Wiki says https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology#cite_note-46
Although they give a completely different definition for field 175.

So I was expecting to see zero values in fields 175 and 176, for a healthy drive, not the large raw values reported.

Can anyone shed any light on this? Is there anything to worry about with this drive?
 
I don't have experience with those drives and am not familiar with that value from it. Too often I see values don't make sense or meet expectations for SMART output. Some values are something that showed errors to me only by monitoring the rate of change for the value when repeating approximately a similar drive activity and the value itself was rolling through a rather large range of values even under normal drive conditions.

Though 175 and 176 might mean something is going on, I'd be more inclined to watch out for 178 increasing further. Drives usually have reserved blocks for use as spots on disk are bad and need a replacement.

More values might get defined if the drive is added to an updated drive database. Update /usr/local/share/smartmontools/drivedb.h with a copy of https://github.com/smartmontools/smartmontools/raw/master/smartmontools/drivedb.h manually or by running /usr/local/sbin/update-smart-drivedb . I'd also try -x in place of -a which gives a different readout and sometimes different details. sysutils/gsmartcontrol is a GUI that will note certain attributes and log details to flag them, and the drive, in red and pink color to draw attention to something that may be or is a problem.

You can always reach out to the manufacturer to see if they can explain the property's meanings and what is good/bad to watch for in them. If given accurate information then you may have gathered more notes that can go into a drivedb update for everyone.
 
Not very helpful, but (Windows) CrystalDisk Info uses a percentage to determine a SSD state, unlike mechanical drives where some counters are relevant to the drive health (C3, C5, C6 if I remember).

In my business, the only SSD failed drive I got happened suddenly. There was no signs like slower operating or data corruption.
I may suppose that SSD failure is very binary... Safe or dead, nothing between 😅
 
It's slightly worrying. I found this relating to attribute id 176:-
https://care.acronis.com/s/article/...se-Fail-Count-chip?language=en_US&ckattempt=1

"Erase Fail Count (chip) S.M.A.R.T. parameter indicates a number of flash erase command failures.RecommendationsThis parameter is considered informational by the most hardware vendors. Although degradation of this parameter can be an indicator of drive aging and/or potential electromechanical problems, it does not directly indicate imminent drive failure. Regular backup is recommended. Pay closer attention to other parameters and overall drive health."

So although field 182 Erase_Fail_Count_Total is zero, field 176 is a large number. I tried writing a 6 GB file to the drive and observed that field 176 increased by 1, field 175 remained the same value. So I think the jury is out as to the sigificance. I trawled through the maker's website but couldn't find any detailed information, I may try sending them an email.

I couldn't find many other people's experience with 'maxsun', but 'netac' is another common make out of shenzhen, and people on this thread appear to be saying "ymmv". Like perceval says, "safe or dead".

For the time being I've moved all important data off that drive, better safe than sorry, and I'll keep a close eye on it. The drive came pre-installed in a low-cost N100 mini-pc from aliexpress.
 
Yes field 178 is a bit worrying too. Generally I like to see the 'raw value' column showing zero for fields that would appear to represent error conditions. Although all the normalised values here are at 100%, which supposedly means zero degradation. It all depends on how the disk controller's firmware maintains those fields!

Attached is output of 'smartctl -x', it shows the same values for those attributes, it does list a small number of errors at power on. I found a support email address on the maker's website, also a marketing statement that the ssd is covered by a 3-year warranty, so I'll try sending them the smartctl output to see if they can shed any light. The disk is almost brand new, so I was a bit concerned to see those values. Maybe it means nothing. It will be interesting to see if their support email in guangzhou sends a response.
 

Attachments

  • smartct-x-maxsun-trace.txt
    15.9 KB · Views: 17
You need a vendor-specific tool to interpret many fields in the SMART output. Usually only available for Windows, if at all.

Very large numbers are a sign of bitfield encoding, not of an absolute large number.
Ahh interesting.... yes, bitfield encoding would explain it; perhaps there is no real significance; and they show 100% in all the normalised values. Hopefully I'm being paranoid and it's not significant! I will send an email to the maker anyway, and watch to see how/if field 176 increases over time.
 
I'd also watch SATA Phy Event Counters at the bottom of the log for changes. Sometimes reseating power and data connectors (on both ends if available) several times can clean off light dirt/corrosion. You can also check that the data cable has a smooth route without any sharp bends or pinches in it and maybe replace the cable (I've certainly seen individual defective cables and entire cable batches).

There are 4 errors logged that may be meaningful as they are records of the drive logging bad events. 2 of them on the same hour and the first two were many hours away from the last 2. If using ZFS, a scrub should either replace the data if it has other valid copies or fail if there are no good copies of a sector. If the drive was just a confused computer, it may have disappeared on the next device restart.

If a good drive gets inadequate input power then it can show as a drive problem too. Hope the manufacturer has useful feedback coming.
 
You need a vendor-specific tool to interpret many fields in the SMART output. Usually only available for Windows, if at all.
Or, if you are a large enough customer, have your R&D group engage with the vendor's technical support and engineering teams.

Very large numbers are a sign of bitfield encoding, not of an absolute large number.

Matter, of fact:
175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 83886080 = 0x5000000
176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always - 240214 = 0x3AA56
177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always - 35696142 = 0x220AE0E
178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always - 36 = 0x24
I added the hex conversion at the end in bold. The first number in particular is clearly not a counter. And wear leveling is not a bad thing.
 
I have checked the cable, it's a flexi in this particular box, I've re-seated it in the socket a couple of times. Can't see anything physically wrong. It's a new machine.

Yes field 175 is clearly not a count despite the field name having the word 'count' in it, or perhaps the count is only in the bottom N bits, who knows. I did observe the value of field 176 increment by one (or, get bit zero set) after copying a 6 GB file, so perhaps that one is a count. I will watch the values closely to see if and how the values change over time. Hopefully the values are not significant, at least the firmware has not changed the normalised values in the VALUE column which are supposed to be what gets compared to the threshold. I emailed the manufacturer, no response so far.
 
Here is a view of the mini-pc with the back panel off. The 2.5" ssd is mounted on the inside of the back panel, and is connected to the motherboard with a flexi cable. The flexi cable IS rather flimsy, they are using laptop type components in these small boxes. The connector seems well terminated at the disk end and I've checked it's fully pushed into the mobo connector, I have reseated it in the mobo socket just to make sure. I think the overall build quality is actually pretty good at this price point, you can't compare it to something like a lenovo M-series 'tiny' box which costs about 8 times the purchase price. That flexi cable did make me feel a bit nervous when I first saw it, but it's not under any stress with the back panel screwed in place.

Completely unrelated but I wasn't very impressed with the lack of metallisation (for RFI shielding) on the inside of the ABS box. I've got another one from a different manufacturer and the inside of the plastic case is fully metallised, but not this one. The machine itself seems completely stable over the last couple of months of use every day, very nice little box, everything works, no random freezes etc. The N100 processor is impressive, performance is on-par with my 10-year old lenovo M91p with a 2.8GHz i7 and the whole machine only uses around 10 W at idle according to my cheapie mains power meter (which probably isn't very accurate at such low power!). In fact the performance for some tasks like video transcoding and encryption is considerably better than the old i7. I really like the N100 chip. This is just being used as a little desktop, of course; I'm driving a 2K monitor directly from it.

Since I bought it I've put a nvme 2TB drive in the M.2 slot, and SMART shows zero errors on that; it's only the 2.5" ssd connected via the flexi that is showing those unusual SMART attributes. Perhaps there's nothing to worry about after all; it was the field 176 in particular that concerned me after reading the interpretation here: https://care.acronis.com/s/article/...se-Fail-Count-chip?language=en_US&ckattempt=1

Oh well, let's see if I get a reply to the email from their tech support. Probably the only people who really know what those fields mean are the guys who wrote the ssd controller firmware in guangzhou.

soyo-m2-plus-back-off.jpeg
 
Back
Top