SSD Power Cycle Count

What is the significance of the SMART attribute of Power Cycle Count ?
I can understand with a physical disk platter, but what about SSD?
Does it degrade the drive any differently than POH?
I was surprised to see the high count on a new drive I had.

Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   000   000   000    Old_age   Offline      -       0
  2 Throughput_Performance  0x0000   000   000   000    Old_age   Offline      -       0
  3 Spin_Up_Time            0x0000   000   000   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0002   100   100   000    Old_age   Always       -       0
  7 Seek_Error_Rate         0x0000   000   000   000    Old_age   Offline      -       0
  8 Seek_Time_Performance   0x0000   000   000   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       69
 10 Spin_Retry_Count        0x0000   000   000   000    Old_age   Offline      -       0
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       256
 
What is the significance of the SMART attribute of Power Cycle Count ?
I can understand with a physical disk platter, but what about SSD?
Does it degrade the drive any differently than POH?

I wonder too.

I was surprised to see the high count on a new drive I had.
Code:
 9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       69

12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       256

Hmm, yeah. That's a power cycle about every 16 minutes of power-on time.

Might something be sleeping the drive when unused for perhaps 15 minutes?

If so it shouldn't hurt, but is unlikely to be helpful either?

For contrast, my Samsung MZ7TD128HAFV has 14571 hours for 2248 power cycles or nearly 6.5 hours per cycle, about 3% used (value 97/100)
 
Code:
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0002   100   100   000    Old_age   Always       -       2
  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       62072
 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       318
192 Unsafe_Shutdown_Count   0x0002   100   100   000    Old_age   Always       -       213
232 Available_Reservd_Space 0x0003   099   099   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0002   099   099   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0000   200   200   000    Old_age   Offline      -       1514208
62072 hours means 7.1 years. The amount of write traffic indicates that every memory cell has been overwritten 739 times, ignoring write amplification (so in practice probably over 1000 times). I definitely got my money's worth out of this SSD (in particular since I paid $0 for it, there were in the trash pile in the office).

I wonder how much longer it will work. I have a hot spare SSD ready to go, frequently updated, and it is "much younger", only 44635 power-on hours and 332 power cycles. There is a second cold spare sitting on the work bench.

Now to the serious question: Why does power cycle count matter? Because SSDs are internally extremely complicated bests. Writing the flash storage in an SSD is not easy, you can't just write a sector here and there, at random places. Instead, whole blocks (typically 512K or larger) need to be overwritten at once. So SSDs internally contain a complex log-structured file system, which maps sectors to blocks, and then cycles those blocks across the hardware, with wear leveling (making sure all flash chips get worn out at the same speed), and with wear management (if one black is worse than others, put more redundant data in it). That internal file system has interestingly complex metadata, some of which is typically kept in RAM during operation, but the bulk of it is also on flash storage. Any write modifies the data in RAM, and SSDs have some of the same problems as normal (OS-internal) file systems with keeping RAM caches up-to-date and safe.

So now imagine what happens when the power fails. There may be things that only exist in RAM: The SSD has capacitors, and can finish doing some emergency writes from RAM to flash, probably in a data structure that resembles a transaction log. Then, when power comes back, the SSD has to do something like fsck, reading that transaction log and checking the on-flash data structures for consistency, and apply updates. This is all a lot of work, stresses the flash (more writes), and relies on aging capacitors. So power cycling is scary and stressful.
 
Might something be sleeping the drive when unused for perhaps 15 minutes?
No it was all me. From getting a split installation going (uboot on microSD and root on mSATA) to setting up device tree overlays which require reboots. Hummingboard has its quirks I must work around.

I was struck by the high cycle count though and it made me wonder the consequences..
 
With the output of smart stuff, you need to be aware of "units". Some units are standardized or obvious (like power on is obviously a monotonic count), other units are not obvious and maybe open to interpretation by the manufacturer.

Something in hours looks like a huge number but then you do the math and realize "oh that's 20 yrs".
 
Yes I see some drives use hex code for SMART output and smartctl converts them.

I too have seen outlandish numbers on some devices.
 
  • Like
Reactions: mer
I was having a problem with device tree overlays blanking the disk drive. Poof.
It seems HDMI grabs i2c2 for DDC and that was crashing my setup badly until I disabled HDMI with an overlay.
A "disable-hdmi" overlay so I could use an overlay for i2c2 and pcf-8574
 
Yes I see some drives use hex code for SMART output and smartctl converts them.
You have to know which ones to convert. This is what I use for most Seagate drives:
Code:
sudo smartctl -v 1,hex48 -v 7,hex48 -v195,hex48 -a /dev/da0 | egrep "ID#|0x[0-9a-f]*$"
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail  Always       -       0x00000c7f4720
  7 Seek_Error_Rate         0x000f   091   061   045    Pre-fail  Always       -       0x00004b7edbde
195 Hardware_ECC_Recovered  0x001a   001   001   000    Old_age   Always       -       0x00000c7f4720
The hex numbers have to be split into 4+8 digits. The first 4 digits are the error count (expect 0). The last 8 digits are the operations count (expect large).

For interpretation, see the TrueNAS Troubleshooting Guide, and the Wikipedia SMART article.
 
Back
Top