Other SSD may need occasional factory reset

PMc · Jul 24, 2021

Hi, something interesting of today:

When I started my machine, I could see a bunch of these errors:

Code:

kernel: (ada0:ahcich4:0:0:0): READ_FPDMA_QUEUED  ACB: 60 08 48 e4 b3 40 06 00 00 00 00 00
kernel: (ada0:ahcich4:0:0:0): CAM status: Uncorrectable parity/CRC error
kernel: (ada0:ahcich4:0:0:0): Retrying command, 3 more tries remain

Nothing bad happened, the machine appeared to work correctly. But looking closer, I found the read thruput from the respective SSD was ~100MBps (instead of ~450). So something appeared to be wrong.

Unplugging and replugging the SATA cable solved that issue (and it's a good cable, with a clip). So now we know, almost certain, these cables have a temperature problem. It is only a few weeks since I put the server mainboard into the desktop and plugged the cables - but that board does get very hot: I hit coretemp.throttle when using more than 15 cores. (Normal desktop airflow is certainly not ideal for a 10/20 core, in this time of the year.)

Then I tried again to read the SSD, raw device with dd. And while the max thruput was okay now, I observed something else (using gstat -po): repeatedly the thruput would go down to only 2MBps and the ms/r value to quite exactly 100ms. This was happening on read (and we do not expect SSD to be slow on read). Something was going on internally, and the piece appeared to have difficulties retrieving it's data (no clue to be found in smart data). Over all, this reduced the thruput to some 330MBps.

Finally I managed to do a factory reset (aka "secure erase") - not so very easy, because the server board does not know S3 hibernation, but does put all SATA drives into security freeze, unconditionally - so hot plugging the power is required (I wonder how that would to be managed remotely; but then, it's only a cheap server board).
Result, after restoring all data: the spookiness is gone, there is sustained read thruput of 500MBps again (minus powerd, minus cx_lowest C3, minus ibrs_disable, minus cpu allocation, gives 430).

Some hard data on the effect of this: I have my basic OS filesystems on plain UFS, and everything else in mirrored zfs. The OS filesystems get copied to the second disk during shutdown. (So that there is always something that can be booted to a working OS, in the most simple way.)
Here are recent timings of this copy:

Code:

Jul  7    DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output
Jul  7    DUMP: finished in 18 seconds, throughput 26678 KBytes/sec
Jul  7    DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output
Jul  7    DUMP: finished in 29 seconds, throughput 22468 KBytes/sec
Jul 13    DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output
Jul 13    DUMP: finished in 14 seconds, throughput 34390 KBytes/sec
Jul 13    DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output
Jul 13    DUMP: finished in 28 seconds, throughput 23270 KBytes/sec
Jul 23    DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output
Jul 23    DUMP: finished in 18 seconds, throughput 26748 KBytes/sec
Jul 23    DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output
Jul 23    DUMP: finished in 28 seconds, throughput 23270 KBytes/sec

And this is it now, after factory reset of the reading(!) device:

Code:

Jul 24    DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output
Jul 24    DUMP: finished in 14 seconds, throughput 34390 KBytes/sec
Jul 24    DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output
Jul 24    DUMP: finished in 18 seconds, throughput 36198 KBytes/sec
Jul 24    DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output
Jul 24    DUMP: finished in 14 seconds, throughput 34390 KBytes/sec
Jul 24    DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output
Jul 24    DUMP: finished in 22 seconds, throughput 29617 KBytes/sec

Even with all TRIM features active (ZFS and UFS), there is a significant improvement here.

The current smart data, for those interested:

Code:

Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37240G
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   000    -    100
  9 Power_On_Hours          -O--CK   100   100   000    -    13509
 12 Power_Cycle_Count       -O--CK   100   100   000    -    694
148 Unknown_Attribute       ------   100   100   000    -    0
149 Unknown_Attribute       ------   100   100   000    -    0
167 Write_Protect_Mode      ------   100   100   000    -    0
168 SATA_Phy_Error_Count    -O--C-   100   100   000    -    1
169 Bad_Block_Rate          ------   100   100   000    -    0
170 Bad_Blk_Ct_Erl/Lat      ------   100   100   010    -    0/0
172 Erase_Fail_Count        -O--CK   100   100   000    -    0
173 MaxAvgErase_Ct          ------   100   100   000    -    0
181 Program_Fail_Count      -O--CK   100   100   000    -    0
182 Erase_Fail_Count        ------   100   100   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
192 Unsafe_Shutdown_Count   -O--C-   100   100   000    -    298
194 Temperature_Celsius     -O---K   040   064   000    -    40 (Min/Max 23/64)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
199 SATA_CRC_Error_Count    -O--CK   100   100   000    -    0
218 CRC_Error_Count         -O--CK   100   100   000    -    1
231 SSD_Life_Left           ------   075   075   000    -    75
233 Flash_Writes_GiB        -O--CK   100   100   000    -    34191
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    33290
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    23140
244 Average_Erase_Count     ------   100   100   000    -    259
245 Max_Erase_Count         ------   100   100   000    -    306
246 Total_Erase_Count       ------   100   100   000    -    110915

Cath O'Deray · Jul 24, 2021

Off-topic,

PMc said:
gstat -po

I normally have an xterm window to gstat -p.

I see:

-o Enable display of statistics for other operations (BIO_FLUSH).

– and I see the additional columns when I run the command, but I have no idea what's meant by this part of the manual page.

Please, can someone ELI5? Thanks.

Jose · Jul 24, 2021

I have very similar problems on one of my servers:

Code:

(ada0:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 58 40 05 d3 40 00 00 00 00 00 00
(ada0:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada0:ahcich1:0:0:0): Retrying command, 3 more tries remain

I isolated it to the on-board SATA controller. It just doesn't like having two SSDs connected to it. Just one shows no errors no matter which port or cable I use. Two always results in errors for just one of them.

PMc · Jul 24, 2021

grahamperrin said:
-o Enable display of statistics for other operations (BIO_FLUSH).

– and I see the additional columns when I run the command, but I have no idea what's meant by this part of the manual page.

Please, can someone ELI5? Thanks.

You're welcome. So, as the keyword "BIO_FLUSH" doesn't give You a hint, let's explain (simplified version):

Originally, the ATA drives were equipped with a write cache, because without that the performance was horrible.
The problem was when a crash/power-failure would occur: the cache contents would be lost, while the OS and the application would believe the data was already safely on disk. So specifically with databases came the advice to switch off that write cache - and we were back at the horrible performance.

This is why BIO_FLUSH was introduced. It tells the drive to push it's write cache onto disk and return only after this is properly done. So, if you open(2) a file with the O_SYNC flag (or set sync=always in your zfs dataset), this should result in BIO_FLUSH commands sent to the disk. When you do not switch this on, it is at the discretion of the application to issue flush commands at appropriate critical points in the processing (checkpoints, commits).
These may result in a lot of actual physical operations in the drive, and therefore can take quite a while to process, and they are not shown in the read/write columns, but in the "other" column. When the question is, why is the drive so busy/slow, one should have a look at that column.

PMc · Jul 24, 2021

Jose said:
I have very similar problems on one of my servers:

Code:

(ada0:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 58 40 05 d3 40 00 00 00 00 00 00 (ada0:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error (ada0:ahcich1:0:0:0): Retrying command, 3 more tries remain

I isolated it to the on-board SATA controller. It just doesn't like having two SSDs connected to it. Just one shows no errors no matter which port or cable I use. Two always results in errors for just one of them.

What about the keyword "Sata port multiplier"?
(I never touched such a thing, and I don't know if they were ever implemented on mainboards, but I remember some warning, when one would buy a 4-port sata adapter, one should take care that it actually is a 4-port and not a 2-port with duplicator.)

Cath O'Deray · Jul 25, 2021

PMc said:
… BIO_FLUSH was introduced. It tells the drive to push it's write cache onto disk and return only after this is properly done. …

Thanks. All makes sense now. In retrospect, I could/should have looked in cgit, <https://cgit.freebsd.org/src/commit/?id=c3618c657a120cc152f3ff15995e3f690b17ba57> … vermaden, was that you? (home.pl)

I wondered about the BIO abbreviation, from old <https://cgit.freebsd.org/src/tree/sys/sys/bio.h?id=df8bae1de4b67ccf57f4afebd4e2bf258c38910d#n48> I assume that it's buffer I/O.

Cath O'Deray · Jul 25, 2021

Jose said:
… two SSDs connected to it. Just one shows no errors no matter which port or cable I use. …

Does that not suggest an issue with the one SSD? (Do I misunderstand?)

Jose · Jul 25, 2021

grahamperrin said:
Does that not suggest an issue with the one SSD? (Do I misunderstand?)

Yes. I even bought a third SSD for testing. The cable, SSD, and particular port doesn't matter. Errors only occur when I hook up two SSDs.

DanDare · Jul 25, 2021

PMc said:
Hi, something interesting of today:

When I started my machine, I could see a bunch of these errors:

Code:

kernel: (ada0:ahcich4:0:0:0): READ_FPDMA_QUEUED ACB: 60 08 48 e4 b3 40 06 00 00 00 00 00 kernel: (ada0:ahcich4:0:0:0): CAM status: Uncorrectable parity/CRC error kernel: (ada0:ahcich4:0:0:0): Retrying command, 3 more tries remain

Nothing bad happened, the machine appeared to work correctly. But looking closer, I found the read thruput from the respective SSD was ~100MBps (instead of ~450). So something appeared to be wrong.

Unplugging and replugging the SATA cable solved that issue (and it's a good cable, with a clip). So now we know, almost certain, these cables have a temperature problem. It is only a few weeks since I put the server mainboard into the desktop and plugged the cables - but that board does get very hot: I hit coretemp.throttle when using more than 15 cores. (Normal desktop airflow is certainly not ideal for a 10/20 core, in this time of the year.)

Then I tried again to read the SSD, raw device with dd. And while the max thruput was okay now, I observed something else (using gstat -po): repeatedly the thruput would go down to only 2MBps and the ms/r value to quite exactly 100ms. This was happening on read (and we do not expect SSD to be slow on read). Something was going on internally, and the piece appeared to have difficulties retrieving it's data (no clue to be found in smart data). Over all, this reduced the thruput to some 330MBps.

Finally I managed to do a factory reset (aka "secure erase") - not so very easy, because the server board does not know S3 hibernation, but does put all SATA drives into security freeze, unconditionally - so hot plugging the power is required (I wonder how that would to be managed remotely; but then, it's only a cheap server board).
Result, after restoring all data: the spookiness is gone, there is sustained read thruput of 500MBps again (minus powerd, minus cx_lowest C3, minus ibrs_disable, minus cpu allocation, gives 430).

Some hard data on the effect of this: I have my basic OS filesystems on plain UFS, and everything else in mirrored zfs. The OS filesystems get copied to the second disk during shutdown. (So that there is always something that can be booted to a working OS, in the most simple way.)
Here are recent timings of this copy:

Code:

Jul 7 DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output Jul 7 DUMP: finished in 18 seconds, throughput 26678 KBytes/sec Jul 7 DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output Jul 7 DUMP: finished in 29 seconds, throughput 22468 KBytes/sec Jul 13 DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output Jul 13 DUMP: finished in 14 seconds, throughput 34390 KBytes/sec Jul 13 DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output Jul 13 DUMP: finished in 28 seconds, throughput 23270 KBytes/sec Jul 23 DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output Jul 23 DUMP: finished in 18 seconds, throughput 26748 KBytes/sec Jul 23 DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output Jul 23 DUMP: finished in 28 seconds, throughput 23270 KBytes/sec

And this is it now, after factory reset of the reading(!) device:

Code:

Jul 24 DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output Jul 24 DUMP: finished in 14 seconds, throughput 34390 KBytes/sec Jul 24 DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output Jul 24 DUMP: finished in 18 seconds, throughput 36198 KBytes/sec Jul 24 DUMP: Dumping snapshot of /dev/ada0p3 (/) to standard output Jul 24 DUMP: finished in 14 seconds, throughput 34390 KBytes/sec Jul 24 DUMP: Dumping snapshot of /dev/ada0p4 (/usr) to standard output Jul 24 DUMP: finished in 22 seconds, throughput 29617 KBytes/sec

Even with all TRIM features active (ZFS and UFS), there is a significant improvement here.

The current smart data, for those interested:

Code:

Model Family: Phison Driven SSDs Device Model: KINGSTON SA400S37240G Firmware Version: S1Z40102 User Capacity: 240,057,409,536 bytes [240 GB] Sector Size: 512 bytes logical/physical ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate -O--CK 100 100 000 - 100 9 Power_On_Hours -O--CK 100 100 000 - 13509 12 Power_Cycle_Count -O--CK 100 100 000 - 694 148 Unknown_Attribute ------ 100 100 000 - 0 149 Unknown_Attribute ------ 100 100 000 - 0 167 Write_Protect_Mode ------ 100 100 000 - 0 168 SATA_Phy_Error_Count -O--C- 100 100 000 - 1 169 Bad_Block_Rate ------ 100 100 000 - 0 170 Bad_Blk_Ct_Erl/Lat ------ 100 100 010 - 0/0 172 Erase_Fail_Count -O--CK 100 100 000 - 0 173 MaxAvgErase_Ct ------ 100 100 000 - 0 181 Program_Fail_Count -O--CK 100 100 000 - 0 182 Erase_Fail_Count ------ 100 100 000 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 192 Unsafe_Shutdown_Count -O--C- 100 100 000 - 298 194 Temperature_Celsius -O---K 040 064 000 - 40 (Min/Max 23/64) 196 Reallocated_Event_Count -O--CK 100 100 000 - 0 199 SATA_CRC_Error_Count -O--CK 100 100 000 - 0 218 CRC_Error_Count -O--CK 100 100 000 - 1 231 SSD_Life_Left ------ 075 075 000 - 75 233 Flash_Writes_GiB -O--CK 100 100 000 - 34191 241 Lifetime_Writes_GiB -O--CK 100 100 000 - 33290 242 Lifetime_Reads_GiB -O--CK 100 100 000 - 23140 244 Average_Erase_Count ------ 100 100 000 - 259 245 Max_Erase_Count ------ 100 100 000 - 306 246 Total_Erase_Count ------ 100 100 000 - 110915

You said a factory reset fixed it but all a factory reset does is erase the disk (i presume). I'm not sure. In this case the problem was something with the file system itself then like fragmentation or anything?
Did you not cycle unplug/plug/moved the cable in the same 'session' you factory reset?

Maybe EMI from inside or outside interfering? This is not meant to happen even more with SATA's differential signaling but not to be discarded in the case of particularly noisy setups. Like with some really bad PSU's. Such cases even how you route the sata cables inside can make a difference, specially for longer cables. Or running with open case in a hash environment in regard to EMI. There are shielded cables in the market even for internal connections despite not being necessary usually.

Maybe any amount of dust on the connectors, or heat causing a minimal amount of mechanical displacement on the connectors, that were working near a threshold already, adding up to insertion losses.

Or a mix of these things

You said board gets very hot. That's a point of interest. I wonder if what's wasting this amount of energy is not producing levels of EMI also.

Cath O'Deray · Jul 25, 2021

DanDare said:
… In this case the problem was something with the file system itself

I wondered the same.

Re: the attachment at <https://forums.FreeBSD.org/threads/80655/post-523465> it's possible for a repaired file system to be clean then dirty, however I don't know the impact of the dirtiness in this situation.

PMc said:
192 Unsafe_Shutdown_Count -O--C- 100 100 000 - 298

I don't know how to interpret that.

From Intel® Solid-State Drive Toolbox: User Guide (PDF, 2009-10-19) 3.4.2.6:

C0 – Unsafe Shutdown Count

This attribute reports the cumulative number of unsafe (unclean) shutdown events over the life of the device. An unsafe shutdown occurs whenever the device is powered off without STANDBY IMMEDIATE being the last command. Use the Raw value for this attribute.

A 2011 discussion of the attribute: <https://community.intel.com/t5/Soli...-Unsafe-Shutdown-Count-quot/m-p/383848#M11960>

DanDare said:
then like fragmentation or anything? …

I imagine that file system fragmentation should have minimal impact with SSD as a medium, although I have minimal experience with UFS in this area.

vermaden · Jul 25, 2021

grahamperrin said:
Thanks. All makes sense now. In retrospect, I could/should have looked in cgit, <https://cgit.freebsd.org/src/commit/?id=c3618c657a120cc152f3ff15995e3f690b17ba57> … vermaden, was that you? (home.pl)

I wondered about the BIO abbreviation, from old <https://cgit.freebsd.org/src/tree/sys/sys/bio.h?id=df8bae1de4b67ccf57f4afebd4e2bf258c38910d#n48> I assume that it's buffer I/O.

I never worked at HOME.PL

DanDare · Jul 25, 2021

grahamperrin said:
I don't know how to interpret that.

From Intel® Solid-State Drive Toolbox: User Guide (PDF, 2009-10-19) 3.4.2.6:

Me neither, strictly speaking.

I ran smart check on my laptop for the NVMe SSD and got: "Unexpected Power Loss Count 74". This is a high count as I almost never do the hard poweroff. This is a almost 3yr old SSD.

Anyway:
"Attribute 174 is usually informational only. However, a large number of such events may indicate that a user needs to be trained on proper operating systems shutdowns, or that there could be a problem with power supplies or connections."

We don't know how to interpret but we probably agree this being a good variable to keep looking in the OP situation (IMO).

Here my SMART ID 199 'SATA CRC Errors' is zero.

grahamperrin said:
I imagine that file system fragmentation should have minimal impact with SSD as a medium, although I have minimal experience with UFS in this area.

Yeah I agree. Well we are in the same boat here

Can't decide the errors/conditions OP reported can be caused by things in the FS.

PMc · Jul 26, 2021

DanDare said:
You said a factory reset fixed it but all a factory reset does is erase the disk

No, with SSD it doesn't. It rather does something similar to a TRIM on the whole device: tell the SSD controller that all cells' contents are no longer needed and cells can be re-used arbitrarily.
So, while the controller should still know how often each cell was written (in order to do the wear levelling) it does not need to bother about the existing data in the cells any longer.
(This also means that a "secure erase" is NOT fully safe, and some experienced people could still retrieve your data. With SSD, probably nothing is fully safe, except going to Mount Doom and burning it.)

DanDare said:
(i presume). I'm not sure. In this case the problem was something with the file system itself then like fragmentation or anything?

There is no "filesystem or anything" in between when doing dd if=/dev/ada0 bs=1m of=/dev/null, only the controller.
It is likely a fragmentation problem, but it concerns the fragmentation inside the SSD logic. The important info here is that it can happen, and that it can be fixed.

DanDare said:
You said board gets very hot. That's a point of interest. I wonder if what's wasting this amount of energy is not producing levels of EMI also.

What produces the heat is simply the CPU, that is supposed to convert 100+ Watts of energy into mere heat - and it does that with delight. (It is only temporary in the desktop, for mutual aquaintance.)

[unsafe shutdown]

grahamperrin said:
I don't know how to interpret that.

These are either kernel crash or reset button, as both do happen here. I do not think it is a problem - if this endangers data consistency, ZFS should report occasional checksum errors.

Cath O'Deray · Jul 26, 2021

PMc said:
… if this endangers data consistency, ZFS should report occasional checksum errors.

☑

PMc said:
… I have my basic OS filesystems on plain UFS, …