4 drive zpool, two NVMe, two SSD

dvl@ · Oct 23, 2023

VladiBG said:
Because of this smart status.

Why does this count matter?

Jose · Oct 23, 2023

dvl@ said:
I enjoy blogging. I really do. It is quite satisfying. I refer to my blogs for my own use. That is what they started off as, way back when I was trying to get my host running with DHCP on ADSL. Notes to help myself, mostly when seeking help and explaining to others what I did.

It took me years and re-learning many, many things I'd forgotten to learn this lesson. My notes are mostly on a private wiki, though. Maybe I should publish them.

dvl@ · Oct 23, 2023

Jose said:
It took me years and re-learning many, many things I'd forgotten to learn this lesson. My notes are mostly on a private wiki, though. Maybe I should publish them.

Yes. Yes, you should publish it. What's the downside?

mer · Oct 23, 2023

dvl@ said:
Yes. Yes, you should publish it. What's the downside?

I agree with this. I think a lot of engineers tend to think on paper or whiteboards. I've had jobs where I fill 2 whiteboards and then people get upset when they get erased.

Phishfry · Oct 23, 2023

There is a SuperMicro solution I found interesting.
U.2 connectors but does not require bifurication. Has some switch onboard.
SLG3-2E4
This worked faster than bifurication with same drives/cables. Not a huge amount but benchmarked and repeated..
I never tested beyond SuperMicro boards. Does work beyond SuperMicro supported list.
Mine tested good on X9/X10 boards.

Phishfry · Oct 23, 2023

mer said:
I agree with this. I think a lot of engineers tend to think on paper or whiteboards. I've had jobs where I fill 2 whiteboards and then people get upset when they get erased.

I wish I could be so lucky. For the amount of engineering descisions I have to make on the fly it is not comforting especially when you are talking about $250K repair procedures approved verbally. Nothing in writing.
Maybe my hand-drawn sketches which may or may not be converted to an electronic version for a condition report.. Many clients are so desperate for the boat back in the water that they forgo official docs we used to do for them.
Don't ask me how that goes....

VladiBG · Oct 23, 2023

dvl@ said:
Why does this count matter?

It's not so important but it will indicate a possible power issue. In normal operation it will rise only if you reset the computer without a proper shutdown. The write cache on those disk are not battery backed protected so it will be lose on sudden power lost and if the cache is used for write operation all data that is not flushed to the NAND will be lost. That's why is important not to enable the volatile write cache if the system is not protected by UPS and you should never unplug such disk from the hot swap bay before shutdown it first.

Phishfry · Oct 23, 2023

VladiBG said:
before shutdown it first.

camcontrol eject da0 is the right way (for SAS/SATA on controller)???

dvl@ · Oct 23, 2023

dvl@ said:

Code:

[20:52 r730-01 dvl ~] % sudo smartctl -a /dev/da13

Device Model:     Samsung SSD 860 EVO 1TB

Sector Size:      512 bytes logical/physical 

241 Total_LBAs_Written      0x0032   097   097   000    Old_age   Always       -       7136126656590

Let's do the math please. How much data has this SSD written: 7136126656590 * 512 bytes = 3,653,697,005,775,360 bytes or about 3,653.7 TB

According to https://www.samsung.com/us/computin.../ssd-860-evo-2-5--sata-iii-1tb-mz-76e1t0b-am/ the TBW is 600 TBW (someone else please verify).

Have I done this math incorrectly?

Phishfry · Oct 23, 2023

dvl@ said:
3,653.7 TB

That sounds like a very large number. Was this drive hosting FreshPorts?

My way of 'checking the math' here would be so slip it in a drive bay of a windows box and use Samsung DiskMagician.
If the drive has that much wear it should show up there. Vendor tools.
Sometimes smartclt uses hex values so you should be wary.

dvl@ · Oct 23, 2023

Phishfry said:
That sounds like a very large number. Was this drive hosting FreshPorts?

Yes, but not exclusively and not directly.

Based on https://dan.langille.org/2022/12/31/knew-8/ it was used for the tank_fast01 zpool.

The following is taken from the above URL and modified to keep it shorter.

That tank_fast01/dbclone fileystem was used as a daily testing. It loaded up every database I had. Every day. tank_fast01/dbclone.backups.rsyncer shows it was about 231GB of backups. It was loaded into tank_fast01/dbclone.postgres

The FreshPorts dev and prod databases are included in that. So was my Bacula database which kept track of all the backups, including FreshPorts.

Good news: none of that data on that filesystem was primary. It all originated somewhere else. From here, it was backed up daily to the other server.

Why do this all on SSD and not HDD? Just because it's faster. However, that dbclone process is not time-sensitive. It can run anytime. The current process runs on HDD.

Based on the power-on-hours, I bought these drives about 4.5 years ago. They've led a good life. If I'm going to keep using them, it would need to be for non-valuable data/purposes, IF INDEED they are past their TBW levels.

Code:

[knew dan ~] % zpool list
NAME          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank_fast01   928G   251G   677G        -         -     5%    27%  1.00x    ONLINE  -

[knew dan ~] % zfs list
NAME                                               USED  AVAIL     REFER  MOUNTPOINT
tank_fast01                                        251G   648G       23K  none
tank_fast01/dbclone                               4.94G   648G     4.94G  /usr/jails/dbclone
tank_fast01/dbclone.backups.rsyncer                231G   648G      231G  /jails/dbclone/usr/home/rsyncer/backups
tank_fast01/dbclone.postgres                       254M   648G      254M  /jails/dbclone/var/db/postgres
tank_fast01/empty                                 1.71G   648G       24K  none
tank_fast01/empty/ports                           1.71G   648G     1.71G  /jails/empty/usr/ports
tank_fast01/vm                                    13.3G   648G      346M  /usr/local/vm
tank_fast01/vm/mkjail                             12.9G   648G     12.9G  /usr/local/vm/mkjail

VladiBG · Oct 24, 2023

Phishfry said:
camcontrol eject da0 is the right way (for SAS/SATA on controller)???

Yes you need to unmount it first and if it's part from raid volume eject it from there first.

TBW = 1024GiB * 2074 / 0,624 = 3 403 487GiB

dvl@ · Oct 24, 2023

VladiBG said:
Yes you need to unmount it first and if it's part from raid volume eject it from there first.

TBW = 1024GiB * 2074 / 0,624 = 3 403 487GiB

Is that 2074 is from:

Code:

177 Wear_Leveling_Count     0x0013   001   001   000    Pre-fail  Always       -       2074

Where is 0.624 from?

You're saying this SSD has had 3.4 PB written to it?

VladiBG · Oct 24, 2023

https://image-us.samsung.com/SamsungUS/b2b/resource/2016/05/31/WHP-SSD-SSDSMARTATTRIBUTES-APR16J.pdf

dvl@ · Oct 24, 2023

VladiBG said:
https://image-us.samsung.com/SamsungUS/b2b/resource/2016/05/31/WHP-SSD-SSDSMARTATTRIBUTES-APR16J.pdf

I don't see 0.624 mentioned there. I conclude it is derived from one of the equations. Perhaps if someone else has the time now (I don't), they can explain it for those (and me) who stumble across this thread later.

Also, my second question: You're saying this SSD has had 3.4 PB written to it?

VladiBG · Oct 24, 2023

If the smartctl format is correct on this then yes this ssd write ~3.4PB which is only rated as 600TB. Some vendors have different format/usage on the smart values so it may be possible this not to be true.