Tell me about your drive failures

Alain De Vos · Dec 31, 2025

I whole my life i had only one drive failure.
An SSD of 128GB , died after six months , during poudriere builds , no data recovery possible , i had no backups as it didn't happened before.
It was a well known brand, Samsung EVO. Currently i have a 512GB working just fine.
What are your experiences.

Alain De Vos · Dec 31, 2025

Due to reasons my zfs pool is on two drives, one NVME , one SSD, i think it uses striping. So if one fails my whole pool is gone, but i have a backup script,

Code:

export MYPATH=/mnt/OLDDISK3_backup/backup/2025_11_27/
mkdir $MYPATH
clone -s /etc           $MYPATH/etc
clone -s /usr/local/etc $MYPATH/usr_local_etc
clone -s /root          $MYPATH/root
clone -s /home/x        $MYPATH/home_x
clone -s /var/db        $MYPATH/var_db
clone -s /var/backups   $MYPATH/var_backups
cp /var/db/postgres/data17/postgresql.conf $MYPATH
cp /usr/src/doit1 $MYPATH
cp /usr/src/doit2 $MYPATH
cp /MYFREEBSD/jails/a/usr/local/etc/nginx/nginx.conf $MYPATH
cp /usr/src/sys/amd64/conf/MYKERNCONF $MYPATH
cp /etc/src.conf                      $MYPATH
pkg prime-list                     >  $MYPATH/primelist.txt

Phishfry · Dec 31, 2025

I like Samsung but Enterprise is the way to go. I use UFS so all that striping is unnecessary for me. I use Twin AIC card's PM1733 with gmirror on my Bhyve Server.
On another server I have Intel AIC. I was noticing compiling on the Intel Enterprise drives seems quicker. They seem to eat up small files.

On some of my touchscreens I just use quality consumer drivers. I like Tosiba/Kioxa. Intel 670p 2TB drives seems snappy. I got a load of those before Intel sold them off.

On my current Poudriere Server I have 40 core 2011v4 with 64GB ram and gmirror Samsung 1.92TB in M.2-22110.

Code:

<SAMSUNG MZ1LB1T9HALS-000MV EDA78M5Q>  at scbus12 target 0 lun 1 (pass2,nda0)
<SAMSUNG MZ1LB1T9HALS-00007 EDA7502Q>  at scbus13 target 0 lun 1 (pass3,nda1)

I avoid swap on SSD

Code:

# gpart show
=>        40  3750748768  mirror/gm0  GPT  (1.7T)
          40      532480           1  efi  (260M)
      532520  3750216288           2  freebsd-ufs  (1.7T)

Alain De Vos · Dec 31, 2025

A little swap is OK. But it's always an indication you do something wrong. Then i go poudriere.conf TMPFS = no

Phishfry · Dec 31, 2025

Tell me about it. I finally figured out how to use alot of my cores:
I have to turn off TMPFS for these two.

Code:

ID  TOTAL                  ORIGIN   PKGNAME         PHASE TIME     TMPFS    CPU% MEM%
[01] 01:10:12            lang/rust | rust-1.92.0     build 01:09:12       2672.1% 9.3%
[02] 01:10:13 devel/llvm19@default | llvm19-19.1.7_1 stage 00:01:10        393.2% 2.3%
[01:10:58] Logs: /poudriere/data/logs/bulk/mbm-default/2025-12-30_20h45m46s
load: 20.01  cmd: sh 41229 [select] 4259.54r 2.03u 3.66s 0% 5152k
[01:10:59] [mbm-default] [2025-12-30_20h45m46s] [parallel_build] Time: 01:10:57
           Queued: 20 Inspected: 0 Ignored: 0 Built: 0 Failed: 0 Skipped: 0 Fetched: 0 Remaining: 20

Staging after 1:10:13 on LLVM is very good for me.

bgavin · Dec 31, 2025

Drives die.
Drives are cheap.
Multiple drives with multiple system images and multiple data backups are called for unless you don’t mind bare metal rebuilds catastrophic data loss.

I place all Samsung Pro in both my clients and my own. I’ve lost one early 840 Pro that bricked.

cracauer@ · Dec 31, 2025

A couple years ago my big HDD array got too warm. Good times. Now it has a floor fan pointed to it.

Early SATA SSDs - I killed all of those I had for testing when driving them with GELI and ZFS. Set back my adoption timeline for SSDs by a decade.

drsnx60 · Dec 31, 2025

One Customer had 20.000 systems of embedded linux installations .
After five years it was common to replace two to four SSDs/ HDDs a week.
Drives usually dies due to excessive Sector Reallocation events or is made unusable by
skyrocketing UDMA comms errors . As shown by "SmartMonTools"
Its really seldom that the LOGIC Card on the SSD/HDD fails completely rendering the unit utterly
Dead and not possible to communicate with , maybe occured once every second month.

Alain De Vos · Dec 31, 2025

My drive not controller died completely, and no reason other than doing heavy poudriere stuff.

drsnx60 · Dec 31, 2025

Alain De Vos said:
My drive not controller died completely, and no reason other than doing heavy poudriere stuff.

The Drive should be visible to the UEFI/BIOS monitor in the machine. ( usually BIOS->Advanced->SATA_config menu if SATA drives ). If it disappeard completely and is not detected by UEFI/BIOS monitor , there is four possible faults: #1 Dead SATA port on motherboard. #2 bad SATA Logic cable. #3 Faulty SATA Power cable, #4 Dead Logic board on the SSD/HDD.

Alain De Vos · Dec 31, 2025

None of these. Port OK , Cable OK, drive said i'm gone buy. Nothing more to see here. It's like i'm no longer there. No backups possible. One reason i can imagine i was doing poudriere. So many bits on this drive had to change from 1 to 0 or vice versa. Maybe i hit a limit. Strange on my same 4 times larger drive not a problem.

drsnx60 · Dec 31, 2025

So can you run: ( substitute appropriate device ID )

smartctl -HA /dev/ada0

or if nvme :

smartctl -d nvme -HA /dev/nvme0

Alain De Vos · Dec 31, 2025

It was a time ago. But think of it as there is nothing more connected. Not on every level. smartctl said I'm not here anymore. It was physicality connected but behaved like you would have cut the data wire with a knife. Recovery unpossible.

drsnx60 · Dec 31, 2025

Agree , if on-disc-controller is dead there will be no responce.