Tell me about your drive failures

I whole my life i had only one drive failure.
An SSD of 128GB , died after six months , during poudriere builds , no data recovery possible , i had no backups as it didn't happened before.
It was a well known brand, Samsung EVO. Currently i have a 512GB working just fine.
What are your experiences.
 
Due to reasons my zfs pool is on two drives, one NVME , one SSD, i think it uses striping. So if one fails my whole pool is gone, but i have a backup script,

Code:
export MYPATH=/mnt/OLDDISK3_backup/backup/2025_11_27/
mkdir $MYPATH
clone -s /etc           $MYPATH/etc
clone -s /usr/local/etc $MYPATH/usr_local_etc
clone -s /root          $MYPATH/root
clone -s /home/x        $MYPATH/home_x
clone -s /var/db        $MYPATH/var_db
clone -s /var/backups   $MYPATH/var_backups
cp /var/db/postgres/data17/postgresql.conf $MYPATH
cp /usr/src/doit1 $MYPATH
cp /usr/src/doit2 $MYPATH
cp /MYFREEBSD/jails/a/usr/local/etc/nginx/nginx.conf $MYPATH
cp /usr/src/sys/amd64/conf/MYKERNCONF $MYPATH
cp /etc/src.conf                      $MYPATH
pkg prime-list                     >  $MYPATH/primelist.txt
 
I like Samsung but Enterprise is the way to go. I use UFS so all that striping is unnecessary for me. I use Twin AIC card's PM1733 with gmirror on my Bhyve Server.
On another server I have Intel AIC. I was noticing compiling on the Intel Enterprise drives seems quicker. They seem to eat up small files.

On some of my touchscreens I just use quality consumer drivers. I like Tosiba/Kioxa. Intel 670p 2TB drives seems snappy. I got a load of those before Intel sold them off.

On my current Poudriere Server I have 40 core 2011v4 with 64GB ram and gmirror Samsung 1.92TB in M.2-22110.
Code:
<SAMSUNG MZ1LB1T9HALS-000MV EDA78M5Q>  at scbus12 target 0 lun 1 (pass2,nda0)
<SAMSUNG MZ1LB1T9HALS-00007 EDA7502Q>  at scbus13 target 0 lun 1 (pass3,nda1)

I avoid swap on SSD
Code:
# gpart show
=>        40  3750748768  mirror/gm0  GPT  (1.7T)
          40      532480           1  efi  (260M)
      532520  3750216288           2  freebsd-ufs  (1.7T)
 
Tell me about it. I finally figured out how to use alot of my cores:
I have to turn off TMPFS for these two.

Code:
ID  TOTAL                  ORIGIN   PKGNAME         PHASE TIME     TMPFS    CPU% MEM%
[01] 01:10:12            lang/rust | rust-1.92.0     build 01:09:12       2672.1% 9.3%
[02] 01:10:13 devel/llvm19@default | llvm19-19.1.7_1 stage 00:01:10        393.2% 2.3%
[01:10:58] Logs: /poudriere/data/logs/bulk/mbm-default/2025-12-30_20h45m46s
load: 20.01  cmd: sh 41229 [select] 4259.54r 2.03u 3.66s 0% 5152k
[01:10:59] [mbm-default] [2025-12-30_20h45m46s] [parallel_build] Time: 01:10:57
           Queued: 20 Inspected: 0 Ignored: 0 Built: 0 Failed: 0 Skipped: 0 Fetched: 0 Remaining: 20
Staging after 1:10:13 on LLVM is very good for me.
 
Drives die.
Drives are cheap.
Multiple drives with multiple system images and multiple data backups are called for unless you don’t mind bare metal rebuilds catastrophic data loss.

I place all Samsung Pro in both my clients and my own. I’ve lost one early 840 Pro that bricked.
 
A couple years ago my big HDD array got too warm. Good times. Now it has a floor fan pointed to it.

Early SATA SSDs - I killed all of those I had for testing when driving them with GELI and ZFS. Set back my adoption timeline for SSDs by a decade.
 
One Customer had 20.000 systems of embedded linux installations .
After five years it was common to replace two to four SSDs/ HDDs a week.
Drives usually dies due to excessive Sector Reallocation events or is made unusable by
skyrocketing UDMA comms errors . As shown by "SmartMonTools"
Its really seldom that the LOGIC Card on the SSD/HDD fails completely rendering the unit utterly
Dead and not possible to communicate with , maybe occured once every second month.
 
My drive not controller died completely, and no reason other than doing heavy poudriere stuff.
The Drive should be visible to the UEFI/BIOS monitor in the machine. ( usually BIOS->Advanced->SATA_config menu if SATA drives ). If it disappeard completely and is not detected by UEFI/BIOS monitor , there is four possible faults: #1 Dead SATA port on motherboard. #2 bad SATA Logic cable. #3 Faulty SATA Power cable, #4 Dead Logic board on the SSD/HDD.
 
None of these. Port OK , Cable OK, drive said i'm gone buy. Nothing more to see here. It's like i'm no longer there. No backups possible. One reason i can imagine i was doing poudriere. So many bits on this drive had to change from 1 to 0 or vice versa. Maybe i hit a limit. Strange on my same 4 times larger drive not a problem.
 
Back
Top