I'm proud of my FreeBSD server(s)

_martin · Oct 11, 2023

Don't we all are, eh ?

I've one server that is "storage backend" for few things. It used to be OpenSolaris (ZFS) but I migrated it to FreeBSD in 2012. All my servers have a script that runs daily and keeps output of few things in a log file. But on this server I forgot to schedule housekeeping and those logs just kept adding up. First log is from Sep 30, 2014 (probably fresh install from 9 to 10). I'm able to backtrack each release's life:

Code:

# grep -H ^kernel\ ver * | uniq -c -f 2
  10 checkconfig.20140930:kernel version:        10.0-RELEASE-p7
  33 checkconfig.20141010:kernel version:        10.0-RELEASE-p9
   8 checkconfig.20141112:kernel version:        10.0-RELEASE-p12
  56 checkconfig.20141121:kernel version:        10.1-RELEASE
  53 checkconfig.20150116:kernel version:        10.1-RELEASE-p4
 112 checkconfig.20150310:kernel version:        10.1-RELEASE-p6
 107 checkconfig.20150630:kernel version:        10.1-RELEASE-p13
  93 checkconfig.20151015:kernel version:        10.2-RELEASE-p5
  42 checkconfig.20160116:kernel version:        10.2-RELEASE-p10
  41 checkconfig.20160227:kernel version:        10.2-RELEASE-p12
  41 checkconfig.20160408:kernel version:        10.3-RELEASE
  67 checkconfig.20160519:kernel version:        10.3-RELEASE-p3
  87 checkconfig.20160725:kernel version:        10.3-RELEASE-p5
 107 checkconfig.20180510:kernel version:        11.1-RELEASE-p10
 453 checkconfig.20180825:kernel version:        11.2-RELEASE-p2
 345 checkconfig.20191121:kernel version:        12.1-RELEASE-p1
 104 checkconfig.20201031:kernel version:        12.2-RELEASE
 271 checkconfig.20210213:kernel version:        12.2-RELEASE-p3
  40 checkconfig.20211111:kernel version:        12.2-RELEASE-p11
 346 checkconfig.20211221:kernel version:        12.3-RELEASE
 145 checkconfig.20221202:kernel version:        12.3-RELEASE-p10
  78 checkconfig.20230426:kernel version:        12.4-RELEASE-p1
  21 checkconfig.20230713:kernel version:        12.4-RELEASE-p3
  70 checkconfig.20230803:kernel version:        12.4-RELEASE-p4
#

With a remarkable life of my disks in pools:

Code:

# for i in `seq 7`; do smartctl -a /dev/da$i; done | grep Power_On_Hours
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       93828
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       93827
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       15431
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       93826
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       93829
  9 Power_On_Hours_and_Msec 0x0032   000   000   000    Old_age   Always       -       987466h+46m+12.920s
  9 Power_On_Hours          0x0032   001   001   000    Old_age   Always       -       93828

Now that I call power to serve.

jbo@ · Oct 11, 2023

Yep, I can only +1 that. FreeBSD is ridiculously stable and requires minimal maintenance. I have never looked back.

_martin · Oct 11, 2023

It is, indeed. But funny enough that housekeeping script was not scheduled most likely because of an annoying PF bug introduced in 10 where I just gave up and reinstalled OS from scratch to see if that helps (yeah, that desperate move). I may have even posted here about it back in the day. Since then I have my own banner in pf.conf:

Code:

###
# --------------------------------------------------------------
#  rdr pass is not working properly (session disconects) due to regression bug in
#  FreeBSD 10.x PF .. we need to pass traffic comming from rdr here
# --------------------------------------------------------------

doing rdr and pass in separate rules instead of rdr pass. Old habits die hard, I didn't try if it works now (wasn't in 11 I think) .

gpw928 · Oct 11, 2023

That's some very good numbers. Those disks have been spinning for nearly 11 years. What make/model are they?

_martin · Oct 12, 2023

This set is:

Code:

# for i in `seq 7`; do smartctl -i /dev/da$i; done  | grep Device\ M
Device Model:     WDC WD20EFRX-68AX9N0
Device Model:     WDC WD20EFRX-68AX9N0
Device Model:     WDC WD2503ABYX-01WERA1
Device Model:     WDC WD20EFRX-68AX9N0
Device Model:     WDC WD20EFRX-68AX9N0
Device Model:     INTEL SSDSC2CW120A3
Device Model:     WDC WD20EFRX-68AX9N0

True, I can't be mad if these disks just stop now.

gpw928 · Oct 12, 2023

I started seven years ago with five 3TB WD Reds (WD30EFRX-68AX9N0).

There are only two left still working.

So either your environment is better than mine (mine was challenging those disks for some time), or the 2TB Reds were more durable.

I have improved the environment considerably, and now replacing all failures with 4TB Seagate enterprise class drives -- ST4000NM0035.

Jose · Oct 12, 2023

I have some apparently immortal Hitachi 500GB drives that are due to fail any day now. I'm ready for my punishment.

_martin · Oct 12, 2023

gpw928 said:
So either your environment is better than mine (mine was challenging those disks for some time), or the 2TB Reds were more durable.

Server is in DC with stable temperatures. They even survived DC relocation from one place to the other.

They for sure exceeded my expectations. I was planning to upgrade the pool by 3 TB WD Reds few years ago but as these WDs kept going I didn't. At this point the whole server would be replaced and disks probably reused for home use as a "cache".

I used to have Seagates in rpool. Actually as I do have those logs I can fetch that info:

Code:

ST3250318AS
WDC WD2503ABYX-01WERA1

And already at that time (Sep 30, 2014) one of the ST disk was replaced. They used to be 80GB mirror pool of ST disks.

kpedersen · Oct 12, 2023

Jose said:
I have some apparently immortal Hitachi 500GB drives that are due to fail any day now. I'm ready for my punishment.

The exhilaration of risk is what keeps us alive!

mer · Oct 12, 2023

I've a couple of WD Reds that are still chugging along with Power On hours of 20809 and 46777. One gives CAM retries when doing a scrub on about half dozen blocks but no other reported errors.
Everything I've seen the WD Reds and Hitachis are about the "gold standard" for spinning drives.

recluce · Oct 13, 2023

My log files typically do not get that old, but here is the head of the bsdinstall_log on my home storage server:

Code:

root@storezilla2:/var/log # less bsdinstall_log
DEBUG: Running installation step: auto
DEBUG: dialog.subr: DEBUG_SELF_INITIALIZE=[]
DEBUG: UNAME_S=[FreeBSD] UNAME_P=[amd64] UNAME_R=[10.0-RC5]
DEBUG: common.subr: Successfully loaded.
DEBUG: Began Installation at Fri Jan 17 01:39:18 UTC 2014

Since then, the server has been cloned (zfs send and receive route) over to new hardware and received its second and third set of hard drives (first set now retired). Just recently upgraded from 12.4 to 13.2. System was and is 100% stable.

_martin · Nov 10, 2023

I was rooting they will live till their 11 years .. but this night patching was enough for one of them:

Code:

(da0:mps0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da0:mps0:0:0:0): Info: 0xe8e08640
(da0:mps0:0:0:0): Error 5, Unretryable error
(da0:mps0:0:0:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da0:mps0:0:0:0): CAM status: SCSI Status Error

Passed away on its 94542 hour.

Jose · Nov 11, 2023

_martin said:
I was rooting they will live till their 11 years .. but this night patching was enough for one of them:

Code:

(da0:mps0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da0:mps0:0:0:0): Info: 0xe8e08640 (da0:mps0:0:0:0): Error 5, Unretryable error (da0:mps0:0:0:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00 (da0:mps0:0:0:0): CAM status: SCSI Status Error

Passed away on its 94542 hour.

I observed a moment of silence for its passing.