Solved do we really need to run zpool scrub in every month or week in freebsd14.2 ?

fff2024g · Friday at 9:03 AM

Dear all:
zfs is very good. but i saw some articles show me we will run a task (zpool scrub) in every month or week in freebsd14.2 ? that is really right ?
when we need to do this job ?
what benefit for us ?
the freebsd14.2 server need to do it , or freebsd14.2 destop need to do it ?
thanks.

SirDice · Friday at 9:12 AM

fff2024g said:
when we need to do this job ?

When you value your data.

fff2024g said:
what benefit for us ?

It'll fix errors, assuming you have a pool with redundant data (or else it'll only be able to detect errors, not correct them).

fff2024g · Friday at 10:47 AM

SirDice said:
When you value your data.

It'll fix errors, assuming you have a pool with redundant data (or else it'll only be able to detect errors, not correct them).

DEar sirdice :
thanks. i study it. thanks.

ralphbsz · Friday at 6:41 PM

As SirDice already side: It fixes or at least finds errors. If you have redundancy in your pool, that's particularly fabulous.

An important part of that is to exercise the disks. It's quite possible that a disk has latent errors: areas that are already damaged, but if you never read them, you won't find out. If you scrub the file system, those errors can be found. If you are lucky, the data is still readable, and the disk knows that it has a problem, and remaps that data other places. If you are less lucky, the data is already gone, but at least you know about it, and can take corrective action, such as buy a replacement disk, and try to migrate the data that is still OK. It's always better to have an early warning before things get worse.

Do you really "need to" run scrub every month? That brings up a really deep underlying question: how often should you scrub. And that's a question for which there is no scientific answer (I know rather well). On one hand, scrubbing helps (see above, finds and fixes problems, and finding problems early helps prevent small problems from turning into disasters). On the other hand, scrubbing also costs energy and prevents the CPU and disk from doing useful work. Even more important, it may wear out the disk: today, hard disks in particular have a limited amount of data they can read over their lifetime; SSDs are typically mostly limited by write endurance. That cost/benefit tradeoff is really difficult to establish, but most people in the industry settle for scrubbing roughly every week or every month.

cracauer@ · Friday at 9:35 PM

Disks can be mostly dead and not show up with problems until there is a major burst of activity.

So the situation could end up in:
- one disk dies
- you replace it and resilver
- resilver is heavy load
- under that heavy load some additional already half-dead disks quit
- now the pool is gone

You want to find half-dead disks as soon as possible - and one at a time.

mer · Friday at 9:43 PM

I don't have much to add, but consumer vs enterprise devices can factor into "how often".
Me personally, on my home systems, using consumer grade devices with redundancy (mirrors) I manually run scrub roughly every quarter (3 to 4 months). Enterprise and higher levels of redundancy may lead one to longer intervals between scrubs.

So, scrub adds load. Run scrub when load is lightest you can correct problems.

homeadm · 2025-01-07T04:38:05+0000

In my opinion, weekly scrubbing is a waste of HDD endurance. I think the optimal scrubbing interval depends on number of errors you encounter. I have currently extended the intervals to 8 months. So far I have not found any errors with more frequent scrubs, but I have new HDDs (1-2 years old).

I've just started scrubbing a pool, where I last scrubbed in May. I'll report back tomorrow.

homeadm · 2025-01-07T15:44:43+0000

Code:

root@NAS1:~# zpool status pool1
  pool: pool1
 state: ONLINE
  scan: scrub in progress since Tue Jan  7 03:43:31 2025
    11.1T scanned out of 11.1T at 255M/s, (scan is slow, no estimated time)
    0 repaired, 100.00% done
config:

        NAME                       STATE     READ WRITE CKSUM
        pool1                      ONLINE       0     0     0
          raidz1-0                 ONLINE       0     0     0
            c0t5000039B78D91AA8d0  ONLINE       0     0     0
            c0t5000039B78D91E0Cd0  ONLINE       0     0     0
            c0t5000039B48DBE974d0  ONLINE       0     0     0
            c0t5000039B78D9B9A3d0  ONLINE       0     0     0
            c0t5000039B78D8FB0Ed0  ONLINE       0     0     0
            c0t5000039B78D99998d0  ONLINE       0     0     0
            c0t5000039B78D8FA8Ed0  ONLINE       0     0     0
            c0t5000039B78D91813d0  ONLINE       0     0     0

errors: No known data errors

I have a twin of this pool with ReFS. Also nothing in scrub.

It seems that lightly-used pools, made of high-quality components, do not require frequent scrubbing.

ralphbsz · 2025-01-08T01:11:00+0000

tl;dr: I don't have an answer.

You will find no errors, for on average 100 years. If you have two disks, for about 50 years, three disks 33 years, and so on. That's just saying that a disk drive has roughly a million hours of MTBF, which immediately implies (Little's law) that the failure rate is the number above. And the "million" is a rough guess; consumer disks are more like 300K-500K hours, and enterprise disks more like 1.5M hours.

But one doesn't scrub to detect that the disk has failed. Because disks are not a fail-stop system, which works perfectly for a while (every bit is read at the performance from the spec sheet), and then goes completely off-line and doesn't talk at all. In reality, most disk failures are a gradual affair, first a few read errors that the disk can correct easily and rewrite data in place, then read errors that are still readable but the data needs to be written elsewhere (that's called revectoring), then the number of read errors overwhelms the spare space the manufacturer set aside for revectoring and some writes will start to fail due to an internal out-of-space condition, and then things go seriously pear-shaped. SSDs are somewhat similar, with more complexity due to over provisioning and whole block erase. So scrubbing is to some extent to force the disk to do a lot of work, which helps detect when it starts failing, at which point the amount of data unreadable is still very small, so normal redundancy techniques (be it RAID, be it backup, be it retyping that piece of code you wrote a month ago) can cope with it efficiently. Another issue with disks is that sometimes they fail silently, where a sector becomes unreadable, but if nobody ever reads it, nobody will know (if a tree falls in the forest ...). So reading it once in a while is vital.

Today, many disk drives are spec'ed to have about 500 TB/year of IO capacity. From that viewpoint, scrubbing once a year is probably a bit low; month or week is more in the ballpark. Once an hour or once a century is clearly insane. In the middle, there is a very large range of plausible scrub frequencies.

Solved do we really need to run zpool scrub in every month or week in freebsd14.2 ?

Administrator