How often do you scrub?

dvl@

Developer
At present, my ZFS array scrubs every 7 days.

How often do you scrub?

In the past two years, I've seen no errors. *knock* *knock* Or rather, if I have seen errors, I've completely forgotten about them.

Given the rarity of such errors, I'm tempted to increase the scrub period to every 21 days.

Comments?
 
I scrub our storage servers once a month, as it can take over a week to do a complete scrub on one of the boxes. These all have multiple raidz2 vdevs and 20-odd TB of storage. I've found a few dying drives this way (zpool status shows increasing errors, SMART shows errors, drives fall out of pool, etc).
 
phoenix said:
I scrub our storage servers once a month, as it can take over a week to do a complete scrub on one of the boxes.
Are you using dedup or something else that slows scrubs way down? Scrubs here run at about 2TB/hour:
Code:
[0] rz1:~> zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0 in 6h46m with 0 errors on Thu Nov  8 22:28:05 2012
config:

        NAME             STATE     READ WRITE CKSUM
        data             ONLINE       0     0     0
          raidz1-0       ONLINE       0     0     0
            label/twd0   ONLINE       0     0     0
            label/twd1   ONLINE       0     0     0
            label/twd2   ONLINE       0     0     0
            label/twd3   ONLINE       0     0     0
            label/twd4   ONLINE       0     0     0
          raidz1-1       ONLINE       0     0     0
            label/twd5   ONLINE       0     0     0
            label/twd6   ONLINE       0     0     0
            label/twd7   ONLINE       0     0     0
            label/twd8   ONLINE       0     0     0
            label/twd9   ONLINE       0     0     0
          raidz1-2       ONLINE       0     0     0
            label/twd10  ONLINE       0     0     0
            label/twd11  ONLINE       0     0     0
            label/twd12  ONLINE       0     0     0
            label/twd13  ONLINE       0     0     0
            label/twd14  ONLINE       0     0     0
        logs
          da0            ONLINE       0     0     0
        spares
          label/twd15    AVAIL   

errors: No known data errors
[0] rz1:~> df -h /data
Filesystem    Size    Used   Avail Capacity  Mounted on
data           21T     12T    9.1T    57%    /data
 
Maybe thousands of snapshots dating years back?

I scrub once a week, it's much smaller pool, 6TB, does it in about 10 hours.
 
40-odd filesystems, each with 300-odd snapshots, all deduped and compressed. The slowest box is also 90% full and takes about 200 hours to resilver a disk or scrub the pool. The fastest pool takes a little over 30 hours.
 
phoenix said:
40-odd filesystems, each with 300-odd snapshots, all deduped and compressed. The slowest box is also 90% full and takes about 200 hours to resilver a disk or scrub the pool. The fastest pool takes a little over 30 hours.

Yeah of course it takes a very long time when the system is almost full, but it's strange though, that you can have scrub and resilver performance of what Terry here described, but as soon as you go near dedup, that performance drops down to a couple of MB/s, and that's regardless of how full it is. Well, that's my experience of it.

Scrubbing on our systems is done periodically from cron every sunday. I think that it is written in the Admin guide that it is considered best practice on enterprise systems, but for home systems once a month is sufficient. Of course it also depends on how fast your systems actually are able to scrub, like phoenix's systems e.g. but with his savings on dedup, I wouldn't be complaining:)

Phoenix, how much savings was it you had, counting both compression and dedup? x6, x7?

/Sebulon
 
Terry_Kennedy said:
Are you using dedup or something else that slows scrubs way down? Scrubs here run at about 2TB/hour:

Hmmm, mine takes about 12 hours:

Code:
 scan: scrub repaired 0 in 11h54m with 0 errors on Sat Nov 10 15:10:45 2012

That's 8x2TB drives in a raidz2, so let's say 13TB.

Code:
$ zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
storage  12.7T  7.51T  5.18T    59%  1.00x  ONLINE  -

Mine is more like 500G/hour
 
Code:
echo "daily_scrub_zfs_enable=YES" >> /etc/periodic.conf"

Look /etc/periodic/daily/800.scrubs-zfs to see what it does and what you can do
 
Until recently almost never because the pool got so full and fragmented (due to some youthful ZFS inexperience) it took forever to complete.
Now I've ashifted the pool to 12 by zfs sending the entire thing around, rearranging things and adding drives in the process, so now the scrub takes 5 hours tops and I do it every weekend during the night.
 
Sebulon said:
Phoenix, how much savings was it you had, counting both compression and dedup? x6, x7?

Backups server for schools:
Code:
storage                          23.7T  2.06T   256K  none
dedup = 1.79, compress = 1.59, copies = 1.05, dedup * compress / copies = 2.71
So, about 50 TB of data in 24 TB of disk. This box has 20 GB of RAM (needs more).

Backups for admin sites:
Code:
storage                                 47.1T  4.87T   288K  none
dedup = 3.86, compress = 1.61, copies = 1.06, dedup * compress / copies = 5.88
So, about 275 TB of data in 47 TB of disk. This box has 64 GB of RAM (works well).

Off-site replica of the above two servers:
Code:
storage                                 39.7T  10.6T   288K  none
dedup = 2.41, compress = 1.55, copies = 1.06, dedup * compress / copies = 3.52
So, about 140 TB of data in 40 TB of disk. This box has 32 GB of RAM (needs more).
 
dvl@ said:
Hmmm, mine takes about 12 hours:

Code:
 scan: scrub repaired 0 in 11h54m with 0 errors on Sat Nov 10 15:10:45 2012

That's 8x2TB drives in a raidz2, so let's say 13TB.

Code:
$ zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
storage  12.7T  7.51T  5.18T    59%  1.00x  ONLINE  -

Mine is more like 500G/hour

Scrub is underway now:

Code:
$ zpool status storage
  pool: storage
 state: ONLINE
 scan: scrub in progress since Sun Nov 18 03:16:11 2012
    6.20T scanned out of 7.77T at 141M/s, 3h14m to go
    0 repaired, 79.81% done

141M/s == 8460M/minute = 507600M/hour = 495G/hour

OK, my estimate was more or less accurate.

FYI, most of this cluster is compressed and all of it is running off these HDD:

- Hitachi GST Deskstar HD32000 IDK/7K (0S00164) 2TB 7200 RPM 32MB Cache SATA 3.0Gb/s

using this controller:

- SYBA SY-PEX40008 PCI Express SATA II (3.0Gb/s)

Some more information here on this box.
 
Back
Top