ZFS ZFS resilvering very slow

circus78 · May 8, 2019

Hi,
I am using 11.1-RELEASE-p13 (I know, I should upgrade as soon as possible) on a physical server with 4 x 3 TB sata hard disks.
I wasn't able to disable raid hardware controller, so I created 4 "logical units", in a manner that ZFS could see 4 different disks:

Code:

# camcontrol devlist
<AMCC 9650SE-4LP DISK 3.08>        at scbus6 target 0 lun 0 (pass0,da0)
<AMCC 9650SE-4LP DISK 3.08>        at scbus6 target 1 lun 0 (pass1,da1)
<AMCC 9650SE-4LP DISK 3.08>        at scbus6 target 2 lun 0 (pass2,da2)
<AMCC 9650SE-4LP DISK 3.08>        at scbus6 target 3 lun 0 (pass3,da3)

Today I replaced da0, zfs stared to automatically resilver pool, I just noticed that it seems a bit slow:

Code:

# zpool status
  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed May  8 17:26:49 2019
        61.5G scanned out of 3.05T at 8.84M/s, 98h25m to go
        15.4G resilvered, 1.97% done
config:

        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          raidz1-0    ONLINE       0     0     0
            gpt/zfs0  ONLINE       0     0     7  (resilvering)
            da0p3     ONLINE       0     0     0
            da1p3     ONLINE       0     0     0
            da2p3     ONLINE       0     0     0

errors: No known data errors

Is it normal in your opinion?
Thank you very much

tingo · May 8, 2019

let it run for a while, and see if the speed picks up. "Normal" is hard to judge - too many factors to guess at (controller hw, fw, hard drives, sata speed and so on).

SirDice · May 9, 2019

If the filesystems are actively being used the resilvering will take much longer. Resilvering a 3TB disk took almost a day on my server last time. My pool is at 90-95% capacity so there's a lot to sync.

phoenix · May 10, 2019

It also depends on the amount of fragmentation in the pool.

Scrub / resilver work through the pool in transaction ID order (from oldest transaction to the newest), and not in the order of the bytes on disk. So, if there's lots of fragmentation, lots of old snapshots, lots of differences between the snapshots, the scrub/resilver will be slow. Or if the pool is full, or in heavy use.

There are a handful of sysctls you can tweak to improve things, depending on the amount of RAM and how much I/O is happening at the same time.

Code:

# ZFS resilver tuning
#         taken from:  http://broken.net/uncategorized/zfs-performance-tuning-for-scrubs-and-resilvers/
#         These can't be set in /boot/loader.conf
vfs.zfs.scrub_delay=0                # Prioritise scrub    over normal writes (default 4)
vfs.zfs.top_maxinflight=8192            # Up the number of in-flight I/O (default 32)
vfs.zfs.resilver_min_time_ms=5000        # Up the length of time a resilver process takes in each TXG (default 3000)
vfs.zfs.vdev.max_pending=64            # Set the queue depth (number of I/O) for each vdev
  #-->                        # Set it to a high number (32+).  Then monitor the L(q) column in gstat under load.
  #-->                        # Set it to just under the highest number you see.

The top_maxinflight and resilver_min_time_ms are the two that make the most difference. The former uses RAM to store that number of I/O requests in memory to be reordered before being written to disk. The latter sets how long resilver process gets access to the pool before something else can access is.

circus78 · May 12, 2019

SirDice said:
If the filesystems are actively being used the resilvering will take much longer. Resilvering a 3TB disk took almost a day on my server last time. My pool is at 90-95% capacity so there's a lot to sync.

Hi SirDice, yes, in my case too is being used (for backup, so lots of write IO).
It is almost finished, anyway:

Code:

# zpool status
  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed May  8 17:26:49 2019
        2.82T scanned out of 3.05T at 8.81M/s, 7h31m to go
        721G resilvered, 92.53% done
config:

        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          raidz1-0    ONLINE       0     0     0
            gpt/zfs0  ONLINE       0     0    31  (resilvering)
            da0p3     ONLINE       0     0     0
            da1p3     ONLINE       0     0     0
            da2p3     ONLINE       0     0     0

PMc · May 12, 2019

I find it helpful in these cases to obersve some zpool iostat -v 3 (to see what is actually moved from/to the disks) and systat -v (to see how much busy-time the disks have, as perceived by the OS, and how big the requests tend to be.

ZFS ZFS resilvering very slow

circus78

tingo

SirDice

Administrator

phoenix

circus78

PMc