ZFS Very poor zpool performance during resilvering

We have a fairly large zpool on our production imap server and we are experiencing miserable performance while a replacement spindle is resilvered into RAIDz.

Code:
$ sudo zpool status
Password:
  pool: zpool2
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep 19 16:05:14 2017
        31.2T scanned out of 31.3T at 43.2M/s, 0h27m to go
        1.69T resilvered, 99.78% done
config:

        NAME        STATE     READ WRITE CKSUM
        zpool2      ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            da45    ONLINE       0     0     0  (resilvering)
            da9     ONLINE       0     0     0
            da44    ONLINE       0     0     0
            da29    ONLINE       0     0     0
            da30    ONLINE       0     0     0
            da31    ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            da32    ONLINE       0     0     0
            da33    ONLINE       0     0     0
            da34    ONLINE       0     0     0
            da35    ONLINE       0     0     0
            da36    ONLINE       0     0     0
            da37    ONLINE       0     0     0
          raidz2-2  ONLINE       0     0     0
            da38    ONLINE       0     0     0
            da39    ONLINE       0     0     0
            da40    ONLINE       0     0     0
            da41    ONLINE       0     0     0
            da42    ONLINE       0     0     0
            da43    ONLINE       0     0     0
        logs
          ada0      ONLINE       0     0     0
        cache
          ada1      ONLINE       0     0     0
        spares
          da47      AVAIL   

errors: No known data errors

The resilvering has been reporting about 28 minutes remaining for the last 3 days. And the percentage complete has not changed.

The machine has 256G RAM and most measurements seem to report a working machine:

Code:
$ sudo zfs-stats -a

------------------------------------------------------------------------
ZFS Subsystem Report                            Thu Sep 28 10:35:12 2017
------------------------------------------------------------------------

System Information:

        Kernel Version:                         1002000 (osreldate)
        Hardware Platform:                      amd64
        Processor Architecture:                 amd64

        ZFS Storage pool Version:               5000
        ZFS Filesystem Version:                 5

FreeBSD 10.2-RELEASE-p7 #0: Mon Nov 2 14:19:39 UTC 2015 root
10:35AM  up 20 days, 23:36, 11 users, load averages: 1.66, 1.55, 2.09

------------------------------------------------------------------------

System Memory:

        2.99%   7.45    GiB Active,     56.46%  140.81  GiB Inact
        37.94%  94.60   GiB Wired,      0.05%   115.59  MiB Cache
        2.57%   6.40    GiB Free,       0.00%   1.34    MiB Gap

        Real Installed:                         256.00  GiB
        Real Available:                 99.98%  255.94  GiB
        Real Managed:                   97.43%  249.37  GiB

        Logical Total:                          256.00  GiB
        Logical Used:                   42.45%  108.68  GiB
        Logical Free:                   57.55%  147.32  GiB

Kernel Memory:                                  9.01    GiB
        Data:                           99.70%  8.98    GiB
        Text:                           0.30%   27.56   MiB

Kernel Memory Map:                              249.37  GiB
        Size:                           14.55%  36.30   GiB
        Free:                           85.45%  213.08  GiB

------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                204.23m
        Recycle Misses:                         57.43m
        Mutex Misses:                           54.73k
        Evict Skips:                            882.39m

ARC Size:                               24.58%  61.05   GiB
        Target Size: (Adaptive)         24.59%  61.08   GiB
        Min Size (Hard Limit):          12.50%  31.05   GiB
        Max Size (High Water):          8:1     248.37  GiB

ARC Size Breakdown:
        Recently Used Cache Size:       93.75%  57.27   GiB
        Frequently Used Cache Size:     6.25%   3.82    GiB

ARC Hash Breakdown:
        Elements Max:                           23.67m
        Elements Current:               83.51%  19.77m
        Collisions:                             130.85m
        Chain Max:                              11
        Chains:                                 4.05m

------------------------------------------------------------------------

ARC Efficiency:                                 8.40b
        Cache Hit Ratio:                96.01%  8.06b
        Cache Miss Ratio:               3.99%   335.10m
        Actual Hit Ratio:               90.98%  7.64b

        Data Demand Efficiency:         96.75%  3.23b
        Data Prefetch Efficiency:       4.59%   35.17m

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             4.52%   364.56m
          Most Recently Used:           7.39%   595.82m
          Most Frequently Used:         87.37%  7.05b
          Most Recently Used Ghost:     0.20%   15.86m
          Most Frequently Used Ghost:   0.52%   42.29m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  38.72%  3.12b
          Prefetch Data:                0.02%   1.62m
          Demand Metadata:              54.47%  4.39b
          Prefetch Metadata:            6.79%   547.67m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  31.26%  104.77m
          Prefetch Data:                10.01%  33.56m
          Demand Metadata:              33.64%  112.71m
          Prefetch Metadata:            25.09%  84.06m

------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
        Passed Headroom:                        112.36m
        Tried Lock Failures:                    908.97k
        IO In Progress:                         353.00k
        Low Memory Aborts:                      2.51k
        Free on Write:                          127.08k
        Writes While Full:                      79.91k
        R/W Clashes:                            7.30k
        Bad Checksums:                          0
        IO Errors:                              0
        SPA Mismatch:                           123.53m

L2 ARC Size: (Adaptive)                         934.70  GiB
        Header Size:                    0.36%   3.40    GiB

L2 ARC Breakdown:                               335.10m
        Hit Ratio:                      28.05%  94.01m
        Miss Ratio:                     71.95%  241.09m
        Feeds:                                  1.81m

L2 ARC Buffer:
        Bytes Scanned:                          133.48  TiB
        Buffer Iterations:                      1.81m
        List Iterations:                        115.01m
        NULL List Iterations:                   688.18k

L2 ARC Writes:
        Writes Sent:                    100.00% 338.79k

------------------------------------------------------------------------

File-Level Prefetch: (HEALTHY)

DMU Efficiency:                                 101.15b
        Hit Ratio:                      88.78%  89.80b
        Miss Ratio:                     11.22%  11.35b

        Colinear:                               11.35b
          Hit Ratio:                    0.01%   591.54k
          Miss Ratio:                   99.99%  11.35b

        Stride:                                 89.04b
          Hit Ratio:                    99.98%  89.03b
          Miss Ratio:                   0.02%   17.81m

DMU Misc:
        Reclaim:                                11.35b
          Successes:                    0.26%   29.58m
          Failures:                     99.74%  11.32b

        Streams:                                745.51m
          +Resets:                      0.01%   52.30k
          -Resets:                      99.99%  745.46m
          Bogus:                                0

------------------------------------------------------------------------

VDEV Cache Summary:                             229.62m
        Hit Ratio:                      28.64%  65.76m
        Miss Ratio:                     59.79%  137.30m
        Delegations:                    11.57%  26.56m

------------------------------------------------------------------------

ZFS Tunables (sysctl):
        kern.maxusers                           16716
        vm.kmem_size                            267761856512
        vm.kmem_size_scale                      1
        vm.kmem_size_min                        0
        vm.kmem_size_max                        1319413950874
        vfs.zfs.trim.max_interval               1
        vfs.zfs.trim.timeout                    30
        vfs.zfs.trim.txg_delay                  32
        vfs.zfs.trim.enabled                    1
        vfs.zfs.vol.unmap_enabled               1
        vfs.zfs.vol.mode                        1
        vfs.zfs.version.zpl                     5
        vfs.zfs.version.spa                     5000
        vfs.zfs.version.acl                     1
        vfs.zfs.version.ioctl                   4
        vfs.zfs.debug                           0
        vfs.zfs.super_owner                     0
        vfs.zfs.sync_pass_rewrite               2
        vfs.zfs.sync_pass_dont_compress         5
        vfs.zfs.sync_pass_deferred_free         2
        vfs.zfs.zio.exclude_metadata            0
        vfs.zfs.zio.use_uma                     1
        vfs.zfs.cache_flush_disable             0
        vfs.zfs.zil_replay_disable              0
        vfs.zfs.min_auto_ashift                 12
        vfs.zfs.max_auto_ashift                 13
        vfs.zfs.vdev.trim_max_pending           10000
        vfs.zfs.vdev.bio_delete_disable         0
        vfs.zfs.vdev.bio_flush_disable          0
        vfs.zfs.vdev.write_gap_limit            4096
        vfs.zfs.vdev.read_gap_limit             32768
        vfs.zfs.vdev.aggregation_limit          131072
        vfs.zfs.vdev.trim_max_active            64
        vfs.zfs.vdev.trim_min_active            1
        vfs.zfs.vdev.scrub_max_active           2
        vfs.zfs.vdev.scrub_min_active           1
        vfs.zfs.vdev.async_write_max_active     10
        vfs.zfs.vdev.async_write_min_active     1
        vfs.zfs.vdev.async_read_max_active      3
        vfs.zfs.vdev.async_read_min_active      1
        vfs.zfs.vdev.sync_write_max_active      10
        vfs.zfs.vdev.sync_write_min_active      10
        vfs.zfs.vdev.sync_read_max_active       10
        vfs.zfs.vdev.sync_read_min_active       10
        vfs.zfs.vdev.max_active                 1000
        vfs.zfs.vdev.async_write_active_max_dirty_percent60
        vfs.zfs.vdev.async_write_active_min_dirty_percent30
        vfs.zfs.vdev.mirror.non_rotating_seek_inc1
        vfs.zfs.vdev.mirror.non_rotating_inc    0
        vfs.zfs.vdev.mirror.rotating_seek_offset1048576
        vfs.zfs.vdev.mirror.rotating_seek_inc   5
        vfs.zfs.vdev.mirror.rotating_inc        0
        vfs.zfs.vdev.trim_on_init               1
        vfs.zfs.vdev.cache.bshift               16
        vfs.zfs.vdev.cache.size                 67108864
        vfs.zfs.vdev.cache.max                  65536
        vfs.zfs.vdev.metaslabs_per_vdev         200
        vfs.zfs.txg.timeout                     5
        vfs.zfs.space_map_blksz                 4096
        vfs.zfs.spa_slop_shift                  5
        vfs.zfs.spa_asize_inflation             24
        vfs.zfs.deadman_enabled                 1
        vfs.zfs.deadman_checktime_ms            5000
        vfs.zfs.deadman_synctime_ms             1000000
        vfs.zfs.recover                         0
        vfs.zfs.spa_load_verify_data            1
        vfs.zfs.spa_load_verify_metadata        1
        vfs.zfs.spa_load_verify_maxinflight     10000
        vfs.zfs.check_hostid                    1
        vfs.zfs.mg_fragmentation_threshold      85
        vfs.zfs.mg_noalloc_threshold            0
        vfs.zfs.condense_pct                    200
        vfs.zfs.metaslab.bias_enabled           1
        vfs.zfs.metaslab.lba_weighting_enabled  1
        vfs.zfs.metaslab.fragmentation_factor_enabled1
        vfs.zfs.metaslab.preload_enabled        1
        vfs.zfs.metaslab.preload_limit          3
        vfs.zfs.metaslab.unload_delay           8
        vfs.zfs.metaslab.load_pct               50
        vfs.zfs.metaslab.min_alloc_size         33554432
        vfs.zfs.metaslab.df_free_pct            4
        vfs.zfs.metaslab.df_alloc_threshold     131072
        vfs.zfs.metaslab.debug_unload           0
        vfs.zfs.metaslab.debug_load             0
        vfs.zfs.metaslab.fragmentation_threshold70
        vfs.zfs.metaslab.gang_bang              16777217
        vfs.zfs.free_max_blocks                 -1
        vfs.zfs.no_scrub_prefetch               0
        vfs.zfs.no_scrub_io                     0
        vfs.zfs.resilver_min_time_ms            3000
        vfs.zfs.free_min_time_ms                1000
        vfs.zfs.scan_min_time_ms                1000
        vfs.zfs.scan_idle                       50
        vfs.zfs.scrub_delay                     4
        vfs.zfs.resilver_delay                  2
        vfs.zfs.top_maxinflight                 32
        vfs.zfs.zfetch.array_rd_sz              1048576
        vfs.zfs.zfetch.block_cap                256
        vfs.zfs.zfetch.min_sec_reap             2
        vfs.zfs.zfetch.max_streams              8
        vfs.zfs.prefetch_disable                0
        vfs.zfs.delay_scale                     500000
        vfs.zfs.delay_min_dirty_percent         60
        vfs.zfs.dirty_data_sync                 67108864
        vfs.zfs.dirty_data_max_percent          10
        vfs.zfs.dirty_data_max_max              4294967296
        vfs.zfs.dirty_data_max                  4294967296
        vfs.zfs.max_recordsize                  1048576
        vfs.zfs.mdcomp_disable                  0
        vfs.zfs.nopwrite_enabled                1
        vfs.zfs.dedup.prefetch                  1
        vfs.zfs.l2c_only_size                   977364976128
        vfs.zfs.mfu_ghost_data_lsize            1179619840
        vfs.zfs.mfu_ghost_metadata_lsize        4112842752
        vfs.zfs.mfu_ghost_size                  5292462592
        vfs.zfs.mfu_data_lsize                  1801214976
        vfs.zfs.mfu_metadata_lsize              14357702656
        vfs.zfs.mfu_size                        26676616192
        vfs.zfs.mru_ghost_data_lsize            8783111680
        vfs.zfs.mru_ghost_metadata_lsize        49993475072
        vfs.zfs.mru_ghost_size                  58776586752
        vfs.zfs.mru_data_lsize                  2081698816
        vfs.zfs.mru_metadata_lsize              2410155520
        vfs.zfs.mru_size                        6321306112
        vfs.zfs.anon_data_lsize                 0
        vfs.zfs.anon_metadata_lsize             0
        vfs.zfs.anon_size                       1586417152
        vfs.zfs.l2arc_norw                      1
        vfs.zfs.l2arc_feed_again                1
        vfs.zfs.l2arc_noprefetch                1
        vfs.zfs.l2arc_feed_min_ms               200
        vfs.zfs.l2arc_feed_secs                 1
        vfs.zfs.l2arc_headroom                  2
        vfs.zfs.l2arc_write_boost               8388608
        vfs.zfs.l2arc_write_max                 8388608
        vfs.zfs.arc_meta_limit                  66672028672
        vfs.zfs.arc_free_target                 453222
        vfs.zfs.arc_shrink_shift                5
        vfs.zfs.arc_average_blocksize           8192
        vfs.zfs.arc_min                         33336014336
        vfs.zfs.arc_max                         266688114688

------------------------------------------------------------------------
 
Is the ZFS pool more than 80% filled with data? If it is resilvering can take more than a month. I know it is ridiculous but this is one of those things which make you want use hardware RAID and HAMMER.
 
Is the ZFS pool more than 80% filled with data? If it is resilvering can take more than a month. I know it is ridiculous but this is one of those things which make you want use hardware RAID and HAMMER.
That's actually an interesting question. Because this is an imap server, we save an extensive history of snapshots lest someone accidentally delete something. But even with that, we're not approaching 80%:
Code:
$ zfs list -o space
NAME                  AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
zpool2                42.0T  20.9T     16.0K    192K              0      20.9T
zpool2/CyrusDB-Sieve  42.0T   311G      310G    509M              0          0
zpool2/SAVES          42.0T   255G     1.37G    254G              0          0
zpool2/a              42.0T  8.70T     7.78T    943G              0          0
zpool2/b              42.0T   448G      164G    283G              0          0
zpool2/c              42.0T   890G      205G    685G              0          0
zpool2/d              42.0T  1002G      237G    765G              0          0
zpool2/e              42.0T   453G      119G    334G              0          0
zpool2/f              42.0T   122G     23.8G   97.9G              0          0
zpool2/g              42.0T   305G     74.4G    230G              0          0
zpool2/h              42.0T   411G      117G    293G              0          0
zpool2/i              42.0T  60.7G     19.5G   41.2G              0          0
zpool2/j              42.0T  1.79T      502G   1.30T              0          0
zpool2/k              42.0T   802G      158G    644G              0          0
zpool2/l              42.0T   506G      151G    355G              0          0
zpool2/m              42.0T  1.40T      351G   1.05T              0          0
zpool2/n              42.0T   266G      107G    160G              0          0
zpool2/o              42.0T  36.3G     8.65G   27.7G              0          0
zpool2/p              42.0T   350G     55.2G    295G              0          0
zpool2/q              42.0T   911M      328M    583M              0          0
zpool2/r              42.0T   715G      166G    550G              0          0
zpool2/s              42.0T  1.07T      243G    855G              0          0
zpool2/t              42.0T   529G      142G    387G              0          0
zpool2/u              42.0T  79.4G     66.1G   13.2G              0          0
zpool2/v              42.0T   152G     44.9G    108G              0          0
zpool2/w              42.0T   214G     71.8G    143G              0          0
zpool2/x              42.0T  15.0G     2.62G   12.4G              0          0
zpool2/y              42.0T  84.3G     18.0G   66.4G              0          0
zpool2/z              42.0T  58.4G     14.5G   43.9G              0          0
 
Back
Top