ZFS ZFS Performance Issue

Paul Houselander · Nov 18, 2016

Hi

I have a FreeBSD 10.2-RELEASE-p9 system with a fairly large zpool.

Code:

root@freebsd03:~ # zpool list
NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
s12d33  54.5T  49.5T  5.02T         -    49%    90%  1.00x  ONLINE  -

root@freebsd03:~ # zpool status
  pool: s12d33
 state: ONLINE
  scan: none requested
config:

        NAME                           STATE     READ WRITE CKSUM
        s12d33                         ONLINE       0     0     0
          raidz2-0                     ONLINE       0     0     0
            multipath/J12F12-1EJDBAEJ  ONLINE       0     0     0
            multipath/J12F13-1EJDGHWJ  ONLINE       0     0     0
            multipath/J12F14-1EJAWSMJ  ONLINE       0     0     0
            multipath/J12F15-1EJDGL9J  ONLINE       0     0     0
            multipath/J12F16-1EJAUE5J  ONLINE       0     0     0
          raidz2-1                     ONLINE       0     0     0
            multipath/J12F17-1EJD9K1J  ONLINE       0     0     0
            multipath/J12F18-1EJAUZ4J  ONLINE       0     0     0
            multipath/J12F19-1EJ9PP2J  ONLINE       0     0     0
            multipath/J12F20-1EJ7X50J  ONLINE       0     0     0
            multipath/J12F21-1EJAUNKJ  ONLINE       0     0     0

errors: No known data errors

It's a backup server and does a lot of read/writes and has been working fine and performance has been ok.

The server also runs zrep (zfs send/recv script) and replicates to another server, it syncs every 10 minutes and normally that's enough of a window to complete a send/recv.

Lately the backup server application has been struggling and becomes very sluggish, the zfs send/recv is taking longer and longer and once it goes down this path the server I/O seems to get slower and slower to the point where I have to reboot the server, the console is fine and responsive but running any zfs commands take a while to come back e.g. "zfs list" might take 30-40 seconds, normally its instant.

It will come up again and work fine for 4/5 days and then go again, although the pattern does seem to indicate its more likely to struggle when its under heavier load.

The server has plenty of memory and I also limit the ARC memory to 4G via vfs.zfs.arc_max="4G" in /boot/loader.conf

After researching I believe my issue is down to using to much space, the zpool is around 90% full and I now understand performance can start to degrade anywhere after 80%?

I was wondering what my options were, would any of the following help

1. Adding another vdev to the pool to increase capacity to bring it back under 80% utilization?
2. Deleting some data?

From researching it would appear the best course of action is to build another (larger) pool and transfer the data out, with the amount of data I have and the frequency it updates that's not going to be an easy option so was hoping it can be improved without transferring the whole lot?

Any assistance appreciated.

Kind Regards

Paul

SirDice · Nov 18, 2016

Paul Houselander said:
After researching I believe my issue is down to using to much space, the zpool is around 90% full and I now understand performance can start to degrade anywhere after 80%?

That's most likely the reason. Remember that ZFS is a copy-on-write filesystem.

Paul Houselander said:
I was wondering what my options were, would any of the following help

1. Adding another vdev to the pool to increase capacity to bring it back under 80% utilization?
2. Deleting some data?

Either one will do the trick.

gkontos · Nov 18, 2016

Also, of you have plenty of RAM, then please do not limit the ARC to 4GB. You are choking the system.

sko · Nov 20, 2016

As the vdevs consist of mpath-devices that, considering the total size of the pool, might consist of multiple drives or "virtual" blockdevices exposed by another filesystem: what additional layer lurks underneath ZFS?

The main reasons should be the high allocation and extremely small ARC, but if you still have issues after fixing these, you might have a look (and remove/reduce) any additional layers. ZFS performs best when talking directly to hardware and acts in the weirdest ways when running on top on other layers/filesystems that might choke on the load ZFS places on them, especially if the actual disks underneath are shared between blockdevices. (ZFS on top of block devices exposed by LVM are a great way to bring performance of both systems involved to its knees...)

Paul Houselander · Nov 21, 2016

Hi

Thanks for all your advice, I am currently reducing the amount of data to see if that helps.

I tried increasing the ARC by a few more GB to 6GB - the reason I had it set low was the server has 24GB RAM, the backup application is JAVA based and I have set a limit of 10GB RAM and I wanted to allow enough RAM just for the OS.

Also because the data is backup data it doesn't generally get read, i.e. it's not a file server where people are accessing it frequently. However I did install the zfs-stats program and the cache hits seem high so I think I've misunderstood what its used for - the stats below are about 7 hours after a reboot of the server

Code:

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                472.91k
        Recycle Misses:                         203.08k
        Mutex Misses:                           22
        Evict Skips:                            681.55k

ARC Size:                               100.00% 6.00    GiB
        Target Size: (Adaptive)         100.00% 6.00    GiB
        Min Size (Hard Limit):          33.33%  2.00    GiB
        Max Size (High Water):          3:1     6.00    GiB

ARC Size Breakdown:
        Recently Used Cache Size:       93.72%  5.62    GiB
        Frequently Used Cache Size:     6.28%   385.83  MiB

ARC Hash Breakdown:
        Elements Max:                           152.17k
        Elements Current:               91.95%  139.92k
        Collisions:                             36.76k
        Chain Max:                              3
        Chains:                                 3.44k

------------------------------------------------------------------------

ARC Efficiency:                                 17.34m
        Cache Hit Ratio:                95.77%  16.61m
        Cache Miss Ratio:               4.23%   733.14k
        Actual Hit Ratio:               71.41%  12.38m

        Data Demand Efficiency:         99.89%  10.27m
        Data Prefetch Efficiency:       94.71%  4.35m

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             24.54%  4.08m
          Most Recently Used:           5.49%   911.55k
          Most Frequently Used:         69.07%  11.47m
          Most Recently Used Ghost:     0.38%   63.86k
          Most Frequently Used Ghost:   0.51%   84.86k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  61.76%  10.26m
          Prefetch Data:                24.81%  4.12m
          Demand Metadata:              12.79%  2.12m
          Prefetch Metadata:            0.65%   108.39k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  1.49%   10.96k
          Prefetch Data:                31.37%  230.00k
          Demand Metadata:              57.70%  423.03k
          Prefetch Metadata:            9.43%   69.16k

Strangely after increasing the max ARC size by 2GB (its a Virtual Machine so I also allocated it another 2GB physical RAM) it slowed down sooner than usual, within 24 hours, whereas normally it doesn't see the issue for at least 3/4 days - as mentioned I am moving data off it to another server so it was doing a large transfer but that would be reading not writing, my understanding is the over 80% problem is more to do with writing data?

The set up is I created the pool with raw access to the whole disks, it's a VM but I use PCI Passthrough to give the VM access to an LSI PCI card (non raid) and that gets direct access to the disks.

I have been reading about metaslabs and why write speed degrades after 80%+ I was seeing reference to using a metaslab debugging mode which stores the space maps in memory which will resolve the write issues - I can only find it related to Solaris - does it work on FreeBSD as well?

Also could that be why I'm getting hits of the ARC cache? Does the metaslab info getting stored in the ARC cache?

Thanks

Paul

ZFS ZFS Performance Issue

Paul Houselander

SirDice

Administrator

gkontos

sko

Paul Houselander