ZFS Performance Troubleshooting

Mraoul · Mar 2, 2013

So, I've looked around for a few days (actually months, but I never got around to really sitting down and troubleshooting) and haven't been able to find anything that applies to my setup/configuration so thought I'd seek any advice any folks have in regard to ZFS read performance. My contiguous read performance is pretty bad, hovering at around 50MB/s for any non-cached read. Cached reads, as you might guess, are extremely fast. But the usage of this storage system for me is to copy large files back and forth. I've tried troubleshooting around but haven't found the solution. Writes are faster than reads hovering in the 100+ MB/s range (and since I'm Gigabit ethernet limited is more than good for me).

My pool is a v28 raidz2 of 5 drives, each 2TB. In case it might make a difference, four of the drives are SAMSUNG HD204UI 1AQ10001, the last one is SAMSUNG HD203WI 1AN10002.

I've used iozone as my benchmarking utility and ran three different types of tests (varying the number of threads and the size of the file). The results for what I see on a lot of forums (32 threads, 40960 file size) performs well, but since it didn't fit my usage of the drives, I also did 5 threads, 1G files and 1 thread, 10G file.

Here's the three sets of test:
http://pastebin.com/wpfDrLN9
http://pastebin.com/0jFSxB4N
http://pastebin.com/QJhW6kz5

One thing I'll note is on the third test (1Thread, 10G file) gstat never showed the %busy of any of the drives in the zpool going past 30% on any read operations (write operations put the %busy's to 100% [in bursts, as expected])

Part of my dmesg:

Code:

FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012
    [email]root@amd64-builder.daemonology.net[/email]:/usr/obj/usr/src/sys/GENERIC amd64
CPU: Intel(R) Pentium(R) Dual  CPU  E2160  @ 1.80GHz (1804.13-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x6fd  Family = 6  Model = f  Stepping = 13
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,
  DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0xe39d<SSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 9395240960 (8960 MB)
avail memory = 8235347968 (7853 MB)

Relevant line in /etc/sysctl.conf:

Code:

kern.maxvnodes=250000

Relevant lines in /boot/loader.conf:

Code:

vfs.zfs.prefetch_disable="1"
vfs.zfs.txg.timeout="5"
vm.kmem_size=9G

Any help would be greatly appreciated.

kpa · Mar 3, 2013

Code:

vm.kmem_size=9G

Please don't touch this setting unless you really understand what it does. I bet your poor performance is partly caused by this setting that should be left to default.

Mraoul · Mar 3, 2013

Commented it out, rebooted, ran the third test (1 thread, 10G) -- doesn't solve the problem

t1066 · Mar 4, 2013

First, when running the tests, you should run the following command and post the result.

$ zpool iostat -v 1 10

Second, are you sure you want to disable prefetch?

Mraoul · Mar 4, 2013

Are you interested in the operations per second or the bandwidth? Not entirely sure what the first 10 seconds of that will reveal ... since the iozone tests take a while. But I'll grab a few anyway (from test 3 -- 1 thread, 10g file) to show some indication of what I've been seeing.

Writing Test:

Code:

pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T      0  1.63K      0   203M
  raidz2    7.63T  1.43T      0  1.63K      0   203M
    ada3        -      -      0    743      0  78.5M
    ada4        -      -      0    753      0  79.6M
    ada5        -      -      0    741      0  77.6M
    ada6        -      -      0    717      0  75.0M
    ada7        -      -      0    655      0  68.6M
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T      0  2.42K      0   306M
  raidz2    7.63T  1.43T      0  2.42K      0   306M
    ada3        -      -      0  1.16K      0   125M
    ada4        -      -      0  1.17K      0   126M
    ada5        -      -      0  1.11K      0   120M
    ada6        -      -      0  1.10K      0   118M
    ada7        -      -      0    980      0   103M
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T      0    734      0  91.9M
  raidz2    7.63T  1.43T      0    734      0  91.9M
    ada3        -      -      0    400      0  42.2M
    ada4        -      -      0    399      0  42.2M
    ada5        -      -      0    379      0  40.0M
    ada6        -      -      0    365      0  38.5M
    ada7        -      -      0    294      0  31.0M
----------  -----  -----  -----  -----  -----  -----

Reading Test:

Code:

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T    408      0  51.1M      0
  raidz2    7.63T  1.43T    408      0  51.1M      0
    ada3        -      -    245      0  10.2M      0
    ada4        -      -    243      0  10.2M      0
    ada5        -      -    242      0  10.1M      0
    ada6        -      -    246      0  10.3M      0
    ada7        -      -    248      0  10.4M      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T    385      0  48.2M      0
  raidz2    7.63T  1.43T    385      0  48.2M      0
    ada3        -      -    231      0  9.65M      0
    ada4        -      -    229      0  9.56M      0
    ada5        -      -    229      0  9.58M      0
    ada6        -      -    230      0  9.62M      0
    ada7        -      -    234      0  9.78M      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T    392      0  49.1M      0
  raidz2    7.63T  1.43T    392      0  49.1M      0
    ada3        -      -    236      0  9.86M      0
    ada4        -      -    231      0  9.65M      0
    ada5        -      -    233      0  9.74M      0
    ada6        -      -    236      0  9.87M      0
    ada7        -      -    238      0  9.95M      0
----------  -----  -----  -----  -----  -----  -----

Reverse-reader test:

Code:

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T    116      0  14.6M      0
  raidz2    7.63T  1.43T    116      0  14.6M      0
    ada3        -      -     68      0  2.86M      0
    ada4        -      -     64      0  2.71M      0
    ada5        -      -     69      0  2.92M      0
    ada6        -      -     73      0  3.09M      0
    ada7        -      -     72      0  3.03M      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T    107      0  13.5M      0
  raidz2    7.63T  1.43T    107      0  13.5M      0
    ada3        -      -     63      0  2.66M      0
    ada4        -      -     63      0  2.67M      0
    ada5        -      -     64      0  2.70M      0
    ada6        -      -     65      0  2.75M      0
    ada7        -      -     64      0  2.70M      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.63T  1.43T    109      0  13.7M      0
  raidz2    7.63T  1.43T    109      0  13.7M      0
    ada3        -      -     71      0  2.99M      0
    ada4        -      -     70      0  2.95M      0
    ada5        -      -     59      0  2.49M      0
    ada6        -      -     62      0  2.61M      0
    ada7        -      -     61      0  2.61M      0
----------  -----  -----  -----  -----  -----  -----

As I mentioned, I was running gstat during the tests and when looking at the performance, reads were never even close to stressing the drives.

Don't think prefetch matters very much in my situation since it's a home NAS and I'm the only user. And generally operations on this storage device are relatively large and contiguous, so the benefit of prefetch is largely diminished, if not eliminated. Also, I'm pretty sure I disabled it in pursuit of troubleshooting this issue ...

t1066 · Mar 4, 2013

From what I understand, prefetch should be beneficial in your case. It is only when most of the reads are random, like in a database, that prefetch will not help.

Now I see that your pool is over 80% full. And from the description of your usage, your pool maybe highly fragmented. But this is just my guess. Maybe you should consult the fs mailing list instead.

Mraoul · Mar 4, 2013

I'll re-enable prefetch when I get a chance and re-test. I'll also run some dtrace scripts to watch the prefetching, hopefully should give me some insight.

A good portion of the data is from another pool that I migrated from. I 'upgraded' by just creating a new pool of disks and copying everything over, which should defragment any of those files. And the majority of data that's copied to this storage is not deleted, so it's doubtful that fragmentation is an issue. And from what I understand, defragmentation hasn't been implemented in ZFS so not sure I can fix anything there. I'll see if I can move/delete some data so I have something around 70% free and see if the problem persists.

It very well could be an fs problem. I'm going to try and exhaust any avenues related to the OS level stuff before I delve into the fs though.

kpa · Mar 4, 2013

Pool full problems are solved by adding more vdevs or recreating the pool from scratch using bigger disks.

Mraoul · Mar 4, 2013

The pool's not full and I cleared up space anyway, but the problem persists.

The suggestion is appreciated, but I'm well aware as to how to increase the size of a pool/vdev -- and adding more drives is not a solution in my case. Btw, you forgot about replacing individual drives in the vdev with larger drives (scrubbing after each one), exporting then re-importing, which is how I upgraded the size of one of my pools 4 years ago (this pool is a different one).

I should probably add/mention that I did a scrub recently and the scrub was running at ~250MB/s. Which, I'm assuming, is what the maximum theoretical read speed should look like. So what does a scrub do differently when reading data? It should, by definition, read every block of data, and compute checksums, so I'd think it'd be somewhat 'slow'.

_martin · Mar 4, 2013

Check this ZFS best practice guide and also this mailing list conversation. Due to the I/O size there was a discussion how many disks should be in vdev of given type.

You also didn't mention how are those disks connected to the system (motherboard type, HBA type, etc.) and when the performance issue started to occur (from the beginning, during last the update, etc.).

t1066 · Mar 5, 2013

Could you test the speed of your system with the following setting?

# zfs set primarycache=none [i]yourpool[/i]

Also, is your system a pure ZFS system or UFS+ZFS hybrid and how much swap is in used normally?

Mraoul · Mar 5, 2013

So this is one of those moments I need to kick myself repeatedly for not going back to the basics. I commented out all of the related lines in /boot/loader.conf and rebooted to start from scratch. Low and behold, read speed jumped up to ~300MB/s in iozone (the third test).

@matoalantis is right, I probably should have given more background on the issue. Since this is 'solved', I won't delve too deep, but as a little background, this machine has been upgraded over the years. I believe I started using ZFS on this system staring with 7.X (I think 7.2 but it's been quite a while) and the speed problem occurred sometime when upgrading to 8.X (I initially noticed it when using 8.2, I'm now on 9 ). I don't often check the speed of the system, since it's a low priority machine, but it could have occurred at anytime between those events. If I had to guess, I added those lines way back when and it 'fixed' whatever problem I had at the time. But I'm guessing the code has improved considerably so it had the opposite effect after all of those upgrades ...

Thanks to everyone who's helped!

phoenix · Mar 5, 2013

kpa said:
Pool full problems are solved by adding more vdevs or recreating the pool from scratch using bigger disks.

You don't need to rebuild it from scratch.

First, enable the "autoexpand" property on the pool.

Then, either "zpool replace" (raidz) or "zpool attach" (mirror) each disk in a vdev, waiting for the resilver to complete between each invocation. Once the last disk in the vdev has been replaced, the pool will automatically expand to use the extra free space.