I'm putting together a small ZFS file server using 9.0-RC1, and I'm noticing strange sequential read performance behavior. When I put 2 drives in a pool, each on its own vdev, I actually get better read performance compared to 4 drives in a pool in the form of two mirror vdevs! This totally goes against the global opinion that striped mirrors provide really good performance, and I would expect the performance to be at least equal, so something must be horribly wrong here.
Here is my system configuration:
Motherboard: Supermicro X8SIL-F w/ Intel 3420 chipset + 6 AHCI SATA ports
CPU: Intel Core i5-650 (dual core)
Memory: 4GB DDR3-1333 ECC (more is on the way)
Hard drives: 2x Seagate ST2000DL003-9VT166 (Barracuda Green), 2x Samsung HD204UI (F4 EcoGreen). All 2TB, All Advanced format (4K sectors).
zpool version 28, ashift=12 by gnop trick
filesystem version 5
Checksum: fletcher4, compression: off
Drive detection:
We start out with a simple non-redundant pool with two drives in separate vdevs:
We create a 10 GB test file, which exceeds the potential ARC size generously:
Now we wait out the 30-second transaction flush period so the thing is completely quiet, and then we attempt to read it:
Repeating the command yields the same results.
Seeing that both drives do at least 130 MB/sec separately (the Seagate being a bit higher with 145 MB/sec), I think this is a very nice figure.
Now we make it a redundant pool, having the Seagates on one vdev, and the Samsungs in the other:
After resilvering this looks like this:
Again we wait a bit to be sure there are no pending write operations, and we try to read the file again:
So we see that a generous 100 MB/sec has just vanished into the big void of emptiness. This time,
CPU utilization is less stable in this test, fluctuating from 7% to 17%... also a bit strange.
As for write performance, both pool configurations have about 220 MB/sec write performance when writing a zero-filled file like this:
Reading this file instead of the random filled file, or putting different brands of drives together in one vdev do not yield a difference.
Lowering the vfs.zfs.vdev.min_pending and vfs.zfs.vdev.max_pending tunables (because we already have NCQ active) does not make any difference also.
So I am hitting a roadblock here. Does anyone have experience with such a performance difference and can give me advice on how to solve it or what tunables I would need to set? Keeping the pool non-redundant is obviously not an option here, and I want to pop in another mirror set later if I need more storage space.
I cannot see that the CPU is the bottleneck here, and if it was the RAM, I would think I would see at least the same throughput using mirrors...
Thanks for any help you can give me.
Here is my system configuration:
Motherboard: Supermicro X8SIL-F w/ Intel 3420 chipset + 6 AHCI SATA ports
CPU: Intel Core i5-650 (dual core)
Memory: 4GB DDR3-1333 ECC (more is on the way)
Hard drives: 2x Seagate ST2000DL003-9VT166 (Barracuda Green), 2x Samsung HD204UI (F4 EcoGreen). All 2TB, All Advanced format (4K sectors).
zpool version 28, ashift=12 by gnop trick
filesystem version 5
Checksum: fletcher4, compression: off
Drive detection:
Code:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <SAMSUNG HD204UI 1AQ10001> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST2000DL003-9VT166 CC3C> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <SAMSUNG HD204UI 1AQ10001> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad10
We start out with a simple non-redundant pool with two drives in separate vdevs:
$ zpool status
Code:
pool: zroot
state: ONLINE
scan: resilvered 11.9G in 0h1m with 0 errors on Sun Oct 23 23:21:19 2011
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada0 ONLINE 0 0 0
errors: No known data errors
We create a 10 GB test file, which exceeds the potential ARC size generously:
$ dd if=/dev/random of=random.bin bs=1m count=10000
Now we wait out the 30-second transaction flush period so the thing is completely quiet, and then we attempt to read it:
$ dd if=random.bin of=/dev/null bs=1m
Code:
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 41.524555 secs (252519503 bytes/sec)
Repeating the command yields the same results.
Seeing that both drives do at least 130 MB/sec separately (the Seagate being a bit higher with 145 MB/sec), I think this is a very nice figure.
$ zpool iostat -v 1
indeed shows that both vdevs handle half of the traffic. Great stuff. CPU utilization hovers around 11-12% (system).Now we make it a redundant pool, having the Seagates on one vdev, and the Samsungs in the other:
# zpool attach zroot ada0 ada2
# zpool attach zroot ada1 ada3
After resilvering this looks like this:
$ zpool status
Code:
pool: zroot
state: ONLINE
scan: resilvered 11.9G in 0h1m with 0 errors on Mon Oct 24 00:11:57 2011
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada2 ONLINE 0 0 0
errors: No known data errors
Again we wait a bit to be sure there are no pending write operations, and we try to read the file again:
$ dd if=random.bin of=/dev/null bs=1m
Code:
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 74.110371 secs (141488429 bytes/sec)
So we see that a generous 100 MB/sec has just vanished into the big void of emptiness. This time,
$ zpool iostat -v 1
reports that all four drives do a 35 MB/sec share each.CPU utilization is less stable in this test, fluctuating from 7% to 17%... also a bit strange.
As for write performance, both pool configurations have about 220 MB/sec write performance when writing a zero-filled file like this:
$ dd if=/dev/random of=random.bin bs=1m count=10000
Reading this file instead of the random filled file, or putting different brands of drives together in one vdev do not yield a difference.
Lowering the vfs.zfs.vdev.min_pending and vfs.zfs.vdev.max_pending tunables (because we already have NCQ active) does not make any difference also.
So I am hitting a roadblock here. Does anyone have experience with such a performance difference and can give me advice on how to solve it or what tunables I would need to set? Keeping the pool non-redundant is obviously not an option here, and I want to pop in another mirror set later if I need more storage space.
I cannot see that the CPU is the bottleneck here, and if it was the RAM, I would think I would see at least the same throughput using mirrors...
Thanks for any help you can give me.