ZFS performance

oliver@ · Sep 9, 2011

Hi,

running FreeBSD 9.0 BETA2 - without debugging enabled in the kernel. Rest is pretty much "standard". The system is a Atom 330 with 4 GB of RAM. Harddisk is a Samsung Spinpoint F3R (1TB, 7200rpm, 32MB cache);

Code:

ada0 at ahcich0 bus 0 scbus1 target 0 lun 0
ada0: <SAMSUNG HE103SJ 1AJ10001> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4

bonnie++ on an ufs partition:

Code:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nudel.salatschue 6G   104  99 121694  56 49653  43   249  98 124928  35 208.7   9
Latency             88762us     282ms     957ms     262ms   63558us    1664ms
Version  1.96       ------Sequential Create------ --------Random Create--------
nudel.salatschuesse -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1143  13 +++++ +++  2050  15  1723  18 +++++ +++  3048  22
Latency             18147us      61us    2022us   18177us      58us    1770us

bonnie++ on a zfs partition:

Code:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nudel.salatschue 6G    20  99 89584  94 29715  36    47  99 60783  26 173.5   9
Latency               582ms     522ms    1530ms     265ms   63363us     744ms
Version  1.96       ------Sequential Create------ --------Random Create--------
nudel.salatschuesse -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  5575  99 20731  99  4801  98  5575  97 21402  99  4859  96
Latency             34093us     257us   14732us   33685us     101us   26856us

ZFS creation:

Code:

nudel# gpart modify -i 3 -t freebsd-zfs ada1
ada1p3 modified
nudel# zpool create -m legacy test /dev/gpt/disk1
nudel# zfs set checksum=fletcher4 test
nudel# mount -t zfs test /mnt/tmp
[mkdir, chown, tmp directory]
%df -h .
Filesystem    Size    Used   Avail Capacity  Mounted on
test          905G     32k    905G     0%    /mnt/tmp

Is this the normal drawback from using ZFS?

vermaden · Sep 9, 2011

Here are mine results of UFS vs ZFS, but with blogbench.

ZFS:

Code:

# zpool create test da0
# zfs set dedup=on test
# zfs set checksum=fletcher4 test
# zfs set recordsize=4k test
# blogbench -d /test -i 10

UFS:

Code:

# newfs -t -U /dev/da0
# mkdir -p /test
# mount /dev/da0 /test
# blogbench -d /test -i 10

ZFS:

Code:

Frequency = 10 secs
Scratch dir = [/test]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 10 iterations.
The test will run during 1 minutes.

  Nb blogs   R articles    W articles    R pictures    W pictures    R comments    W comments
        14        51390           783         29022           738         22821          1047
        28       158451           564        100700           612         64668          3054
        42       114382           683         83162           865         59446          1990
        50        94539           395         69261           367         51648          1466
        58        52211           416         38725           339         30519           645
        68        43977           472         30867           425         24280           549
        73        23212           331         16849           281         13418           352
        77        61050           143         44083           156         31396           357
        80        30975           221         21320           127         13915           255
        83        23552            96         16465           208         10889           163

Final score for writes:            83
Final score for reads :         14594

UFS:

Code:

Frequency = 10 secs
Scratch dir = [/test]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 10 iterations.
The test will run during 1 minutes.

  Nb blogs   R articles    W articles    R pictures    W pictures    R comments    W comments
        22       213709          1237        118854          1132        116822          5571
        24       179760           259         99368            41        114570          2761
        24        29454             3         18759             2         19789           549
        24        12516            33          4828            12          7985           257
        24         9478            27          6124            11          5361           398
        24         2973             5          1077            12          1338            47
        24         1125             2           542             1          1325             0
        24         2189             0           691             0          1560           158
        25        12500             4          7167            15         11064           235
        33       444161           547        228482           433        312611          6522

Final score for writes:            33
Final score for reads :         18251

last1 · Sep 16, 2011

I was about to post a similar thread but I'll just bump yours since I have similar problems and I'm really wondering why ZFS is so slow; don't get me wrong, I still think it's great because of all the other features but still... much slower than UFS.

I run FreeBSD 9.0B2 on a single Xeon quad-core X3470 (2.93 GHZ) with 4 GB DDR3 RAM @1333 and a 3WARE 9650SE-16 port drive. I have 16 WD RE3 HD's in RAID 10 setup with write cache on. I configured one giant 8TB array divided in two LUN's: one for FreeBSD (50 GB) and everything else is for ZFS. Here are the results of dd and bonnie++ on /data (ZFS) and on /usr (UFS).

Code:

WRITING:

dd bs=1024 if=/dev/zero of=/usr/t1 count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes transferred in 5.886218 secs (182416249 bytes/sec)

182MB/sec on UFS with dd


dd bs=1024 if=/dev/zero of=/data/t1 count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes transferred in 15.704153 secs (68373114 bytes/sec)

68MB/sec with ZFS

READING:


dd bs=1024 if=/usr/t1 of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes transferred in 2.716340 secs (395289946 bytes/sec)

395MB/sec with UFS 

dd bs=1024 if=/data/t1 of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes transferred in 5.460304 secs (196645062 bytes/sec)

196MB/sec with ZFS


BONNIE RANDOM FILE CREATION ON ZFS:

bonnie++ -d /data -s 0 -n 50 -u 0
Using uid:0, gid:0.
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96       ------Sequential Create------ --------Random Create--------
store2.emailarray.c -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 50 15909  99 46189  97 15955  99 16409  99 51449  99 15774  99
Latency             39098us   11168us   15125us   22816us      41us   13735us
1.96,1.96,store2.emailarray.com,1,1316139689,,,,,,,,,,,,,,50,,,,,15909,99,46189,97,15955,99,16409,
99,51449,99,15774,99,,,,,,,39098us,11168us,15125us,22816us,41us,13735us

BONNIE RANDOM FILE CREATION ON UFS:

bonnie++ -d /usr -s 0 -n 50 -u 0
Using uid:0, gid:0.
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96       ------Sequential Create------ --------Random Create--------
store2.emailarray.c -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 50 26008  84 60909  99 44363  99 25809  83 65760  99 48224  99
Latency             79715us      43us    1442us   41689us     455us    1688us
1.96,1.96,store2.emailarray.com,1,1316139630,,,,,,,,,,,,,,50,,,,,26008,84,60909,99,44363,99,25809,
83,65760,99,48224,99,,,,,,,79715us,43us,1442us,41689us,455us,1688us

BONNIE SEQUENTIAL ON ZFS

bonnie++ -d /data -s 8088 -n 0 -u 0
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
store2.emaila 8088M    73  99 226228  65 71531  21   204  98 133720  13 352.3   8
Latency               123ms    2768ms    1245ms   72150us   55068us     258ms

1.96,1.96,store2.emailarray.com,1,1316140118,8088M,,73,99,226228,65,71531,21,204,98,133720,
13,352.3,8,,,,,,,,,,,,,,,,,,123ms,2768ms,1245ms,72150us,55068us,258ms,,,,,,


BONNIE SEQUENTIAL ON UFS:

bonnie++ -d /usr -s 8088 -n 0 -u 0
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
store2.emaila 8088M   277  99 264842  51 43845  57   567  99 561755  72 580.9  17
Latency             30843us     281ms    1381ms   42712us   98747us     203ms

UFS appears to be at least twice as fast as ZFS. I thought they would be at least comparable in speed. What gives? Am I missing something?

last1 · Sep 16, 2011

I am seeing similar results in 9.0B2 with a 16 drive RAID 10 array with write cache on. UFS is at least twice as fast as ZFS. With ZFS I'm barely getting 160MB/sec in writing tests while I get 300MB/sec with UFS.

What's going on ?

last1 · Sep 16, 2011

For example, testing with bonnie on UFS:

Code:

bonnie++ -d /usr -c 10 -s 0 -n 50 -u 0
Using uid:0, gid:0.
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96       ------Sequential Create------ --------Random Create--------
store2.emailarray.c -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 50 20582  73 59312  99 43256  96 27443  90 65872  99 46846  99
Latency               135ms     462us    1145us   37874us      65us    2334us
1.96,1.96,store2.emailarray.com,10,1316138022,,,,,,,,,,,,,,50,,,,,20582,73,59312,99,43256,96,27443,
90,65872,99,46846,99,,,,,,,135ms,462us,1145us,37874us,65us,2334us

and on ZFS:

Code:

bonnie++ -d /data -c 10 -s 0 -n 50 -u 0
Using uid:0, gid:0.
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96       ------Sequential Create------ --------Random Create--------
store2.emailarray.c -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 50 15389  98 46536 100 15456  98 15687  99 50641  99 15288  98
Latency             29134us     142us    6973us   40102us      55us   12531us
1.96,1.96,store2.emailarray.com,10,1316138032,,,,,,,,,,,,,,50,,,,,15389,98,46536,100,15456,98,15687,
99,50641,99,15288,98,,,,,,,29134us,142us,6973us,40102us,55us,12531us

Why is ZFS so much slower?

User23 · Sep 16, 2011

ZFS is not just only another filesystem. And there are faster filesystems out there.
But if you need the features of ZFS, it is the best you have ever worked with.

http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis

8x 2.2 GHz Opteron, 32GB RAM, 2x 80GB SSD cache, 2x 32GB SSD SLC ZIL mirror, 2x RAIDz2 with 5x 1TB disc (on 3ware 9690SA configured as single drives) just running perfect over here.

older benchmarks:
http://forums.freebsd.org/showpost.php?p=119932&postcount=13

---

PS.: ZFS on a single disk or array is suboptimal.

Sylhouette · Sep 17, 2011

like user23 said you must let zfs handle the disks, so configure your card to use JBOD and do not use the raid of the card.
things can go wrong this way.

http://opensolaris.org/jive/thread.jspa?threadID=141071

Try using mirrord disk

Code:

zpool create mailstore mirror /dev/disk1 /dev/disk2
zpool add mailstore mirror /dev/disk3 /dev/disk4
zpool add mailstore mirror /dev/disk5 /dev/disk6

and so on.

Or if you want to use raidz.

Code:

zpool create mailstore raidz /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4 /dev/disk5 spare /dev/disk16
zpool add mailstore raidz /dev/disk6 /dev/disk7 /dev/disk8 /dev/disk9 /dev/disk10 spare /dev/disk17
zpool add mailstore raidz /dev/disk11 /dev/disk12 /dev/disk13 /dev/disk14 /dev/disk15 spare /dev/disk18

Maybe you can find some info on the following site.
it has some benchmarks.
http://zfsguru.com

if you try to expiriment with the raid levels and the number of disk in one vdev, you maybe should get better results.

more than 8 disks in one vdev is not adviceable, because of resilver times if i remember correctly.

If i do the dd you do, it takes about 25 to 26 seconds.
A speed of 40 MB that is.

If i put files on the zfs mirror of two disks through samba , it does it with a speed of 60 to 70MB/sec.
So if i must believe dd it can not be.

regards,
Johan Hendriks

leosat · Sep 23, 2011

So, sirs, what are the conclusions? ZFS is 2 times slower than UFS?

Here is my benchmarking. Having test.bin file on UFS, HW Raid10 under Linux and ZFS 1+0 pool I get strange results: fio shows me better iops for ZFS, but when I just try dd it is 2 times slower than UFS or ext4:

Have set up a test server using FreeBSD 8.2 amd64.
Have 12x WD blacks 7200rpm. Disks work through the 3ware 9690 controller. 12Gigs of RAM, CPU is Intel(R) Xeon(R) CPU E5520 @ 2.27GHz.
The FreeBSD system is on 2 disks hw raid 0 (hmm, do not ask why

) and uses UFS2.
The ZFS pool consists of the rest 10 disks, configured as raid 1+0 (5 mirrors by 2 disks each).

Should perform well?

Here are some fio tests (see below for the strange thing).

fio test:
Options: -bs=512 -runtime=20 -iodepth 24 -filename test.bin -direct=1 -ioengine=sync

A: Linux over hw raid10 + ext4
B: FreeBSD 8.2 amd64 over hw raid0 + UFS2
C: FreeBSD 8.2 amd64 over ZFS (pool v.15) + vfs.zfs.zil_disable="1"

Code:

RAND RW:
 A: 290 iops
 B: 334 iops
 C: 408 iops

SEQ RW:
 A: 208 iops
 B: 2729 iops
 C: 44339 iops

SEQ READ:
 A: 12406
 B: 7604 iops
 C: 106530 iops

SEQ WRITE:
 A: 11183
 B: 4610
 C: 75815 iops

__________________________
BUT!

Code:

dd if=/dev/zero of=/data/test.bin count=1000000
      2-3 seconds with FreeBSD + UFS2 + HW raid0 or Linux + ext4 + raid10. 
      7 seconds with ZFS-based raid1+0 pool.

Wonder, what may be the problem?

Crivens · Sep 23, 2011

ZFS does IO in larger blocks, and by default dd uses 512 bytes IIRC. Try that with 256kb as blocksize and let's see how that turns out.

leosat · Sep 23, 2011

Crivens said:
ZFS does IO in larger blocks, and by default dd uses 512 bytes IIRC. Try that with 256kb as blocksize and let's see how that turns out.

Yep, dd uses 512, but there is no difference in tests:
dd 2000 times of 256k block gives ~ 2.6-3 seconds on ext4 and UFS with hardware described above, still ZFS is ~5.6 and more seconds.

Sylhouette · Sep 24, 2011

What if dd does not work fine with zfs?
I have read many times that dd is not a good measure tool for ZFS.

Did you try to setup a ftp server, and copy a large file to 1) a shared dir on the UFS filesystem, and 2) a shared dir on the ZFS filesystem.

Or through samba.

Tar a large maildir, and untar it onto the UFS filesystem,
then untar it onto the ZFS filesystem.

here you can find an example of how to measure time taken.
http://forums.freebsd.org/showthread.php?t=11693

Try to do a imap stresstest to a maildir on ZFS and UFS.

So use more real world loads to test.

regards,
Johan Hendriks

last1 · Sep 24, 2011

After extensive testing on 9.02B2 with ZFS I opted for UFS.

In every benchmark ( bonnie++ sequential/random, iozone, dd ) UFS came on top by a lot.

I tried ZFS not only with the ZIL disable but with the cache flush disabled as well. It wasn't even close in performance to UFS.

Maybe the problem was on my end since I was testing with a hardware raid card and didn't let ZFS handle the disks directly + lack of SSD ZIL. It could be, but I needed to use all my 16 bays for storage/iops whereas with ZFS I would have only had 14 disks + 2 for the ZIL.

Any case, I'm quite happy with UFS and might consider ZFS in the future if I can see some consistently faster benchmarks or I don't need a FS for max iops.

aragon · Sep 24, 2011

It would be helpful to devs and other users if everyone who is reporting slower-than-UFS ZFS could explain their hardware and zpool setup.

And is everyone building their pools with 4k sectors taken into consideration? I doubt there are many large, consumer SATA disks on the market now that still use real 512 byte sectors...

phoenix · Sep 26, 2011

Also, remember that UFS is just writing data out to the disk as it comes in, while ZFS is also:

computing checksum for each block being written
writing checksum data to disk next to data
computing parity information for each block bring written
writing parity blocks alongside the data blocks
writing data out to disk in transaction groups (not one block at a time)
possibly compressing the data as well
if this is "overwriting" data, then it's actually writing new data blocks and updating all the metadata to point to new blocks (it's Copy-on-Write, remember)
and probably a whole host of other things behind the scenes

You cannot directly compare the performance of UFS to ZFS. That's like trying to compare the performance of a generic sedan to the performance of a Mack truck. Sure, the sedan has a higher top-speed and way better acceleration, but the Mack truck can carry a hell of a lot more, for longer distances, has more torque, etc.

You need to figure out what's more important to you: absolute raw throughput, or data integrity? And what trade-offs are you willing to make between the two extremes?

Figure out your needs. Then pick the filesystem that best matches those needs. There's no such thing as "the one perfect filesystem for every need and every person". (Although ZFS is getting close.)

Crivens · Sep 27, 2011

leosat said:

BUT!

Code:

dd if=/dev/zero of=/data/test.bin count=1000000
      2-3 seconds with FreeBSD + UFS2 + HW raid0 or Linux + ext4 + raid10. 
      7 seconds with ZFS-based raid1+0 pool.

Wonder, what may be the problem?

This works out to about 500MB in 2-3 seconds, right?

As you wrote, the raid is on a HW card. ZFS writes more data due to the copy-on-write nature, but usually this goes out to different channels. What I smell here is a bus saturation, the raid card can deal with all disks seperately on their own peak, but ZFS is then limited by the bus connection. What is the peak performance of memory-to-card?

leosat · Sep 27, 2011

Ok, sirs, thanks for answers!

Please, give me a day,

I'm carrying another series of tests and it looks better now.
I've just added one more mirror to the striped pool, so that it now contains
6 mirrors of 2 WD caviar black 7200rpm disks each... Disks are plugged to 3ware 9690 controller with R and W caches enabled for each disk at the controller level.

And our present Linux-based storage has the same HW config, but usgin the 3ware 9690-handled raid 1+0. Linux storage uses ext4 over this HW raid 1+0.

I'm testing performance, since if ZFS in described config would be slower for some tests, there would be no point for us to switch to it, since in the first place we need more iops/speed. (sure, we could just get SAS 15k or add another disks box to get more iops... but first wanted to try ZFS)

leosat · Sep 27, 2011

HW config (the same for both servers):
CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
RAM: 12 Gb
HDD: 12 x WD Caviar Black 7200rpm, WD1002FAEX.
Raid Controller: 3ware 9690.

Soft configs:
A) FreeBSD 8.2-Release, on additional USB-attached HDD (No, it didn't impact ZFS performance), ZFS implementation of raid 1+0 (raid controller is only used to pass HDDs through + implements some R/W caching for each HDD) on 12 HDDs, using striped pool of 6 mirrors. (the fastest config possible in terms of IOPS). ZFS recordsize was set to 4k.

B) Linux OpenSuSE 11.3, hardware raid 1+0, ext4 (the fastest config possible in terms of IOPS).

-----------------------------------------------------
- I/O blocksize of 4k was used in benchmarking.
- Benchmarks were taken from the third server attached with 1Gb link to A and B.
- Benchmarking was through the NFS share to emulate the target environment.
- Benchmarking was to determine which config, A or B, is faster (only relative results mattered, so, I do not post here any numbers).

Results summary:

Good for ZFS: SEQ RW, RANDOM WRITES, SEQ READS and SEQ WRITES were almost the same for A an B
with A slightly faster than B.

Bad for ZFS: RANDOM READS were constantly 2.2 times slower for A than for B. Adding USB-based ARC, adding ramdisk-based ARC (mdconfig -a -t swap -s 5g) didn't help. Turning off ARC or ZFS prefetch didn't help. It was a stable result: A was 2 times slower than B for random reads.

Sylhouette · Sep 27, 2011

Isn't that funny.

In all the dd tests, it shows bad performance in writing compared to UFS.
Compared to Linux the writes are at the same level, but the reads ar lagging behind.

Like phoenix said, try ZFS and if the performance are acceptable for your config, stay with it, if not use UFS or something else.

regards,
Johan

aragon · Sep 27, 2011

leostat, why are you using a hardware raid controller? You really should try rebuild your pool with directly attached disks (aka JBOD) and gnop(8).

[thread=21644]ZFS using 'advanced format drives' with FreeBSD[/thread]

Sebulon · Sep 27, 2011

leosat said:
HW config (the same for both servers):
CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
RAM: 12 Gb
HDD: 12 x WD Caviar Black 7200rpm, WD1002FAEX.
Raid Controller: 3ware 9690.

Soft configs:
A) FreeBSD 8.2-Release, on additional USB-attached HDD (No, it didn't impact ZFS performance), ZFS implementation of raid 1+0 (raid controller is only used to pass HDDs through + implements some R/W caching for each HDD) on 12 HDDs, using striped pool of 6 mirrors. (the fastest config possible in terms of IOPS). ZFS recordsize was set to 4k.

B) Linux OpenSuSE 11.3, hardware raid 1+0, ext4 (the fastest config possible in terms of IOPS).

-----------------------------------------------------
- I/O blocksize of 4k was used in benchmarking.
- Benchmarks were taken from the third server attached with 1Gb link to A and B.
- Benchmarking was through the NFS share to emulate the target environment.
- Benchmarking was to determine which config, A or B, is faster (only relative results mattered, so, I do not post here any numbers).

Results summary:

Good for ZFS: SEQ RW, RANDOM WRITES, SEQ READS and SEQ WRITES were almost the same for A an B
with A slightly faster than B.

Bad for ZFS: RANDOM READS were constantly 2.2 times slower for A than for B. Adding USB-based ARC, adding ramdisk-based ARC (mdconfig -a -t swap -s 5g) didn't help. Turning off ARC or ZFS prefetch didn't help. It was a stable result: A was 2 times slower than B for random reads.

Very interesting indeed. Awesome job man, thank you!

Couple of questions though... You state here that ZFS record size was set down to 4k. Why?
Also, did you make sure to use 4k blocksize in ext4, or is that default or something?

LOL, I was too qurious so I went ahead and looked that up myself

OK, so 4k is the default in ext4 but is there a perticular reason youÂ´d really want that? I know the default block size in ZFS is 128K, basically just because thatÂ´s what hard drives likes to write the most (or as big as possible really), so unless yourÂ´e gonna be running a database that happens to have a default write size of exactly 4k, why make it harder on your hard drives than it has to be?

/Sebulon

Sebulon · Sep 27, 2011

aragon said:
leostat, why are you using a hardware raid controller? You really should try rebuild your pool with directly attached disks (aka JBOD) and gnop(8).

[thread=21644]ZFS using 'advanced format drives' with FreeBSD[/thread]

aragon, I agree with you about JBOD, but why should he gnop them? WD1002FAEX is not "Advanced Format":
http://pcper.com/reviews/Storage/Western-Digital-SATA-6Gbsec-1TB-Caviar-Black-WD1002FAEX-Review?aid=870

For those curious, this new drive does not incorporate Advanced Format.

Or am I missing something?

/Sebulon

leosat · Sep 28, 2011

Regarding JBOD:

3Ware user guide: "JBOD configuration is no longer supported in the 3ware 9000 series. AMCC recommends that you use Single Disk as a replacement for JBOD, to take advantage of advanced features such as caching, OCE, and RLM."

And "Single disk" is what I used, can not imagine how could it impact random reads performance even when all the other metrics were OK.

If you have a storage box with 12 disks, you are to use some controller to plug them in, this is the only thing I currently have, and I see no point not to use it. There's no any HW-managed raid below the ZFS -- it works with 12 "single disks"

Regarding gnop:

gnop is cool. But why should I use it in this case?... I didin't want to test ZFS's reaction on dissapeared disk, e.g., this time. When I want to, sure, I'll use it.

Regarding recorsize of 128k
Yes, I've tryed different recordsizes - from 4k to 128k --- the rsult was the same in all cases and random reads are far below.

When random reads got equal to the described above Linux storage:
Only when I used 256k and more blocksize on the fio benchmark utility, only then
the throughput of storage (A) was equal to that of storage (B). But with lower I/O blocksizes for the testing application, random reads were lower for (A) than for (B).

leosat · Oct 14, 2011

Here I attach
iozone-data-based 3d graphs and data covering

Random R/W performance with and without O_DIRECT for

- Linux ext4 over HW raid0, HW raid1+0,
- FreeBSD ZFS-based 1+0 pool and
- FreeBSD UFS over raid 1+0

for currently accessible HW configurations.

Sorry for xls format, hope you can view it.

---
I used ram-based disks for ARK and ZIL for ZFS due to lack of SSD drives, anyway,
for random I/O ZFS 1+0 was slower than ext4 over HW raid1+0.
UFS over HW raid1+0 happend to be slower than both ext4 and ZFS.
All on the same hardware, described in posts above.
All the detailes - in xls file.

ZFS performance

oliver@

vermaden

last1

last1

last1

User23

Sylhouette

leosat

Crivens

Administrator

leosat

Sylhouette

last1

aragon

phoenix

Crivens

Administrator

leosat

leosat

Sylhouette

aragon

Sebulon

Sebulon

leosat

leosat

Attachments