NFS write performance with mirrored ZIL

Hi all!

Current Highscore
Code:
[B]Local writes 4k[/B]
5.4k rpm     160GB = 32MB/s
OCZ Vertex 2 60GB  = 33MB/s
Intel 320    120GB = 52MB/s
Zeus IOPS    16GB  = 55MB/s
OCZ Vertex 2 120GB = 56MB/s
Intel S3700  200GB = 58MB/s
HP 10k SAS   146GB = 59MB/s
SAVVIO 15k.2 146GB = 60MB/s
OCZ Deneva 2 200GB = 60MB/s
OCZ Vertex 3 240GB = 61MB/s
Intel X25-E  32GB  = 72MB/s


[B]Local writes 128k[/B]
OCZ Vertex 2 60GB  = 51MB/s
OCZ Vertex 2 120GB = 61MB/s
HP 10k SAS   146GB = 101MB/s
Intel 320    120GB = 128MB/s
Zeus IOPS    16GB  = 133MB/s
SAVVIO 15k.2 146GB = 165MB/s
Intel X25-E  32GB  = 197MB/s
OCZ Vertex 3 240GB = 271MB/s
OCZ Deneva 2 200GB = 284MB/s
Intel S3700  200GB = 295MB/s

Code:
[B][U]NFS Mirrored ZIL[/U][/B]

[B]Ordinary HW[/B]
Intel 320    40GB  = 30MB/s
OCZ Vertex 2 60GB  = 32MB/s
OCZ Vertex 2 120GB = 36MB/s
Intel 320    120GB = 52MB/s
Zeus IOPS    16GB  = 55MB/s
Intel X25-E  32GB  = 60MB/s
Intel S3700  200GB = 65MB/s
OCZ Deneva 2 200GB = 67MB/s
OCZ Vertex 3 240GB = 70MB/s


[B]HP DL380 G5[/B] (default controller settings)
Controller write cache = on
Drive write cache = off

OCZ Vertex 2 120GB = 49MB/s
Intel 320    120GB = 52MB/s
SAVVIO 15k.2 146GB = 56MB/s
HP 10k SAS   146GB = 58MB/s
Intel X25-E  32GB  = 67MB/s

I´ve been experimenting with a high performance NAS based on FreeBSD and ZFS. I have some questions about NFS write performance together with SSD ZIL-accelerators. First specs:

Hardware
Supermicro X7SBE
Intel Core2 Duo 2,13GHz
8GB 667MHz RAM
3x Lycom SATA II PCI-X controllers
2x Supermicro CSE-M35T-1

Code:
# camcontrol devlist
<WDC WD30EZRS-00J99B0 80.00A80>    at scbus0 target 0 lun 0 (ada0,pass0)
<SAMSUNG HD103SJ 1AJ10001>         at scbus1 target 0 lun 0 (ada1,pass1)
<SAMSUNG HD103SJ 1AJ10001>         at scbus2 target 0 lun 0 (ada2,pass2)
<SAMSUNG HD103SJ 1AJ10001>         at scbus5 target 0 lun 0 (ada3,pass3)
<SAMSUNG HD103SJ 1AJ10001>         at scbus6 target 0 lun 0 (ada4,pass4)
<SAMSUNG HD103SJ 1AJ10001>         at scbus7 target 0 lun 0 (ada5,pass5)
<SAMSUNG HD103SJ 1AJ10001>         at scbus8 target 0 lun 0 (ada6,pass6)
<SAMSUNG HD103SJ 1AJ10001>         at scbus9 target 0 lun 0 (ada7,pass7)
<SAMSUNG HD103SJ 1AJ10001>         at scbus10 target 0 lun 0 (ada8,pass8)
<OCZ-VERTEX2 1.29>                 at scbus12 target 0 lun 0 (ada9,pass9)
<OCZ-VERTEX2 1.29>                 at scbus13 target 0 lun 0 (ada10,pass10)

Code:
# zpool status
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

	NAME                STATE     READ WRITE CKSUM
	pool1               ONLINE       0     0     0
	  raidz2            ONLINE       0     0     0
	    label/rack-1:2  ONLINE       0     0     0
	    label/rack-1:3  ONLINE       0     0     0
	    label/rack-1:4  ONLINE       0     0     0
	    label/rack-1:5  ONLINE       0     0     0
	    label/rack-2:1  ONLINE       0     0     0
	    label/rack-2:2  ONLINE       0     0     0
	    label/rack-2:3  ONLINE       0     0     0
	    label/rack-2:4  ONLINE       0     0     0
	logs
	  mirror            ONLINE       0     0     0
	    gpt/ssd-1:1     ONLINE       0     0     0
	    gpt/ssd-2:1     ONLINE       0     0     0

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: none requested
config:

	NAME              STATE     READ WRITE CKSUM
	pool2             ONLINE       0     0     0
	  label/rack-1:1  ONLINE       0     0     0

errors: No known data errors

I´ve partitioned the ssd´s to be used as both ZIL and L2ARC devices with mirroring, but only using the ZIL-partitions for the moment. I noticed a rather big performance hit while using them as both at the same time- about 30% drop in fact.
Code:
# gpart show ada9
=>       34  117231341  ada9  GPT  (55G)
         34         30        - free -  (15k)
         64   33554432     1  freebsd-zfs  (16G)
   33554496   83676879     2  freebsd-zfs  (39G)

# gpart show ada10
=>       34  117231341  ada10  GPT  (55G)
         34         30         - free -  (15k)
         64   33554432      1  freebsd-zfs  (16G)
   33554496   83676879      2  freebsd-zfs  (39G)

Now for local performance:
Code:
# dd if=/dev/random of=/tmp/test16GB.bin bs=1m count=16384
# dd if=/tmp/test16GB.bin of=/dev/zero bs=4096 seek=$RANDOM
17179869184 bytes transferred in 50.711092 secs (338779318 bytes/sec)
# dd if=/tmp/test16GB.bin of=/dev/gpt/ssd-1\:2 bs=4k seek=$RANDOM
17179869184 bytes transferred in 381.576081 secs (45023444 bytes/sec)
# dd if=/dev/zero of=/dev/gpt/ssd-1\:2 bs=4k seek=$RANDOM
42738441728 bytes transferred in 623.003697 secs (68600623 bytes/sec)

In comparison to:
# newfs -b 32768 /dev/gpt/ssd-1\:2
# mount /dev/gpt/ssd-1\:2 /mnt/ssd/
# dd if=/tmp/test16GB.bin of=/mnt/ssd/test16GB.bin bs=1m
17179869184 bytes transferred in 348.755907 secs (49260439 bytes/sec)

And also:
# dd if=/dev/zero of=/dev/gpt/ssd-1\:2 bs=1m
42842562048 bytes transferred in 187.442289 secs (228564015 bytes/sec)

So just writing zeros are OK 228MB/s, but when trying to write some random data, like in every day use, no matter of how, when and where tops at about 45-50MB/s. Wierd.

Accordingly, I have that performance through NFS- about 30MB/s write speed.

Am I missing something? The data sheet for SSDs boasts with 50,000 4k random write IOPS, and I had thought that I would at least get 100MB/s NFS write?

/Sebulon
 
Sebulon said:
Code:
# newfs -b 32768 /dev/gpt/ssd-1\:2
# mount /dev/gpt/ssd-1\:2 /mnt/ssd/
# dd if=/tmp/test16GB.bin of=/mnt/ssd/test16GB.bin bs=1m
17179869184 bytes transferred in 348.755907 secs (49260439 bytes/sec)

And also:
# dd if=/dev/zero of=/dev/gpt/ssd-1\:2 bs=1m
42842562048 bytes transferred in 187.442289 secs (228564015 bytes/sec)

Hi,

In the first case above you are copying from /tmp which is going to be limited by the read speed of whatever disk your /tmp file system is on. Doesn't that explain you being limited to 50MB/sec? Try dd'ing to /dev/null from /tmp and see if you get the same limit...

cheers Andy.
 
Hi,

thank you so much for your inputs on this. I´m reading and reading and reading but I can´t seems to make any sense of this. After reading tons of performance tests/charts/whitepapers on the SSDs, comparing them to, for example ZeusIOPS SSDs- that Sun/Oracle themselves uses in their Unified Storage Systems- I think it looks like OCZs Vertex 2 actually have better specs. But we have Oracle storage systems at work and they can easily shuffle 100MB/s NFS write over 1Gbps, while I only get around 30MB/s out of my system.

@Andy
Quoting myself:
Code:
# dd if=/tmp/test16GB.bin of=/dev/zero bs=4096 seek=$RANDOM
17179869184 bytes transferred in 50.711092 secs (338779318 bytes/sec
/tmp is on my primary pool and scored 338MB/s to /dev/zero, so no, that shouldn´t be a bottleneck.

@SirDice
Before I started with this, I used mdmfs to slice off 3GBs of RAM and adding that as one alone ZIL device and got around 80-90MB/s NFS write, instead of the 30MB/s I have now. Then I had to crash my primary pool and restore from the secondary to get rid of it again, but I can definately say that it does matter having good SLOG device/devices.

----------------------------

Looking at performance tests on this drive from anandtech it says that the drives are supposed to be able to write 51.6MB/s random 4k unaligned, and a whopping 164MB/s random 4k aligned. Following that tidbit, it would seem as if my writings aren´t really 4k aligned, but I think they are. I made like this when I added the partitions to the pool:
Code:
# gpart create -s gpt ada9
# gpart add -l ssd-1:1 -t freebsd-zfs -b 64 -s 16G ada9
# gnop create -S 4096 /dev/gpt/ssd-1:1
and then added the gnop device to the pool:
Code:
# zpool add pool1 log mirror gpt/ssd-{1:1.nop,2:1.nop}
I then exported the pool, destroyed the nop devices and reimported the pool again, worked like a charm. I then confirmed this by running zdb and seen the ashift value=12 on the mirrored log vdev.
By doing all this, I believe that 1: the partitions are aligned to 4k and 2: they are recognized and used as 4k devices by ZFS.

What am I missing here guys and girls?

I´m really hoping someone is going to spot out some obvious flaw that I´ve been to close to spot out myself.
If I/(we?) can figure out how to squeeze out the 30 000 random 4k IOPS that the specs says, you´ll later find me skimping across white little clouds up in the blue, and will be more than happy to share my experience=)

/Sebulon
 
You aren't per chance running with power saving options enabled? Maybe post us output of:

$ sysctl dev.cpu
 
@aragon:
Code:
# sysctl dev.cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 2133
dev.cpu.0.freq_levels: 2133/35000 1866/30625 1600/16000 1400/14000 1200/12000 1000/10000 800/8000 600/6000 400/4000 200/2000
dev.cpu.0.cx_supported: C1/0 C2/85
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% 0.00% last 274us
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.CPU1
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.cx_supported: C1/0 C2/85
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% 0.00% last 246us
Good? Bad?

@AndyUKG:
Pegged at 100%
Which leads me to believe that they really can´t handle more than that. Plus, I´ve found the REALLY fine print from OCZ´s webpage here, that I think are the drives´s small sequential write performance. Now, it doesn´t say how big the block size is anywhere there, but the numbers are the same as what I´m getting when using them as ZIL devices, so...

-------------------------------

Now for the conclusion:

For NFS write on my 8-drive 7.2k rpm SATA II, raidz2 pool

~30MB/s - mirrored ZIL
~40MB/s - no ZIL at all
~50MB/s - striped ZIL ashift=9
~60MB/s - striped ZIL ashift=12

-------------------------------

More interesting is to read on in the fine-print from OCZ:
OCZSSD2-2VTXE60G 35MB/s (that´s mine)
OCZSSD2-2VTX100G 75MB/s
OCZSSD2-2VTXE120G 80MB/s
Double the size- double the performance; seems logical. So if I could get my hands on either two 100- or 120GB drives, I would be able to attain about the same performance as I have now, but with mirrored ZIL instead- which is a must right now, at least in FreeBSD. Very interesting indeed.

I´m rather satisfied with this, because I now know exactly what to expect from SSDs as ZIL accelerators in 2011. I mean, if we look back just 2 or 3 years, the SSDs were terrible. If we give it 2 or 3 years more, perhaps they are all the way there- with 100MB/s small sequential write. But, my fear is that the manufacturers just don´t care about that. They have produced drives that push the boundaries even for SATA 6Gb/s now, at least at larger block sizes. So as far as the masses are concerned, they are flawless.

Oracle uses ZeusIOPS SSD drives, NetApp sacrifices one DIMM-slot, giving the system 5GBs of RAM, keeps the last 1GB battery-backed and uses that as a ZIL. There´s also stuff like the DDRdrive X1, OCZ has their Z-drive series, also Velo- and RevoDrive series. All of these are just too expensive for a normal person- especially if you also need not just one, but two=)

Is it really that hard getting 100MB/s NFS write with standard consumer-grade products? Impossible even? I find that odd. If any one out there has achieved this, please speak up.

/Sebulon
 
@Andy
Pfft! Rubbish
"4k file writes up to 60,000 IOPS"
The leading part of it beeing up to. But they´ll make you pay two times the price for it=)

There´s also Vertex 3 MAX IOPS, with "up to" 75,000 IOPS. But that´s random writes. What you want for a ZIL drive is sequential 4k writes.

My two SSD´s may not make it all the way up to 100MB/s over NFS but still, I think it´s really cool that 2 SSD´s are better than 8 regular drives raidz2.

The results would have been different with 4x mirrored vdev´s. Regardless, I scored 60MB/s with two 60GB striped logs. With two striped 120GB, I could potentially score x2 that = 120MB/s- wire speed. That also means that as long as you have two SSD´s, or four just to have them mirrored, you can build the pool any way you´d like; mirrored vdevs, raidz-2-3, it doesn´t matter- you´d be able to have 100MB/s NFS writes.

I´m gonna have to buy me two of them 120GB´s just to see this with my own eyes.

PS. I just realized, there´s gotta be a breakpoint. I scored 40MB/s with 8 drives. Which means that x3 the drives(24), you could potentially score 120MB/s without any SLOG at all. But if you´re gonna build a system with less drives than that, you can definitely benefit from having two or more SSD´s to even out the load.

/Sebulon
 
@AndyUKG
Yes "up to" 500MB/s
But, that´s with 128k block size. We are after 4k block size.

Reading even more on this subject, I´ve found out that Intel´s 320 120GB is supposed to handle 130MB/s sequential 4k writes!!!
If that´s true, it means that you can have two of those mirrored and still push 100MB/s write over NFS!!!
That means that they are better and also cheaper than the 120GB OCZ drives. God damn it, I just ordered two OCZ´s=(

But I will take this opportunity to first test the 120GB OCZ drives- then send them back and order two intel 120GB instead=)

I will keep you all posted.

/Sebulon
 
Sebulon said:
@AndyUKG
Yes "up to" 500MB/s
But, that´s with 128k block size. We are after 4k block size.

Even with 4k writes, at 60k IO/sec that gives you 240MB/sec. Also with regard random vs sequential writes, I think even on SSD random is slower than sequential, so where you read random you should be able to assume equal or greater sequential performance.

Andy.
 
@Andy:
No, they don´t handle that much. You don´t get 60k IOPS at 4k sequential write.
If you manage to find a drive that actually handles 240MB/s 4k sequential write, let me know, I´ll be the first to buy=)
The ones who claims to be best so far at 4k sequential write are:
OCZ Vertex 2 120GB - 80MB/s
Intel 320 120GB - 130MB/s

So far, i have tested, and scored NFS write with mirrored log:
OCZ Vertex 2 60GB - 30MB/s
Intel 320 40GB - 30MB/s

This far, the manufacturers numbers for sequential 4k writes seems to match what I´ve been able to score on NFS write, so maybe in this case "size does matter", I will score higher NFS write the bigger SSD drives I use, hopefully.

But remember, that higher throughput remains to be proven! I haven´t seen that multiplied performance IRL yet, but I promise, you will be the first to know=)

Exiting times

/Sebulon
 
@Andy:
You are correct. That adds up! They were testing the 240GB model from your link.

Sources:
From anandtech on Vertex 3
And same on Vertex 2

Vertex 3 240GB scored 49152 IOPS 6Gb/s with random data
Vertex 3 240GB scored 39168 IOPS 3Gb/s with random data

Vertex 3 120GB scored 41472 IOPS 6Gb/s with random data
Vertex 3 120GB scored 38912 IOPS 3Gb/s with random data

Vertex 2 100GB scored 41984 IOPS 3Gb/s with random data

Then, if 4k random writes are exactly the same (or even better) as 4k sequential writes, then my drive should be able to push 84MB/s. From your link the 120GB model scored 43411 IOPS, or 169MB/s. The one I used was 60GB. Half the size, half the throughput - 169/2=84. 84MB/s

But it didn´t! Curiously enough, I´m seeing about half of that:
Code:
# dd if=/tmp/test16GB.bin of=/dev/gpt/ssd-1\:2 bs=4k seek=$RANDOM
17179869184 bytes transferred in 381.576081 secs (45023444 bytes/sec)

Explain that! Because i certainly can´t=)
The closest I can think of is that it´s twice as hard doing sequential writes instead of random writes, and that "seek=$RANDOM" reads random from if but still writes sequential to of.

This just gets harder the more you try to understand it=)
Anyhow, I´ll keep updating when I´ve gotten a chance to test the Vertex 2 120 GB, to see if twice the size really equals to twice the throughput.

/Sebulon
 
Sebulon said:
@aragon:
Code:
# sysctl dev.cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 2133
dev.cpu.0.freq_levels: 2133/35000 1866/30625 1600/16000 1400/14000 1200/12000 1000/10000 800/8000 600/6000 400/4000 200/2000
dev.cpu.0.cx_supported: C1/0 C2/85
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% 0.00% last 274us
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.CPU1
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.cx_supported: C1/0 C2/85
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% 0.00% last 246us
Good? Bad?

Looks good. Only other suggestion that comes to mind is to change your partition alignments. Looks like you're using a 32k alignment, but SSDs don't necessarily have a 32k erase boundary. 1 MiB would be a safer option.
 
Sebulon said:
Then, if 4k random writes are exactly the same (or even better) as 4k sequential writes, then my drive should be able to push 84MB/s. From your link the 120GB model scored 43411 IOPS, or 169MB/s. The one I used was 60GB. Half the size, half the throughput - 169/2=84. 84MB/s

But it didn´t! Curiously enough, I´m seeing about half of that:
Code:
# dd if=/tmp/test16GB.bin of=/dev/gpt/ssd-1\:2 bs=4k seek=$RANDOM
17179869184 bytes transferred in 381.576081 secs (45023444 bytes/sec)

Explain that! Because i certainly can´t=)
The closest I can think of is that it´s twice as hard doing sequential writes instead of random writes, and that "seek=$RANDOM" reads random from if but still writes sequential to of.

So I'm also wondering where you got the 35MB/sec info from. I didn't see that, but I did see:

Ok, I just had a look at the spec sheets for the Vertex 2 and 3 drives. This seems to explain everything. The vertex 2 have 2 different metrix quoted, "max write" and "max sustained write". Your drive has zero info stated for sustained write, but on the other drives it is half the max write. So we might assume your drive will have max sustained write of abou 32.5MB/sec. The Vertex 3 don't quote sustained write info, so I'd assume they don't suffer from lower sustained write speeds. Sounds about the same as your 35MB/sec info.

I was reading from this page:
http://www.ocztechnology.com/res/manuals/OCZ_Vertex_Product_sheet_1.pdf

cheers Andy.
 
@AndyUKG:
Sorry, no I forgot to include source for that one. But it´s in the product sheet for Vertex 2
At the bottom of page two, it says "For AS-SSD Performance metrics, go here"

@aragon:
Very good suggestion. I put it to the test:
Code:
# mdmfs -s 2G md0 /mnt/ram/
# cp /tmp/test2GB.bin /mnt/ram/
# gpart create -s gpt ada9
# gpart add -l log1 -t freebsd-ufs -b 2048 -s 16G ada9
# newfs /dev/gpt/log1
# mount /dev/gpt/log1 /mnt/log1
# dd if=/mnt/ram/test2GB.bin of=/mnt/log1/test2GB.bin bs=4k
2073034752 bytes transferred in 40.751293 secs (50870404 bytes/sec)
Compared to:
Code:
# gpart create -s gpt ada10
# gpart add -l log2 -t freebsd-zfs -b 64 -s 16G ada10
# newfs /dev/gpt/log2
# mount /dev/gpt/log2 /mnt/log2
# dd if=/mnt/ram/test2GB.bin of=/mnt/log2/test2GB.bin bs=4k
2073034752 bytes transferred in 40.612978 secs (51043653 bytes/sec)

So no real difference there. But I remember you using the word "safer" with 1MiB, so I´m gonna go with that in the future as well.

/Sebulon
 
Sebulon said:
@AndyUKG:
Sorry, no I forgot to include source for that one. But it´s in the product sheet for Vertex 2
At the bottom of page two, it says "For AS-SSD Performance metrics, go here"

Kind of bizare, their documents seem to contradict each other. In the main product sheet:

http://www.ocztechnology.com/res/manuals/OCZ_Vertex2_Product_sheet_6.pdf

The "max write" speeds have this note:

"Maximum Sequential Speeds are determined using ATT"

So its states its also sequential, which gives a speed of up to 250MB/sec, then in the other doc it states sequential write is only 35MB/sec. There doesn't seem to be an additional performance metrics document for the Vertex 3 so you just get left wondering....

Andy.
 
Update:

This thread really should have been called "SSD´s; a crapshoot"=)

I´ve gotten two OCZSSD2-2VTXE120G, based on the thought that twice the size gives twice the performance. I was half right.

4k sustained write on random data locally:
OCZSSD2-2VTXE60G = 33,5MB/s
OCZSSD2-2VTXE120G = 83MB/s

1m sustained write random data over NFS with mirrored log:
OCZSSD2-2VTXE60G = 32,6MB/s
OCZSSD2-2VTXE120G = 36MB/s

Oddest thing is that gstat shows the logs are only around 50% busy. Why isn´t ZFS using the last 50%?

I´ve tried with both ashift=9 and 12 on the log vdev, without any difference what so ever. So I´m going with ashift=9 as default now, less hassle that way.

I´m crashing my primary pool now for the 100th time to try again but with striped logs instead. I have also ordered two Intel 320 120GB (SSDSA2CW120G310) since I found in the fine fine-print that it´s supposed to handle 130MB/s sustained 4k write random data but we´ll see just how that translates into NFS write performance in the end, since it sorta didn´t with the OCZ drives.

/Sebulon
 
OK, some serious performance testing with the OCZ Vertex 2 120GB

Locally UFS, 4k sustained write with random data
One transfer - 61,3 MB/s - one drive
One transfer - 120,6 MB/s - two gstriped

One SSD, UFS- write over NFS
One transfer - 45,7 MB/s
Two simultaneous - 23,4 MB/s

The SSD gstripe over NFS
One transfer - 61,8 MB/s
Two simultaneous - 32,8 MB/s

ZFS - one log
One transfer - 38,2 MB/s ashift=12
(ashift=9 37,8 MB/s)
Two simultaneous - 23,7 MB/s ashift=12
(ashift=9 23,4 MB/s)

ZFS - two log mirror
One transfer - 35,9 MB/s ashift=12
(ashift=9 35,9 MB/s)
Two simultaneous - 23,4 MB/s ashift=12
(ashift=9 23,4 MB/s)

ZFS - two log stripe
One transfer - 53,9 MB/s ashift=12
(ashift=9 53,6 MB/s)
Two simultaneous - 37,4 MB/s ashift=12
(ashift=9 38,5 MB/s)

Odd here is that one drive formated with UFS and writing over NFS scores 45,7MB/s, but two striped only scored 61,8MB/s, which is exactly half of the performance I got from writing locally to the stripe. Am I imagining things? Is that just unrelated numerology?
I imagined that two striped UFS-drives over NFS would have been 45,7x2=91,4MB/s, but no- that only scored 61,8MB/s...

That same client has before pushed 100MB/s over NFS with the 3GB md-drive as ZIL, so the client wasn´t a bottleneck.

I have at least proven that it doesn´t make any real difference having ashift=9 or 12. Partitioning with either -b 64 or 2048 has been the only performance optimization so far, which was crucial by the way- more than 100% performance improvment compared to letting sysinstall partition and format. But except for that, this journey has been a complete mystery. Only thing left to test are the Intel drives, then I´m done.

/Sebulon
 
Sebulon said:
@Andy:
If you manage to find a drive that actually handles 240MB/s 4k sequential write, let me know, I´ll be the first to buy=)

Without asking for the price? :)
There ARE enough drives that do way more than 240MB/s 4k sequential, or even random, writes. But I believe will be way out of your budget.

I am not expert in OCZ drives, but reading recent spec/reviews it seems these use compression to achieve most of their 'performance'. While this might be ok, for storing DOS err, Windows, data, it does not help the SLOG (separate ZIL).
Therefore, don't test these SSDs with /dev/zero. Best create (large) RAM disk, copy data from /dev/random there and use the 'random' file for your tests.

Anyway, I am missing the FreeBSD, respectively ZFS version you play with. There have been significant improvements in performance in recent versions. Also, if you play with the ZIL, using the experimental ZFS v28 may provide much better performance.

The ZIL will help with IOPs, not with troughput. After all, however fast SSDs you have, data has to be written to the 'slower' rotating disks.

The idea of separate ZIL is also to have an device, that is undisturbed by other tasks and can just write sequentially. As such, having the ZIL on a pair of normal disks, may provide you with better results. I believe your magnetic disks are capable of at least 100MB/s sequential write.

These SSDs you have are good for L2ARC, because they are not that fast for writing anyway.

Try using a magnetic disk or two for the ZIL and SSDs for cache and see if this helps.

FreeBSD with ZFS can certainly push 100MB/s over NFS :)
 
@Danbi:
Well thank god, I was beginning to worry there=)
Yes I know of the difference between RAM SSDs and NAND SSDs, I have a big list of those manufacturers but even if I wanted to buy- for example- a ZeusIOPS drive, there aren´t any resellers around.

Yes, the testing has been performed exactly as you described, with randomly generated data, stored on a RAM disk, and I can write to my pool at around 200MB/s so that shouldn´t be a bottleneck.
I´m sticking with FreeBSD 8 because of the stability.

But man, am I banging my head right now, of course, an ordinary drive is better sequential! It´s so obvious when you think about it. But why does it say everywhere that you should have a SSD for a ZIL?

The Evil Tuning Guide:
"...using an SSD as a separate ZIL log is a good thing"

Solaris Internals ZFS Best Practices Guide:
"Better performance might be possible by using dedicated nonvolatile log devices such as NVRAM, SSD drives"

Neelakanth Nadgir's blog - The ZFS Intent Log:
"...using nvram/solid state disks for the log would make it scream!"

These where the three first hits when searching "zfs zil" on google=)
It kinda sends you over the bridge to get water, you know what I mean? There´s a very big difference between buying two regular 2,5" drives for about 130$ and buying two SSDs of the same size for about 560$ and getting the same performance in the end. My wallet is fealing that difference right now, for example=) Lucky for me, I have the right to send them back if I want.
Maybe this should be more clearly explained in the first place you usually consult: The Handbook?

I´ve gotten my hands on a 2.5" 7200rpm SATA II drive that i will test out tonight. Fingers crossed!

/Sebulon
 
Back
Top