NFS write performance with mirrored ZIL

usdmatt

Daemon

Reaction score: 515
Messages: 1,386

I'm not sure how you are mounting your NFS share but I've been looking for ways to get decent NFS sync write performance for a while. I have a Linux box serving some VMware images that I really, really would like to get replaced with FreeBSD+ZFS. Not only am I much more comfortable with FreeBSD but I would get better snapshots / zfs send / zfs scrub, etc.

From what I understand the slow NFS write performance is due to the fact that every sync NFS request has to be flushed to disk before the client can proceed with the next request. The (relatively) slow access times of mechanical disks means you can't actually process that many sync NFS requests per second, dragging the throughput down.

NFS to a small test pool only gave me ~5MB/s which matches that seen in this thread.
http://lists.freebsd.org/pipermail/freebsd-fs/2009-September/006884.html
Obviously the local performance was way above this.

I was able to increase this to ~35MB/s by adding a single 60GB vertex 2 ssd.

In order to replace my Linux box I ideally need to be seeing 80MB/s+ so I'd be real interested if anyone can pull it off without spending a fortune on PCIe SSDs. So far, I'm not aware of anyone that has managed more than about 60 with standard hardware. Hacks like trying to mount async (don't think it's even possible in VMware) or disabling ZIL that could jeopardize data integrity are not really an option.

Interestingly, ixSystems make some FreeBSD 8.2 NAS boxes with Fusion-io cards that can apparently max 10Gb ethernet, although I assume that's not with NFS. I be interested to see what their NFS performance is like though (not that I could afford them).
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

Extended 4k sustained write tests

MO:
Code:
# mdmfs -s 4096m md0 /mnt/ram
# dd if=/dev/urandom of=/mnt/ram/test4GB.bin bs=1m count=4096
# dd if=/mnt/ram/test4GB.bin of=/dev/ada0(.nop,s1,s1d,p1,p1.nop) bs=4k
Or, if towards filesystem:
# dd if=/mnt/ram/test4GB.bin of=/mnt/ada0/test4GB.bin bs=4k
Code:
[B]OCZ Vertex 2 120GB[/B]
  Local writes:
  raw            56 MB/s
  fdisk          17 MB/s
  fdisk/label    17 MB/s
  gpart          55 MB/s
  gnop           55 MB/s
  gpart/gnop     54 MB/s

  raw [B]bs=128k[/B]    60 MB/s

  sysinstall ufs 56 MB/s
  gpart ufs      60 MB/s
  raw zfs        44 MB/s
  gpart zfs      43 MB/s
  gnop zfs       51 MB/s

  [U]Score as mirrored ZIL:[/U]
  raw            49 MB/s
  fdisk          56 MB/s
  gpart          56 MB/s
  gnop           56 MB/s
  gpart/gnop     53 MB/s
--------------------------------


--------------------------------
[B]Intel 320 120GB[/B]
  Local writes:
  raw            52 MB/s
  fdisk          52 MB/s
  fdisk/label    51 MB/s
  gpart          52 MB/s
  gnop           51 MB/s
  gpart/gnop     50 MB/s

  raw [B]bs=128k[/B]    128 MB/s

  sysinstall ufs 132 MB/s
  gpart ufs      131 MB/s
  raw zfs        70 MB/s
  gpart zfs      73 MB/s
  gnop zfs       72 MB/s

  [U]Score as mirrored ZIL:[/U]
  raw            52 MB/s
  fdisk          52 MB/s
  gpart          52 MB/s
  gnop           52 MB/s
  gpart/gnop     52 MB/s
--------------------------------


  References:
--------------------------------
[B]CompactFlash 16GB[/B]
  Local writes:
  raw            18 MB/s
  fdisk          17 MB/s
  gpart          18 MB/s

  raw [B]bs=128k[/B]    64 MB/s

  sysinstall ufs 46 MB/s
  gpart ufs      37 MB/s
--------------------------------


--------------------------------
[B]5.4k rpm 160GB[/B]
  Local writes:
  raw            32 MB/s
  fdisk          34 MB/s
--------------------------------


--------------------------------
[B]10k rpm 146GB[/B]
  Local writes:
  raw            59 MB/s
  fdisk          55 MB/s

  raw [B]bs=128k[/B]    101 MB/s

  sysinstall ufs 96 MB/s

  [U]Score as mirrored ZIL:[/U]
  raw            52 MB/s
  fdisk          52 MB/s
The NAND SSD ZIL pimple is popped!

These tests have been performed on my rig and also a HP DL380 G5 and produced the same results. You get the exact same performance out of a 10k rpm rotating disk as you get from a SSD of the same size. Intel says 130MB/s and they are at least only half lying since they do have that, but only at 128k block size.

At 4k, the performance dropped down to the same performance as both OCZ and the 10k rpm drive. So far, the performance you get from doing 4k writes to a raw device or partition is the same performance you get over NFS in the end. Therefore, my eyes have turned towards the Seagate SAVVIO 15k.2.

Looking at the 5.4k rpm and 10k rpm drives, the performance has been linear to the amount of rpm´s, so what I´m hoping, is that a 15k rpm drive will have the combined performance off of that, like "5.4k rpm + 10k rpm = 15k rpm", which in performance would be "32MB/s + 59MB/s = 91MB/s" at 4k block size. That would make them perfect as SLOG!. I´ll update once I´ve had a chance to test it out.

/Sebulon
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

Seen around the net about Seagate´s Pulsar

http://www.foodreview.imix.co.za/node/91093
"Seagate claims the solid-state drives can perform up to 30,000 random read and up to 25,000 random write IOPS, or 240 MBps sequential reads and 210 MBps sequential writes with a 4K block size."

http://www.theregister.co.uk/2011/03/15/seagate_enterprise_drive_refresh/
"The Pulsar XT.2 – with 100, 200 and 400GB capacity points – has the same endurance and reliability, and offers 48,000 sustained random read IOPS (4K blocks), 22,000 write IOPS, 360MB/sec bandwidth for sequential reads and 300MB/sec write bandwidth: good figures."

I couldn´t agree more=)
If anyone has a chance to test them out, please do!


Also read here:
http://www.natecarlson.com/2010/05/07/review-supermicros-sc847a-4u-chassis-with-36-drive-bays/
Write-up about a rig that can saturate two 1GigE trunked using two Intel X25-e as mirrored ZIL.

It´s so hard to really understand the difference between SSD´s, cause when you´re looking at Intel´s datasheets, the 320´s I tested actually looked better in comparison. They really should start to post "sequential 4k read/writes" to make this easier to spot out. Right now, they only post "sequential read/write" without mentioning block size. Now, I know by experience that the Intel 320 120GB´s scored 130MB/s only with 128k block size and not even half of that with 4k. Being clearer on that point would save you the trouble having to test everything out yourself before you could be completly sure.

Intel 320 120GB posts 130MB/s sustained write
Intel 320 160GB posts 165MB/s sustained write
Intel X25-E 32GB posts 170MB/s sustained write
Intel 320 300GB posts 205MB/s sustained write

As tested before, the 320 only scored 52MB/s sustained write at 4k
And judging from the earlier linked write-up, the X25-E really could have 170MB/s sustained write at 4k, but what would be the cost of testing all of this out?

So to get about the same performance:
2x Intel 320 160GB costs about 790$
2x Intel X25-E 32GB costs about 1080$
2x Intel 320 300GB costs about 1400$

Yeah, like that´s gonna happen=)

2x Seagate SAVVIO 15k.2 146GB costs about 560$ in comparison.

/Sebulon
 

danbi

Active Member

Reaction score: 30
Messages: 227

You will never, ever need 146GB for a ZIL. Sharing that (rotating) disk with another task will make the ZIL perform very poorly.
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

@danbi:
You wouldn´t need 32,64,120, or even 300 either, but I just couldn´t find any smaller than that=) It´s all about raw performance. Either way, 15k rpm drives are faster than regular 7.2k rpm, or perhaps 5.4k rpm, if you´re building your pool with 2.5" drives for example. And at random writes, IOPS is limited to the amount of vdevs in your pool. So if you´re building a pool with raidz(2,3), you´re probably gonna have more random write IOPS from one 15k rpm anyway.
And I´ll only be using them as ZIL, doing sequential writes. Think transferring large ISO´s, GIS-data, VM´s and so on.

I am aware that you´ll only use what it takes to saturate with network IO, in my case, 100MB/s, or about 1GB. However, I have measured the difference between having a 1GB large md-drive as ZIL, and comparing with having 3GB as ZIL scores higher in the end, so it´s at least good to have a little overhead.
The Oracle systems have as much as 16GB ZIL. So partitioning the SAVVIO´s for the first 8GB would ensure both proper size and best performance, using the most outer tracks on the disk.

/Sebulon
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

Finally gotten my hands on two SAVVIO 15k.2 and had a chance to test them out. These test have been performed on a HP DL380 G5. The pool has been made up by three 10k rpm HP-drives raidz and two SAVVIO´s as mirrored ZIL.

First checking the network:
Code:
iPerf:
------------------------------------------------------------
Client connecting to 10.20.0.99, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local 10.20.0.56 port 46240 connected with 10.20.0.99 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec
Local writes on server:
Code:
# mdmfs -s 2304m md0 /mnt/ram
# dd if=/dev/urandom of=/mnt/ram/test2GB.bin bs=1m count=2048
# dd if=/mnt/ram/test2GB.bin of=/dev/ada0(s1,s1d) bs=4k
NFS writes from a client:
Code:
# mdmfs -s 2304m md0 /mnt/ram
# dd if=/dev/urandom of=/mnt/ram/test2GB.bin bs=1m count=2048
# mount 10.20.0.99:/export/tank /mnt/tank/perftest
# dd if=/mnt/ram/test2GB.bin of=/mnt/tank/perftest/test2GB.bin bs=1m
# dd if=/mnt/ram/test2GB.bin of=/mnt/tank/perftest/test2GB-2.bin bs=1m
# dd if=/mnt/ram/test2GB.bin of=/mnt/tank/perftest/test2GB-3.bin bs=1m
# umount /mnt/tank/perftest
Finally, the results:
Code:
--------------------------------
[B]SAVVIO 15k.2 146GB[/B]
  Local writes:
  raw            60 MB/s
  fdisk          56 MB/s

  raw bs=128k    165 MB/s

  [U]Score as mirrored ZIL:[/U]
  raw            50 MB/s
  fdisk          50 MB/s
--------------------------------
As you can see, the results are amaizingggggggly bad- unfortunately. They give the exact same performance you get from just about any other drive I´ve tested so far. Therefore, since I´ve apparently started chasing windmills, I´ve also ordered the supposed knight in shining armor, the Intel X25-E, which is about the only drive I´ve been able to find any kind of performance benchmarking been made on elsewhere.

The best source of info I´ve found on the matter is from here:
http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs

Where the writer states:
"SLC flash is faster for writes and more reliable, therefore it's the best choice for a ZIL"

And then:
"MLC flash will give you more capacity for your money, but is less reliable and less fast than SLC. Therefore, MLC makes a good read accelerator when you're budget-constrained."

But at the same time, these statements doesn´t seem to be based on factual experience:
"I don't have any empirical data, but from the way SLC and MLC work, it is to be expected that SLC drives are faster and more reliable than MLC drives."

Well, I had great expectations when I woke up this morning too, but look how that turned out=)

I´m also gonna give the SAVVIO´s a fair chance at some other machines as well. I bought SAS to SATA converters here for testing but the drives didn´t even spin up, much less show up in the OS. So I´ve ordered a pair of these instead, hoping they´ll work better.

/Sebulon
 

usdmatt

Daemon

Reaction score: 515
Messages: 1,386

Hi Sebulon,

Thank you for providing this information. It's great to know how these drives perform locally *and* over NFS. Pretty much all our storage needs are NAS/SAN based so network storage performance is my main concern. I can't afford to buy drives just for testing and my company wouldn't be too happy spending a fortune on new hardware just to find out it doesn't perform any better than our existing systems, even if it does make our data more secure and easier to manage.

Have you tested any of these drives as striped ZIL?
Obviously in a live system you'd want striped mirrors but if the performance does scale, it wouldn't be out of the way to have 4 or 6 drives as ZIL, enough to max out 1Gb ethernet at least.

Looking forward to the X25-E results. Hopefully we'll get some good news for once...
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

@usdmatt:
No problem man, happy to help. But don´t think I´m sponsored or that it´s through my job or such. I buy those drives out of my own budget, then I have the right to send them back within 14 days by law. I think of it as paying to test them out=)

Code:
Over NFS single transfer:
1x md-drive     128MB             = 54MB/s
1x md-drive     256MB             = 57MB/s
1x md-drive     512MB             = 60MB/s
1x md-drive     768MB             = 67MB/s
1x md-drive     1GB               = 77-80MB/s
1x md-drive     2,4GB             = 77-80MB/s

Over NFS double transfer total:
1x md-drive     128MB             = 64MB/s
1x md-drive     256MB             = 66MB/s
1x md-drive     512MB             = 70MB/s
1x md-drive     768MB             = 78MB/s
1x md-drive     1GB               = 90MB/s
1x md-drive     2,4GB             = 90MB/s


---------------------------------------------------
Over NFS single transfer:
2x OCZ Vertex 2	120GB stripe log  = 69MB/s
2x Intel 320    120GB stripe log  = 69MB/s
2x HP 10k SAS   146GB stripe log  = 69MB/s
2x SAVVIO 15k.2	146GB stripe log  = 66MB/s

2x HP 10k SAS   146GB mirror log  = 58MB/s
2x SAVVIO 15k.2 146GB mirror log  = 56MB/s
Well, what about that! The HP-drives actually outrun the SAVVIO´s. God damn it.
I also redid the tests with mirrored ZIL and got better results than what I posted before. The reason was because I had a 2,2GB large md-drive configured on the server and when I deleted it and gave that RAM back to the OS, it also performed better.

Also, I tested what difference the size of the ZIL actually has. It´s exactly as explained. The size of the ZIL only has to be as large as your bandwidth. If you have 1GigE, you only need 1GB ZIL. If you have 10GigE, you´re gonna need 10GB ZIL. Good to know.

/Sebulon
 

danbi

Active Member

Reaction score: 30
Messages: 227

usdmatt said:
I can't afford to buy drives just for testing and my company wouldn't be too happy spending a fortune on new hardware just to find out it doesn't perform any better than our existing systems, even if it does make our data more secure and easier to manage.
That would be an non-commercial company then :)

wasted time = loss of money
data loss = loss of lots of money
secure data = less data loss
easier to manage data = less time spent

and so on :)

Anyway, I just tried to sort of repeat Sebulon's tests, on similar hardware.

system 1: Xeon X3450, 8GB RAM, LSI 1068e, 2xST9146852SS (Savvio 15k), 2xMBF2600RC (VERY BUSY database server)

This is copying to ZFS filesystem.

Code:
# mdmfs -s 2304m md1 /mnt
# dd if=/dev/urandom of=/mnt/test2GB.bin bs=1m count=2048
2048+0 records in
2048+0 records out
2147483648 bytes transferred in 29.238521 secs (73447068 bytes/sec)
# dd if=/mnt/test2GB.bin of=/fast/junk bs=4k
524288+0 records in
524288+0 records out
2147483648 bytes transferred in 24.428726 secs (87908131 bytes/sec)
system 2: 2x Xeon E5620, 48GB RAM, LSI 2008, 2xMBF2600RC (idle)

This is copying to the raw GPT partition.

Code:
# mdmfs -s 2304m md1 /mnt
# dd if=/dev/urandom of=/mnt/test2GB.bin bs=1m count=2048
2048+0 records in
2048+0 records out
2147483648 bytes transferred in 32.296183 secs (66493419 bytes/sec)
# dd if=/mnt/test2GB.bin of=/dev/gpt/data0 bs=4k
524288+0 records in
524288+0 records out
2147483648 bytes transferred in 3153.543603 secs (680975 bytes/sec)
Observations: writing to the raw devices for some reason has always been slow. Putting those under management of ZFS made them run much faster. In fact, using UFS made them run faster. Go figure. :)

On system 2, the disk was satturated at 166 IOPs (about 680KB/s), weird! With 1m block size, it satturated at 145 IOPs (about 19 MB/s).

An non-raw example, on system 2:

Code:
# newfs /dev/gpt/data0
# mount /dev/gpt/data0 /media
# dd if=/mnt/test2GB.bin of=/media/junk bs=4k
524288+0 records in
524288+0 records out
2147483648 bytes transferred in 14.184637 secs (151395037 bytes/sec)
The drive was going over 1150 IOPs and 150MB/s. So there might be some raw device IO weirdness going on here.

What is your IOPs value (observed with gstat)?
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

To all:
I´m just a dude, you know. And not the most 1337 dude in the world either=) Chances are that I´ve been doing something wrong all along but so far no one on this forum seems to argue my methods. But don´t just take my words absolute, go research all this and find out for your selves! Because this feels like a complete crapshoot from the start.

@danbi
Yes, way to go man, starting a knowledge revolution over here=) Awesome to have others testing this as well and posting their results!
I´m testing deafult installs of amd64 FreeBSD 8.2-RELEASE, the same methods, as similar networks, clients and the same SLOG´s, but on as many different servers/hardware as possible to see if anything changes, depending on what hardware/enviroment you have, but so far, the results have been quite the same. And not a single drive I´ve tested so far seems to hold it´s worth as ZIL.

I´ve also noticed the same behaviour as you; the difference between writing to a filesystem and writing to a device. The only explanation I can think up for it is:

Code:
dd if=ram of=/dev/something bs=4k (little MB/s)
dd if=ram of=/dev/something bs=128k (lots of MB/s)

dd if=ram of=/mount/file bs=4k (still lots of MB/s)
dd if=ram of=/mount/file bs=128k (same same)
That should mean that writing to a file in a filesystem transforms the write blocksize to that which it likes best, which is about as big as possible- no matter what you specify as bs to the application. ZFS e.g. has 128k as default, simply because that´s the block size hard drives likes to write the best.
When you´re writing directly to a device however, there is no filesystem that´s allowed to have a say in the matter, so dd can write what you specify with bs without any convertion, showing it´s real speed at 4k- since that´s the performance you get over NFS when using them as SLOG devices.

/Sebulon
 

danbi

Active Member

Reaction score: 30
Messages: 227

By the way, I just discovered, that my SAS drives had write cache disabled. Enabling it on the Toshiba drives:

Code:
dd if=/dev/zero of=/dev/gpt/data0 bs=4k count=1m
1048576+0 records in
1048576+0 records out
4294967296 bytes transferred in 125.264367 secs (34287223 bytes/sec)
So, much different than before.

Check your SAS drives:

camcontrol modepage da0 -m 0x08

look for WCE, if this is 0 your write cache is disabled; to change, use

camcontrol modepage da0 -m 0x08 -e

set

Code:
WCE:1
You may discover what your Savvio can do for you :)
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

@danbi
Tried to use the camcontrols you specified. None of them have worked so far, on any system unfortunately. But I know that write cache is on by default in FreeBSD and that is true with AHCI as well. If I cranked it up with the old ata driver and had:
Code:
hw.ata.wc=0
In loader.conf, then the local write speed went down to about 8MB/s on the X25-E, proving that it has been active, before as well.

---------------------------------------------

OK, before you read further I want you all to be comfortably sitting, because the results were shocking... I take no responsibility for sudden faints, broken arms- bones and so on.

I´ve-ahh...gotten the X25-E now. Gotten a chance to test it. But also, at work, I took the opportunity to pull out a STEC Zeus IOPS SSD 16GB from a shut down oracle system- to test it the same way as with the rest.

Code:
[B]Intel X25-E 32GB[/B]
  Local writes:
  raw            77 MB/s

  raw bs=128k    197 MB/s

  [U]Score as ZIL:[/U]
  raw            60 MB/s
Code:
[B]Zeus IOPS 16GB[/B]
  Local writes:
  raw            64 MB/s

  raw bs=128k    133 MB/s

  [U]Score as ZIL:[/U]
  raw            55 MB/s
Yeah, that´s right. The X25-E outperformed the almighty Zeus. Also, I have compiled my numbers from all the previous tests, to get a better overview. These tests have been made on as many different machines as possible and the results have been about the same.

2011-11-01: Highscore is moved to top of first post. It´s a more logical place to have. Faster to find for those reading for the first time and also easier for me to update, as time passes.

The results were definitely not what I´ve expected. Not one device is able to shuffle NFS at 100MB/s and testing the Zeus was probaly the biggest anticlimax ever. But, no matter how you look at it, we can see that Intels X25-E is the winner of every test.

So if speed is your top priority, my advice would be to buy X25-E´s until you hit your target. ZFS V28 is STABLE now, so at least you won´t have to mirror the logs any more. Just don´t count on two logs giving double performace. At least with ZFS V15, I would say I won about 20% striped logs vs mirrored.

As a last test, I am going to upgrade to V28 and test one more time with the X25-E to report what difference that might give.

/Sebulon
 

danbi

Active Member

Reaction score: 30
Messages: 227

Sebulon said:
@danbi
Tried to use the camcontrols you specified. None of them have worked so far, on any system unfortunately.
There is difference in SAS and SATA write cache and how it is enabled in FreeBSD. You have enabled SATA write cache, but that does not affect SAS drives at all.

What does

# camcontrol modepage da0 -m 0x08

produce?
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

@danbi
OK, didn´t know that. Good to know but does little for me:
Code:
[root@tank ~]# camcontrol modepage da0 -m 0x08
camcontrol: error sending mode sense command
This is probably because of the HP Smart Array controller. It has no JBOD mode. You have to create RAID0´s for each drive.

/Sebulon
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

Hi,

Upgraded to 8-STABLE ZFS V28

Still the same performance with 1x Intel X25-E 32GB. About 60-70MB/s at best. I'm hoping someone will prove me wrong, but it seems that is as good as it gets. Striping several has a proven positive effect, but not linearly multiplied by the number of devices, so I can't say for sure how many you'd need to hit a 100MB/s.

Perhaps someone here is feeling daring enough to pick up were I leave?

The truth is out there.

/Sebulon
 

gyrex

New Member


Messages: 1

Apologies for hijacking this thread but I was trying to figure out whether the Supermicro CSE-M35T-1 is SATA II capable. This was about the only result that came up!

Sebulon, do you know if the CSE-M35T-1 is SATA II capable? Could this be the reason why your performance is degrading?

There's no mention on Supermicro's site (http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm) that indicates that this drive caddy is SATA II capable - it only says that it's a SATA (assuming SATA I) drive caddy/backplane.

I have 2 of these and was just wondering if this backplane will limit my performance.

Cheers,

John
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

Hi gyrex and welcome!

I have two of those. Each port is SATA-150 and I have never connected a disk that could have exceeded that in IO. I mean, I have only regular hard drives connected to the caddy and they only generate about 80-100MB/s tops. I have however tested running:
Code:
# dd if=/dev/ada0-9 of=/dev/zero bs=1m
To test the total backplane and controller bandwidth, as the caddys are evenly connected to three PCI-X SATA-300 controllers. It generated IO around 1GB/s. So cool=)

The SSD´s I´ve tested have been directly connected to the built in ICH9 SATA-300 controller on the motherboard, so they don´t get "disturbed" by IO coming from the other drives. The Intel X25-E could generate as much as 197MB/s but only at bs=1m. At bs=4k it shuffled only 77MB/s.

/Sebulon
 

danbi

Active Member

Reaction score: 30
Messages: 227

Isn't this passive drive enclosure? As far as I can see, it doesn't even have a backplane or port multiplier.
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

Have an update!

I have gotten my hands on a OCZ Vertex 3 240GB. I took my time testing it in the following rig:
Code:
[B][U]HW[/U][/B]
1x  Supermicro X8SIL-F
2x  Supermicro AOC-USAS2-L8i
2x  Supermicro CSE-M35T-1B
1x  Intel Core i5 650 3,2GHz
4x  2GB 1333MHZ DDR3 ECC UDIMM
10x SAMSUNG HD204UI (in a raidz2 zpool)
1x  OCZ Vertex 3 240GB

[B][U]SW[/U][/B]
[CMD="#"]uname -a[/CMD]
FreeBSD server 8.2-STABLE FreeBSD 8.2-STABLE #0: Mon Oct 10 09:12:25 UTC 2011     root@server:/usr/obj/usr/src/sys/GENERIC  amd64
[CMD="#"]zpool get version pool1[/CMD]
NAME   PROPERTY  VALUE    SOURCE
pool1  version   28       default
Code:
[CMD="#"]iperf -c server[/CMD]
------------------------------------------------------------
Client connecting to server, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[  3] local client port 45921 connected with server port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.08 GBytes   927 Mbits/sec
Code:
[B][U]LOCAL WRITES[/U][/B]
[CMD="#"]gpart create -s gpt da5[/CMD]
da5 created
[CMD="#"]gpart add -t freebsd-zfs -b 2048 -l log1 da5[/CMD]
da5p1 added
[CMD="#"]gpart show da5[/CMD] 
=>       34  468862061  da5  GPT  (223G)
         34       2014       - free -  (1M)
       2048  468860047    1  freebsd-zfs  (223G)
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/dev/gpt/log1 bs=4k[/CMD]
524288+0 records in
524288+0 records out
2147483648 bytes transferred in 34.741185 secs (61813771 bytes/sec)
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/dev/gpt/log1 bs=128k[/CMD]
16384+0 records in
16384+0 records out
2147483648 bytes transferred in 7.921530 secs (271094554 bytes/sec)
Code:
[B][U]OVER NFS[/U][/B]
[B]with ssd log:[/B]
async)  2147483648 bytes transferred in 26.854198 secs (79968266 bytes/sec)
sync)   2147483648 bytes transferred in 30.528600 secs (70343339 bytes/sec)
[B]with md log:[/B]
async)  2147483648 bytes transferred in 38.788051 secs (55364567 bytes/sec)
sync)   2147483648 bytes transferred in 121.933071 secs (17611987 bytes/sec)
[B]without log:[/B]
async)  2147483648 bytes transferred in 38.690945 secs (55503520 bytes/sec)
sync)   2147483648 bytes transferred in 136.648112 secs (15715429 bytes/sec)
Tests over NFS have been made like:
Code:
[B]async)[/B]
[CMD="#"]mount -o async server:/export/perftest /mnt/tank/perftest[/CMD]
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/mnt/tank/perftest/rand2GB.bin bs=1m[/CMD]
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/mnt/tank/perftest/rand2GB-2.bin bs=1m[/CMD]
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/mnt/tank/perftest/rand2GB-3.bin bs=1m[/CMD]
[CMD="#"]umount /mnt/tank/perftest[/CMD]
[B]sync)[/B]
[CMD="#"]mount server:/export/perftest /mnt/tank/perftest[/CMD]
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/mnt/tank/perftest/rand2GB.bin bs=1m[/CMD]
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/mnt/tank/perftest/rand2GB-2.bin bs=1m[/CMD]
[CMD="#"]dd if=/mnt/ram/rand2GB.bin of=/mnt/tank/perftest/rand2GB-3.bin bs=1m[/CMD]
[CMD="#"]umount /mnt/tank/perftest[/CMD]
The ZIL-tests were positive! It is the fastest disk I´ve tested so far and bested even the X25-E as ZIL. Also the "Highscore" a couple posts above has been updated.

Odd reflection from these tests was from when I added a 1GB large ram-md disk as a "best possible" disk for ZIL and it didn´t even use it?! I mean, I added md0 as ZIL in the pool, started the sync´ed transfers from the test-client, watched gstat on the server during this time and the md0 drive was never written to. I then removed it from the pool and re-added the Vertex as ZIL instead, and instantly ZFS started using the ZIL as it normally does. Tried restarting the server, destroyed and created with a bigger sized md-device, partitioned it, destroyed the Vertex´s partition and label and used that same gpart-label on the md0p1 (gpt/log1) instead. Nothing worked. The only disk it wrote to as ZIL was the Vertex. Has worked in earlier tests though. Very odd.

/Sebulon
 

olav

Well-Known Member

Reaction score: 27
Messages: 374

Impressive results!
Is the OCZ Vertex 3 240GB safe to use as ZIL?
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

No capacitors on this model. I´m just gonna be using it as L2ARC so I don´t mind. I have however found information that Vertex 3 Pro has "Power loss data protection":
Also found this:
http://www.legitreviews.com/article/1547/2/
On the bottom left is the large Cap-XX HZ202 super-capacitor that ensures that all writes are completed in the event of power interruption.
So Vertex 3 Pro is what you want.

/Sebulon
 

danbi

Active Member

Reaction score: 30
Messages: 227

It is faster than X25-E, because that Intel drive is really old. But it also uses SLC flash, which means:

- less risk of device/data failure at power outage
- much, much larger lifespan

If you are using it in enterprise environment (X25-E is clearly an enterprise drive), you must care about both features. If not.. you know, you can build an indefinitely fast system, if data integrity is not a concern.
 
OP
OP
Sebulon

Sebulon

Aspiring Daemon

Reaction score: 128
Messages: 709

peetaur

Active Member

Reaction score: 17
Messages: 167

How does your spinning disk pool do when reading and writing to the same zpool?

eg.

Code:
#clear cache (does this work? works on Linux for ext3/4)
zfs umount -a
zfs mount -a
#or maybe:
zpool export pool
zpool import pool
#or pick a file that nobody read for days

dd if=/tank/openSUSE-11.4-DVD-x86_64.iso of=/tank/testfile bs=128k
35208+0 records in
35208+0 records out
4614782976 bytes transferred in 20.468894 secs (225453460 bytes/sec)

[edit: since FreeBSD's dd has no conv=fdatasync I didn't know what to do above for the best results, but now I do, and here it is]

gdd if=/tank/openSUSE-11.4-DVD-x86_64.iso of=/tank/testfile bs=128k conv=fdatasync
35208+0 records in
35208+0 records out
4614782976 bytes (4.6 GB) copied, 26.8473 s, 172 MB/s

My performance was surprisingly lame (16 disks doing read at 600MB/s, and write+read together at 150 or so when combined [slower than a consumer fake raid 4 disk stripe]) until I set up the /boot/loader.conf to change the zfs tunables. Now as you can see, it will read and write at around 450 (faster than some other untuned 24 disk SAS system we have). I used this page as my template:
http://hardforum.com/archive/index.php/t-1551326.html
 
Top