ZIL performance

johwes · Dec 15, 2013

Hi all,

My system:

Dual six core CPU
196 MB RAM
24x 2 TB 7200SAS 12 x mirrored vdevs
4x 120 GB eMLC sm843t Samsung SSD (2 x mirrored vevs) ZIL
2x 240 GB L2ARC sm843t eMLC SSD
2x 10 Gbit NIC

I have set up my ZIL with (4x eMLC SSD) 2 x mirrored vdevs under provision to 8 GB like this:

Code:

gpart create -s GPT da0
gpart create -s GPT da1
gpart create -s GPT da2
gpart create -s GPT da3
gpart add -t freebsd-zfs -a 4k -s 8192M da0
gpart add -t freebsd-zfs -a 4k -s 8192M da1
gpart add -t freebsd-zfs -a 4k -s 8192M da2
gpart add -t freebsd-zfs -a 4k -s 8192M da3
zpool add tank log mirror da0s1 da1s1
zpool add tank log mirror da2s1 da3s1

Logs:

Code:

mirror 11.7M 7.93G 0 11 0 167K
da0p1 - - 0 11 0 167K
da1p1 - - 0 11 0 167K
mirror 11.5M 7.93G 0 12 0 104K
da2p1 - - 0 12 0 104K
da3p1 - - 0 12 0 104K

My first quotation is: will I get more performance with 2 x mirror like my setup? Are there any tweaks to get more ZIL performance? I only get the ZIL to handle about 100 MB/s when I use NFS, if I disable ZIL on the dataset I get about 300 MB/s

/J

usdmatt · Dec 18, 2013

I'll reply to this seeing as it's still got no responses.

If you look through the forum for the post about NFS ZIL performance, you'll find that getting 100 MB/s is actually very good. I'm not sure exactly where the bottleneck is, but NFS performance on ZFS really isn't that good on FreeBSD. I'm also not sure that striping the ZIL has much of an effect (I think there's another post on here somewhere about that). You may find the performance is very similar with just the one mirrored pair.

There are a few hacks people have done on the mailing list to increase performance but I'm not sure if those hacks are harmless or have consequences. This one for instance: http://lists.freebsd.org/pipermail/free ... 17519.html

I'm not aware of any tweaks to the disk or dataset properties that will make much difference. Someone else might be able to suggest of something.

Nice server though

bjwela · Jan 8, 2014

I have been experimenting with the SM843T in the past as a ZIL device in ~~Freebsd~~ FreeBSD and I have not been able to get good performance using this SSD unless I disable cache flushes. The device is suppose to have power loss protection on the cache, but it appears that ZFS is issuing a cache flush request to the device with every ZIL write and that the Samsung device is actually honoring that even though it has power loss protection.

You can test if disabling cache flushes will improve ZIL performance by adding:

Code:

vfs.zfs.cache_flush_disable="1"

to loader.conf. Note that this can cause dataloss since it disables cache flushes alltogether "I think" but it can be useful for testing purposes.

I did not have time to investigate this further, but maybe someone else can chime in on ZIL and cache flushes.

For your question regarding multiple ZIL devices. My understanding is that ZFS will not stripe writes across ZIL devices and will therefore not boost performance for a single client. But you will get lower latency when you have multiple NFS clients connecting to the server since the clients do not have to wait for each other to complete their writes. So multiple ZIL devices will potentially give you better over-all throughput for the server, but that is if you have more than one client.

pillai_hfx · Jan 8, 2014

I recently did an NFS deployment with FreeBSD 9.2 and ZFS with 45 SATA drives. The log device used was an Intel DC S3700 and the cache was the Intel DC S3500. It is configured as four 11 drive RAID-Z3 in one pool and one cold spare. Initially the NFS performance was not very good and I was unhappy with the results (I don't remember it was read or write performance, but it was ~100 MB/s). It turned out to be the new NFS server in FreeBSD. Even though I liked the on demand threading on the new one, I ended up using the old NFS server and the performance is back along expected lines. Since this is for HPC usage, IOR is my preferred benchmark. Here is the throughput using 54 NFS clients in parallel:

Code:

 Summary:
        api                = MPIIO (version=2, subversion=0)
        access             = file-per-process
        ordering in a file = random offsets
        ordering inter file= no tasks offsets
        clients            = 54 (1 per node)
        repetitions        = 1
        xfersize           = 10 MiB
        blocksize          = 200 MiB
        aggregate filesize = 10.55 GiB

Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev  Max (OPs)  Min (OPs)  Mean (OPs)   Std Dev  Mean (s)  
---------  ---------  ---------  ----------   -------  ---------  ---------  ----------   -------  --------
write         466.75     466.75      466.75      0.00      46.68      46.68       46.68      0.00  23.13852   EXCEL
read          341.96     341.96      341.96      0.00      34.20      34.20       34.20      0.00  31.58274   EXCEL

Max Write: 466.75 MiB/sec (489.43 MB/sec)
Max Read:  341.96 MiB/sec (358.57 MB/sec)

So it might be a good idea to try the old NFS implementation and see if your write throughput improves. In my case, as you can see above that switch solved the problem. Hope this helps.

ZIL performance

johwes

usdmatt

bjwela

pillai_hfx