ZFS How to determine if ZIL or L2ARC would be useful

I'm curious how I could go about figuring if adding a ZIL and/or L2ARC SSD would be useful for my RAIDZ2 array. Specifically I'm running an ARK server under a bhyve Linux VM. I suspect it's either disk IO or more likely CPU (old Xeon x5650's) are the bottleneck once I start exploring more, or having more users or building more. I've got PLENTY of unused RAM (72GB total, ~25+GB in VMs or other stuff running at peak) so I suspect an L2ARC SSD wouldn't buy me much.

Honestly, I suspect simply upgrading the CPU would be much more important for me, and I plan to do so in the near future with a Ryzen, although I might go down to only 32GB of RAM, so a L2ARC/ZIL NVMe SSD (or 2, 1 for each) might be much more useful then.
 
ZIL is used for write immediate requests when logbias is set to latency. How much a dedicated device could help depends on how good your main storage is, the workload on the server, and the zfs configuration.

L2ARC is a secondary cache primarily to help read requests. Its most likely going to be useful when the primary cache isn't big enough (you dont have enough ram), the L2ARC device ideally should have a noticeable performance advantage over the normal storage.

I have found a surprisingly lot of stuff issues fsync or at least is dependent on ZIL for performance. Even e.g. a portsnap update command will benefit from either SSD ZIL or sync=disabled. Seems so many people want their write requests to be flushed right away now days. So given the situation you posted I would be inclined to use the SSD for ZIL.
 
A quick and dirty way to test for the benefit of L2ARC is to temporarily add a spare device (or partition) as cache. I have used both a 32GB USB stick, and a Velociraptor (a HDD from 2009!) to determine that it was worthwhile buying an NVMe SSD. Then install the zfs-stats package and look at the hit/miss ratio for zfs-stats -L. Another command that may help is something like zpool iostat -v <poolname> 1. The hit ratio will be poor as the cache initially fills, so let everything stabilize for a while before making any conclusions.

For the ZIL SLOG, you can also use a spare device/partition temporarily, or use zilstat from https://github.com/richardelling/tools/blob/master/zilstat . Note this script requires ksh to be installed. It's less likely you'll need a ZIL SLOG, unless you have application(s) which gratuitously and repeatedly force a sync. If your workload does do that, bear in mind that SSD endurance could become an issue. My ZIL SLOG was added about 5 weeks ago - on a MySQL server using InnoDB - and it's already at 4% of expected life.
 
Back
Top