ZFS ZIL & L2ARC over SSD pool worth it?

I have a zpool on a Dell 730xd used as a dedicated storage server with 64GB of memory that looks like this:
Code:
    NAME                        STATE     READ WRITE CKSUM
    ssdpool                     ONLINE       0     0     0
      mirror-0                  ONLINE       0     0     0
        wwn-0x5002538e40be14b3  ONLINE       0     0     0
        wwn-0x5002538e40be1606  ONLINE       0     0     0
      mirror-1                  ONLINE       0     0     0
        wwn-0x500a07511c75d788  ONLINE       0     0     0
        wwn-0x500a07511c75da09  ONLINE       0     0     0
      mirror-2                  ONLINE       0     0     0
        wwn-0x500a075115b56adf  ONLINE       0     0     0
        wwn-0x500a075115b56b2a  ONLINE       0     0     0

Every disk is a SATA6 SSD.

The contents of the pool consists solely of Proxmox virtual machine volumes mounted over iSCSI:
Code:
NAME                                   USED  AVAIL     REFER  MOUNTPOINT
ssdpool                               4.36T  2.75T     26.5K  /ssdpool
ssdpool/vm-100-disk-0                  364M  2.75T      353M  -
ssdpool/vm-100-disk-1                 2.54T  2.75T     1.56T  -
ssdpool/vm-101-disk-0                 11.7G  2.75T     7.69G  -
ssdpool/vm-101-disk-2                 7.05G  2.75T     4.44G  -
ssdpool/vm-102-disk-0                 18.7G  2.75T     13.7G  -
ssdpool/vm-102-disk-1                 8.70G  2.75T     6.79G  -
ssdpool/vm-103-disk-0                 8.06G  2.75T     5.87G  -
ssdpool/vm-103-disk-1                 8.75G  2.75T     5.94G  -
ssdpool/vm-103-state-pre_update       1.41G  2.75T     1.41G  -
ssdpool/vm-104-disk-0                 8.47G  2.75T     4.47G  -
ssdpool/vm-104-disk-1                 77.9G  2.75T     42.7G  -
ssdpool/vm-106-disk-0                 5.09G  2.75T     4.83G  -
ssdpool/vm-106-disk-1                  309G  2.75T      293G  -
ssdpool/vm-107-disk-0                 6.19G  2.75T     6.19G  -
ssdpool/vm-108-disk-0                 13.9G  2.75T     7.89G  -
ssdpool/vm-109-disk-0                 5.40G  2.75T     3.77G  -
ssdpool/vm-109-state-sup              2.88G  2.75T     2.88G  -
ssdpool/vm-110-disk-0                  642G  2.75T      339G  -
ssdpool/vm-110-disk-1                 48.1G  2.75T     48.1G  -
ssdpool/vm-110-state-April_13_2020    15.3G  2.75T     15.3G  -
ssdpool/vm-111-disk-0                 6.94G  2.75T     6.90G  -
ssdpool/vm-112-disk-0                 6.27G  2.75T     6.03G  -
ssdpool/vm-112-disk-1                 1.99G  2.75T     1.85G  -
ssdpool/vm-113-disk-0                  157G  2.75T     91.1G  -
ssdpool/vm-114-disk-0                  167G  2.75T     91.2G  -
ssdpool/vm-115-disk-0                 6.91G  2.75T     6.91G  -
ssdpool/vm-116-disk-0                 93.0G  2.75T     92.8G  -
ssdpool/vm-117-disk-0                 1.70G  2.75T     1.61G  -
ssdpool/vm-118-disk-0                  169G  2.75T      101G  -
ssdpool/vm-120-disk-0                 8.16G  2.75T     6.97G  -
ssdpool/vm-120-state-before_20200407  3.42G  2.75T     3.42G  -
ssdpool/vm-120-state-pre_demo         2.11G  2.75T     2.11G  -
ssdpool/vm-121-disk-0                 9.43G  2.75T     9.43G  -
ssdpool/vm-122-disk-0                 2.05G  2.75T     2.05G  -
ssdpool/vm-130-disk-0                 7.77G  2.75T     7.77G  -
ssdpool/vm-130-disk-1                 16.9G  2.75T     16.9G  -
My question is, given that the pool is composed on SSD drives, is it worth adding a mirrored NVMe SSD as ZIL and L2ARC given the additional speed of NVMe?

Screen Shot 2020-04-30 at 8.39.34 PM.png
 
is it worth adding a mirrored NVMe SSD as ZIL and L2ARC given the additional speed of NVMe?
It depends on the load and type of file access. If the pool only contains large movie files for example, that are only played from start to finish, you're not going to benefit from L2ARC at all. If there's only a lot of reading but rarely any writes then you're not going to benefit from ZIL.
 
Thanks. As I said, the zpool consists of Proxmox virtual machine volumes. The virtual machines themselves provide a variety of services, such as databases, continuous build servers, issue trackers, wikis, network monitoring systems (NMS), etc. Basically, a grab bag of servers found in a typical development organization. Therefore, I would expect a rather typical mix of disk I/O -- primarily random access, and a mix of read/write of something like 80% read and 20% write. One last factor to consider, we strictly use synchronous writes.
 
On the main issue, SirDice's opinion seems sensible: If reading the L2ARC or writing the ZIL is no faster than doing the same to the SSDs, they won't help much. They may help some, because operations that would be completely random on the original SSD will become sequential on the two caches (prefetching, write-behind), but the difference won't be large.

The following details are just for my curiosity, and don't really matter to your original question:

Therefore, I would expect a rather typical mix of disk I/O -- primarily random access, and a mix of read/write of something like 80% read and 20% write. One last factor to consider, we strictly use synchronous writes.
Actually, you should measure this sometime, if you care. I bet that the "random" you mention will have a significant fraction of large reads and writes (megabytes and up), and performance wise those get reasonably close to sequential IO. Random-vs-sequential isn't black and white, with 512-byte IOs on one extreme and gigabytes of streaming on the other; in reality, there are lots of intermediate-size IOs. And in a multi-user / multi-tenant environment, even large sequential streams get chopped into smaller random pieces, as multiple tasks compete for the disks. I also doubt your read/write balance; typical Unix machines are actually pretty write-heavy, because reads are often very effectively cached.

How do you adjust all workloads to use only synchronous writes? I didn't know there is one central switch there.
 
I would check with iostat to see what you blocking % is. It may or may not help. Good to know if your current bottleneck is I/O or not.
 
Thanks to everyone's replies so far! I have collected some of the additional requested data.
Code:
ARC Total accesses:                           11.83G
    Cache Hit Ratio:        95.90%    11.34G
    Cache Miss Ratio:        4.10%   484.51M
    Actual Hit Ratio:       91.37%    10.81G

    Data Demand Efficiency:        91.22%    2.90G
    Data Prefetch Efficiency:      73.44%    838.28M

ralphbsz You were right. My 80/20 read/write mix was based on my guess at the virtual machine side, not at the zfs side where clearly most of the reads will be satisfied by the arc rather than getting through to the physical disks.

Here's the result of zpool iostat 2 6:
Code:
                              capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
--------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
ssdpool                     4.36T  2.98T     30    589   182K  7.03M  915us  880us  822us  377us   39us  298us  167us  875us  231us   43ms
  mirror                    2.05T  1.58T     13    196  79.9K  2.82M  777us  942us  703us  379us   44us  403us  103us  829us  134us   40ms
    wwn-0x5002538e40be14b3      -      -      6     98    40K  1.41M  779us  932us  706us  378us   44us  395us  103us  818us  141us   40ms
    wwn-0x5002538e40be1606      -      -      6     97  39.9K  1.41M  774us  952us  700us  380us   44us  410us  103us  840us  128us   40ms
  mirror                    1.15T   731G      8    194  50.8K  2.09M    1ms  826us  918us  367us   36us  224us  221us  890us  293us   45ms
    wwn-0x500a07511c75d788      -      -      4     97  25.2K  1.05M    1ms  822us  927us  366us   36us  220us  226us  893us  308us   45ms
    wwn-0x500a07511c75da09      -      -      4     96  25.6K  1.05M    1ms  831us  909us  369us   36us  229us  217us  888us  278us   45ms
  mirror                    1.17T   708G      8    199  50.9K  2.12M    1ms  873us  912us  384us   35us  272us  240us  909us  294us   46ms
    wwn-0x500a075115b56adf      -      -      4     99  25.6K  1.06M    1ms  849us  912us  381us   36us  248us  237us  897us  275us   46ms
    wwn-0x500a075115b56b2a      -      -      4     99  25.3K  1.06M    1ms  896us  911us  387us   34us  295us  242us  921us  311us   46ms
--------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
So clearly, writes are vastly outnumbering reads.

If I am interpreting the ARC data, a 95.9% hit ratio would imply (to me) that an L2ARC would probably not provide much advantage. Is this correct?

I am not as confident interpreting the iostat data. As I mentioned before, we use synchronous writes ( zfs set sync=always ssdpool). Given that, and the bias toward writes (according to the above stats), does that imply that a separate NVMe ZIL/SLOG would improve performance (based on 3x NVMe write advantage over SATA6)?

Thanks again for all feedback.
 
Back
Top