Solved ZFS performance - one mix pool or two separate ones?


I'm currently planning to deploy a set of FreeBSD machines that provide storage over NFS to virtualization hosts (Proxmox). I already have several Proxmox nodes running but currently all VM disks are in the node's local storage. While migration does work it does require me to shut-down a VM temporarily. Also, this does not really provide any HA as live-migration is not an option unless you have a shared file system.

My goal is to setup two FreeBSD machines running HAST which uses ZFS and NFS to provide storage to the virtualization nodes. Right now, each virtualization node has two storage pools: storage_ssd and storage_hdd to provide storage of different capacity & speed. Usually VMs run on the SSD pools while the HDD pool is used for data storage for VMs that provide network drives, backup services and stuff like that.

My question: Does it make sense to keep the same architecture with the two separated pools or would it make sense to have just one storage pool and use the SSDs simply as L2ARC and ZIL/SLOG devices? After all, my only reason to create two different storage pools is providing better performance (for the SSD one). But I feel like there's a chance that ZFS running raidz2 on four SATA/SAS hard disks might be able to provide enough performance to run the VMs of the cache/ZIL if I have enough SSD capacity.
The planned setup currently looks like this for the two FreeBSD HAST nodes (pernode):
  • Xeon E3-1240 v5
  • 32GB DDR4 ECC memory
  • 4x 6TB SATA/SAS HDDs
  • 4x 1TB SATA SSD
  • 2x 512GB NVMe SSD
  • 2x 10 Gbps network interfaces
What's the recommendation here? Two separate pools or throwing everything into one pool and hoping that the L2ARC & ZIL/SLOG can provide enough performance to get pure SSD pool-like performance?
I would strongly discourage the use of HAST with ZFS.

You really do not describe your throughput / capacity needs but the best scenario for this type of situation is to use SSD disks in RAID10 / stripped mirrors. L2ARC could be useful but you need more RAM. I would use SLOG 1/2 the size of RAM mirrored preferably. I would also bond the 2X10Gbps NIC's.
I run a much simpler virtualisation environment using a ZFS server for NFS and KVM clients (on Debian). There's a bonded network connection. Mode 0 (balance-rr) is best. However you need to run it through a switch (i.e. not back-to-back, like I do) if you want to be able to detect link failure.

ZIL does not need to be large -- a few GB is generally enough. However, if you want it to be reliable, it has to survive power loss, so the ZIL hardware needs to have "power loss protection" which means on-board capacitors. The Intel SSD models prefixed with DC have it.

There's really no substitute for benchmarking your options. For instance, you will be truly appalled at the performance cost of running NFS synchronous. But I believe that VMware won't do it any other way.

I have a plan to look into iSCSI storage for the VM clients, but have not done so to date.
gkontos Could you elaborate why you'd discourage the use of HAST with ZFS?

I did some more investigation and it seems like running CEPH would make sense in my case. Of course the underlying system would still use ZFS but at least I don't use HAST/NFS in that case. CEPH seems like a good, purpose build solution.
gkontos Could you elaborate why you'd discourage the use of HAST with ZFS?

HAST is a RAID1 architecture on a disk by disk basis. It also does not expose the physical disks to ZFS. So, if you loose a disk on Server1 ZFS will not be aware of that. Why? Because HAST will start using the mirrored disk from Server2.