ZFS What value of sync is better for a swap on zvol?

sync=disabled or sync=always; logbias=throughput?
I saw
Fixit# zfs create -V 2G -o org.freebsd:swap=on -o checksum=off -o compression=off -o dedup=off -o sync=disabled -o primarycache=none zroot/swap
in the freebsd wiki.
https://wiki.freebsd.org/RootOnZFS#ZFS_Swap_Volume
and I also saw $ zfs create -V 4G -b $(getconf PAGESIZE) \
-o logbias=throughput -o sync=always \
-o primarycache=metadata \
-o com.sun:auto-snapshot=false rpool/swap

in the FAQ of openzfsopenzfs, but it's for linux.
https://openzfs.github.io/openzfs-d....html#using-a-zvol-for-a-swap-device-on-linux
 
no log, no sync, no cache, no checksum - if the system is already under memory pressure you just want to get that data on there as fast as possible with the lowest overhead - disks are already abysmally slow compared to RAM.
Yes, also NVMe/PCIe is already an order of magnitude slower, let alone SAS/SATA SSDs or - god beware- spinning rust (just don't use that for swap!)

compression is debateable - on a halfway decent CPU compression usually gives better throughput as those slow disks have to deal with less data. I'd say if this system isn't some glorified doorstopper nicked from a museum junkyard, always use compression
 
I suspect that "-o logbias=throughput -o sync=always" is correct - if only because it's not the sort of thing that someone would suggest if their knowledge were shallow.

I presume the point of it is to speed-up the release of buffers, sync=disabled tells ZFS that there's no hurry.
 
Avoid swap on zpool/zfs as that may lead to deadlocks. Use a disk partition or a separate disk for swap. Also, if possible adjust your workload (such as "make -j <N>") to avoid swapping. Though this is difficult. Even with 64GB machine "make -j6 buildworld" (on a 8 core, 16 hypeprthreads) my system swap some as llvm/lld are hogs.
 
AI,
sync=disabled
Maximize write speed and bypass the ZIL
Bypasses the ZIL entirely; data is committed in normal transaction groups (TXGs).
Highly stable native integration; very low risk of deadlock.
 
If it's mainly used for hibernation, how should the value of sync be set? Although hibernation is currently unavailable on my computer, it might come in handy someday. Maybe the 15.1
 
My 5-cent take. Hibernation dangerous. Then you want most protection against data corruption. And that is sync always. Data protection & performance always work against eachother.
 
For hibernating, saving the entire state of memory would be last thing the system has to do before shutting down so i suppose the sync state doesn’t matter. It has to ensure everything is synced. But even here, restoring from a partition would be far easier and likely to be less error prone.
 
AI,
sync=disabled
Maximize write speed and bypass the ZIL
Bypasses the ZIL entirely; data is committed in normal transaction groups (TXGs).
Highly stable native integration; very low risk of deadlock.
The point of sync=disabled is to lie to code that requires an assurance that data has been committed to non-volatile storage, and to do buffered asynchronous writes instead. Lying to the VM system achieves nothing, and buffering swap writes is counterproductive.

What I think this is about, is that -o logbias=throughput -o sync=always , causes swap writes to be written directly to a txg, but as synchronous writes they close their txg allowing an immediate write to the pool. This isn't likely to have much effect under thrashing, but it may help the VM system to page more smoothly.
 
I don't see any point to that.
Agreed.

Below is a script I use to add a swap, should I need it, to a zpool. This is typically used on machines with internal SSDs, where I put swap on an external device or spinning disk to avoid unnecessary writes to SSD.

sh:
#!/bin/sh -

# -b 4096 \
zfs create -b 16384 \
        -o org.freebsd:swap=on \
        -o checksum=off \
        -o primarycache=metadata \
        -o compress=off \
        -o dedup=off \
        -o sync=disabled \
        -o snapshot_limit=0 \
        -V $2 $1/SWAPVOL

There's no sense caching swap to RAM. Though some Linux systems I maintain at $JOB use a compressed RAM disk for swap, similar to what one might use ESTOR on an IBM mainframe for. But that's a separate topic.
 
That's an 8 year old bug against Linux, not FreeBSD.
While that is probably textually correct, given the reported problems, it has been mentioned, by several OpenZFS developers, that a solution would require some fundamental changes/code reworks to alleviate the problem. IIUC, it is nowhere suggested that this is actually a problem limited to Linux[1]; it has been suggested that FreeBSD might very well be vulnerable in the same manner, Prakash Surya on Sep 17, 2018:
I'm not an expert in this area of the code, but I think that swap on ZVOL is inherently unreliable due to writes to the swap ZVOL having to go through the normal TXG sync and ZIO write paths, which can require lots of memory allocations by design (and these memory allocations can stall due to a low memory situation). I believe this to be true for swap on ZVOL for illumos, as well as Linux, and presumably FreeBSD too (although I have no experience using it on FreeBSD, so I could be wrong).
[...]
I'll note his reservations about his familiarity with the particular ZFS code sections affected by this problem, but I see no rebuttals or denials following.

If FreeBSD does not suffer from this "ZVOL as swap" problem in any way, then, zfs-create(8):
Code:
   ZFS for Swap
       Swapping	to a ZFS volume	is prone to deadlock and not recommended.  See
       OpenZFS FAQ[2].
would be incorrect, or at least not based on any concrete (FreeBSD) data. IF that is indeed incorrect, then, IMHO, zfs-create(8) should reflect that and it should be amended.

With respect to using either sync=disabled or sync=always, in addition the the "ZVOL as swap" problem, the advice of the OpenZFS FAQ is diametrically opposed to -outdated- FreeBSD ZFS Wiki - 2.2. ZFS Swap Volume. My hunch is that sync=always on Linux reduces the possibility of a deadlock somewhat; note however, where ZFS writes would normally be accumulated and saved together (asynchronous writes), using sync=always is devastating for throughput.

IIUC, especially when using a ZVOL as swap the deadlock issue can arise when there is extreme memory pressure where the ZFS ARC needs to give up part of its occupied RAM as-fast-as-possible to free it. Ensuring matching of the ZVOL's blocksize to the OS of by means of
Code:
-b $(getconf PAGESIZE)
[3] and not using compression play a part in latency minimization; under extreme memory pressure compression (even though not CPU intensive on modern CPU's for the default ZFS compression) requires some additional RAM that adds to that extreme memory pressure which we're trying to avoid. When the IO for your zpool is already saturised, placing your swap on the same disk doesn't help much in lowering latency for your swap data that need to be written out to disk as quickly as possible.

Awaiting a final solution to Swap deadlock in 0.7.9 #7734, I'd look at deploying ZFS DIRECT IO that bypasses the ARC:
To use ZFS Direct IO, you'll need at least OpenZFS - ZFS 2.4.0. For FreeBSD that means using the 15 branch, FreeBSD 15.0-RELEASE Release Notes contains ZFS 2.4.0 - rc4

___
[1] Given the structure of OpenZFS code development, OpenZFS - "Code flow", it is, in my view, more likely than not that all other OS-es (illumos included) are vulnerable to this problem.
[2] Using a zvol for a swap device on Linux
[3] When not matched, this can easily lead to write amplification, see for example Tuning OpenZFS by Allan Jude and Michael Lucas.
 
That's an 8 year old bug against Linux, not FreeBSD.
From zfs-create(8):
ZFS for Swap
Swapping to a ZFS volume is prone to deadlock and not recommended. See
OpenZFS FAQ.

Swapping to a file on a ZFS filesystem is not supported.

Just for kicks I tried this on a 8GB freebsd-15-stable VM:
# dd < /dev/random > swapfile bs=1m count=2048
# swapon swapfile
swapon: swapfile: Block device required
# zfs create -V 2G zroot/swap
# swapon /dev/zvol/zroot/swap
# swapinfo -k
Device 1K-blocks Used Avail Capacity
/dev/zvol/zroot/swap 2097152 0 2097152 0%
# stress-ng --vm 12 --vm-keep --vm-bytes 11G --timeout 1h


Admitedly, 11GB > 8GB virtmem + 2G swap because I wanted to see what happens, while watching top -ocpu in another window. The VM died after a few minutes. It should've killed a random process instead of committing harakiri but so it goes!

[edit:] No it didn't die. Just the ssh connection died. Connecting to its console, I see a continuous stream of messages like this:
swp_pager_getswapspace(10): failed
swap_pager: out of swap space
swp_pager_getswapspace(30): failed
swp_pager_getswapspace(6): failed
swap_pager: out of swap space
swp_pager_getswapspace(8): failed

I can login but the login shell is killed soon. Unknown whether it will eventually recover....

[update2:]
Eventually I was able to login & kill the stress-ng control process and the system was responsive again.
May be there are other argument values for this test or other tests that would cause an actual deadlock.

The lesson is any is to not rely on swap except for dealing with temporary overflows and if you must have it, make it large enough.
 
The lesson is any is to not rely on swap except for dealing with temporary overflows and if you must have it, make it large enough.

Overrunning the amount of swapspace available is an entirely different problem than hanging during normal (non-depletion) swap usage.
 
Back
Top