ZFS System slowed down with not-so-low free space?

Hi,

I've had some general performance issues with a server, made apparent mainly by web pages loading slowly and inconsistently. Certain MySQL queries were very slow, and even certain directory listings. It seemed to get resolved either by chance, or by my extending the main zpool. The only thing is, I thought I had plenty of free space.

Before and after:
Diff:
 # zpool list
 NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
 zroot   196G   174G  21.8G        -         -    79%    88%  1.00x  ONLINE  -
-zuser   850G   772G  77.6G        -         -    89%    90%  1.00x  ONLINE  -
+zuser   900G   772G   128G        -         -    84%    85%  1.00x  ONLINE  -

I found some old user information regarding needing a certain amount of free space for ZFS to function well, but where can I find something more definitive, if at all true?

This is a virtual machine, for the record.
 
General rule of thumb is to not fill a production pool[*] over ~80% capacity. Over 80% performance will degrade drastically and above 90% recovering will get really painful as ZFS doesn't have enough space for housekeeping and fragmentation gets increasingly worse, as is the case for your pools.
The high fragmentation will also still have performance penalties *after* adding more space, and depending on the workload can take considerable time to improve. The very simplified explanation to this being: ZFS can only improve fragmentation when writing blocks, so everything thats already heavily fragmented on disk and only read will stay in that fragmented stage. (resilvering to new providers could be used to improve fragmentation to some extent)



edit:
[*] for 'cold' backup pools or something like one-off data migrations where you want to squeeze those last few GB on some disks, you can get away with filling them to almost 100%, but performance will be really bad.
 
Thanks. I will make sure to keep more free space available.

Bunch of "old" snapshots lingering? That can eat up space without noticing.

Perhaps old upgrade/update snapshots from freebsd-update(8)? Check bectl list

I should clarify; I thought that those 77 GBs were plenty for the system to function at full capacity, but that was before I learnt that ZFS gets crippled much earlier.

I do have a bunch of snapshots, but none that are unaccounted for. Could probably cut down on them a bit.
 
I usually check with:
# zfs list -H -o name -t snapshot
When I’m really in a pickle with very low space left (usually in VMs for testing only):
# zfs list -H -o name -t snapshot | xargs zfs destroy -R
 
77 GB free should be plenty.
I don't disagree, but that was at 89% full. Does the code use the Percent Free value in calculations or the raw numbers? Kind of like UFS and the reserved percentage for root user. If you are hitting that as a normal user, do you seen different behavior/performance?
 
Following what sko said I applied this advice on my pools too (I've read it somewhere that I can't remember now) just because I find it sometimes difficult to quantify how many space zfs uses.
Code:
# create a special dataset
doas zfs create mypool/reserved

# allocate 20% of the disk space to that dataset
doas set reservation=xxGB mypool/reserved

# check the result
doas get reservation mypool
doas zfs list -r mypool

from the man page zfsprops

reservation=size|none
The minimum amount of space guaranteed to a dataset and its descen-
dants. When the amount of space used is below this value, the
dataset is treated as if it were taking up the amount of space speci-
fied by its reservation. Reservations are accounted for in the par-
ent datasets' space used, and count against the parent datasets' quo-
tas and reservations.
 
77 GB free should be plenty.
given the high fragmentation of the pool, those 77GB are scattered across hundreds of thousands (or even millions) of small, non-adjacent blocks. larger blocks that need to be written (new data or simply during housekeeping) therefore need to be fragmented. This not only takes more time and IOPS to write the actual data, it also increases the amount of metadata.
So the effect is very much accelerated the more a pool gets filled up and due to the fragmentation, the absolute value of free space cannot be used as an indicator. Those 80% are a rough rule of thumb, but they depend on the actual fragmentation of the pool. Heavily fragmented pools with lots of large-block data might suffer from degraded performance much earlier than pools with low fragmentation and/or data that tendentially creates smaller block sizes.
The main driver for fragmentation are usually snapshots of datasets containing data which changes often but with small deltas. Cleaning up snapshots on such datasets can decrease fragmentation significantly.

The effects may vary between pool/vdev layouts (e.g. raidz, as usual, is much more prone to performance degradation) and especially with the type of the providers - fast PCIe connected SSDs might perform well enough up until very high percentages of fragmentation and "full" levels; spinning rust with their abysmally low IOPS capabilities will suffer performance degradation much earlier and *a lot* worse up to almost catatonic pool behavior. (been there once, not even got a t-shirt...)
 
A few hundred GB on the host. I expanded the VM from 32 to 40 before expanding the disk, because it appeared to be a memory issue at the time. Currently sitting at 18 GB free as per top.
It is unlikely in the extreme that fill percentage causes slow directory listings.
Yeah, I don't know what that was about. I only have a very limited sample size, but when trying to list the contents of /var/db/mysql (which didn't have a huge amount of files or anything, some 200) using ls -l, it would list roughly one terminal page worth of files, freeze for a couple of seconds before continuing, and then freeze one more time. It behaved in the same way for the few attempts I made for the duration of the issues. I think specifically it froze before listing the most recent MySQL bin log file, which might not have been a coincidence? The same was not true for a different folder, which had fewer files, but enough to exceed that rough terminal page.
 
Yeah, I don't know what that was about. I only have a very limited sample size, but when trying to list the contents of /var/db/mysql (which didn't have a huge amount of files or anything, some 200) using ls -l, it would list roughly one terminal page worth of files, freeze for a couple of seconds before continuing, and then freeze one more time. It behaved in the same way for the few attempts I made for the duration of the issues. I think specifically it froze before listing the most recent MySQL bin log file, which might not have been a coincidence? The same was not true for a different folder, which had fewer files, but enough to exceed that rough terminal page.

Hm.

Can you describe the setup some more? HDs or SSDs? What kind of filesystem on the host? What kind of hypervisor and host OS?

I no longer think that ZFS is responsible here, I bet that directory listing is waiting for delivery from the host.
 
How much RAM do you have in the VM and on the host?
true, such operations should still be fast, regardless of the fill and fragmentation state of the pool...

What is the rest of the setup? Are you running ZFS on top of ZFS? If you are using bhyve: what storage provider?
OTOH it sounds more like 'normal' memory exhaustion than a filesystem problem (except if zfs is hogging a lot of CPU and memory for increased housekeeping efforts due to the almost full pool)


edit:
IIRC there also were several unresolved issues with mySQL on ZFS; so maybe mariaDB is worth a try?
 
This is running on an SSD only vSphere cluster. I'm not entirely sure how it works with file systems there. I'm pretty sure I've not had this issue with this server before, but I've probably made sure to extend the storage earlier before.

There has been some swap_pager_getswapspace: failed messages lately, which I've been trying to get to bottom of, but it's been difficult to monitor. It only happens for a few minutes at a time, and I've not seen any clear indication of it being a memory or ARC size issue, like most solutions have alluded to. The issue was not alleviated by granting the machine another 8 GB of memory, in any case.

I have a number of other virtual machines in the cluster (including a number of very similar FreeBSD setups), and I've not noticed anything similar anywhere. At least not conclusively. I have one system which has been exhibiting similar behaviour (notably the swap_pager_getswapspace: failed messages), while also being on the low end (again, percentage wise) of free pool space, but I've yet to investigate the matter more thoroughly.
 
Back
Top