Solved How to enable autotrim for zfs?

More questions by the way:
1. Where is doc about this?
2. Is it good to enable autotrim? Why is it diabled by default?
 
I wouldn't suggest autotrim depending on the scenario because several reasons, some of them I can be wrong (I don't know how the zfs caches works/interacts with trim, to start)
This leaves me wondering how this should ever relate to caching? After all, trim is just about letting the "layer below" (whatever it is) know about blocks that aren't in use any more. I use that with one VM that needs its own ZFS although it's hosted on a zvol already, that way I can at least make sure that sparse zvol shrinks whenever possible.

Could you maybe elaborate on the other reasons? Maybe there's anything I'm not aware of?

Why is it diabled by default?
My guess (no official statement!) about the reason: There are lots of scenarios where trim wouldn't do any good. In a nutshell whenever the backing storage is a) fixed size and b) not an SSD. So it should be opt-in.
 
Autotrim used to enabled by default, but then ZoL-rebase happened and somehow, suddenly, it wasn't. The best part, this is not actually documented anywhere.
 
Screenshot_2022-08-07_19-29-55.png


 
Autotrim can interfere with disk performance. All it does is tell the SSD controller about the contents of the free list. This allows the controller to prepare pre-zero'd pages (essential for adequate write performance). If your over-provisioning is sufficient, there won't ever be any shortage of pages to pre-zero, and you won't need to trim.

You can turn on autotrim at zpool-create(8) or zpool-import(8) time, or at any time with zpool-set(8).

I over-provision my SSDs manually, and normally just leave trim turned off off, but you can also trim manually:
Code:
[sherman.129] # zpool status -t zroot   
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:04:39 with 0 errors on Thu Jul 28 03:43:26 2022
config:

    NAME                      STATE     READ WRITE CKSUM
    zroot                     ONLINE       0     0     0
      mirror-0                ONLINE       0     0     0
        gpt/236009L240AGN:p3  ONLINE       0     0     0  (untrimmed)
        gpt/410008H400VGN:p3  ONLINE       0     0     0  (untrimmed)

errors: No known data errors

[sherman.130] # zpool trim zroot

[sherman.131] # zpool status -t zroot   
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:04:39 with 0 errors on Thu Jul 28 03:43:26 2022
config:

    NAME                      STATE     READ WRITE CKSUM
    zroot                     ONLINE       0     0     0
      mirror-0                ONLINE       0     0     0
        gpt/236009L240AGN:p3  ONLINE       0     0     0  (4% trimmed, started at Mon Aug  8 07:37:50 2022)
        gpt/410008H400VGN:p3  ONLINE       0     0     0  (4% trimmed, started at Mon Aug  8 07:37:50 2022)

errors: No known data errors

[sherman.133] # zpool status -t zroot   
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:04:39 with 0 errors on Thu Jul 28 03:43:26 2022
config:

    NAME                      STATE     READ WRITE CKSUM
    zroot                     ONLINE       0     0     0
      mirror-0                ONLINE       0     0     0
        gpt/236009L240AGN:p3  ONLINE       0     0     0  (100% trimmed, completed at Mon Aug  8 07:38:21 2022)
        gpt/410008H400VGN:p3  ONLINE       0     0     0  (100% trimmed, completed at Mon Aug  8 07:38:18 2022)

errors: No known data errors
If I wanted to trim, I would do so once a day (or week) from cron, using zpool-trim(8) at a time when the disks were likely to be idle.
 
If your over-provisioning is sufficient, there won't ever be any shortage of pages to pre-zero, and you won't need to trim.
I had one server with 28% ZFS fragmentation. The server has NVME disks in RAID-1. I manually run zpool trim zroot and after it finish it shows 16% fragmentation. The ZFS pool uses 21% capacity. Does manually doing TRIM helps in my case?
 
TRIM (manual or otherwise) can always help if your disk usage is active, and especially if your disk is prone to filling up.

TRIM is all about the Unix kernel telling the SSD controller about unused disk blocks in the kernel's free list. The SSD controller needs to gather unused disk blocks, aggregate them into pages, and pre-zero the pages ready to be re-written. This is essential in order to maintain adequate write performance.

At 21% capacity, you might think that there would be plenty of spare pages for the SSD controller to pre-zero.

However, if you have never enabled autotrim, nor run a manual TRIM, then the SSD controller sees when a disk block is used (written), but never sees it returned to the Unix kernel's free list. So, over time, the entire free list appears to the SSD controller to be "in use" by the kernel.

At this point, the pool of spare pages, available to pre-zero, is limited to that provided by over-provisioning. The kernel's free list, no matter how large, gets to the point where it is "in use and unavailable" as far as the SSD controller is concerned.

Enabling autotrim will cause the kernel to advise the SSD controller when a disk block is returned to the free list. This can slow down the SSD controller, as it is forced to frequently manage very small numbers of disk blocks. And that's doubly bad on (potentially) heavily used media like swap space.

A manual TRIM command will advise the SSD controller of the kernel's entire free list. It's better to do this somewhat occasionally, as the more disk blocks TRIM'd at the same time, the more efficiently they can be processed (and aggregated into pages) by the SSD controller. Be aware that there may be a performance penalty while this happens.

The amount of over-provisioning, and the duty cycle really matter. But, unless your disk I/O is relentless, or you have extra over-provisioning, trimming once a day is usually enough. I'd suggest doing it from cron at a time when you expect the system to be idle:
Code:
[sherman.135] # crontab -l | grep trim
5 3 * * * /sbin/zpool trim zroot
The term "ZFS Fragmentation" refers to "fragmentation of free space". I'll leave others to comment on that, as I'm not competent on the subject (as it might relate to TRIM).

Edit: Over-provisioning has two sources. The first source is the extra capacity the manufacturer provides with the SSD. You can's normally see this. The second source is any part of the SSD that you don't use and make sure is TRIM'd. When I manually over-provision, I allocate 10% of the total capacity of the SSD to an unused partition and run "newfs -E" on that partition.
 
So the free capacity doesn't play any role even if the capacity (usage) is low because after a lot of writes & deletes I have to use TRIM to make the SSD controller "know" the pages free list (where it can immediately write and not to move first the data and then write).

If the SSD controller doesn't have many pages in free list, then the writes will be slower and the disk wear will be higher, right? Are these the only disadvantages if I don't TRIM?

I believe the "ZFS Fragmentation" is related to TRIM:

Here is a server running for 70 days and I never TRIM before:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 3.47T 1.68T 1.79T - - 40% 48% 1.00x ONLINE -
After TRIM:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 3.47T 1.68T 1.79T - - 15% 48% 1.00x ONLINE -
 
So the free capacity doesn't play any role even if the capacity (usage) is low because after a lot of writes & deletes I have to use TRIM to make the SSD controller "know" the pages free list (where it can immediately write and not to move first the data and then write).
File systems generally TRIM'd when initialised. Without any further TRIMing, free capacity will be known to the SSD controller for as long as it has never been used by the kernel. However, once used by the kernel, they have to be TRIM'd for the SSD controller to ever use the again. So, the situation depends on how active your file system is, and if it ever fills up.
If the SSD controller doesn't have many pages in free list, then the writes will be slower and the disk wear will be higher, right? Are these the only disadvantages if I don't TRIM?
You will always have the over-provisioning provided by the manufacturer. It's generally meant to be sufficient for the intended market of the product (i.e. should maintain write speed). Generally "better" SSDs have more over-provisioning. The definition of sufficient depends on your duty cycle. Wear leveling is complex, and I don't think I want to speculate on the specifics.
I believe the "ZFS Fragmentation" is related to TRIM:
Your empirical observations are quite convincing. Fragmentation of free space can be viewed from the perspective of the kernel, or the perspective of the SSD controller. The SSD controllers have a mountain of code, implementing algorithms that I don't understand in detail. So, once again, I don't know enough to be sure of my ground.
 
Back
Top