ZFS: Disable file access time modification

As you know, in * nix systems, each time you access files, the access time to them changes. This every time provokes a write to the media. If you're working on a lot of files at the same time, or if you have an SSD drive, this may not be acceptable.
It is recommended to disable:
Code:
# zfs set atime=off zroot/ROOT/ubuntu-1310-root
What is the effect of disabling file access time changes?
 
What is the effect of disabling file access time changes?
The biggest problem is anything that checks file timestamps won't detect changed files. A few things that immediately come to mind:

Makefile may not see a dependency and fail to build correctly.
Mail checkers may fail to recognize "you've got mail".
Security audit tools may fail to see a changed file.

disable atime is an optimization that may help performance, may help extend SSD life but one should be careful about applying it globally.
Directories/datasets that are read mostly (executables, libraries) one can argue they don't need atime enabled.
/tmp again arguments can be made for atime disabling.
/var/log, user home directories, I would argue against atime disabling.
 
The biggest problem is anything that checks file timestamps won't detect changed files.
It's an access time stamp, not a modification time stamp. The access time stamp is updated when you read a file. The modification time is set when you write to the file.

Security audit tools may fail to see a changed file.
No, audit tools may fail to see which files have been read.
 
You should know your usecases.

Example: for some purposes, I use www/privoxy with TLS interception, which means the proxy must issue "fake" certificates for arbitrary domains signed by some internal CA my browser trusts. It creates them, but never cleans up, so I wrote a script for cron(8) to do the housekeeping, and my script uses atime to decide whether to delete a certificate, because I want to delete those that are not in active use....
 
Atime is useful for some debugging purposes, seeing which files were accessed last, how far a crashing program made it, that sort of thing.
 
I had to do my research here as well. With "atime=on" as well, it only updates atime when the difference is >24 hours or atime is older than mtime.

This indeed seems to be a very interesting compromise for many scenarios where you need atime, but only roughly.

Alain De Vos good tip, but some explanation with it would be nice in general ;-)
 
Most users don't need atime. There are rare use cases that zirias and cracauer explained above. Another use case is "HSM" or hierarchical storage management: Take data that is only being read (or going to be read) rarely, and move it to slower media. Few systems support doing this transparently, without changing the file path or hackery such as soft links. This is not typically used on small servers and desktop systems.

Personally, I would turn atime off (and I have on all the file systems on my machines where it is easily doable).
 
Yep there aren't too many usecases for atime. But that's why I said you must know your usecases ;) If you haven't any, turn it off for good.

In my case described above, atime with the "relatime" option described above would be enough. Right now, I don't care too much cause the pool is on spinning HDDs. But good to know, will try to remember!
 
You should know your usecases.

Example: for some purposes, I use www/privoxy with TLS interception, which means the proxy must issue "fake" certificates for arbitrary domains signed by some internal CA my browser trusts. It creates them, but never cleans up, so I wrote a script for cron(8) to do the housekeeping, and my script uses atime to decide whether to delete a certificate, because I want to delete those that are not in active use....
May be you can store these on a memory-based filesystem like tmpfs(5) and if that is too temporary, you can use a memory disk md(4) backed with a real file!
 
I use www/privoxy with TLS interception, which means the proxy must issue "fake" certificates for arbitrary domains signed by some internal CA my browser trusts. It creates them, but never cleans up, so I wrote a script for cron(8) to do the housekeeping, and my script uses atime to decide whether to delete a certificate, because I want to delete those that are not in active use....

I doubt it would make any significant difference if you used mtime with a sensible time-limit. It's presumably just a cache which privoxy will recreate as needed and it's not as if the files will disappear mid-read. Unless there's a known problem with deleting these files it doesn't seem worth having atime updates.
 
So, are they the same thing? It seems very confusing if "on" isn't full atime support.
According to the documentation I found, atime must be set to on to update atime at all. relatime is an additional option modifying the behavior as described above for fewer updates.

I doubt it would make any significant difference if you used mtime with a sensible time-limit. It's presumably just a cache which privoxy will recreate as needed and it's not as if the files will disappear mid-read. Unless there's a known problem with deleting these files it doesn't seem worth having atime updates.
Well, atime doesn't bother me. The underlying pool is an array of spinning HDDs. Basing the deletion on atime avoids periodically deleting the certificates for domains I use regularly, so they don't have to be recreated over and over.
 
Thanks for the discussion on atime - it's been nice to read and learn a bit.

Happened to check my system and atime was only enabled for /var/mail - while for the remaining it was "off" - is this the normal/default setting?
Another use case is "HSM" or hierarchical storage management: Take data that is only being read (or going to be read) rarely, and move it to slower media. Few systems support doing this transparently, without changing the file path or hackery such as soft links. This is not typically used on small servers and desktop systems.
What kind of systems are there and where can I read more about them?
 
What kind of systems are there and where can I read more about them?
On the commercial side, look at IBM Spectrum Scale, or 3Par (now sold by Hewlett-Packard), for examples. I know NetApp also has a good competing product, but I don't remember the name of it. If you are interested in block storage, the big disk arrays (EMC/Hitachi/IBM) also have that capability, where blocks that hadn't been written or read in a long time can move to cheaper storage tiers. In the cloud, look for "auto tiering" or "intelligent tiering" at the big cloud storage providers. Today, the storage tiers are typically SSD - fast disk - slow disk - sometimes tape or powered-off disk drives.

I started using these systems in the early 80s, on mainframes. In those days, a large mainframe with 500 simultaneously logged in users would have 25 or 50 disks connected to it, each about 600 MB or or 2.5 GiB. Clearly, not enough capacity, so older "files" (we used to call them datasets back then) were automatically migrated to tape. The second tier was on tapes that were physically kept in the computer room, and could be mounted on a tape drive within a few minutes (there were a handful of operators in the computer room at all times). The third tier was tiers kept in storage rooms and basements, which typically took hours to retrieve. The final tier (with week-long access times) were tapes that were kept off-premises in vaults (often abandoned mines). The particular flavor of HSM I used was an in-house-written system with a sense of humor: it was called "FAST", which was an obvious joke: If you try to edit a file, and it takes minutes / hours / days for it to come online, then it is anything other than fast. The implementor was a very friendly systems programmer named Otto Hell (this was in northern Germany). The printed documentation for FAST at the bottom said: "If you have problems with FAST, go to Hell".
 
S
Set up a schedule for zpool scrub to run:

It is recommended that you configure the zpool scrub run schedule so that the system automatically runs the data integrity check process on the ZFS pool. To do this, you can use the sysrc utility and add the appropriate setting to the /etc/rc.conf file.

For example, to set up a monthly zpool scrub on the "tank" pool, run the following commands:

Code:
sysrc zfs_scrub_enable=YES
sysrc zfs_scrub_monthly="1"
sysrc zfs_scrub_args="-s -p 15 tank"

Does it need to be done?
hare:
Reddit Pinterest Tumblr WhatsApp Email
 
All I could think about that always makes sense for SSDs (as well as for "virtual" disks) is the autotrim property for a zpool(8).
I don't autotrim any of my SSDs. Instead I schedule a regular bulk trim of the free list:
Code:
[sherman.135] $ zpool get all zroot | grep trim
zroot  autotrim                       off                            default
[sherman.136] $ sudo crontab -l | grep trim
5 3 2 * * /sbin/zpool trim zroot
Because frequent TRIM commands interfere with ordinary I/O, and can even be optionally ignored by the SSD firmware (particularly for small extents), I regularly bulk TRIM the entire free list at a quiet time. This advises the smallest set of largest free extents to the SSD firmware. These SSDs are well over-provisioned, so the delay in TRIMing does not compromise SSD function (it always has plenty of free pages ready to write).
 
Back
Top