Solved Binaries get rewritten at jail start with unionfs

I am working on a solution for mass jails deployment based on ZFS datasets and unionfs.
A jail template is created by extracting base archive into the dataset. Then the dataset is mounted read-only in all the jails' roots and each jail has its own lean dataset, where only the deltas are stored.

The goal is, when an update is applied, only the template is modified and the jails automatically get the updates without duplicating the files for each jail.

My problem is, when I start a jail with initially empty top unionfs layer (R/W), many standard files in /bin, /lib, /sbin etc. get rewritten with the current timestamp but the sizes are identical. This bloats the jail's dataset and will probably break the update, when I update the template, the older files from the jails will hide what is the latest version in the lower unionfs layer.

Why do the binary files get rewritten? This happens only for a handful of them, for example /bin/sh, /bin/cat, /bin/mkdir etc. In total, it's about 7 MB and ~300 files.

Here is a little visual help about the situation:

level 0 ------ jail*/root, R/W -------------- : should contain only the changed files in each jail (but the rewritten files in /bin land here and hide the respective files in level 1).
level 1 ------ template/root, R/O -------------- : contains the system base, completely generic, to be updated regularly


I know about the possibility to mount read-only directories via nullfs, but this option looks quite complicated compared to the simple layering with unionfs, which should work in principle.
 
Here is my configuration:
Bash:
# fstab
/jails/template/root    /jails/overlay/mnt       unionfs         ro                                                                      0       0
/jails/overlay/root     /jails/overlay/mnt       unionfs         rw,noatime,cow,max_files=32768,allow_other,use_ino,suid,nonempty        1       0


#jail.conf
overlay {
        host.hostname = "overlay";
        path = "/jails/overlay/mnt";
        exec.clean;
        
        exec.system_user = "root";
        exec.jail_user = "root";

        vnet;
        vnet.interface = "";

        allow.raw_sockets;
        mount.devfs;
        devfs_ruleset="4";
        
        mount.fstab = "$path/../fstab";
        
        exec.consolelog = "$path/../log/jail_${name}_console.log";
        
        # hooks only create/destroy network interfaces via ifconfig
        exec.prestart  += "$path/../exe/hooks/prestart.sh  $name";
        exec.poststart += "$path/../exe/hooks/poststart.sh $name";
        exec.prestop   += "$path/../exe/hooks/prestop.sh   $name";
        exec.poststop  += "$path/../exe/hooks/poststop.sh  $name";
        
        exec.start += "/bin/sh -x /etc/rc";
        exec.stop  =  "/bin/sh -x /etc/rc.shutdown";
}
 
Another interesting observation. I started setting template's ZFS dataset property "readonly=on" and all of a sudden those files don't get recreated in the jails anymore.
Why would the readonly property of the template's dataset have this impact on another file system mounted on top of it R/W via unionfs?
 
Code:
BUGS
     THIS FILE SYSTEM TYPE IS NOT YET FULLY SUPPORTED (READ: IT DOESN'T WORK)
     AND USING IT MAY, IN FACT, DESTROY DATA ON YOUR SYSTEM.  USE AT YOUR OWN
     RISK.  BEWARE OF DOG.  SLIPPERY WHEN WET.  BATTERIES NOT INCLUDED.
From mount_unionfs(8).
 
I actually use sysutils/fusefs-unionfs, unionfs(8). This manual page does not warn about missing support. There is a known issue about disabling copy-on-write, but I enable it, so it should in theory work.
As mentioned above, it actually works as expected, but when I set the "readonly" ZFS property of the lower layer to "on". I am wondering why the file system behaves like this.

Edit:
Wow, now I took a look at my fstab again, and you're completely right! It actually goes back to mount_unionfs. That's probably what happens.
I initially used a script for mounting (fusefs-unionfs) and then switched to fstab.
Thanks, I'll check if the problem persists when I use fusefs-unionfs via the command line instead of fstab.
 
You MUST mount the unionfs layer with NOATIME option, and so the system will no attempt to update the access time.
This is the reason why it rewrites the binaries.

Setting the underlying layer in read only mode doesn't block the access time update process... it will just block the writting process
With unionfs, the system find a writtable layer.... so it writes on it.

As SirDice said, unionfs is not totally reliable.

I have also tested unionfs in jail in the past, I have finally dropped.... exactly for the reason I explain here. Because with unionfs it is impossible to force read only mode
For example, if I want to "lock" a file, with unionfs it doesn't work anymore because filesystem only take into account the writtable flag of the upper layer

To lock a given file, you must create manually a new version of the file on the upper layer that will mask the underlying version of the file, and then use "chmod" which will apply to the file on the upper unionfs layer. So this reduce the interest of unionfs.

Unionfs implementation has some limitations "by design".
Many people have tested before you unionfs in jail, you are not the first, many people have finally dropped this idea because unionfs brings more problems than it solves ones.

Unionfs may be used in some limited scenario as merging some targeted user's directories as applications list....
 
You MUST mount the unionfs layer with NOATIME option, and so the system will no attempt to update the access time.
This is the reason why it rewrites the binaries.

Setting the underlying layer in read only mode doesn't block the access time update process... it will just block the writting process
With unionfs, the system find a writtable layer.... so it writes on it.

A SirDice said, unionfs is not totally realiable.

I have also tested unionfs in jail in the past, I have finally dropped.... exactly for the reason I explain here. Because with unionfs it is impossible to force readonly mode
For example, if I want to "lock" a file, with unionfs it doesn't work anymore because filesystem only take into account the writtable flag of the upper layer

To lock a given file, you must create manually a new version of the file on the upper layer that will mask the underlying version of the file, and then use "chmod" which will apply to the file on the upper unionfs layer. So this reduce the interest of unionfs.

Unionfs implementation has some limitations "by design".
Many people have tested before you unionfs in jail, you are not the first, you have not invented the wheel, many people have finally dropped this idea because unionfs brings more problems than it solves ones.

Unionfs may be used in some limited scenario as merging some targeted user's directories as applications list....
Thank you so much for the detailed explanation! This makes a lot of sense, I'll try it out and report back.
 
You MUST mount the unionfs layer with NOATIME option, and so the system will no attempt to update the access time.
This is the reason why it rewrites the binaries.
I am confirming that mounting the lower layer with NOATIME works. This solves my problem, thanks!
 
As SirDice said, unionfs is not totally reliable.

I have also tested unionfs in jail in the past, I have finally dropped.... exactly for the reason I explain here. Because with unionfs it is impossible to force read only mode
For example, if I want to "lock" a file, with unionfs it doesn't work anymore because filesystem only take into account the writtable flag of the upper layer

To lock a given file, you must create manually a new version of the file on the upper layer that will mask the underlying version of the file, and then use "chmod" which will apply to the file on the upper unionfs layer. So this reduce the interest of unionfs.

Unionfs implementation has some limitations "by design".
Many people have tested before you unionfs in jail, you are not the first, many people have finally dropped this idea because unionfs brings more problems than it solves ones.

Alright, I am convinced that unionfs is probably not the perfect solution for me. And I almost buy the argument, but I have a slight problem that prevents me from going all the way.
Let's consider this scenario:
  • I mount the template via nullfs into the jail root.
  • I mount /etc, /var, /usr/local/etc and /home as read-write. This is part of the jail's dataset.

Problem: After an update in the template, how do I make sure the updated configuration files are merged into the jails in /etc? The jails will be stuck with their copy of /etc, which will never be updated.
How did you solve this issue back then?
Should I use unionfs then only for /etc? i.e. should I mount $TEMPLATE/root/etc and $JAIL/root/etc on top of each other via unionfs?
 
Back
Top