Seeking Advice: Thin Jail + unionfs running with workarounds

codeedog · Mar 22, 2024

Attempting to create a thin jail on a Raspberry Pi using 13-2-REL, I had trouble with the handbook's skeleton formulation 17.5.2. Creating a Thin Jail Using NullFS. While searching for solutions, I stumbled upon Scott Robbins's method for creating thin jails using unionfs(8) which worked but failed with a no tty error when I tried to ssh into it.

I was able to workaround this error by:

~~Adding add path 'ttyv*' unhide to [devfsrules_unhide_login=3] in /etc/defaults/devfs.rules and reboot~~ I was wrong about this (see here).
Using ls to tickle the device file system before jail start (in jail.conf: exec.prestart += "ls /usr/local/jails/containers/${name}_/dev") which tricks the mount system into including the devices (/dev/ttyv*)

ssh worked correctly after that. The idea for (2) came from these forum posts — Thin jail woes and devfs not mounting in nullfs jail — and this workaround for the still unfixed bug 186360.

Furthermore, this comment under the "Thin jail woes" forum post advises against using unionfs(8) due to warnings about bugs in the man page, along with some scary notes elsewhere in the discussion about the possibility of a corrupted file system. However, the current version of the man page no longer contains a bug warning!

My questions:

Is unionfs now stable and safe to use in general?
Is unionfs now stable and safe to use for thin jails?
Is the device mount bug indicative of continued unionfs problems or just a difficult to fix decade old bug?
Should the thin jail section of the handbook be updated to reflect the simpler method of thin jail set up (vs. the more complex skeleton links that are created)? Or, wait until Bug 186360 is fixed so no one has to workaround the ssh problem?
Should Raspberry Pi base image include an entry for /dev/ttyv? I know very little about the intricacies of devices or if ttyv* is the common terminal on the Pi.
Given that touching the device system somehow magically works around the mount problem, should unionfs be used for thin jail mounts?

I can provide configuration file entries, if that would be helpful.
Thanks.

sh:

$ # Using a Raspberry Pi 4b (2MB)
$ uname -a
FreeBSD pc-base 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC arm64

zirias@ · Mar 22, 2024

I would never use unionfs, because I see semantic issues with the idea, like say the bottom fs has a directory /foo/bar/baz full of stuff, and the top fs has the file /foo/bar/something, therefore also a directory /foo/bar, then should the stuff in /foo/bar/baz exist in the unionfs or not? And there's no right or wrong answer, unfortunately. How to handle deleting files that exist in the bottom fs is also a somewhat fishy question.

All my jails here are "thin", sharing a common base as a read-only nullfs mount, although structured a bit differently than in the handbook. I never experienced any issues with devfs, and have no idea how you could actually trigger this ancient bug at all.

Just as an example, this is what my "file server" jail (now including nfsd) looks like:

Code:

$ cat /etc/jail.conf
exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;
mount.devfs;
mount.fstab = "/var/jail/${name}.fstab";
host.hostname = "${name}.<something>";
allow.noset_hostname;
path = "/var/jail/${name}/jail";
[...]
files {
     vnet = new;
     vnet.interface = epair4b;
     allow.mount;
     allow.mount.zfs;
     allow.nfsd;
     devfs_ruleset = 102;
     enforce_statfs=1;
     exec.created="zfs jail files zroot/netshares";
     exec.release="zfs unjail files zroot/netshares";
     exec.start="/usr/bin/nice -n -20 /bin/sh /etc/rc";
}
[...]
$ zfs list | grep jail/files
zroot/jail/files                       997M  6.14T      140K  /var/jail/files
zroot/jail/files/etc                  5.42M  6.14T     2.65M  /var/jail/files/etc
zroot/jail/files/local                 767M  6.14T      421M  /var/jail/files/local
zroot/jail/files/root                 1.06M  6.14T      227K  /var/jail/files/root
zroot/jail/files/tmp                   721K  6.14T      192K  /var/jail/files/tmp
zroot/jail/files/var                   223M  6.14T      109M  /var/jail/files/var
$ cat /var/jail/files.fstab
/var/jail/.release/13.3/.zfs/snapshot/p0 /var/jail/files/jail nullfs ro 0 0
/var/jail/files/etc /var/jail/files/jail/etc nullfs rw 0 0
/var/jail/files/root /var/jail/files/jail/root nullfs rw 0 0
/var/jail/files/local /var/jail/files/jail/usr/local nullfs rw 0 0
/var/jail/files/tmp /var/jail/files/jail/tmp nullfs rw 0 0
/var/jail/files/var /var/jail/files/jail/var nullfs rw 0 0

In a nutshell, the base of the jail is just a read-only null-mounted ZFS snapshot.

codeedog · Mar 22, 2024

Thank you. The bug was triggered in the unionfs setup and not the original skeleton setup from the book. The problem with the latter was related to a file permission read on /etc/passwd or something. Later, I accidentally locked myself out of the Pi and don't have a kbd or monitor with me, so I blew away the micro SD card and re-etched a new OS install, meaning I lost all of my work. I think I had already solved it once, but couldn't recall how, so I was stuck when redoing the skeleton linkage method and couldn't make progress, so I tried the unionfs thing.

I hear you on the problems with unionfs. May choose to live with it for the moment, or try to redo to the skeleton structure or analyze your write up, above. Problem is that I'm not running zfs on the Pi. I will need to get a console cord and connect to it the next time I boot from scratch and try zfs. It's on my todo list to learn that file system.

I believe the ttyv change needs to happen with /etc/defaults/devfs.rules regardless as I'm pretty sure that's a bug.

zirias@ · Mar 22, 2024

Well, regarding ZFS, what you mainly "lose" here without it is the ability to use a snapshot for the read-only "base" mounts ... which is pretty nice, you can update this base installation without affecting running jails, you could even have jails run on different snapshots. But one can certainly live without that as well. So, you can setup pretty much the same structure with UFS.

A FreeBSD userland has a pretty clear structure with few well-known locations that need write access (minus "special" filesystems like devfs):

/usr/local (only) for installing ports/packages
/etc for base system configuration
/root, /home and /tmp for "normal" usage by normal users and root
/var for installing software and some areas also for running services

... and that's it. So that's easy enough to provide with appropriate rw null mounts, therefore I would really avoid unionfs here. (edit: forgot one, haha

)

codeedog · Mar 22, 2024

Thank you for the advice. Much appreciated.

If I find the ttyv issue is still present on the Pi base image with a regular UFS structure, I intend to file that as a bug as I believe it's independent of the unionfs problem.

Seems like zfs and snapshots have finessed the unionfs.

zirias@ · Mar 22, 2024

codeedog said:
I believe the ttyv change needs to happen with /etc/defaults/devfs.rules regardless as I'm pretty sure that's a bug.

Regarding this, no, they shouldn't be present in a jail, those are the tty devices for the actual virtual consoles of the (host) machine.

SSH won't need them, it uses a pseudo-terminal (pts(4)). See e.g. here in my "fileserver" jail:

Code:

files# w
 6:17PM  up 3 days,  6:56, 1 user, load averages: 1.78, 1.98, 1.93
USER       TTY      FROM    LOGIN@  IDLE WHAT
root       pts/0    nexus   6:17PM     - w

All that's needed for that is /dev/pts which is available in jails by default.

Code:

files# ls /dev
fd      ptmx    random  stdin   urandom zfs
null    pts     stderr  stdout  zero

(I manually added /dev/zfs here, so the jail can manage its own ZFS datasets, everything else is default)

codeedog · Mar 22, 2024

zirias@ said:
Regarding this, no, they shouldn't be present in a jail, those are the tty devices for the actual virtual consoles of the (host) machine.

SSH won't need them, it uses a pseudo-terminal (pts(4)). See e.g. here in my "fileserver" jail:

Code:

files# w 6:17PM up 3 days, 6:56, 1 user, load averages: 1.78, 1.98, 1.93 USER TTY FROM LOGIN@ IDLE WHAT root pts/0 nexus 6:17PM - w

All that's needed for that is /dev/pts which is available in jails by default.

Code:

files# ls /dev fd ptmx random stdin urandom zfs null pts stderr stdout zero

(I manually added /dev/zfs here, so the jail can manage its own ZFS datasets, everything else is default)

I'm 95% certain I could not log into the jails on the RPi via ssh without them present. If I find it continues with other thin jail methods, I'll come looking for advice or file a bug after a thorough examination and reproduction of results.

Nope, I was wrong.

From current setup with the two aforementioned workarounds:

Code:

andy@www_jail:~ % w
 6:37AM  up 8 mins, 1 user, load averages: 0.05, 0.23, 0.19
USER       TTY      FROM         LOGIN@  IDLE WHAT
andy       pts/1    10.0.0.226   6:37AM     - w

Snipping out the mod in /etc/defaults/devfs.rules

Code:

add path 'ttyo*' unhide
#add path 'ttyv*' unhide
add path 'ttyL*' unhide

ssh into jail post reboot:

Code:

andy@www_jail:~ % w
 1:42PM  up 2 mins, 1 user, load averages: 0.84, 0.55, 0.24
USER       TTY      FROM         LOGIN@  IDLE WHAT
andy       pts/1    10.0.0.226   1:41PM     - w

OlCe · Mar 26, 2024

Hello,

codeedog said:
My questions:

Is unionfs now stable and safe to use in general?

In a nutshell, currently, no. Its state has been progressing significantly in the past years thanks to the continuous work of Jason Harmening (jah@), who has run stress tests on it and fixed the encountered bugs. However, not all of them have been fixed yet, and the current implementation has more fundamental problems at an architectural level, causing unwanted and hard to fix behavior. That's for the bad news.

The good news, however, are that I'm going to work on a number of projects, including a revamp of unionfs, under the sponsorship of the FreeBSD Foundation. So you should expect that the current unfortunate situation, which has lasted for years, is finally going to be lifted. That said, this is not going to happen overnight, we are talking about a multiple months project (if we do all that I have proposed, it is projected to last a little more than a year). And I may not even start with that one.

codeedog said:
However, the current version of the man page no longer contains a bug warning!

It still does, at the same place. The passages that were removed were just "humorous" warnings, but the main ones are still in place (see 6659516b1a47).

codeedog said:
Is the device mount bug indicative of continued unionfs problems or just a difficult to fix decade old bug?

I'll have to test that to confirm. But, if my memory serves well, I suspect I saw this problem reported for nullfs alone a while ago. Provided it's actually the case, it obviously wouldn't be unionfs-specific. If you have the time and opportunity to test with nullfs yourself in the meantime, that would be an interesting data point.

zirias@ said:
I would never use unionfs, because I see semantic issues with the idea, like say the bottom fs has a directory /foo/bar/baz full of stuff, and the top fs has the file /foo/bar/something, therefore also a directory /foo/bar, then should the stuff in /foo/bar/baz exist in the unionfs or not? And there's no right or wrong answer, unfortunately. How to handle deleting files that exist in the bottom fs is also a somewhat fishy question.

I'm sorry but this is completely untrue. The unionfs semantics, as seen from the users, are perfectly well defined, and always have been. In the case you described, if there is a file /foo/bar/something, it simply appears in the union view along with /foo/bar/baz and its content. Only the identical prefix matters to determine if there's an override ("shadowing" is the proper term), and since that prefix, /foo/bar, designates a directory both in the top and bottom layers (and the corresponding upper layer's directory is not opaque), their content are simply merged. Non-directory file deletion is handled by creating whiteouts (this is even not necessary if the file doesn't exist in the bottom layer, in which case just removing that of the upper layer is enough; the default behavior is configurable at mount). Directory deletion happens by creating a whiteout, and then, if the old directory name is reused for a new directory, substituting it with an opaque directory. All these have been devised ~30 years ago.

Thanks and regards.

zirias@ · Mar 26, 2024

OlCe first of all, I didn't mean to "bash" unionfs or something like that.

So even if all these things are perfectly well-defined in the scope of unionfs(5) (and as some of the behavior depends on mount options, I'm not so sure about that, but, close enough), it's not well-defined in the general concept, and both the union option of mount(8) and the file access mapping for /compat/* (e.g. for Linux binaries), which do conceptually very similar things, have relevant differences in behavior.

This doesn't mean it isn't useful, so maybe my posting was worded poorly. All I wanted to say is, for a "thin jail", you know exactly where you must provide some writable sub-trees, so I'd always prefer nullfs mounts for that purpose.

OlCe · Mar 26, 2024

zirias@ said:
OlCe first of all, I didn't mean to "bash" unionfs or something like that.

I didn't take it like that, no worries. But I just wanted to correct what you said since it is conceptually clear what appears in the union view given the content of the two backing layers.

zirias@ said:
So even if all these things are perfectly well-defined in the scope of unionfs(5) (and as some of the behavior depends on mount options, I'm not so sure about that, but, close enough), it's not well-defined in the general concept, and both the union option of mount(8) and the file access mapping for /compat/* (e.g. for Linux binaries), which do conceptually very similar things, have relevant differences in behavior.

Yes, but importantly, -o union and the /compat/* mechanisms are precisely not unionfs, and, even if related, have different semantics. The former only merges the mounted and mounted-over directories, and not their sub-directories. The latter is loosely similar to a subset of unionfs (e.g., there are no whiteouts) applying only to Linuxulator's processes. If by "general concept", you mean merging some portions of the tree without additional precisions, which all such mechanisms do to different extents, then of course the behavior is not well-defined. By contrast, how unionfs is supposed to build its view from its layers is well defined.

zirias@ said:
This doesn't mean it isn't useful, so maybe my posting was worded poorly. All I wanted to say is, for a "thin jail", you know exactly where you must provide some writable sub-trees, so I'd always prefer nullfs mounts for that purpose.

You can provide writable trees with unionfs as well, if you need/want to build them from a read-only base and have the difference to it (like a patch; i.e., the upper layer) in a different, writable filesystem. This would be akin to ZFS snapshots, but with a much finer granularity (file-level) and control (it is possible to evolve the base and apply the old patch, or an amended version). If you don't need all that, nullfs is indeed the way to go.

codeedog · Mar 29, 2024

OlCe said:
If you have the time and opportunity to test with nullfs yourself in the meantime, that would be an interesting data point.

I may be able to bring up the environment to test this. I'm unsure what precisely what you wanted me to test. Here's the configuration for the null/union mounts:

Code:

/usr/local/jails/templates/13.2-RELEASE-base /usr/local/jails/containers/www_jail_        nullfs ro       0       0
/usr/local/jails/containers/www_jail /usr/local/jails/containers/www_jail_        unionfs rw,noatime      0       0

Did you want only the first line (delete 2nd) or to have the second line replace unionfs with nullfs? I've moved on to a different model, but if the old one still comes up, I will try to test something. As an aside, the unionfs set up is so elegant and appears to use the least disk space of all solutions. I'd prefer to use that one whenever it's deemed stable.

Also, zirias@, thank you for the suggested alternative. It's been working great. Even built a shell script to automate the generation step for faster experimentation.