So I've got a FreeBSD 10.3 host with three jails that use nullfs to mount a common read-only base system. I've discovered that on reboot, only one or two of the three will start, and I cannot predict which ones. The remaining jail (or jails) fail to start. There are NO logs anywhere on the main system, nor in the jails' individual console log files. So to debug, I had to hack some debugging logging into the /etc/rc.d/jail script.
/etc/jail.conf:
/etc/rc.d/jail script has this line added in the jail_start() function in the _ALL case statement subsection to capture the output stored in the $_tmp file on error/failure:
So I reboot the system. Only a single jail starts, the first one this time (the first nearly always starts--it's usually the second that fails, but sometimes it's the third, or both second and third). I examine the output of /tmp/DEBUG and see:
Workaround discovered: My next step was to add to jail2 a "depend = jail1;" line and to jail3 a "depend = jail2;" line. That fixes the problem.
So the question is:
Why is jail failing with
My conclusion:
This looks like a parallel race with mounting a common nullfs(5) read-only filesystem.
BUG #1: Race condition on nullfs(5) mount of commonly shared read-only filesystems by jails
BUG #2: NO logging of the failure anywhere! (I had to invent my own.)
What is the correct permanent fix? I can use my workaround, but I hate workarounds when something should just work.
Thanks!
Aaron out.
/etc/jail.conf:
Code:
jail1 {
host.hostname = "jail1.example.org";
path = "/usr/local/jail/jail1";
ip4.addr = 127.0.0.11;
mount = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail1/basejail nullfs ro 0 0";
exec.consolelog = "/var/log/jail_${host.hostname}.log";
exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'";
}
jail2 {
host.hostname = "jail2.example.org";
path = "/usr/local/jail/jail2";
ip4.addr = 127.0.0.12;
mount = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail nullfs ro 0 0";
exec.consolelog = "/var/log/jail_${host.hostname}.log";
exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'";
}
jail3 {
host.hostname = "jail3.example.org";
path = "/usr/local/jail/jail3";
ip4.addr = 127.0.0.11;
mount = "/usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail nullfs ro 0 0";
exec.consolelog = "/var/log/jail_${host.hostname}.log";
exec.prestart = "/bin/sh -c 'echo PRESTART_${host.hostname} >> /tmp/DEBUG'";
exec.poststart = "/bin/sh -c 'echo POSTSTART_${host.hostname} >> /tmp/DEBUG'";
}
/etc/rc.d/jail script has this line added in the jail_start() function in the _ALL case statement subsection to capture the output stored in the $_tmp file on error/failure:
Code:
echo "DEBUG: Contents of '$_tmp' are:" >> /tmp/DEBUG
cat $_tmp >> /tmp/DEBUG
echo "DEBUG: END OF '$_tmp' CONTENTS" >> /tmp/DEBUG
So I reboot the system. Only a single jail starts, the first one this time (the first nearly always starts--it's usually the second that fails, but sometimes it's the third, or both second and third). I examine the output of /tmp/DEBUG and see:
Code:
PRESTART_jail1
POSTSTART_jail1
DEBUG: Contents of '/tmp/jail.hyLntGie' are:
mount_nullfs: /usr/local/jail/jail2/basejail: Operation not supported by device
mount_nullfs: /usr/local/jail/jail3/basejail: Operation not supported by device
jail: jail2: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail2/basejail: failed
jail: jail3: /sbin/mount -t nullfs -o ro /usr/local/jail/basejail_2016_04_19 /usr/local/jail/jail3/basejail: failed
jail1: created
DEBUG: jail_start(): END OF '/tmp/jail.hyLntGie' CONTENTS
Workaround discovered: My next step was to add to jail2 a "depend = jail1;" line and to jail3 a "depend = jail2;" line. That fixes the problem.
So the question is:
Why is jail failing with
mount_nullfs
errors when launching jails WITHOUT me manually setting dependencies so that jails launch sequentially?My conclusion:
This looks like a parallel race with mounting a common nullfs(5) read-only filesystem.
BUG #1: Race condition on nullfs(5) mount of commonly shared read-only filesystems by jails
BUG #2: NO logging of the failure anywhere! (I had to invent my own.)
What is the correct permanent fix? I can use my workaround, but I hate workarounds when something should just work.
Thanks!
Aaron out.