Kernel crash on jail stop

Running 12.0-RELEASE-p5, jails managed through the native jails tools with the attached jail.conf file (so called "thin jails", VNET). I can reliably crash the system by issuing a "service jail restart". The crash happens when jails are being stopped.

Code:
[root@beastie ~]# service jail restart
Stopping jails: dns db ldap web nextcloud imap smtp testpacket_write_wait: Connection to xxx.xxx.xxx.xxx port 22: Broken pipe

As a workaround I can avoid the crash by inserting a exec.poststop = "sleep 2"; statement in jail.conf. 1 second was not enough to avoid the crash.

I've managed to obtain a core dump but my attempts at debugging didn't go far as I'm not experienced with this. Of course happy to do more debugging if this can help identifying the issue. The system is not yet live so I can pretty much try anything on it.

Code:
[root@beastie /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...(no debugging symbols found)...done.
0xffffffff80bcd0bd in sched_switch ()
(kgdb) bt
#0  0xffffffff80bcd0bd in sched_switch ()
#1  0xffffffff80ba6de1 in mi_switch ()
#2  0xffffffff80bf554c in sleepq_wait ()
#3  0xffffffff80ba6817 in _sleep ()
#4  0xffffffff80bfae71 in taskqueue_thread_loop ()
#5  0xffffffff80b5bf33 in fork_exit ()
#6  <signal handler called>

My jail.conf file below. Note that the crash happens regardless of whether the jails with special permissions (db, builder) are configured. I also tried tweaking the mount/umount order without success.

Code:
$release="12.0-RELEASE"; # Release used to create the jail if not already existing
host.hostname = "${name}";
path = "/jails/${name}";
exec.consolelog = "/var/log/jail.${name}.console.log";
vnet = "new";
vnet.interface = "epair${jailnum}b";
### Create the jail if not already exsisting ###
exec.prestart = "if [ ! -d "/jails/THINJAILS/${name}" ]; then zfs clone zroot/jails/TEMPLATES/skeleton-${release}@skeleton zroot/jails/THINJAILS/${name}; fi";
exec.prestart += "if [ ! -d "/jails/${name}" ]; then mkdir /jails/${name}; fi";
### Mount filesystems for shared base (RO) and individual jail skeleton (RW)
exec.prestart += "mount_nullfs -o ro /jails/TEMPLATES/base-${release} /jails/${name}";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/etc /jails/${name}/etc";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/home /jails/${name}/home";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/root /jails/${name}/root";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/tmp /jails/${name}/tmp";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/var /jails/${name}/var";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/usr/local /jails/${name}/usr/local";
### Mount /usr/ports from hosts
exec.prestart += "mount_nullfs -o ro /usr/ports /jails/${name}/usr/ports";
### Mount data filesystem if exists
exec.prestart += "if [ -d "/jails/DATA/${name}" ]; then mount_nullfs -o rw /jails/DATA/${name} /jails/${name}/data; fi";
### Mount devfs with the default ruleset (11) now that the jail filesystems are mounted
exec.prestart += "mount -t devfs -o ruleset=4 devfs /jails/${name}/dev";
### Create an ethernet pair and add to bridge0 (internal bridge)
exec.prestart += "ifconfig epair${jailnum} create up";
exec.prestart += "ifconfig epair${jailnum}a description '${name} - host interface'";
exec.prestart += "ifconfig epair${jailnum}b description '${name} - jail interface'";
exec.prestart += "ifconfig bridge0 addm epair${jailnum}a";
exec.start = "ifconfig epair${jailnum}b inet 192.168.1.${jailnum}/24";
exec.start += "ifconfig epair${jailnum}b inet6 xxx:xxx:xxx:xxx::${jailnum}:1/64";
exec.start += "route add default 192.168.1.1";
exec.start += "route -6 add default xxx:xxx:xxx:xxx::1:1";
### Proceed with boot through rc
exec.start += "/bin/sh /etc/rc";
### Shutdown the jail
exec.stop = "/bin/sh /etc/rc.shutdown";
### Give the jail a couple of seconds to shutdown (avoid issues unmounting filesystems)
#exec.poststop = "sleep 2";
### Destroy jail  network interface
exec.poststop += "ifconfig epair${jailnum}a destroy";
### Unmount filesystems
exec.poststop += "if [ -d "/jails/DATA/${name}" ]; then umount /jails/${name}/data; fi";
exec.poststop += "umount -f /jails/${name}/dev";
exec.poststop += "umount -f /jails/${name}/usr/ports";
exec.poststop += "umount -f /jails/${name}/etc";
exec.poststop += "umount -f /jails/${name}/home";
exec.poststop += "umount -f /jails/${name}/root";
exec.poststop += "umount -f /jails/${name}/tmp";
exec.poststop += "umount -f /jails/${name}/var";
exec.poststop += "umount -f /jails/${name}/usr/local";
exec.poststop += "umount -f /jails/${name}";

dns { $jailnum=2; };

db {
    $jailnum=3;
    allow.sysvipc;
};

ldap { $jailnum=4; };

web { $jailnum=5; };

nextcloud { $jailnum=6; };

imap { $jailnum=7; };

smtp { $jailnum=8; };

test { $jailnum=98; };

builder {
    $jailnum=99;
    enforce_statfs=0;
    allow.mount;
    allow.mount.nullfs;
    allow.mount.tmpfs;
    allow.mount.devfs;
    allow.chflags;
};
 
pgauret The best thing to do is to submit PR. You can start here. If you never submitted PR before it's recommended to read the guidelines first. It's not long and does serve its purpose.
In the past I had always issues with VNET but from your output it's impossible to tell what the issue was.
 
pgauret The best thing to do is to submit PR. You can start here. If you never submitted PR before it's recommended to read the guidelines first. It's not long and does serve its purpose.
In the past I had always issues with VNET but from your output it's impossible to tell what the issue was.

Thanks for your reply and advice.

I've submitted PR 238326. In addition I've narrowed the issue to VIMAGE/VNET networking (jails can be reliably restarted hundreds of times without crashing if I disable networking).
 
Back
Top