Cannot cleanly stop jails after using graphical applications

Starting and stopping the jail works fine when I stick to console apps. I can also successfully launch and use graphical jailed apps on the host display using e.g., jailme 1 firefox, but I cannot cleanly stop the jail after closing jailed graphical apps from my host X display.

Here is my /etc/jail.conf for reference:

Code:
# Global settings applied to all jails

interface = "lo1";
host.hostname = "$name.domain.local";
path = "/usr/local/jails/$name";
ip4.addr = 10.0.0.$ip;
mount.fstab = "/usr/local/jails/$name.fstab";

exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;
mount.devfs;
devfs_ruleset = 5; # apply a sound ruleset setup from /etc/devfs.rules

# The jail definition for browser
browser {
    $ip = 17;
}

Output of stopping the jail with jail -r -v browser:

Code:
browser: run command in jail: /bin/sh /etc/rc.shutdown
Stopping cron.
Waiting for PIDS: 1058.
.
Terminated
browser: sent SIGTERM to: 1054 1051 992
browser: removed
browser: run command: /sbin/umount /usr/local/jails/browser/dev
browser: run command: /sbin/umount -t nullfs /usr/local/jails/browser/skeleton
browser: run command: /sbin/umount -t nullfs /usr/local/jails/browser/
umount: unmount of /usr/local/jails/browser failed: Device busy
jail: browser: /sbin/umount -t nullfs /usr/local/jails/browser/: failed
browser: run command: /sbin/ifconfig lo1 inet 10.0.0.17 netmask 255.255.255.255 -alias

It seems the first line of my /usr/local/jails/browser.fstab is not getting unmounted by the /etc/rc.shutdown script:

Code:
/usr/local/jails/templates/base-10.3-RELEASE  /usr/local/jails/browser/ nullfs   ro          0 0
/usr/local/jails/thinjails/browser     /usr/local/jails/browser/skeleton nullfs  rw  0 0

The output of jls shows that the jail did shutdown, though:

Code:
JID  IP Address      Hostname                      Path

The main problem is that as the /usr/local/jails/browser.fstab mounts do not get unmounted on jail shutdown, restarting the jail with service jail start browser now causes an error:

Code:
Starting jails: cannot start jail  "browser":
mount_nullfs: /usr/local/jails/browser: Resource deadlock avoided
jail: browser: /sbin/mount -t nullfs -o ro /usr/local/jails/templates/base-10.3-RELEASE /usr/local/jails/browser/: failed
.

Manually unmounting with umount /usr/local/jails/browser from the host results in:

Code:
umount: unmount of /usr/local/jails/browser failed: Device busy

Surprisingly, after a while, manually unmounting works, and I can restart the jail fine. This suggests that process(es) still temporarily keep access to the nullfs filesystem, but I'm unsure how I can trace the process(es). I tried fstat -f /usr/local/jails/browser and lsof +D /usr/local/jails/browser, and the same commands on the other jail mounts, but they do not list any open files once the jail is closed. I'm unfamiliar with how to use these tools effectively, however.

If it is any help, here is the jails section of my /etc/rc.conf:

Code:
# Jails
cloned_interfaces="lo1"
pf_enable="YES"
jail_enable="YES"
jail_list="browser"

And my pf rules too:

/etc/pf.conf:

Code:
ext_if="wlan0"
int_if="lo1"
localnet=$int_if:network

scrub in all fragment reassemble
set skip on lo0
set skip on lo1

#nat for jails
nat on $ext_if inet from ($localnet) to any -> ($ext_if)
 
Although I still can't detect rogue open files in the jail mounts, probing for open sockets to the jail IP with sockstat -c | grep -i 10.0.0.17 gives interesting results:

Code:
?        ?          ?     ?  tcp4   10.0.0.17:31909       199.96.156.124:80
?        ?          ?     ?  tcp4   10.0.0.17:31912       199.96.156.124:80
?        ?          ?     ?  tcp4   10.0.0.17:31911       199.96.156.124:80
?        ?          ?     ?  tcp4   10.0.0.17:31910       199.96.156.124:80

This is after closing chromium. The output shows lingering connections where the closed application process has died whilst communicating with the Xorg process. These connections gradually go away, and when they do I can stop and restart the jail gracefully without the aforementioned umount errors. Is there a clean way to kill or prevent such sockets when forwarding to the X host display, so that I do not have to wait to restart the jail?

Edit: What I have come up with so far in /etc/jail.conf:

Code:
exec.prestop = "tcpdrop -l -a | grep 10.0.0.$ip | sh";

This gets rid of all the still open sockets on the jail IP before running the /etc/rc.shutdown sequence, and stops the umount errors from happening.

I'm waiting for someone to tell me why this is a bad idea. The tcpdrop command is destructively killing transactions, but killing off a jail is destructive in and of itself, so I assume this is safe to run for jails where I'm not doing anything critical such as transferring databases? Or is there a better way around my problem with the built-in jail options? I'm also wondering if my /etc/pf.conf rules are slowing down the removal of sockets.

Thanks for reading.
 
I'm wondering why those connections stay open. What does netstat(8) say about the state of those connections? It might be they're simply waiting for an ACK that never comes and they just timeout after a while. In that case it may be helpful to find out if there ever was an ACK or maybe lower the timeout a bit.
 
Thanks for your reply. They are in TIME_WAIT state with netstat | grep 10.0.0.17 e.g., straight after closing forwarded chromium:

Code:
tcp4       0      0 10.0.0.17.64351        151.101.65.69.http     TIME_WAIT
tcp4       0      0 10.0.0.17.64350        151.101.65.69.http     TIME_WAIT
tcp4       0      0 10.0.0.17.64349        sea.gmane.org.http     TIME_WAIT
tcp4       0      0 10.0.0.17.64348        sea.gmane.org.http     TIME_WAIT
tcp4       0      0 10.0.0.17.64347        sea.gmane.org.http     TIME_WAIT
tcp4       0      0 10.0.0.17.64346        lo.gmane.org.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64345        lo.gmane.org.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64344        lo.gmane.org.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64343        lo.gmane.org.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64342        lo.gmane.org.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64341        lo.gmane.org.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64340        lo.gmane.org.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64339        cache-eu.flickr..https TIME_WAIT
tcp4       0      0 10.0.0.17.64338        cache-eu.flickr..https TIME_WAIT
tcp4       0      0 10.0.0.17.64336        wk-in-f156.1e100.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64335        wk-in-f156.1e100.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64332        server-54-192-3-.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64331        server-54-192-3-.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64330        server-54-192-3-.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64329        ssd12.stablehost.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64328        ssd12.stablehost.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64327        ssd12.stablehost.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64326        ssd12.stablehost.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64325        server-52-85-69-.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64324        server-52-85-69-.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64323        104.20.19.28.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64322        94.46.159.17.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64321        94.46.159.17.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64320        94.31.29.218.IPY.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64319        94.31.29.218.IPY.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64318        94.46.159.17.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64317        94.46.159.17.http      TIME_WAIT
tcp4       0      0 10.0.0.17.64316        c2.90.564a.ip4.s.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64315        c2.90.564a.ip4.s.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64314        c2.90.564a.ip4.s.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64313        c2.90.564a.ip4.s.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64312        c2.90.564a.ip4.s.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64311        c2.90.564a.ip4.s.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64310        2.counter.a.stat.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64308        ec2-46-51-197-89.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64307        ec2-46-51-197-89.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64306        ec2-46-51-197-89.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64305        wfe0.ysv.freebsd.https TIME_WAIT
tcp4       0      0 10.0.0.17.64304        wfe0.ysv.freebsd.https TIME_WAIT
tcp4       0      0 10.0.0.17.64303        lhr25s09-in-f9.1.https TIME_WAIT
tcp4       0      0 10.0.0.17.64302        lhr25s09-in-f14..http  TIME_WAIT
tcp4       0      0 10.0.0.17.64301        ber01s15-in-f3.1.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64300        ber01s15-in-f35..http  TIME_WAIT
tcp4       0      0 10.0.0.17.64299        ber01s15-in-f35..http  TIME_WAIT
tcp4       0      0 10.0.0.17.64298        lhr25s07-in-f9.1.https TIME_WAIT
tcp4       0      0 10.0.0.17.64297        any-in-2415.1e10.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64296        any-in-2415.1e10.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64295        any-in-2415.1e10.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64294        any-in-2415.1e10.http  TIME_WAIT
tcp4       0      0 10.0.0.17.64293        wfe0.ysv.freebsd.https TIME_WAIT
tcp4       0      0 10.0.0.17.64292        lhr25s10-in-f14..https TIME_WAIT
tcp4       0      0 10.0.0.17.64291        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64290        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64289        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64288        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64287        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64286        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64285        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64284        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64283        lhr26s04-in-f10..https TIME_WAIT
tcp4       0      0 10.0.0.17.64276        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.36358        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.63054        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.48743        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.47473        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.64711        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.56088        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.11903        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.62674        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.26989        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.34340        10.0.0.17.x11          TIME_WAIT
tcp4       0      0 10.0.0.17.42324        10.0.0.17.x11          TIME_WAIT
udp4       0      0 10.0.0.17.syslog       *.*

Before I close the browser those connections are listed by sockstat -c as belonging to chrome and Xorg processes.

I had considered doing sysctl net.inet.tcp.msl=7500 to bring the time to wait down to 15s from 60s, but if it is safe to zap them all straight away, then I'd rather do that.
 
Well, they're in a TIME_WAIT state so they're not actually doing something besides waiting for the connection to close. So it should be safe to "kill" them.
 
Right. How can I check if ACK packets are ever sent? I assume with something like tcpdump and/or reading pflog. I've also found sysctl net.inet.tcp.nolocaltimewait=1 and wonder if this command is optimal since my jails run on lo1?
 
Back
Top