Solved mysql socket disappears when restarting related jail with nullfs mount point

I'm running FreeBSD 11.3-RELEASE-p3.
The jail manager is qjail.

There's a jail for the mariadb database server that should serve some other web server jails, via a unix socket, not through the network.
Let's call this the `mariadb` jail.
When the jail is started, the service mysqld-server is started automatically and a socket is placed in /var/run/mysql/mysql.sock

A related jail is started that mounts the /var/run/mysql directory of the mariadb jail in its own /var/run/mysql directory.
This is the line of the corresponding /usr/local/etc/qjail.fstab/webjail file:
Code:
/usr/jails/mariadb/var/run/mysql  /usr/jails/webjail/var/run/mysql  nullfs  rw  0  0

So far, so good.

But when the webjail is restarted, the mysql.sock file in the mariadb's /var/run/mysql directory disappears and access to the database server is lost to all jails depending on it.
I don't understand the logic of this, that's why I'm asking here.
 
If MySQL is stopped the socket file disappears. Why would it leave it dangling? A restart is a stop and a start. So there's a window where the socket file doesn't exist, it gets created if/when the database is ready to take connections.

It's the same with network sockets, if the service is stopped (or not running) the port is closed. Why would it leave the port open? Why do you think a socket file is different from a network socket?
 
Please read what I've written carefully.
I'm not restarting the mariadb jail, I'm restarting the webjail jail.
 
Oh, right. The filesystems in /usr/local/etc/qjail.fstab/webjail are mounted/unmounted when the jail starts/stops. I suspect the socket file is written to the wrong directory. So when you shutdown that jail you remove the socket file that was written to it.
 
Does the MariaDB jail have some nullfs(4) mounts too? It might be an ordering issue with filesystems overlapping. It's quite easy to get into a situation like that if you have a few nullfs(4) mounts.
 
The mariadb jail has just this nullfs read-only mount:
Code:
# cat /usr/local/etc/qjail.fstab/mariadb
/usr/jails/sharedfs /usr/jails/mariadb/sharedfs nullfs ro 0 0

The webjail also has a php-fpm socket nullfs mount point:
Code:
# cat /usr/local/etc/qjail.fstab/webjail
/usr/jails/sharedfs /usr/jails/webjail/sharedfs nullfs ro 0 0
/usr/jails/mariadb/var/run/mysql  /usr/jails/webjail/var/run/mysql  nullfs  rw  0  0
/usr/jails/php-fpm/var/run/php-fpm/sockets    /usr/jails/webjail/var/run/php-fpm/sockets nullfs      rw      0  0
 
At first glance those appear to be fine. Does it matter if you start the web jail before the MariaDB jail initially? And then restart the web jail? If things are set up correctly the order in which they are started shouldn't matter. But if the behavior changes (like the disappearance of the socket) you know it must have something to do with the mounts.
 
If I start the web jail before the mariadb jail, the nullfs mount of the socket dir works fine, as it does when mariadb is started before the web jail.

The behavior is the same though. The mariadb socket disappears when restarting the web jail.
More precisely, this happens not when stopping the web jail, but when starting it.

I have the impression that many times these strange things only happen to me...
 
More precisely, this happens not when stopping the web jail, but when starting it.
That's super weird. Is there some rc(8) script (home-made maybe) that might inadvertently remove files in /var/run/?
 
There's no script, at least made by me.

I removed the nullfs mount for the external php-fpm service and it runs locally on the web jail.
The mariadb socket still disappears.
Here's the content of the qjail log when starting the web jail:

Code:
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/mysql /usr/local/lib/perl5/5.28/mach/CORE
32-bit compatibility ldconfig path: /usr/lib32
Setting hostname: web.
Creating and/or trimming log files.
Starting syslogd.
Clearing /tmp.
Updating motd:.
Performing sanity check on php-fpm configuration:
[08-Oct-2019 23:29:03] NOTICE: configuration file /usr/local/etc/php-fpm.conf test is successful

Starting php_fpm.
Starting hiawatha.
Performing sanity check on sshd configuration.
Starting sshd.
Starting cron.

Tue Oct  8 23:29:03 WEST 2019
/var/log/qjail.webjail.console.log (END)

Then, I removed the nullfs mount line of /usr/local/etc/qjail.fstab/web, so that the only line is:
Code:
# cat /usr/local/etc/qjail.fstab/webjail
/usr/jails/sharedfs /usr/jails/webjail/sharedfs nullfs ro 0 0
Surprise, surprise... The socket of the mariadb database server on the mariadb jail is gone as well.

Next step, I removed this sharedfs line of the /usr/local/etc/qjail.fstab/webjail file.
When restarting the webjail, the socket still disappeared.
 
Using qjail archive, I've copied the two jails from the VPS to a FreeBSD 11.3-RELEASE VM in my computer.
After qjail restore, starting both jails, then stopping the webjail and starting it again, the same behavior happens.
The mariadb socket disappears when the webjail jail is started.

I'm going to post this to some mailing list. This is some kind of a bug.
 
I also have this issue but with iocage on 12.0-RELEASE. I noticed the mysql.sock is gone when the related jail is started, not when the jail shuts down. Additionally, I also have few instances of php-fpm jail with similar nullfs mounts to nginx jail, which exhibited the same behavior when nginx jail is restarted.

Running mount -t nullfs /data/shared /mnt/tmp on the jailhost did not remove the mysql.sock which is weird, given jail mount basically calls mount -t $3 -o $4 $1 $2 (in sh equivalent, see source code).

The socket seems to still open as it's listed in netstat -l even after the socket file is gone (with the same address/inode and everything as before it was gone). Running netstat -l inside a jail doesn't have mysql.sock listed, so possibly something might be unlinking socket with unknown address/inodes during jail start... but I haven't been able to debug it further.
 
What I find strange is that, for such a (I think) common setup of a mysql socket, this seems to have only happened to both of us.

I've sent a mail to the qjail mantainer and got a message back saying that:

When you restart the master jail that the other jails access, you also have to restart these other jails so they get the new file content created when you restarted the master jail.

Every thing looks like its working as it should. You just need to manually stop the other jails that depend on the master jail then restart the master jail followed by starting the other jails that depend on the master jail. All 3 jails work together as a group and have to be restarted as a group when you want the master jail restarted for whatever reason.

This is not a qjail, jail, mysql, or nullfs problem. Its a problem of how you think jails work and your special use case. This is the key. Create a special script to be used when you want to restart the master jail. First embed qjail commands to stop all 3 jails in reverse order they were started in and then start the jails in same order as when you boot the host. Everything will self align and work just like you want it to.

Still, don't understand why one jail interferes with another one.
 
Today I tried changing socket mountpoint to use rw only for jail that creates that socket, and ro for jail that only access them, e.g. for MySQL jail I have:

Code:
/data/run/mysql /var/iocage/jails/mysql/root/var/run/mysql nullfs rw 0 0

...and for php-fpm:

Code:
/data/run/mysql /var/iocage/jails/nextcloud/root/var/run/mysql nullfs ro 0 0
/data/run/nextcloud /var/iocage/jails/nextcloud/root/var/run/php-fpm nullfs rw 0 0

To my surprise, this setup no longer causing mysql.sock to be deleted when starting a jail. php-fpm can still access MySQL just fine (with ability to INSERT and all). I think this might be a possible workaround.
 
sirn I've only tried this for the mariadb jail but your solutions has also worked for my case. I don't use iocage, but qjail.
It's solved, thank you.
 
I hit this problem today. I used your RO solution.

I also think I know WHY this problem occurs.

From /etc/rc.d/cleanvar, which is run, by default on boot:

Code:
$ grep cleanvar /etc/defaults/rc.conf
cleanvar_enable="YES"     # Clean the /var directory


Code:
cleanvar_start()
{
        if [ -d /var/run -a ! -f /var/run/clean_var ]; then
                # Skip over logging sockets
                find /var/run \( -type f -or -type s ! -name log -and ! -name logpriv \) -delete
                >/var/run/clean_var
        fi
 
Back
Top