Overnight periodic report? triggering in bottlenecked processes on jail mounts

Hi, posting here because this started after the recent 15.0 upgrade, but I'm not convinced it is due to the upgrade.

The problem: at 3am each day something is triggering a bottlenecking of the processes related to mounts for the jails. We have not had this problem before, and have been running this jail setup for about a year now.


Description of our environment: We are currently running Host -> vnet Jail -> 4 child jails on a smaller machine: 8-core, 16gb of memory, with one jail being a reverse proxy, another being a production web app, a third is a development version of the web app, and the 4th is a WIP something else.

ZFS datasets have been created as needed and our approach for mounting datasets into the jail were already following the recommended approach as described in the recent feature for mounting datasets.

When the vnet Jail is started, Host's /usr/local/jails is mounted at vnet Jail's /usr/local/jails. We mount a templated base for each child jail, and then the respective userland, etc.


The symptom: when running htop from the Host, sorted by CPU high to low, the top two processes show the filesystem mounts. These don't normally appear on the list when things are "working". Although the system responds somewhat to some commands, jails cannot be shutdown with normal commands from the vnet jail. A reboot has been required which returns the loads to normal. The processes - whatever it is - is pegged at the equivalent of one processor core. I haven't found how to trigger the problem, and am not sure how to diagnose what process may be causing it.


Attempts to solve: I thought maybe there was a conflict between periodic reports being run since high CPU is always starting at 3am, so I disabled the ones in the child jails related to filesystem changes, and for a night that seemed to fix it. Interestingly, there are no errors I can find in any logs, and the jail processes are running fine when checked, but the resources are being hogged by the Host or something else. When running htop from a child jail, it's almost all red across all cores.

But yesterday it started showing high CPU again during the day - however, the processes showing the filesystem mounts were not on the list. It seemed to resolve when I experimented by shutting down the development jail, so I figured it might be related to a process on it. I choose that jail because our production jail can call to the development jail but the 4 jails are otherwise not able to talk to each other. However, last night something got triggered and the CPU was high with the filesystem mounts, and this was with the development jail off.


I'm looking for some ideas of what to try or test, thanks.

Host jail.conf which starts the vnet jail:
Code:
mount.devfs;                               # Mount devfs inside the jail
exec.clean;                                # prevent use of host environment in commands
exec.start = "/bin/sh /etc/rc";            # Start command
exec.stop = "/bin/sh /etc/rc.shutdown";    # Stop command
exec.consolelog = "/var/log/jail_console_${name}.log";
host.hostname = "redacted";

jail_parent {
    path = "/usr/local/jails/containers/${name}";
    vnet;
    vnet.interface="e0b_jail_parent";
    devfs_ruleset=70;
    exec.created+="zfs jail $name zroot/poudriere";
    exec.created+="zfs jail $name zroot/jails/children";
    exec.created+="zfs jail $name zroot/jails/root_templates";
    exec.created+="zfs jail $name zroot/jails/databases";
    exec.created+="/usr/local/jails/scripts/zfs_set_jailed_on";
    exec.start+="service jail onestart nginx";
    exec.start+="service jail onestart beta";
    exec.start+="service jail onestart marketing";
    exec.start+="service jail onestart develop";
    exec.start+="wg-quick up wg_client";
    allow.raw_sockets;
    allow.mlock;
    allow.mount;
    allow.mount.devfs;
    allow.mount.fdescfs;
    allow.mount.fusefs;
    allow.mount.linprocfs;
    allow.mount.linsysfs;
    allow.mount.nullfs;
    allow.mount.procfs;
    allow.mount.tmpfs;
    allow.mount.zfs;
    enforce_statfs = 1;
    children.max=20;
    exec.prestop+="/sbin/ifconfig e0b_jail_parent -vnet $name";
    exec.stop+="service jail onestop develop";
    exec.stop+="service jail onestop marketing";
    exec.stop+="service jail onestop beta";
    exec.stop+="service jail onestop nginx";
    exec.stop+="wg-quick down wg_client";
    exec.poststop+="zfs unjail $name zroot/poudriere";
    exec.poststop+="zfs unjail $name zroot/jails/children";
    exec.poststop+="zfs unjail $name zroot/jails/root_templates";
    exec.poststop+="zfs unjail $name zroot/jails/databases";
    exec.poststop+="/usr/local/jails/scripts/zfs_set_jailed_off";
}

Vnet jail.conf which starts the child jails:
Code:
mount.devfs;                               # Mount devfs inside the jail
devfs_ruleset = "0";
ip6="disable";
exec.clean;
exec.start = "/bin/sh /etc/rc";            # Start command
exec.stop = "/bin/sh /etc/rc.shutdown";    # Stop command
exec.consolelog = "/var/log/jail_console_${name}.log";
path = "/usr/local/jails/$name";

nginx {
    # hostname
    host.hostname = "redacted";

    # network
    ip4.addr = "192.168.1.1";

    # permissions
    enforce_statfs = 1;
    allow.raw_sockets;
    allow.mount;
    allow.mount.fusefs;

    mount.fstab = "/usr/local/jails/${name}.fstab";
    exec.release += "/usr/local/jails/unmount_nginx_fstab";
}

marketing {
    # hostname
    host.hostname = "redacted";

    # network
    ip4.addr = "192.168.2.1";

    # permissions
    enforce_statfs = 1;
    allow.raw_sockets;
    allow.mlock;
    allow.mount;
    allow.mount.devfs;
    allow.mount.fdescfs;
    allow.mount.fusefs;
    allow.mount.linprocfs;
    allow.mount.linsysfs;
    allow.mount.nullfs;
    allow.mount.procfs;
    allow.mount.tmpfs;

    mount.fstab = "/usr/local/jails/${name}.fstab";
    exec.start += "/var/www/sourcefiles/mountsourcefiles";
    exec.start += "/var/www/sourcefiles/start_linux_service";
    exec.stop += "/var/www/sourcefiles/stop_linux_service";
    exec.stop += "/var/www/sourcefiles/unmountsourcefiles";
    exec.release += "/usr/local/jails/unmount_marketing_fstab";
}

beta {
    # hostname
    host.hostname = "redacted";

    # network
    ip4.addr = "192.168.3.1";

    # permissions
    enforce_statfs = 1;
    allow.raw_sockets;
    allow.mlock;
    allow.mount;
    allow.mount.fusefs;
    allow.mount.nullfs;

    mount.fstab = "/usr/local/jails/${name}.fstab";
    exec.prestart += "sudo -u rsync_user /home/rsync_user/mount_rsync_net_beta_remote_storage";
    exec.start += "/var/www/sourcefiles/mountsourcefiles";
    exec.prestop += "umount -f /usr/local/jails/beta/s/usr/local/mnt/rsync_remote_storage";
    exec.stop += "/var/www/sourcefiles/unmountsourcefiles";
    exec.release += "/usr/local/jails/unmount_beta_fstab";
}

develop {
    # hostname
    host.hostname = "redacted";

    # network
    ip4.addr = "192.168.4.1";

    # permissions
    enforce_statfs = 1;
    allow.raw_sockets;
    allow.mlock;
    allow.mount;
    allow.mount.fusefs;
    allow.mount.nullfs;

    mount.fstab = "/usr/local/jails/${name}.fstab";
    exec.stop += "/var/www/sourcefiles/mountsourcefiles";
    exec.stop += "/var/www/sourcefiles/unmountsourcefiles";
    exec.release += "/usr/local/jails/unmount_develop_fstab";
}
 
Spread these a bit on each system/jail.

/etc/crontab
Code:
# Perform daily/weekly/monthly maintenance.
1       3       *       *       *       root    periodic daily
15      4       *       *       6       root    periodic weekly
30      5       1       *       *       root    periodic monthly
So they don't fire all at exactly the same time.
 
Ok, I think we are on the right track. I could actually see the processes for the periodic cron jobs, but the system was pretty loaded up still. I think I may have cruft and/or incorrect job disabling:

The system was initially version 13 (way back). I have two questions.

1. Regarding older items like these (none are custom) - is it best practice to remove any that do not also appear in /etc/periodic/* ?
Code:
# ls -R /usr/local/etc/periodic
daily    security weekly

/usr/local/etc/periodic/daily:
411.pkg-backup         490.status-pkg-changes

/usr/local/etc/periodic/security:
405.pkg-base-audit 410.pkg-audit      460.pkg-checksum

/usr/local/etc/periodic/weekly:
400.status-pkg

2. periodic.conf() says periodic.conf.local can be used in /etc - can it also be used in /usr/local/etc ? Or does it need to be periodic.conf? I had tried to disable the 110.neggrpperm with security_status_neggrpperm_enable="NO" in /etc/periodic.conf.local for each of the 4 child jails but I saw 4 instances running and it makes me thing it is not working. Maybe it only gets disabled with security_status_neggrpperm_period="NO"?

Or, maybe the host and the vnet jail (parent jail of the 4 children) were each running neggrpperm and the processes showed a command process + the script process. I don't know.

Thanks I appreciate the help a lot.
 
When you disabled security_status_neggrpperm_enable in /etc/periodic.conf.local, it sounds like you concluded /etc/periodic.conf.local didn't work as-advertised, because you still saw some nightly find processes.

I believe the confusion is due to you not disabling all nightly tasks that do a find /

To be clear, I see 2 periodic daily security tasks and 1 periodic weekly task that each do a find / (at least in 14.3).

You could put the following lines in /etc/periodic.conf on your main system and in each jail (where/if cron is enabled) to disable all periodic find traversals:

Code:
security_status_chksetuid_enable="NO"
security_status_neggrpperm_enable="NO"
weekly_locate_enable="NO"

If you reconfigure this, all periodic filesystem traversals should cease, but then the nightly security jobs won't scan-for and report on changes in the list of setuid programs or any files with negative group permissions, and the database for the locate(1) command won't be updated, so please make sure you understand the trade-offs if you decide to disable these security/weekly jobs.

IMHO, the recommendation by SirDice to "stagger" the time of the nightly cron jobs for each jail is the easiest/standard way most sysadmins reduce parallel-contention of filesystems.

You might find it an acceptable trade-off if you only disable the above periodic tasks in your jail(s), because the parent system's periodic scans should traverse into jails.
 
You might find it an acceptable trade-off if you only disable the above periodic tasks in your jail(s), because the parent system's periodic scans should traverse into jails.

I agree - I think the conflict between the system host and the vnet jail (parent jail) may be the issue at this point. No jobs for the vnet jail have been disabled, so I will attempt that next.
 
After another night...

In crontab settings, the host periodic schedule runs first. It appears (Surprisingly) that all of the jail (vnet parent and child jails) reports are done before the host periodic is done with 100.chksetuid.

It had been 3.5 hours and so I killed the host 100.chksetuid and as expected the host 110.neggrpperm began. It seems there is something causing 100.chksetuid on the host system to be blocked from starting (or finishing?). I got an email report from it and there was no indication of termination.

I'm going to have to manually run it to troubleshoot some more. I am still seeing no errors in logs.
 
After another night...
[...]
I'm going to have to manually run it to troubleshoot some more. I am still seeing no errors in logs.

Wow, 3.5 hours does suggest your find is probably hanging or going really slow on some sub-portion of your hierarchy, at least I don't recall ever seeing a full local filesystem find / take that long ever on any of my systems, even on a huge RAID array.

I recommend you run it in the foreground as root, via:

/etc/periodic/security/110.chksetuid

Then periodically typing ^T (CTRL-T) while that job is in the foreground -- find will report the current path it's on ("Scanning: /path"), and it also says how many inodes have been scanned so-far ("Scanned: NNNN"), so you should be able to tell if/when it "stalls".

It makes sense that the nightly security jobs would take longer, because they will always scan the entire system, whereas each jail only scans the root of the jail downward.

FYI, I just ran 110.chksetuid manually on one of my 14.3 systems: it took approx 3 minutes to scan the complete hierarchy with about 13 million inodes in-use (UFS2 on SSD).
 
Back
Top