"service" confused about various services?

I've had a problem for a while where various services would just shut down every once in a while for no reason I could discern. I wouldn't see it shut down; rather I would discover that at some point in the past, it had shut down. Log messages hadn't been very helpful; they typically just said something like "SIGTERM received, shutting down". Today, while investigating a totally different issue, I happened to be watching a tail -f /var/log/daemon.log in one window as I issued a service stop athens in another. To my surprise, I saw that not only did athens shut down, but also several other services (including but perhaps not limited to forgejo, vault and step_ca). This is reproducible. I am pretty sure this at least partially explains the seemingly random shutdowns I've been seeing -- I shut down a service, and that causes other services to also stop, which I'm not expecting or looking for and don't notice until possibly hours or days later. But... what, in turn, explains it?

Since discovering this, I've been trying to poke around in the rc stuff, but I'm not really familiar with it, so I'm wondering if perhaps anybody might have any ideas, or hints as to what I should look for. It seems to me like service is confused about what is what. Here's some example wackiness:

Code:
# service athens status; service forgejo status
athens is not running.
forgejo is not running.
#
# service athens start
Starting athens.
#
# service athens status
athens is running as pid 20141.
#
# service forgejo start
#
# service forgejo status
forgejo is running as pid 60023.
#

OK, all cool so far. Until I check athens again:

Code:
# service athens status
athens is running as pid 20141 59541.
#

Huh? After starting forgejo, service now says that athens is running as two separate PIDs? One of which (20141) is what it was running as earlier, but the other (59541) is... what?

Code:
# ps aux -ww | rg 59541
root     59541   0.0  0.0   13856   2200  -  Is   19:44       0:00.00 daemon: /usr/bin/env[60023] (daemon)
root     15378   0.0  0.0   18888   6432  6  R+   19:46       0:00.00 rg 59541
#

So 59541 is daemon, running env, I guess. As PID-ish-looking thing 60023, I guess? Let's check that:

Code:
# ps aux -ww | rg 60023
root     59541   0.0  0.0   13856   2200  -  Is   19:44       0:00.00 daemon: /usr/bin/env[60023] (daemon)
git      60023   0.0  0.3 1452368 173388  -  I    19:44       0:01.80 /usr/local/sbin/forgejo web
root     30023   0.0  0.0   19068   6560  6  R+   19:50       0:00.00 rg 60023
#

So 60023 is... forgejo? The strange new second PID for athens is the PID of daemon for forgejo?

Wait, let's check the same sort of thing for 20141:

Code:
# ps aux -ww | rg 20141
root     20141   0.0  0.0   13856   2208  -  Is   19:42       0:00.00 daemon: /usr/local/bin/athens[20898] (daemon)
root     36223   0.0  0.0   19068   6592  6  R+   19:51       0:00.00 rg 20141
#

It's daemon for athens, and it seems to have something to do with 20898:

Code:
# ps aux -www | rg 20898
root     20141   0.0  0.0   13856   2208  -  Is   19:42       0:00.00 daemon: /usr/local/bin/athens[20898] (daemon)
root     20898   0.0  0.1 1322380  37972  -  I    19:42       0:00.13 /usr/local/bin/athens -config_file /usr/local/etc/athens/athens.toml
root     43371   0.0  0.0   19068   6580  6  R+   19:53       0:00.00 rg 20898

And what happens if I shut down athens?

Code:
# service forgejo status
forgejo is running as pid 60023.
#
# service athens status
athens is running as pid 20141 59541.
#
# service athens stop
Stopping athens.
Waiting for PIDS: 20141 59541.
#
# service athens status; service forgejo status
athens is not running.
forgejo is not running.
#
# wtf
 _______
( WTF?! )
 -------
        o   ^__^
         o  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
 ______
< Moo! >
 ------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
#

In summary, I am ignorant and confused, but this seems wacky to me. Any help would be appreciated greatly. Thanks in advance.
 
OK, after looking into it further, I have a theory. However, as I've said, I know very little about this stuff, and so I want to run it past you all here in case I'm just blatantly misunderstanding something, and also to ask what should be done about it if I'm correct:

In /etc/rc.subr, it seems to me like doing a status will cause the system to look up a PID from a pidfile, using function check_pidfile()... unless, of course, there is no defined pidfile. In that case, it will instead get the PID via the function check_process() instead.

And /usr/local/etc/rc.d/athens has no ${pidfile} set in it (at least not directly). So when you do a service athens status or a service athens stop, it gets the PID via the process.

I do not know enough about the details of the whole rc system, but I'm guessing "the process" is defined in some way based the rc.d script. In /usr/local/etc/rc.d/athens, the ${command} variable is not for athens; it's for daemon (which is passed arguments invoking athens).

I haven't checked them all, but at least some of the various other things that have been affected by this for me (like forgejo) also use daemon. So, I'm guessing, service athens status gets a list of all PIDs of everything that uses daemon. And service athens stop stops them all.

Looking closer at check_process, it does in fact return a list of PIDs, not (necessarily) just a single PID, so that seems to mesh with my theory.

So, do I seem to be making sense with this idea? If so, what should be done about it?

It seems like the obvious solution (at least in a "just get it working" sense) is to make /usr/local/etc/rc.d/athens use a pidfile. But (again, keep in mind I'm not really familiar with any of this, so forgive me if this is dumb) it seems like perhaps daemon should somehow be treated specially by the whole rc system. Maybe, for example, service should refuse to handle things that use daemon but don't have pidfiles, or something like that.
 
Back
Top