Auto restart a process on crash

Hello,

I have a problem with a process that dies so I need some script or something that will check to see if the process is running and if not, start the process.

Can someone help please? Thanks and sorry for my English.
 
To see if process is running, You can use several commands: ps(1) and top(1). Also grep(1) is very useful;

Code:
$ ps
$ ps ax |grep process_name
$ top -S
$
etc...
The output are similar, but top(1) automatically updates this display every two seconds. Then check if process which You are looking for is displayed, or not. And then run it.

Writing scripts is not my strong point, but I have done something like that. Please improve this example.
Code:
#!/bin/sh

process = 'firefox-bin'

if ps ax | grep -v grep | grep $process
then
    echo "$process is alive."
else
    echo "$process is dead, but will be launched."
/usr/local/bin/firefox3

fi
Generally, the script structure looks like this: few steps.

Code:
 if process is running
[I][color="DarkGreen"]then[/color][/I] 
 do nothing
[I][color="DarkGreen"]else[/color][/I] 
 start process
 
A script automatically checking the output of top, or ps, or alike will consume some CPU time. IMHO, monitoring and auto-restarting should be done by some sort of a guard process, that launches the process to be monitored as a child, and that respawns its child when it died. On Mac OS X 10.4-10.6, launchd is used for this, and on Mac OS X Server 10.2-10.3 there was a quite sophisticated watchdogd, that could do this.

I am new to FreeBSD, and unfortunately I am not aware of a utility for FreeBSD, that serves for this - for sure there is something, and I would also be interested to learn about this.

Anyway, here comes a (working) prove of the concept written in C:

Code:
//pguard.c -- process guard
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
   pid_t pid;
   int   statloc;
   
   do
   {
      if ((pid = fork()) == 0)
         goto launch_child;
      
      if (pid < 0)
         return 1;

      if (pid > 0)
         wait(&statloc);
   }
   while (1);

launch_child:
   execv(argv[0], &argv[1]);
   return 0;
}

This would be compiled by:
[CMD=""]cc pguard.c -o pguard[/CMD]

Usage:
[CMD=""]pguard command args[/CMD]

For example, I tested it for maintaining a SSH Tunnel for MySQL database replication from one server to another.

[CMD=""]pguard /usr/bin/ssh -N -L4306:127.0.0.1:3306 tunnel@example.com > /dev/null &[/CMD]

Of course, authentication has to be done with RSA keys. Note also, that ssh is not in -f (background) mode. If the tunnel dies for some reason, pguard would respawn it.

Again, this is only a prove of the concept, it is missing error handling, and other sophisticated features like running in the background, guarding more than one process, having scheduling capabilities, etc. So, again, I would be very interested to learn about such a beast on FreeBSD.

Best regards

Rolf
 
I'm just wondering something here.. Wouldn't starting this process in a while (true) loop be just as useful? Whenever the process ends, it'll just be started again - and therefore doesn't require another application checking if it is still running?

Something like this:
Code:
#!/bin/sh

while (true); do
/usr/bin/firefox3
done
 
If the process forks into the background, you will Denial-of-Service yourself by continuously starting up new firefox processes.

To test this, just run the process from the command-line. If you get returned to your prompt while the process is running, then your while loop will kill your system.
 
Take a look: fscd -- service state monitoring daemon

You may want to take a look at fscd(8). From the FreeBSD Status Report - 4Q/2010:

FreeBSD Services Control (fsc)

Contact: Tom Rhodes <trhodes@FreeBSD.org>

FreeBSD Services Control is a mix of binaries which integrate into the
rc.d system and provide for service (daemon) monitoring. It knows about
signals, pidfiles, and uses very little resources.

The fscd utilities will be set up as a port and, hopefully, dropped
into the ports collection in the coming weeks. This will allow easier
testing by everyone and it should make migration into -CURRENT much
easier.

Here's a link to the proposed port (I assume anyway):

http://people.freebsd.org/~trhodes/fsc/fsc-port.tar
 
phoenix said:
If the process forks into the background, you will Denial-of-Service yourself by continuously starting up new firefox processes.

I guess this is one of the reasons why many daemons have a command line switch for not to daemonize, for example:

Code:
httpd -D FOREGROUND
smbd -F
sshd -D
ntpd -n
afpd -d

Or others do daemonize only when a switch is set:

Code:
ftpd -D
ssh -f

The script of kyentei can be used in any of the above cases, and yes it will autorestart the daemon when it crashes for some reason.

Best regards

Rolf
 
sysutils/fsc seems to use kqueue(2) to get notification of terminated process.

But here is my question: can kqueue(2) get an EVFILT_PROC for a crashed process? I just now got a piece of software of my own make that cannot get a notification for a crashed process. My software got only EVFILT_PROC events for processes that terminated normally. What am I missing here?

EDIT: I will post a piece of code tomorrow that will demonstrate what I mean. However, if it is true that kqueue(2) can get an event for a process that has crashed (not terminated normally), then I could finally start working on a parallel-startup and fifo-activation providing service manager ...
 
Okay, problem solved. It seems that kqueue() was broken in 10.1 for a brief moment. I am seriously sure about this! Because now when I tested, a crashed process would generate an EVFILT_PROC event. Strange, really strange.

Anyway, carry on.
 
I gotta get back to designing the service executive. I need something that works on FreeBSD, even with the rc.d-scripts as service utilities, and it needs to have support for defining watchdogs (eg. programs that will be executed after a spesific service goes down). And the whole thing needs to have extremely small footprint, and depend only on base system facilities and libraries.
 
A cool short little script I use as temporary solution sometimes goes like:
Code:
until script; do
  echo 'crashed'
  sleep 1
done;

The script will keep restarting itself until it exits cleanly.
 
Back
Top