How to run services at high real-time priority?

GrandAdmiralThrawn · Jan 28, 2021

I have just found rtprio(1), and it does what I want wonderfully well: To make specific system services execute and return data to a front-end blazing fast even on an (intentionally) heavily loaded machine in terms of CPU usage by running something like # rtprio 0 -n on them, with n being the numeric PID.

In my case, the services would be MySQL, nginx and php-fpm. I just set them to real-time priority 0 and wow, the results are amazing, so much more responsive while the machine is under massive load. And while some warning signal at the back of my head tells me that what I'm doing might be dangerous, it shouldn't matter: I'm the only user on said machine. If I mess up, I know who's responsible, and I'm gonna be the only one who's going to suffer.

My question is: What would be the best practice to configure specific services to run at a real-time priority of my choice? I'm thinking that modifying RC scripts or service-specific startup scripts is probably not the right way?

How would I do that properly?

Thank you!

PMc · Jan 29, 2021

Well, depends what You want to do. I for my part don't really see the point in putting a database to rtprio, as this is payload and so I would rather plan for the required resources (using racct/rctl and cpuset).
What I did with rtprio is on a very old machine make the network fast enough for VoIP - that is, run ppp, natd, etc. in rtprio).

The danger with rtprio is, if these processes go into an endless loop, there is no way to stop them anymore. With SMP the danger would only appear when there are as many endless loops as there are CPUs. Anyway, what I figured is this:
1. Touch a file, every minute, from cron. In normal prio.
2. Run a watchdog daemon, using the highest rtprio, that checks this file is not too old - and if it is, revoke the rtprio from all the other processes (and terminate).
3. Run all other processes with a lower rtprio. Provide a list of PID-files of these processes, so they can easily be engaged and disengaged.

Code:

/* $Id: rtwatchd.c,v 1.5 2019/12/24 20:22:04 admin Exp $
*
* Setzt rtprio von Prozessen ab wenn das system einfriert.
*
* Prios:    31 = rtprio:79 = langsamste
*            0 = rtprio:48 = schnellste
* Aufteilung: 31 - 21    sollen packet-handling/VoIP nicht bremsen
*             21 - 18    packet-handling/VoIP
*             18 - 10    vorrangig über packet-h./VoIP, aber überwacht
*             9          der Watchdog
*             8 - 0      vom Watchdog ignoriert
*/

# include <sys/types.h>
# include <sys/rtprio.h>
# include <sys/sysctl.h>
# include <sys/user.h>
# include <sys/priority.h>
# include <sys/stat.h>
# include <sys/mman.h>
# include <unistd.h>
# include <stdio.h>
# include <stdlib.h>
# include <malloc_np.h>
# include <string.h>
# include <syslog.h>

# define THE_MIB "kern.proc.all"
# define RESERVE 250000
# define MYPRIO 9
# define LOOPTIME 135

extern int errno;

char *progname;

void usage( void )
{
  fprintf( stderr, "Usage: %s file-to-watch\n", progname);
  exit(1);
}

time_t getmtime( char *file )
{
  struct stat attr;
  if(stat(file, &attr) != 0)
    attr.st_mtime = 0;
  return(attr.st_mtime);
}

void slowdown(int active)
{
  size_t size;
  int count;
  struct kinfo_proc *buffer, *p;
  struct rtprocs {
    pid_t pid;
    char *name;
    struct rtprocs *next;
  } *pilist = NULL, **b = &pilist, *c, *d;   
  struct rtprio prio;
 
  if(sysctlbyname(THE_MIB, NULL, &size, NULL, 0) != 0) {
      syslog(LOG_ALERT,
             "failed to access %s: %s\n", THE_MIB, strerror(errno));
    exit(1);
  }
  size += RESERVE;  /* lieber ein bischen mehr allozieren */
  if((buffer = malloc(size)) == NULL) {
    syslog(LOG_ALERT, "malloc() failure %d, exiting\n", errno);
    exit(1);
  }
  if(sysctlbyname(THE_MIB, buffer, &size, NULL, 0) != 0) {
    syslog(LOG_ALERT, "failure %d retrieving procs, exiting\n", errno);
    exit(1);
  }

  count = size / sizeof(struct kinfo_proc);
  p = buffer;
  while(count-- != 0) {
    if(p->ki_pri.pri_class == PRI_REALTIME &&
       p->ki_pri.pri_user > PRI_MIN_REALTIME + MYPRIO) {
      struct rtprocs *a = pilist;
      int schonda = 0;

      /* Die liste enthält threads, wir wollen die pid nur einmal */
      while(a != NULL) {
        if(a->pid == p->ki_pid)
          schonda = 1;
        a = a->next;
      }
      if (schonda == 0) {
        if((*b = malloc(sizeof(struct rtprocs))) == NULL) {
          syslog(LOG_ALERT, "malloc() failure %d, exiting\n", errno);
          exit(1);
        }
        /* irnwie sieht das als linked-list komisch aus, aber funzt */
        (*b)->pid = p->ki_pid;
        (*b)->name = p->ki_comm;
        (*b)->next = NULL;
        b = &((*b)->next);
      }
    }
    p++;
  } 
  prio.type = RTP_PRIO_NORMAL;
  prio.prio = 0;  /* Keine Ahnung was man da setzen soll */
  c = pilist;
  while(c != NULL) {
    if(active) {
      rtprio(RTP_SET, c->pid, &prio);
      syslog(LOG_CRIT, "revoking rtprio for pid=%d (%s)\n", c->pid, c->name);
    }
    d = c->next;
    free(c);
    c = d;
  }
  free(buffer);
}

int main (int argc, char *argv[])
{
  struct rtprio prio;
  char *filen;
  time_t filemtime, last;
 
  progname = argv[0];
  if(argc != 2 || access(argv[1], F_OK) != 0)
    usage();
  filen = argv[1];

  /* Auf Realtimeprio wechseln */
  prio.type = RTP_PRIO_REALTIME;
  prio.prio = MYPRIO;
  if(rtprio(RTP_SET, 0, &prio) != 0) {
    syslog(LOG_ALERT, "failed to set realtime priority: %s\n",
             strerror(errno));
    exit(1);
  }

  /* Testaufruf */
  slowdown(0);
 
  /* Aktionen nach log/messages reporten */
  openlog("rtwatchd", LOG_CONS, LOG_DAEMON);

  /* Das ist nur falls swap-exhaustion, sollte nie passieren */
  if(madvise(0, 0, MADV_PROTECT) != 0)
    syslog(LOG_ALERT, "failure %d to madvise\n", errno);

  /* Das ist interessanter, aber konfliktiert mit zfs wired  */
  if(mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
    syslog(LOG_ALERT, "failure %d to mlockall\n", errno);
  }
  time(&last);
  while(1) {
    sleep(LOOPTIME);
    if(getmtime(filen) < last) {
      slowdown(1);
      exit(0);
    }
    time(&last);
  }
}

This one compiles to /ext/sbin/rtwatchd.

And here is /ext/etc/rc.d/rtwatchd:

Code:

#!/bin/sh

# PROVIDE: rtwatchd
# REQUIRE: cron jail
# KEYWORD: shutdown

. /etc/rc.subr

name="rtwatchd"
rcvar=rtwatchd_enable

# read configuration and set defaults
load_rc_config $name
: ${rtwatchd_enable:="NO"}
: ${rtwatchd_beaconfile:="/dev/null"}
: ${rtwatchd_pidfiles:=""}
: ${rtwatchd_rtprio:="20"}

pidfile="/var/run/${name}.pid"
procname="/ext/sbin/${name}"
command="/usr/sbin/daemon"
command_args="-S -p ${pidfile} ${procname} ${rtwatchd_beaconfile}"
start_postcmd="${name}_rtprio on"
stop_precmd="${name}_rtprio off"

rtwatchd_rtprio()
{
        if test "$rtwatchd_pidfiles"; then
                if test $1 = on; then
                        echo "Setting network daemons to realtime."
                fi
                if test $1 = off; then
                        echo "Leaving realtime mode."
                fi
                for p in $rtwatchd_pidfiles; do
                        if test -r "$p"; then
                                if test $1 = on; then
                                        rtprio "$rtwatchd_rtprio" -`cat $p`
                                fi
                                if test $1 = off; then
                                        rtprio -t -`cat $p`
                                fi
                        fi
                done
        fi
}

/usr/bin/touch "${rtwatchd_beaconfile}"
run_rc_command "$1"

And this is in /etc/rc.conf.local:

Code:

rtwatchd_enable="YES"
rtwatchd_beaconfile="/var/run/timeshare.alive"
rtwatchd_rtprio="20"
rtwatchd_pidfiles="/j/conn/var/run/tun0.pid /j/conn/var/run/natd.pid \
        /var/run/ntpd.pid"

GrandAdmiralThrawn · Jan 29, 2021

Thank you very much, especially for providing the code!

I have already counted the processes I chose to give real-time priority yesterday and it's 14 in total, with 64 logical CPUs being available on the system.

Actually, I haven't properly tested which specific component it is that slows everything down so much when the machine is under high load, so as to whether it's the SQL server or PHP for instance. I should test this more thoroughly.

What I want to do (or think I want to do) is to have certain services start up at certain RT priorities at boot time (or service start time) without manual interaction. Other, lower priority processes should only give up CPU resources to those services when needed. But when needed, they should let those services run uninterrupted. Your watchdog can probably make this a lot safer.

I tried renicing first, but that simply doesn't do the trick at all, even if I set my compute processes to nice level 20 and the service programs to -20.

As to why the machine experiences such high loads: Because it's more convenient to just fire all those compute processes away at once, and just get notified via mail when they terminate. Fire & forget. Also, the load fluctuates over time, so if I start less compute processes, the CPU will not be fully loaded at all times, wasting precious CPU time. Hence, I overload it a bit, ensuring that it never goes below 100% while running my compute jobs. I guess too much is bad too (context switching?), but yeah. It happens.

What I don't want to do: Assign an entire CPU core to just those services. Because most of the time they'd be almost idle, just collecting bits of data over time. But when I go ahead and request a large part of the data via the web, I want it to load as fast as the hardware can do it with no other process getting in the way.

PS.: I'm still rather new to FreeBSD, I still have to read up on rctl and cpuset.

Again, thanks!

Edit: I have re-inspected the three services, nginx, php-fpm and MySQL. MySQL has eaten up far more CPU than the other two. Like 32 times as much as php-fpm and nginx combined. So that's the process that needs to be given top priority here I assume, because that's the one I guess I'm waiting for the most in high load scenarios when refreshing the corresponding web page.

How to run services at high real-time priority?

GrandAdmiralThrawn

PMc

GrandAdmiralThrawn