HOWTO: monitor load average of your servers

Many times I came across the problem, that when you need to instantly monitor the load average of your servers it is not very comfortable to stare at the console. Especially when you need to know what was going on when you were sleeping ;). So I wrote a very basic script for monitoring purposes. When load average is higher than it should be an email notification will be sent to specified address. Feel free to modify it ;).
Code:
#!/bin/sh

MAX_LAVG=1                #set the MAX load average value
EMAIL=your@email.here     #set the email to send the notification
INTERVAL=30               #set the time interval in seconds to check the load average value

while sleep $INTERVAL
do

   LAVG=$(uptime | awk '{gsub(",",""); print $10}')

   MLAVG=`echo $LAVG|awk -F \. '{print $1}'`
   if [ "$MLAVG" -ge "$MAX_LAVG" ]; then

      SUBJECT="$(hostname) LOAD AVERAGE ALERT $LAVG (>$MAX_LAVG)"
      EMAILMESSAGE="WARNING! Load average is $LAVG an is more than $MAX_LAVG on $(date)"

      echo $EMAILMESSAGE | mail -s "$SUBJECT" "$EMAIL"
   fi
done
 
Really bad idea, server load may spike for few seconds or minutes and it will sent an email. You may wanna modify script so that if last 15 minutes average is above 5 or something like that. A better solution is monit or other such tools.
 
30 seconds INTERVAL is just an example. Anyone can set his own value. I do usually set 600 seconds interval. So if there is a spike for a few seconds/minutes email won't be sent.
 
Thanks for replies

Thanks to all for replies.
I'm using nagios as well for some of my servers.

@vermaden
Will try bsdsar(1) when I have some spare time. Thanks.

Probably should had given this post slightly another title like:
" Quick script to monitor load average " :p
 
Back
Top