Enterprise-class reporting/monitoring with Graphite

Hey all,

I´ve been rather jealous over those very fancy reporting graphs[1] included in FreeNAS got me reading this[2] article that explains how to set that up manually. That information is priceless in times when your system just feels wrong, but you´re not exactly sure why. Those graphs can show that and more.

[1]: http://doc.freenas.org/index.php/Reporting
[2]: http://www.flagword.net/2014/01/installing-and-configuring-graphite-with-collectd-on-freebsd/

The article was centred around NGINX, but as I´m more of an Apache kind of guy, this is what I´ll be showing you here. So let´s get to it!

Install and configure graphite, then start carbon (graphite backend)
Be mindful of carbon´s storage retentions here! Read up on examples:
http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf
Code:
if [ "$(grep -c carbon /etc/rc.conf 2>&1)" -eq "0" ]; then
  if [ "$(pkg_info | egrep -c '(graphite-web|carbon)' 2>&1)" -eq "0" ]; then
    cd /usr/ports/www/py-graphite-web
    make install distclean
    if [ $? -gt "0" ]; then
      echo '+-- Failed to install py-graphite-web. Aborting.'
      exit 1
    fi
    cat > /usr/local/etc/carbon/storage-schemas.conf << EOF
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[default]
pattern = .*
retentions = 15s:7d,1m:28d,60m:1y
EOF
    cat > /usr/local/etc/carbon/carbon.conf << EOF
[cache]
GRAPHITE_ROOT        = /usr/local/graphite
GRAPHITE_CONF_DIR    = /usr/local/etc/carbon
GRAPHITE_STORAGE_DIR = /usr/local/graphite/storage/
STORAGE_DIR          = /usr/local/graphite/storage/
LOCAL_DATA_DIR       = /usr/local/graphite/storage/whisper/
CONF_DIR             = /usr/local/etc/carbon
LOG_DIR              = /usr/local/graphite/storage/log/
PID_DIR              = /var/run
ENABLE_LOGROTATION = True
USER =
MAX_CACHE_SIZE = inf
MAX_UPDATES_PER_SECOND = 500
MAX_CREATES_PER_MINUTE = 50
LINE_RECEIVER_INTERFACE = 127.0.0.1
LINE_RECEIVER_PORT = 2003
ENABLE_UDP_LISTENER = False
UDP_RECEIVER_INTERFACE = 127.0.0.1
UDP_RECEIVER_PORT = 2003
PICKLE_RECEIVER_INTERFACE = 127.0.0.1
PICKLE_RECEIVER_PORT = 2004
LOG_LISTENER_CONNECTIONS = True
USE_INSECURE_UNPICKLER = False
CACHE_QUERY_INTERFACE = 127.0.0.1
CACHE_QUERY_PORT = 7002
USE_FLOW_CONTROL = True
LOG_UPDATES = False
LOG_CACHE_HITS = False
LOG_CACHE_QUEUE_SORTS = True
CACHE_WRITE_STRATEGY = sorted
WHISPER_AUTOFLUSH = False
WHISPER_FALLOCATE_CREATE = True
[relay]
LINE_RECEIVER_INTERFACE = 127.0.0.1
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 127.0.0.1
PICKLE_RECEIVER_PORT = 2014
LOG_LISTENER_CONNECTIONS = True
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2004
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
[aggregator]
LINE_RECEIVER_INTERFACE = 127.0.0.1
LINE_RECEIVER_PORT = 2023
PICKLE_RECEIVER_INTERFACE = 127.0.0.1
PICKLE_RECEIVER_PORT = 2024
LOG_LISTENER_CONNECTIONS = True
FORWARD_ALL = True
DESTINATIONS = 127.0.0.1:2004
REPLICATION_FACTOR = 1
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_AGGREGATION_INTERVALS = 5
EOF
    echo 'carbon_enable="YES"' >> /etc/rc.conf
    service carbon start
    if [ $? -gt "0" ]; then
      echo '+-- Failed to start carbon service. Aborting.'
      exit 1
    fi
    while [ ! -d /usr/local/graphite ]; do
      sleep 1
    done
    mkdir -p /usr/local/graphite/storage/log/webapp /usr/local/graphite/storage/rrd
    for i in img js css html; do
      cp -r /usr/local/share/graphite-web/content/$i /usr/local/graphite/webapp/
    done
    cat > /usr/local/lib/python2.7/site-packages/graphite/local_settings.py << EOF
#TIME_ZONE = 'Europe/Stockholm'
GRAPHITE_ROOT = '/usr/local/graphite'
CONF_DIR = '/usr/local/etc/graphite'
STORAGE_DIR = '/usr/local/graphite/storage'
CONTENT_DIR = '/usr/local/graphite/webapp'
DASHBOARD_CONF = '/usr/local/etc/graphite/dashboard.conf'
GRAPHTEMPLATES_CONF = '/usr/local/etc/graphite/graphTemplates.conf'
WHISPER_DIR = '/usr/local/graphite/storage/whisper'
RRD_DIR = '/usr/local/graphite/storage/rrd'
LOG_DIR = '/usr/local/graphite/storage/log/webapp'
DATABASES = {
    'default': {
        'NAME': '/usr/local/graphite/storage/graphite.db',
        'ENGINE': 'django.db.backends.sqlite3',
        'USER': '',
        'PASSWORD': '',
        'HOST': '',
        'PORT': ''
    }
}
EOF
    cd /usr/local/lib/python2.7/site-packages/graphite
    echo 'no' | python manage.py syncdb
    if [ $? -gt "0" ]; then
      echo '+-- Failed to run "manage.py syncdb". Aborting.'
      exit 1
    fi
    cat > /usr/local/etc/graphite/graphite.wsgi << EOF
import os, sys
sys.path.append('/usr/local/graphite/webapp')
os.environ['DJANGO_SETTINGS_MODULE'] = 'graphite.settings'
import django
import django.core.handlers.wsgi
django.setup()
application = django.core.handlers.wsgi.WSGIHandler()
from graphite.logger import log
log.info("graphite.wsgi - pid %d - reloading search index" % os.getpid())
import graphite.metrics.search
EOF
    cat > /usr/local/etc/graphite/dashboard.conf << EOF
[ui]
default_graph_width = 400
default_graph_height = 250
automatic_variants = true
refresh_interval = 60
autocomplete_delay = 375
merge_hover_delay = 750
theme = default
[keyboard-shortcuts]
toggle_toolbar = ctrl-z
toggle_metrics_panel = ctrl-space
erase_all_graphs = alt-x
save_dashboard = alt-s
completer_add_metrics = alt-enter
completer_del_metrics = alt-backspace
give_completer_focus = shift-space
EOF
    cat > /usr/local/etc/graphite/graphTemplates.conf << EOF
[default]
background = black
foreground = white
majorLine = white
minorLine = grey
lineColors = blue,green,red,purple,brown,yellow,aqua,grey,magenta,pink,gold,rose
fontName = Sans
fontSize = 10
fontBold = False
fontItalic = False
[noc]
background = black
foreground = white
majorLine = white
minorLine = grey
lineColors = blue,green,red,yellow,purple,brown,aqua,grey,magenta,pink,gold,rose
fontName = Sans
fontSize = 10
fontBold = False
fontItalic = False
[plain]
background = white
foreground = black
minorLine = grey
majorLine = rose
[summary]
background = black
[alphas]
background = white
foreground = black
majorLine = grey
minorLine = rose
lineColors = 00ff00aa,ff000077,00337799
EOF
    chown -R www:www /usr/local/graphite
  else
    echo 'Package "graphite-web" or "carbon" already installed. Aborting.'
    exit 1
  fi
else
  echo 'Service "carbon" is already in "rc.conf". Aborting.'
  exit 1
fi

Install, configure and start uwsgi
The "-p 1" uwsgi flag you echo into rc.conf decides how many threads it should work on, in this case one thread. Most servers nowadays have many cores, so if you have four cores, put in "-p 4" for better performance.
Code:
if [ "$(grep -c uwsgi /etc/rc.conf 2>&1)" -eq "0" ]; then
  if [ "$(pkg_info | grep -c uwsgi 2>&1)" -eq "0" ]; then
    cd /usr/ports/www/uwsgi
    make install distclean
    if [ $? -gt "0" ]; then
      echo '+-- Failed to install uwsgi. Aborting.'
      exit 1
    fi
    echo 'uwsgi_enable="YES"' >> /etc/rc.conf
    echo 'uwsgi_flags="-L -M -p 1 --socket /tmp/uwsgi.sock --gid 80 --uid 80 --python-path /usr/local/lib/python2.7/site-packages/ --chdir /usr/local/etc/graphite/ -w graphite"' >> /etc/rc.conf
    service uwsgi start
    if [ $? -gt "0" ]; then
      echo '+-- Failed to start uwsgi service. Aborting.'
      exit 1
    fi
  else
    echo 'Package "uwsgi" already installed. Aborting.'
    exit 1
  fi
else
  echo 'Service "uwsgi" is already in "rc.conf". Aborting.'
  exit 1
fi

Install, configure and start apache
Remember to set a HOST_NAME value here! Just like this:
# HOST_NAME="myhostname.foo.bar"
Code:
if [ -z ${HOST_NAME} ]; then
  echo "You have to assign a HOST_NAME variable!"
  exit 1
fi

if [ "$(grep -c apache22 /etc/rc.conf 2>&1)" -eq "0" ]; then
  if [ "$(pkg_info | grep -c apache22 2>&1)" -eq "0" ]; then
    cd /usr/ports/www/apache22
    make install distclean
    if [ $? -gt "0" ]; then
      echo '+-- Failed to install apache22. Aborting.'
      exit 1
    fi
    cd /usr/ports/www/mod_wsgi3
    make install distclean
    if [ $? -gt "0" ]; then
      echo '+-- Failed to install mod_wsgi3. Aborting.'
      exit 1
    fi
    if [ ! -d /etc/pki/graphite ]; then
      mkdir -p /etc/pki/graphite
    fi
    openssl req -new -newkey rsa:4096 -days 3650 -nodes -x509 -subj "/CN=${HOST_NAME}" -keyout /etc/pki/graphite/${HOST_NAME}.key -out /etc/pki/graphite/${HOST_NAME}.cert
    cat > /usr/local/etc/apache22/Includes/graphite.conf << EOF
RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule ^/?(.*) https://%{SERVER_NAME}/\$1 [R,L]

WSGISocketPrefix /tmp/wsgi

Listen 443
<VirtualHost *:443>

        SSLEngine on
        SSLProtocol -ALL +SSLv3 +TLSv1
        SSLCipherSuite ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP

        SSLCertificateFile        /etc/pki/graphite/${HOST_NAME}.cert
        SSLCertificateKeyFile     /etc/pki/graphite/${HOST_NAME}.key

        ServerName ${HOST_NAME}
        DocumentRoot "/usr/local/graphite/webapp/"
        ErrorLog /usr/local/graphite/storage/log/webapp/error.log
        CustomLog /usr/local/graphite/storage/log/webapp/access.log common
        WSGIDaemonProcess graphite processes=5 threads=5 inactivity-timeout=120 display-name=graphite
        WSGIProcessGroup graphite
        WSGIApplicationGroup %{GLOBAL}
        WSGIImportScript /usr/local/etc/graphite/graphite.wsgi process-group=graphite application-group=%{GLOBAL}
        WSGIScriptAlias / /usr/local/etc/graphite/graphite.wsgi
        Alias /content/ /usr/local/graphite/webapp/
        <Location "/content/">
                SetHandler None
        </Location>
        Alias /media/ /usr/local/lib/python2.7/site-packages/django
        <Location "/media/">
                SetHandler None
        </Location>
        Alias /static/ "/usr/local/lib/python2.7/site-packages/django/contrib/admin/static/"
        <Location "/static/">
                SetHandler None
        </Location>
        <Directory "/usr/local/lib/python2.7/site-packages/django/contrib/admin/static/">
                Order deny,allow
                Allow from all
        </Directory>
        <Directory /usr/local/etc/graphite/>
                Order deny,allow
                Allow from all
        </Directory>
        <Directory /usr/local/graphite/webapp/>
                Order deny,allow
                Allow from all
        </Directory>

        <Location "/">
            AuthType basic
            AuthName "Graphite"
            AuthBasicProvider ldap
            AuthLDAPBindDN "CN=someuser,OU=Users,DC=ad,DC=foo,DC=bar"
            AuthLDAPBindPassword VerySecretPassword
            AuthLDAPURL ldap://ad.foo.bar:3268/DC=ad,DC=foo,DC=bar?sAMAccountName?sub?(objectClass=*)
            AuthLDAPGroupAttributeIsDN off
            Require valid-user
        </Location>

</VirtualHost>
EOF
    echo 'apache22_enable="YES"' >> /etc/rc.conf
    service apache22 start
    if [ $? -gt "0" ]; then
      echo '+-- Failed to start apache service. Aborting.'
      exit 1
    fi
  else
    echo 'Package "apache22" already installed. Aborting.'
    exit 1
  fi
else
  echo 'Service "apache22" is already in "rc.conf". Aborting.'
  exit 1
fi

Lastly install, configure and start collectd
You need the HOST_NAME variable set here as well. In case you logged out/in, or doing this from another session:
# HOST_NAME="myhostname.foo.bar"
In collectd.conf, you have to specify the network interface you using. I´ve already added the usual ones, but If your interface is missing, just add another line in there and restart collectd for it show up. Same goes for hard drives or RAID devices.
Code:
if [ -z ${HOST_NAME} ]; then
  echo "You have to assign a HOST_NAME variable!"
  exit 1
fi

if [ "$(grep -c collectd /etc/rc.conf 2>&1)" -eq "0" ]; then
  if [ "$(pkg_info | grep -c collectd5 2>&1)" -eq "0" ]; then
    cd /usr/ports/net-mgmt/collectd5
    make install distclean
    if [ $? -gt "0" ]; then
      echo '+-- Failed to install collectd5. Aborting.'
      exit 1
    fi
    ### I had this issue, not saying you will but best to leave this in, the rrdtool plugin wouldn´t work without it ###
    if [ "$(ldd /usr/local/lib/collectd/rrdtool.so | grep libpixman-1 | grep -c 'not found' 2>&1)" -gt "0" ]; then
      ln -s $(find /usr/local/lib/ | grep libpixman-1.so.) /usr/local/lib/$(ldd /usr/local/lib/collectd/rrdtool.so | grep libpixman-1 | grep 'not found' | awk '{print $1}')
    fi
    cat > /usr/local/etc/collectd.conf << EOF
LoadPlugin syslog
LoadPlugin cpu
LoadPlugin df
LoadPlugin disk
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin rrdtool
LoadPlugin write_graphite
LoadPlugin zfs_arc
<Plugin df>
	FSType "zfs"
</Plugin>
<Plugin disk>
        Disk "/^vtbd[0-9]+$/"
        Disk "/^[hs]d[a-f][0-9]?$/"
        Disk "/^d[a-f][0-9]+$/"
        IgnoreSelected false
</Plugin>
<Plugin interface>
        Interface "/^vtnet[0-9]+$/"
        Interface "/^vlan[0-9]+$/"
        Interface "/^lagg[0-9]+$/"
        Interface "/^bxe[0-9]+$/"
        Interface "/^de[0-9]+$/"
        Interface "/^em[0-9]+$/"
        Interface "/^igb[0-9]+$/"
        Interface "/^ixgbe[0-9]+$/"
        Interface "/^le[0-9]+$/"
        Interface "/^ti[0-9]+$/"
        Interface "/^txp[0-9]+$/"
        Interface "/^vx[0-9]+$/"
        Interface "/^miibus[0-9]+$/"
        Interface "/^ae[0-9]+$/"
        Interface "/^age[0-9]+$/"
        Interface "/^alc[0-9]+$/"
        Interface "/^ale[0-9]+$/"
        Interface "/^bce[0-9]+$/"
        Interface "/^bfe[0-9]+$/"
        Interface "/^bge[0-9]+$/"
        Interface "/^dc[0-9]+$/"
        Interface "/^et[0-9]+$/"
        Interface "/^fxp[0-9]+$/"
        Interface "/^jme[0-9]+$/"
        Interface "/^lge[0-9]+$/"
        Interface "/^msk[0-9]+$/"
        Interface "/^nfe[0-9]+$/"
        Interface "/^nge[0-9]+$/"
        Interface "/^nve[0-9]+$/"
        Interface "/^pcn[0-9]+$/"
        Interface "/^re[0-9]+$/"
        Interface "/^rl[0-9]+$/"
        Interface "/^sf[0-9]+$/"
        Interface "/^sge[0-9]+$/"
        Interface "/^sis[0-9]+$/"
        Interface "/^sk[0-9]+$/"
        Interface "/^ste[0-9]+$/"
        Interface "/^stge[0-9]+$/"
        Interface "/^tl[0-9]+$/"
        Interface "/^tx[0-9]+$/"
        Interface "/^vge[0-9]+$/"
        Interface "/^vr[0-9]+$/"
        Interface "/^wb[0-9]+$/"
        Interface "/^xl[0-9]+$/"
        Interface "/^cs[0-9]+$/"
        Interface "/^ed[0-9]+$/"
        Interface "/^ex[0-9]+$/"
        Interface "/^ep[0-9]+$/"
        Interface "/^fe[0-9]+$/"
        Interface "/^sn[0-9]+$/"
        Interface "/^xe[0-9]+$/"
        IgnoreSelected false
</Plugin>
<Plugin rrdtool>
	DataDir "/usr/local/graphite/storage/rrd"
	CreateFilesAsync false
	CacheTimeout 120
	CacheFlush   900
	WritesPerSecond 50
</Plugin>
<Plugin write_graphite>
  <Node "${HOST_NAME}">
    Host "127.0.0.1"
    Port "2003"
    Protocol "tcp"
    LogSendErrors true
    Prefix "collectd."
  </Node>
</Plugin>
EOF
    echo 'collectd_enable="YES"' >> /etc/rc.conf
    service collectd start
    if [ $? -gt "0" ]; then
      echo '+-- Failed to start collectd service. Aborting.'
      exit 1
    fi
  else
    echo 'Package "collectd5" already installed. Aborting.'
    exit 1
  fi
else
  echo 'Service "collectd" is already in "rc.conf". Aborting.'
  exit 1
fi

echo 'Graphite installation complete.'

You´ll end up with the dashboard accessible from https://myhostname.foo.bar/dashboard, no non-standard port that will always be a source of misunderstanding otherwise, but that´s just a matter of taste really. For Active Directory login you need a user that is used for LDAP-searches, so make it very unprivileged, only being able to read what it needs to read, since the password is written here in plain text. For smaller environments you can use "AuthBasicProvider file" instead, creating local apache accounts with htpasswd, instructions can found at http://httpd.apache.org/docs/current/howto/auth.html. The standard root URL http://myhostname.foo.bar/dashboard is automatically redirected to it´s https equivalent. For systems that are "out there" on the web, that is a must, in my opinion.

/Sebulon
 
  • Thanks
Reactions: Oko
Hi @Sebulon,

Thank you for sharing this post with us. Just a little bit of clarification for me. Are all the sections shell scripts? Do I need to copy the contents and run it as a./graphite.sh script?

Thank you
Fred
 
Last edited by a moderator:
fred974 said:
Hi @Sebulon,

Thank you for sharing this post with us.
Just a little bit of clarification for me...
Are all the section shell script?
Do I need to copy the contain and run it as a./graphite.sh script?

Thank you
Fred

Hey Fred!

Exactly, code doesn´t lie :) Copy/paste the code into:
Code:
# ee script.sh
(paste and save)
# chmod +x script.sh
# ./script.sh
# > script.sh

/Sebulon
 
Last edited by a moderator:
Thank you for the reply. How would you so the apache bit in a jail? I am asking as my web server runs inside a FreeBSD jail.
 
This was one of more interesting posts on this forum. I use FreeNAS (two servers) and from few days ago two new vanilla FreeBSD file servers. I am very well familiar with collectd and SNMP daemons in general. Some people might find my approach to enterprise-class monitoring little bit less elegant but very pragmatic and quick. SNMP daemon and collectd are already configured on FreeNAS. Setting them on OpenBSD or FreeBSD server is a five-minute job. On Linux it takes about ten minutes because lm_sensors are not integrated in SNMP by default. Speaking of which, the OpenBSD version of SNMP, which is fully integrated with OpenBSD built in sensoring infrastructure, is a pure gem. I use Observium to collect that data. While Observium is in ports due to strong bias of Observium developers for Ubuntu and Debian which I generally don't use and try to stay as far possible as I can (when I have to use Linux I use only Red Hat) I deployed the Observium turnkey Linux appliance. The Observium turnkey appliance doesn't come with collectd server installed but it is a matter of login into shell and typing apt-get install collectd. After uncommenting the network plugin, instructing on which port should collectd listen, and adding one line
Code:
$config['collectd_dir']        = "/var/lib/collectd/rrd";
into /opt/observium/config.php you will have a fully functional Observium with collectd plugin. Adding a server (I don't use self discovery) is now just a matter of adding DNS names into Observium.

It took me about one hour to have a complete metric monitoring solution for about four dozen servers. I have not had a chance to configure rsyslog or syslog-ng on Observium appliance but that is on my todo list..

Observium is in FreeBSD ports. I installed it, but based on the fact that it didn't pull in apache24 as a dependency and used /usr/local/www/observium it seems like very time consuming to configure it on FreeBSD instead of the "correct" Ubunt way /opt/Observium.
 
Thank you @Oko. I'll take a look at Observium. Will my system setup be a problem? Having the web in a jail, I mean?
 
Last edited by a moderator:
fred974 said:
Thank you @Oko
I'll take a look at Observium.
Will my system setup be a problem ? Having the web in a jail..
Nope. You can monitor Jail as a separate virtual host in particularly if you have multiple physical interfaces. On FreeNAS box there is no right now a way in GUI to pick up different physical interface for Jail so collected and SNMP get confused mesuring network flow because there are two IP addreesses attached to the same interfaces. By the way I also use Monit for quick up and down view of entire system. I could not recomment enough. It is a great product. Unforutelly M/Monit collector is not free but you can update evaluation version every month as long as you don't care to keep metric. I don't because I have Observiou (SNMP) + colllectd for it. It works on FreeNAS but it has to be installed in Jail. Non the less gives the info.
 
Last edited by a moderator:
fred974 said:
Thank you for the reply. How would you so the apache bit in a jail? I am asking as my web server runs inside a FreeBSD jail.

Aw, too bad mate, then you're screwed... Nah, just kidding :)

Just install collectd in the host, tell the graphite plugin to write towards the jail's IP, install everything else into the jail and you should be set.

@Oko
Thanks, glad you liked it, I'll have to check out observium some time. Our organization is otherwise big on Microsoft and would like to integrate FreeBSD servers with SCOM, have any pointers there? I've never really done SNMP before.

/Sebulon
 
Last edited by a moderator:
Sebulon said:
@@Oko
Thanks, glad you liked it, I'll have to check out observium some time. Our organization is otherwise big on M$ and would like to integrate FreeBSD servers with SCOM, have any pointers there? I've never really done SNMP before so...

/Sebulon
Nope! We have total of 5 Windows machines out of close of 100 I have under my control. Four of those are business laptops. I monitor them only for one thing, not to get in touch with my UNIX infrastructure. :h
 
Last edited by a moderator:
Oko said:
Nope! We have total of 5 Windows machines out of close of 100 I have under my control. Four of those are business laptops. I monitor them only for one thing, not to get in touch with my UNIX infrastructure. :h

Ah, I see. Then I´ll just have figure that one out for my own :)


@All
I´ve updated a piece of the code to be compatible with Django-1.7:
graphite-wsgi.patch:
Code:
--- /usr/local/etc/graphite/graphite.wsgi	2014-10-15 12:21:07.000000000 +0200
+++ /usr/local/etc/graphite/graphite.wsgi	2014-10-15 12:19:07.000000000 +0200
@@ -1,7 +1,9 @@
 import os, sys
 sys.path.append('/usr/local/graphite/webapp')
 os.environ['DJANGO_SETTINGS_MODULE'] = 'graphite.settings'
+import django
 import django.core.handlers.wsgi
+django.setup()
 application = django.core.handlers.wsgi.WSGIHandler()
 from graphite.logger import log
 log.info("graphite.wsgi - pid %d - reloading search index" % os.getpid())

If you´re having trouble getting Graphite started in the web server, this would be to blame.

/Sebulon
 
Oko do yo use the free or pay version of observium?

How does it compare with icing2 and zabbix?
WHich is easier to learn?
 
Oko do yo use the free or pay version of observium?

How does it compare with icing2 and zabbix?
WHich is easier to learn?
We switched to LibreNMS which runs fine on OpenBSD (FreeBSD as well) and its free of ugly Observium politics. Hopefully one of these days LibreNMS will also be able to use PostgreSQL as a backend which was rejected multiple times by Adam Armstrong the guy who defacto owns Observium. Observium is moving in really wrong direction after Adam got few million dollars of donation a year or two ago.

Anyhow in order to compare Observium with icinga2, and zabbix you have to understand the subject of monitoring. There are tree principal things that people care about

  1. Functional monitoring (if the things are up and running)
  2. Remote telemetry (collecting time-series from various devices which can reveal long term trends, bottle necks and problems)
  3. Log file collection and monitoring.
Functional monitoring and notification (typically e-mail) go hand and hand. Examples of the programs which are designed for functional monitoring are infamous nagios (the grand daddy of all functional monitoring), icinga (its fork), icinga2 (complete rewrite of icinga). I personally use M/Monit which scales well for my needs, it is very simple. It uses push model. A tiny client which needs to be installed on every server pushes things to M/Monit server which is has advantage over pull model if you are monitoring remote devices behind corporate firewalls which you don't control. icinga2 should be the first tool to look for any organization having more than a hundred devices (which I don't).

Observious is an example of remote telemetry monitoring which uses generic SNMP network protocol to pool or network devices. The protocol is standard but not the clients for it and MIBs. For example Observious has hard time displaying custom PF related MIBs of OpenBSD. Besides that it expect infamous buggy net-snmp although works with OpenSBD snmp and FreeBSD's bsnmp daemons. Observious is also one of the best front ends for net-mgmt/collectd5. Collectd client by default report time-series to rrd and are vastly superior telemetry tools for client which support it. Obviously your Switch, UPS, or PDUs are not supporting collectd and the only way to pool them is SNMP protocol (switches, UPSs, and PDUs which don't support SNMP should not be used in the enterprise environment). Once you learn bit more about time-series you will discover that RRD are not best way to collect and store time series. The better way is Graphite backend carbon although some people claim that alternative InflyxDB is even better. Anyway collectd can report to Graphite backend. Observous also includes the front end for NetFlow monitoring but it is not very well maintained and should not be used in the production. NetFlow monitoring is the topic in its own. Just like hardware monitoring (HDD monitoring). Observious does display data from IPMI collectd plugin but collectd is not able to cope with OpenBSD sensoring framework for example. Similarly I relay on e-mail notification from SMART daemon itself for the HDD functionality rather than on any particular monitoring tool.


Finally Log file monitoring is a huge topic where open source solutions are not really in par with proprietary solutions. Obviously one should have centralized login server (syslog-ng is currently mine of choice but I am trying to migrate to vanilla OpenBSD syslog server which is really getting better). The real problem is data mining through those logs. Solutions like Logstash address the problem in the wrong way. What is the purpose of displaying statistics about log files? The real goal of looking at log files is anomaly detection and that should be done by machine learning algorithms which leads as to another topic Intrusion detection. Intrusion Detection, forensic and alike are the real reasons people like me want to monitor log files, not to see if the Apache is working properly. The only really good solution for log file monitoring is Splunk and I am not saying just because my former classmate (from Ph.D. studies) is their chief technology officer.

So long story short this topic deserves a book which I could write if somebody paid me to do so.


Going back to your original question. Comparing Observium and icinga2 makes no sense as one is remote telemetry collection server and another is functional monitoring server (plugin). Zabbix is one of those tools which suffers of bipolar mental disorder and can't really decide if it wants to be functional monitoring or remote telemetry tool or even collect log files. It does everything reasonably well but it is not really good at anything. It requires its own client (which obviously can't be installed on a switch, UPS, or PDU). My folks at CMU CS are using it. Good for them. They know my opinion of it and we are not using it in my lab where I decide things. Note that Adam Armstrong is using those several millions not to improve Observious telemetry capabilities (fixing the lack of proxy would be a huge start since SNMP is using pool model and it is nontrivial to pool SNMP clients behind the firewalls) but to add functional monitoring and notification to Observioum.
 
Back
Top