Solved Looking for a network monitoring daemon

obsigna

Profile disabled
"Monitoring" in order not to call it "Watchdog" because FreeBSD uses the latter terminology for the completely opposite behaviour, namely shutting down a NIC without activity. I would like to have the network connection resurrected (e.g. by calling /etc/rc.d/netif restart ifX) instead of giving a dying connection the final fatal blow. Here ifX would be replaced with the actual device identifier of the respective NIC.

I don't want to reinvent the wheel, therefore my question: Does something like this already exist in the ports or even in the base system?
 
I'm not aware of anything. I think most will just write a small script that regularly sends a few pings to the other end of the connection, if that fails the interface is reset. Executed through cron every 5 or 10 minutes. Should be fairly easy to script something like that.
 
Yes, it would not be difficult for me. However, if something would have exit already, ne? OK, I will start with coding now.
 
the best tool is one that you write
I had a similar case in my servers,when the light cut off,the servers wakeup and send me an email

nothing complex,with sh or bash,crond and your brain :)
 
"Monitoring" in order not to call in "Watchdog" because FreeBSD uses the latter terminology for the completely opposite behaviour, namely shutting down a NIC without activity. I would like to have the network connection resurrected (e.g. by calling /etc/rc.d/netif restart ifX) instead of giving a dying connection the final fatal blow. Here ifX would be replaced with the actual device identifier of the respective NIC.
That's the purpose of Solaris SMF, AIX SRC, Linux systemd.
 
That's the purpose of Solaris SMF, AIX SRC, Linux systemd.
Yes because unlike writing a few scripts and cron as SirDice suggests he could port these vast systems... :p
I could also sew a button on my cheek and hang a piano on it - guess what? I won't.

Let alone, that the mentioned systems are service and not device management facilities. So, I would need first to wrap the device into a service which could then be monitored. This would be a job for people, who employ a pincer for putting on their pants. I will let sh(1) and cron(8) do the job for me.
 
Here is my solution:

The shell script /root/bin/pingorrestart.sh:
Bash:
#!/bin/sh

RIP=`tail -15 /var/db/dhclient.leases.re0 | sed -n '/  option routers /{s///;s/ .*//;s/;//;p;}'`

ping -c1 -t1 $RIP > /dev/null 2>&1

if [ "$?" != "0" ] ; then
   /etc/rc.d/netif restart re0 > /dev/null 2>&1
fi

The cron job directive:
Code:
#
# ping the router every 5 minutes and restart the network interface in case the ping fails
*/5     *       *       *       *       root    /root/bin/pingorrestart.sh
 
I'm done with FreeBSD. You people do nothing but complain about Linux, then when they actually have technology that solves what you're after, you resort to your typical childish inner beings. None of you are smart enough to get real jobs using commercial operating systems. Keep playing with your toy.
RTFM
 
Bash:
#!/bin/sh

RIP=`tail -15 /var/db/dhclient.leases.re0 | sed -n '/  option routers /{s///;s/ .*//;s/;//;p;}'`

...
How about
Code:
RIP=`netstat -nr | awk '$1 ~ /default/ {print $2}'`
There might be trouble if you have an Ipv6 default route, though.
 
How about
Code:
RIP=`netstat -nr | awk '$1 ~ /default/ {print $2}'`
There might be trouble if you have an Ipv6 default route, though.
I didn't explain the actual incidents which got me finally to work on an automatic network connection recovery. My FreeBSD home server is connected to a cable modem/router, which is in bridge mode, and the public interface re0 of the FreeBSD server receives its public IPv4 configuration via DHCP. Now already in the course of three nights in a row, the cable connection was dropped, and dhclient(8) did not recover from this by itself. At the other morning, I needed to restart the network interface manually by /etc/rc.d/netif restart re0.

This happened already in the past this and then, but never that frequently, that it began to hurt. In the past, re0 in the dropped state had no IP address assigned, and I did not want to rely on netstat to report the router address. I figure that the last entry in the leases file would always give a syntactically correct IPv4 address, and then ping would tell whether it is online or not.

This is IPv4 only.

Finally, I am savvy in sed(1) while from awk(1), I know only the name - which doesn't mean that it isn't good.
 
In order to remain transparent, pyret got himself a temporary ban. This isn't the first time he spouts profanity and I'm done with it. I've also removed a few post that quoted his rants, not because of those posters, its just to remove pyret's profanity.
 
I wonder if you could accomplish this more easily and reliably with devd(8).
Well, perhaps. However reading the description of devd(8), I wonder, whether devd could be the culprit that dhclient cannot recover from a dropped cable connection - devd might have killed it:
..., and kill the dhclient(8) instance when the same adapter is removed. ...
Anyway, I got no idea how to debug this.
 
Well, perhaps. However reading the description of devd(8), I wonder, whether devd could be the culprit that dhclient cannot recover from a dropped cable connection - devd might have killed it:
It looks like the default /etc/devd.conf doesn't do anything for network down events:
Code:
# Note that the attach/detach with the highest value wins, so that one can
# override these general rules.

#
# Configure the interface on attach.  Due to a historical accident, this
# script is called pccard_ether.
#
# NB: DETACH events are ignored; the kernel should handle all cleanup
#     (routes, arp cache).  Beware of races against immediate create
#     of a device with the same name; e.g.
#     ifconfig bridge0 destroy; ifconfig bridge0 create
#
notify 0 {
        match "system"          "IFNET";
        match "subsystem"       "!(usbus|wlan)[0-9]+";
        match "type"            "ATTACH";
        action "/etc/pccard_ether $subsystem start";
};
       
#
# Try to start dhclient on Ethernet-like interfaces when the link comes
# up.  Only devices that are configured to support DHCP will actually
# run it.  No link down rule exists because dhclient automatically exits
# when the link goes down.
#
notify 0 {
        match "system"          "IFNET";
        match "type"            "LINK_UP";
        media-type              "ethernet";
        action "service dhclient quietstart $subsystem";
};
We might be able to figure out what events are triggering devd(8) from your dmesg(8) output. You could also try dropping something like this in your /etc/devd directory:
Code:
notify 10 {
        match "system"          "IFNET";
        media-type              "ethernet";
        action "logger ethernet event $*";
};

Edit: I tried it on my system. I unplugged and plugged in my ethernet cable. Unfortunately the results are the same whether I disable the devd events:
Code:
Aug 13 19:00:51 myhost kernel: igb0: link state changed to DOWN
Aug 13 19:00:51 myhost myusername[1462]: ethernet event !system=IFNET subsystem=igb0 type=LINK_DOWN
Aug 13 19:01:01 myhost kernel: igb0: link state changed to UP
Aug 13 19:01:01 myhost myusername[1463]: ethernet event !system=IFNET subsystem=igb0 type=LINK_UP
Aug 13 19:01:01 myhost dhclient[1467]: New IP Address (igb0): 172.16.1.157
Aug 13 19:01:01 myhost dhclient[1468]: New Subnet Mask (igb0): 255.255.255.0
Aug 13 19:01:01 myhost dhclient[1469]: New Broadcast Address (igb0): 172.16.1.255
Aug 13 19:01:01 myhost dhclient[1470]: New Routers (igb0): 172.16.1.1
Or not:
Code:
Aug 13 19:02:23 myhost kernel: igb0: link state changed to DOWN
Aug 13 19:02:31 myhost kernel: igb0: link state changed to UP
Aug 13 19:02:31 myhost dhclient[1510]: New IP Address (igb0): 172.16.1.157
Aug 13 19:02:31 myhost dhclient[1511]: New Subnet Mask (igb0): 255.255.255.0
Aug 13 19:02:31 myhost dhclient[1512]: New Broadcast Address (igb0): 172.16.1.255
Aug 13 19:02:31 myhost dhclient[1513]: New Routers (igb0): 172.16.1.1
I have no idea why dhclient doesn't restart your interface :(
 
Actually the /var/log/messages gives a clue what happened:
Code:
Aug 11 01:40:37 server kernel: re0: link state changed to DOWN
Aug 11 01:40:48 server kernel: re0: link state changed to UP
Aug 11 01:42:03 server dhclient[5915]: New IP Address (re0): mmm.nnn.2.207
Aug 11 01:42:03 server dhclient[5916]: New Subnet Mask (re0): 255.255.248.0
Aug 11 01:42:03 server dhclient[5917]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 01:42:03 server dhclient[5918]: New Routers (re0): mmm.nnn.0.1
Aug 11 01:42:04 server dhclient[5920]: New Routers (re0): mmm.nnn.0.1
Aug 11 01:48:04 server dhclient[6217]: New IP Address (re0): mmm.nnn.2.207
Aug 11 01:48:04 server dhclient[6218]: New Subnet Mask (re0): 255.255.248.0
Aug 11 01:48:04 server dhclient[6219]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 01:48:04 server dhclient[6220]: New Routers (re0): mmm.nnn.0.1
Aug 11 01:48:05 server dhclient[6222]: New Routers (re0): mmm.nnn.0.1
Aug 11 01:54:05 server dhclient[6450]: New IP Address (re0): mmm.nnn.2.207
Aug 11 01:54:05 server dhclient[6451]: New Subnet Mask (re0): 255.255.248.0
Aug 11 01:54:05 server dhclient[6452]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 01:54:05 server dhclient[6453]: New Routers (re0): mmm.nnn.0.1
Aug 11 01:54:06 server dhclient[6455]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:00:06 server dhclient[6713]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:00:06 server dhclient[6714]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:00:06 server dhclient[6715]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:00:06 server dhclient[6716]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:00:07 server dhclient[6718]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:01:39 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:02:45 server syslogd: last message repeated 1 times
Aug 11 02:03:50 server syslogd: last message repeated 1 times
Aug 11 02:06:02 server syslogd: last message repeated 2 times
Aug 11 02:06:07 server dhclient[6962]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:06:07 server dhclient[6963]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:06:07 server dhclient[6964]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:06:07 server dhclient[6965]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:06:08 server dhclient[6968]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:07:09 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:08:14 server syslogd: last message repeated 1 times
Aug 11 02:09:19 server syslogd: last message repeated 1 times
Aug 11 02:11:31 server syslogd: last message repeated 2 times
Aug 11 02:12:08 server dhclient[7217]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:12:08 server dhclient[7219]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:12:08 server dhclient[7220]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:12:08 server dhclient[7221]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:12:09 server dhclient[7223]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:12:38 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:13:44 server syslogd: last message repeated 1 times
Aug 11 02:14:50 server syslogd: last message repeated 1 times
Aug 11 02:18:06 server syslogd: last message repeated 3 times
Aug 11 02:18:09 server dhclient[7463]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:18:09 server dhclient[7464]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:18:09 server dhclient[7465]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:18:09 server dhclient[7466]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:18:10 server dhclient[7468]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:19:12 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:20:19 server syslogd: last message repeated 1 times
Aug 11 02:21:25 server syslogd: last message repeated 1 times
Aug 11 02:21:33 server rpc.statd[838]: Failed to contact host CyStat-BBB: RPC: Port mapper failure - RPC: Timed out
Aug 11 02:22:30 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:23:34 server syslogd: last message repeated 1 times
Aug 11 02:24:10 server dhclient[7718]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:24:10 server dhclient[7719]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:24:10 server dhclient[7720]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:24:10 server dhclient[7721]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:24:11 server dhclient[7723]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:24:41 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:25:46 server syslogd: last message repeated 1 times
Aug 11 02:26:50 server syslogd: last message repeated 1 times
Aug 11 02:30:08 server syslogd: last message repeated 3 times
Aug 11 02:30:11 server dhclient[7975]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:30:11 server dhclient[7976]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:30:11 server dhclient[7977]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:30:11 server dhclient[7978]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:30:12 server dhclient[7980]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:31:15 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:32:19 server syslogd: last message repeated 1 times
Aug 11 02:34:30 server syslogd: last message repeated 2 times
Aug 11 02:35:35 server syslogd: last message repeated 1 times
Aug 11 02:36:12 server dhclient[8232]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:36:12 server dhclient[8233]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:36:12 server dhclient[8234]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:36:12 server dhclient[8235]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:36:14 server dhclient[8237]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:36:40 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:37:47 server syslogd: last message repeated 1 times
Aug 11 02:38:55 server syslogd: last message repeated 1 times
Aug 11 02:42:11 server syslogd: last message repeated 3 times
Aug 11 02:42:13 server dhclient[8475]: New IP Address (re0): mmm.nnn.2.207
Aug 11 02:42:13 server dhclient[8476]: New Subnet Mask (re0): 255.255.248.0
Aug 11 02:42:13 server dhclient[8477]: New Broadcast Address (re0): mmm.nnn.7.255
Aug 11 02:42:13 server dhclient[8478]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:42:14 server dhclient[8480]: New Routers (re0): mmm.nnn.0.1
Aug 11 02:43:15 server ntpd[941]: error resolving pool pool.ntp.br: Name does not resolve (8)
Aug 11 02:44:20 server syslogd: last message repeated 1 times
Aug 11 02:46:32 server syslogd: last message repeated 2 times
Aug 11 02:47:32 server dhclient[262]: connection closed
Aug 11 02:47:32 server kernel: pid 344 (dhclient), jid 0, uid 65: exited on signal 11
Aug 11 02:47:32 server dhclient[262]: exiting.

In the above log, I replaced the first 2 nibbles of the public IPv4 by mmm.nnn. Seemingly dhclient tried to recover the connection until it was killed by the kernel because of a segmentation violation (11 = SIGSEGV). After this I see no attempts of dhclient anymore. This shouldn't happen in no case. This means we are talking about a severe bug in dhclient.

PS: This happened before I activated the cronjob with the ping-or-restart script.
 
Are you getting your name servers through DHCP too?
Sort of yes, since I see two IP's of the name servers of the ISP in the leases file. Actually, these are not employed, since I use local_unbound as a recursive caching resolver for my network:

/etc/resolvconf.conf
Code:
# Generated by local-unbound-setup
resolv_conf="/dev/null" # prevent updating /etc/resolv.conf
unbound_conf="/var/unbound/forward.conf"
unbound_pid="/var/run/local_unbound.pid"
unbound_service="local_unbound"
unbound_restart="service local_unbound reload"

/etc/resolv.conf
Code:
nameserver 127.0.0.1
search obsigna.com
options edns0

So resolvconf(8) actually places the name servers of my ISP into /var/unbound/forward.conf, but I do not include this file into /var/unbound/unbound.conf, but instead I have here:
Code:
...
root-hints: root-hints.zones
 
Back
Top