Router losing public IP, can't recover without full reboot

arader · Jul 12, 2016

Hey all,

I'm seeing a very intermittent issue (as on once every ~40 days or so) where my 10.3 router loses it's internet connectivity. It looks like something is causing the interface to go down and when it comes back up, I get an address in the 192.168.X.Y range, rather than my usual public IP.

Code:

Jul 12 00:16:10 imp kernel: igb0: link state changed to DOWN
Jul 12 00:16:24 imp kernel: igb0: link state changed to UP
Jul 12 00:16:24 imp devd: Executing '/etc/rc.d/dhclient quietstart igb0'
Jul 12 00:16:34 imp kernel: igb0: link state changed to DOWN
Jul 12 00:16:37 imp kernel: igb0: link state changed to UP
Jul 12 00:16:37 imp devd: Executing '/etc/rc.d/dhclient quietstart igb0'
Jul 12 00:16:52 imp dhclient: New IP Address (igb0): 192.168.100.20
Jul 12 00:16:52 imp dhclient: New Subnet Mask (igb0): 255.255.255.0
Jul 12 00:16:52 imp dhclient: New Broadcast Address (igb0): 192.168.100.255
Jul 12 00:16:52 imp dhclient: New Routers (igb0):

I'm not sure why that's the case, but even more annoying than that is the only way I've found to fix it is to reboot. Here's what I've tried:

# ifconfig down igb0; ifconfig up igb0
- From memory, this resulted in the interface not even getting an address
# dhclient igb0
- It's been a few months since I tried this one, but I believe it failed to get an address
# service netif restart
- The nuclear option, but this completely locked me out of the machine. I expected sshd to drop my connection, but even after 30 minutes nothing came back up. I had to walk downstairs and physically log in to the machine.
# reboot
- Of course, this worked. As soon as the box came back up it got a public IP and I was back in business.

So a few questions:
Why would rebooting get me a public IP, when none of the other commands got me back into a good state? What could rebooting do that service netif restart wouldn't?

Why would # service netif restart permanently kill my network services? Here's the output of /var/log/messages at the time I issued the command:

Code:

Jul 12 07:33:34 imp kernel: ifa_del_loopback_route: deletion failed: 48
Jul 12 07:33:34 imp dhclient[722]: connection closed
Jul 12 07:33:34 imp dhclient[722]: exiting.

Any thoughts? I'm really hoping for two things: hints on how to diagnose why I'm losing my public IP, but also answers on what the best way to recover without rebooting the box.

Murph · Jul 12, 2016

It's very hard to give a good diagnosis based on what you have posted. Can you describe the public side of your network in more detail, please. Is it some form of *DSL (PPPoE or PPPoATM?), cable modem, shared Ethernet, wi-fi, etc? I can see that your interface to it is some form of Ethernet, because of the igb(4) driver, but that doesn't tell me what is on the other end of the Ethernet cable. Do you have any other devices connected to the ISP/telco equipment, or just your FreeBSD router.

Now for some random speculation / guesswork. If it is some form of layer 2 WAN network (i.e. bridged Ethernet-ish), you need to take a look at the MAC address of the remote devices that you are talking to when it's working and when it's not working. The ISP network should be engineered to prevent it, but it could be that another customer has a DHCP server setup which is leaking out onto a shared L2 segment, which you may see from the MAC addresses that your system has seen.

MAC addresses for remote hosts on the same L2 segment are discovered via ARP (Address Resolution Protocol). arp -an to show your system's ARP cache, arp -d <IP address> to delete a single entry, and arp -ad to flush it. See arp(8) for more details.

If you can obtain the DHCP server IP address for both good and bad scenarios, and they are different, you can add reject bad.server.ip.address to /etc/dhclient.conf. See dhclient.conf(5). I also note that it does not seem to be giving you a default router in the example of the bad address lease above. So, you could possibly make that a requirement in the config, e.g.

Code:

interface "igb0" {
    request subnet-mask, broadcast-address, routers;
    require subnet-mask, routers;
}

(That's just a trimmed down version of the example in the man page, completely untested, just to express some ideas. Up to you to turn it into something useful that works well. Usual FreeBSD warranty applies.)

arader · Jul 12, 2016

Thanks Murph, sorry for the lack of info, I wasn't entirely sure what would be helpful.

My router is connected to an arris tg862g cable modem, my ISP is Comcast. The router is directly connected to the cable modem, which is directly connected to the coax line coming from the street.

Thanks for the pointer to dhclient.conf, I didn't know you could add requirements there, I'll have to give this some thought. I'm thinking dhclient is getting some bad configuration from comcast and is resetting the interface. With time I'll likely get to the bottom of it.

That said, I really want to know why a full reboot is necessary. What would cause restarting netif to kill all my interfaces and services?

Murph · Jul 12, 2016

arader said:
Thanks Murph, sorry for the lack of info, I wasn't entirely sure what would be helpful.

My router is connected to an arris tg862g cable modem, my ISP is Comcast. The router is directly connected to the cable modem, which is directly connected to the coax line coming from the street.

Thanks for the pointer to dhclient.conf, I didn't know you could add requirements there, I'll have to give this some thought. I'm thinking dhclient is getting some bad configuration from comcast and is resetting the interface. With time I'll likely get to the bottom of it.

That said, I really want to know why a full reboot is necessary. What would cause restarting netif to kill all my interfaces and services?

If you are connected remotely to your FreeBSD router when you do a service netif restart, then what is probably happening is the ssh daemon will be told by the OS that the TCP connection has closed the moment the interfaces are stopped. The ssh daemon will then immediately SIGHUP your login session (shell), in turn killing all processes started by the shell (also with SIGHUP), including the script which is restarting the interface. From a quick look at /usr/sbin/service and /etc/rc*, there's no inbuilt trapping of a HUP in anything. You might get away with it by doing nohup service netif restart, although it's not something that I would recommend in general, as tearing down the network stack could cause widespread breakage of things.

Thinking over your issue again, and having done a quick Google on the Arris TG862G, I think what may be happening is that it is resetting for some reason (possibly due to the ISP rebooting something on their end or perhaps forcing all the modems to reset periodically, or possibly it is crashing). That would account for the Ethernet bouncing down then up. My guess is that it takes a little time after the modem has restarted before its WAN service goes live. The results in Google seem to suggest that it has an built in DHCP server which will serve 192.168.100.x addresses (despite being configured in bridge mode) if a client requests an address while the WAN is down, presumably to provide a convenient method for accessing the admin functionality on the modem during WAN problems/outages.

Now, there's some details about DHCP which could be coming into play at this point. DHCP has essentially two request modes: 1) New address request, and 2) Renew existing address. dhclient(8) will always try to renew the existing (previous) address by default, but a full system reboot is evidently enough for your system to shake off the 192.168 address (I'm honestly not sure if it's a case of needing to shake it off on the cable modem side or your FreeBSD side, maybe both) and force it into requesting a new address.

What you may really need to be doing is a DHCP "release", but the FreeBSD dhclient does not appear to support that. You may wish to investigate the net/isc-dhcp43-client. The dhclient in the base system is derived from a much older (and now obsolete) version of ISC DHCP, with significantly less functionality. ISC DHCP 4.3 does have a release option, so /usr/local/sbin/dhclient -r might be somewhere in the solution for you. Use, for example, man -M /usr/local/man dhclient to get the man pages for the port instead of the base version. Fully converting your system to use the current ISC version from ports is left as a fun exercise for the reader.

Something else to try: service dhclient stop && rm /var/db/dhclient.leases.igb0 && service dhclient start

Of course, if you can use my previously suggested ideas to reject the "bad" DHCP lease either based on the server IP or the options included in the lease, you may not need to worry about being able to do a DHCP release. If I've guessed the nature of the problem correctly, you may be able to provoke the problem by resetting or power cycling your cable modem.

arader · Jul 13, 2016

Thanks Murph, I think you hit the proverbial nail on the head. I didn't even think about the fact that my shell will be brought down.

Your hypothesis on the modem setting up an internal DHCP server on the event of a down WAN link seems very plausible. I'm considering installing a console browser and seeing if I get some sort of HTTP redirect the next time I'm given a 192 address. You've given me a lot of ammunition for the next time this occurs.

I think my plan will be to leave my /etc/dhclient.conf file as is and wait for the next time this happens. This time I'll log in directly to the machine and fiddle around to remove the SSH variable from the equation.

thank you!

SirDice · Jul 13, 2016

Murph said:
The results in Google seem to suggest that it has an built in DHCP server which will serve 192.168.100.x addresses (despite being configured in bridge mode) if a client requests an address while the WAN is down, presumably to provide a convenient method for accessing the admin functionality on the modem during WAN problems/outages.

This is exactly what happens with my Cisco cable modem. It basically boots in 'normal' mode (it's a combined wireless, VoIP and internet router), providing a 192.168.100.x address to LAN clients. Once it's booted it will connect and switch to bridge mode and, more or less, forwards the DHCP responses to the LAN interface. The client then switches from the RFC1918 private address to the public IP address. Whenever my internet connection goes down (doesn't happen often thankfully) the client receives the private address again.

Router losing public IP, can't recover without full reboot

arader

Murph

arader

Murph

arader

SirDice

Administrator