Solved Please help to solve the random network problem

Just recently I have started to encounter a random IPv4 network outage. When it happens, the re(4) looses IP4 address without any clear reason. This seems to be completely random and there are no messages in dmesg or /var/log/messages.

Running service netif restart does not restore the IP address. It stays 0.0.0.0. Only reboot will help.

Suspected interface hardware, but the strange thing is that IPv6 keeps working on that interface. It affects IPv4 only. Suspected router and did the restart, but this does not help. Also there is another FreeBSD machine in the network and this keeps working. The other machine has bge(4).

My question here is, when this happens again, what to look for? Any additional diagnostics? Why did this happen now, but not before I have been using this machine?
 
There is little information.
The output of,
Code:
cat /etc/rc.conf | egrep -i "dhc|ifc"
&
Code:
ifconfig -a
could help
Code:
# cat /etc/rc.conf | egrep -i "dhc|ifc"
dhclient_program="/usr/local/sbin/dual-dhclient"
ifconfig_re0="DHCP"
ifconfig_re0_ipv6="inet6 DHCP accept_rtadv"

... but this issue appears to be completely random. Sometimes it works days.
 
To trigger DHCP again on the interface use service dhclient restart re0
Now that issue happened again. When trying to restart the DHCP client:
Code:
# service dhclient restart re0
dhclient not running? (check /var/run/dhclient/dhclient.re0.pid).
Starting dhclient.
/etc/rc.d/dhclient: WARNING: failed to start dhclient

Also, strange that arp table has these addresses:
Code:
# arp -a
router.lan (192.168.1.1) at 00:22:07:a0:eb:91 on re0 expires in 1197 seconds [ethernet]
239.237.117.34.bc.googleusercontent.com (34.117.237.239) at (incomplete) on re0 expired [ethernet]
ec2-35-82-131-108.us-west-2.compute.amazonaws.com (35.82.131.108) at (incomplete) on re0 expired [ethernet]

After reboot the system returned to normal and arp has
Code:
# arp -a
router.lan (192.168.1.1) at 00:22:07:a0:eb:91 on re0 expires in 1195 seconds [ethernet]
Rhodium.lan (192.168.1.195) at 38:d5:47:b2:29:e4 on re0 permanent [ethernet]
 
Sometimes it better in two steps , something like:
service dhclient onestop re0
service dhclient onestart re0
Then you know clearly if the stopping failed or the starting or both
 
Sometimes it better in two steps , something like:
service dhclient onestop re0
service dhclient onestart re0
Then you know clearly if the stopping failed or the starting or both
My question is why dhcp? This issue happens some time after boot, but not immediately. Interface already has an address and it works few minutes. The lease in router is permanent, so this MAC gets always the same IPv4 address (also IPv6). When the disaster strikes, as I wrote before, IPv6 keeps running. Also I do not see any reason the dhclient is needed at this point of time, because the address has already been configured.

But what happens is these strange addresses in my ARP, which certainly should not be there:
Code:
# arp -a
router.lan (192.168.1.1) at 00:22:07:a0:eb:91 on re0 expires in 1197 seconds [ethernet]
239.237.117.34.bc.googleusercontent.com (34.117.237.239) at (incomplete) on re0 expired [ethernet]
ec2-35-82-131-108.us-west-2.compute.amazonaws.com (35.82.131.108) at (incomplete) on re0 expired [ethernet]

First thing, I want to understand what exactly happens and why?

When everything is working and I try to shut down dhclient:
Code:
# service dhclient onestop re0
dhclient not running? (check /var/run/dhclient/dhclient.re0.pid).

This is logical - dhclient is not needed any more.
 
I'd probably look at the hardware in this case. If OP's router has a cable connection to the host, can the cable be moved to a different (free) plug on the router? I'd think of eliminating fried circuitry behind the plug as a possible issue, even if it's rather unlikely.
 
  • Like
Reactions: mer
My opinion those addresses in the arp table are there because something tried to talk to them. DNS lookup for the googleusercontent thing, gave back the IP address and then sent packets to it. You have an ethernet connection to it (from your machine to the router and out), part of the ethernet header is destination MAC. The "incomplete" means that arp failed (expected for external/not locally connected).

Now a question about should what/should those have been DNS looked up? I don't know.
amazon aws shows up a lot from things like google ads
 
Also I do not see any reason the dhclient is needed at this point of time, because the address has already been configured.
DHCP leases expire, and have to be renewed. The dhclient daemon has to be running in order for this renewal to happen. What's the lease time configured in your DHCP server?
But what happens is these strange addresses in my ARP, which certainly should not be there:
Code:
# arp -a
router.lan (192.168.1.1) at 00:22:07:a0:eb:91 on re0 expires in 1197 seconds [ethernet]
239.237.117.34.bc.googleusercontent.com (34.117.237.239) at (incomplete) on re0 expired [ethernet]
ec2-35-82-131-108.us-west-2.compute.amazonaws.com (35.82.131.108) at (incomplete) on re0 expired [ethernet]
You're not the only one:

But how are those getting in your ARP table? Something must be doing ARP broadcasts for them in your local network. You might have to bust out net/Wireshark.
 
When everything is working and I try to shut down dhclient:
Code:
# service dhclient onestop re0
dhclient not running? (check /var/run/dhclient/dhclient.re0.pid).

This is logical - dhclient is not needed any more.
You're manually shutting down dhclient? That explains it. Your DHCP lease is expiring. That's why you're losing your IP address.
 
I'd probably look at the hardware in this case. If OP's router has a cable connection to the host, can the cable be moved to a different (free) plug on the router? I'd think of eliminating fried circuitry behind the plug as a possible issue, even if it's rather unlikely.
Yes, this is cable connected. As I have written here before, IPv6 keeps running. It concerns only IPv4 address. I can use all v6 connections. That means the physical cable and interface is working.
 
DHCP leases expire, and have to be renewed. The dhclient daemon has to be running in order for this renewal to happen. What's the lease time configured in your DHCP server?

You're not the only one:

But how are those getting in your ARP table? Something must be doing ARP broadcasts for them in your local network. You might have to bust out net/Wireshark.
I have fixed leases in the router for both v4 and v6 addresses. Assume these are not expiring (I may be wrong of course). But the strange thing is that I cannot re-run the network connection.
I have net/dual-dhclient installed for dual IP versions. In the /etc/rc.conf I have
Code:
ipv6_activate_all_interfaces="YES"
dhclient_program="/usr/local/sbin/dual-dhclient"

rtsold_enable="YES"
ip6addrctl_enable="YES"

ifconfig_re0="DHCP"
ifconfig_re0_ipv6="inet6 DHCP accept_rtadv"

As I already said - v6 keeps running and in most cases v4 also.
 
My opinion those addresses in the arp table are there because something tried to talk to them. DNS lookup for the googleusercontent thing, gave back the IP address and then sent packets to it. You have an ethernet connection to it (from your machine to the router and out), part of the ethernet header is destination MAC. The "incomplete" means that arp failed (expected for external/not locally connected).

Now a question about should what/should those have been DNS looked up? I don't know.
amazon aws shows up a lot from things like google ads
Seems so. When there is no IP4 address configured these entries appear.

Almost solved the problem, but not quite. I had two dhclient processes ruuning
Code:
 594  -  Is     0:00.00 /usr/local/sbin/dhclient re0
 631  -  Is     0:00.01 /usr/local/sbin/dhclient -6 -nw -D LL re0

Assume one is for v4 and the other is for v6.
Killing them both returned the service netif restart operation back to normal and I was able to restart the interface.

After that
Code:
# ps -ax|grep dhc|grep -v grep
4701  -  Is     0:00.00 /usr/local/sbin/dhclient re0
4769  -  Is     0:00.00 /usr/local/sbin/dhclient -6 -nw -D LL re0

Still
Code:
# service dhclient status re0
dhclient is not running.

But all the operation returned to normal and no need for reboot or any other action.

My question here is why service dhclient status re0 does not show it is running?
 
Yes, this is cable connected. As I have written here before, IPv6 keeps running. It concerns only IPv4 address. I can use all v6 connections. That means the physical cable and interface is working.
Yeah, what I had in mind is that moving the physical cable to a free port just might fix the 'no ipv4' issue. Or are you blocking IPv4 on the problematic ethernet plug (router side) by accident? Sometimes, a misconfig has unintended consequences.
 
I think 2 processes running one for 4 and one for 6 is correct if IPV6 is enabled. What is in /etc/rc.conf? Is DHCP specified for the IPV6 interface? I think the answer is Yes based on #18.
I'm guessing the empty status issue is a bug or undocumented feature :)
 
probably pid file is missing or wrong
either way having 2 process is probably somehow unsupported with the pidfile thing
 
Now I also start thinking that this is a sort of bug in the RC scripts.
And what is weird - in the net/dual-dhclient description we can see
Code:
This port provides a script which spawns both /sbin/dhclient and
/usr/local/sbin/dhclient -6; this simplifies the configuration needed to
run DHCP on both protocols of a dual-stack network.

In other words the base dhclient should be started for v4 and isc client for v6, but in the actual script there is:

Code:
# cat /usr/local/sbin/dual-dhclient
#!/bin/sh

# Public domain

/usr/local/sbin/dhclient "$@"
/usr/local/sbin/dhclient -6 -nw -D LL "$@"

The same isc client is started for v4 and v6.

... changed that manually and will see...

And my strong hypothesis here is now that there is a bug in net/isc-dhcp44-client with IPv4.

... and after that service netif restart also works. So I think I should file a PR.
 
  • Like
Reactions: mer
And what is weird - in the net/dual-dhclient description we can see
Code:
This port provides a script which spawns both /sbin/dhclient and
/usr/local/sbin/dhclient -6; this simplifies the configuration needed to
run DHCP on both protocols of a dual-stack network.

In other words the base dhclient should be started for v4 and isc client for v6, but in the actual script there is:

Code:
# cat /usr/local/sbin/dual-dhclient
#!/bin/sh

# Public domain

/usr/local/sbin/dhclient "$@"
/usr/local/sbin/dhclient -6 -nw -D LL "$@"

The same isc client is started for v4 and v6.

... changed that manually and will see...

And my strong hypothesis here is now that there is a bug in net/isc-dhcp44-client with IPv4.

... and after that service netif restart also works. So I think I should file a PR.
Filed a PR 260317
 
Back
Top