FreeBSD intermittently stops forwarding packets

This may seem impossible but I have been working for over three weeks on a FreeBSD machine setup as a gateway. Several times during the day, Internet access from the internal network just stops. No DNS, no pings, nothing. After a few minutes, it comes back.

I have tried ALTQ with priq, cbq and finally hfsc. I have tried kernels with and without device polling. Finally, today we got a good HFSC configuration and were in the middle of a test with some heavy Internet traffic when the problem came up again.

netstat -id showed no drop packets.

netstat -m looked ok.

I could ping yahoo.com from the gateway but not from the internal network. On the gateway, tcpdump showed the packets coming in on the internal interface but nothing going out on the external interface.

When I toogled packet forwarding (sysctl -w net.inet.ip.forwarding=0 / sysctl -w net.inet.ip.forwarding=1), the problem disappears.

I am running FreeBSD 7.2-p6 with a Marvel NIC (msk0) as the internal interface and a D-Link NIC (rl0) as the external one.

I am curious. Has anyone else seen this problem or has any idea where I can start looking for why packet forwarding is hanging?
 
Interesting. Sounds like a bug to me. Is it possible to run the gateway with traffic shaping and/or firewalling disabled for a while to see if it happens again?

I think you should log a PR too.
 
I have tried with traffic shaping off. The problem persists.

Turning off PF is not a possibility because the machine is in production.

I was thinking that it could be a driver/hardware issue. msk0 is on the board but it will not do polling. rl0 is in an old D-Link card in the PCI slot. Today, I am going to try to replace them with a dual-port Intel card in a PCI-Express slot.
 
Disabling PF is not an option (unless I rebuild the machine with something else).

Doing a "pfctl -f /etc/pf.conf" when the problem occurs does not solve it.

So far the only thing that seeming to help is toogling IP Forwarding. I put a cron that does this every minute and the problem has not come up again.

Last night, we put in the PCI-X dual port Intel NIC. Now I am trying the configuration with "em" drivers but that is introducing a whole new set of problems. The machine hangs with "polling" enabled on the NICs. So far, it has worked fine all morning without polling.

At lunch time, we are going to stop toogling IP Forwarding and see what happens this afternoon. If it holds, the problem is the drivers. If not, the problem is with IP Forwarding.
 
Just to let others know the final result of all of this, I added a Intel dual-port NIC. That seemed to solve the problem with IP Forwarding but this lead to the discovering of another problem. Traffic on the external interface would intermittently stop. The case was a faulty router. The router was replaced and everything is running fine.

The bottom line is I saw a problem with IP forwarding between a Realtek (rl0) and a Marvel (msk0) driver but I cannot be certain if it was FreeBSD 7.2, or the drivers, or the hardware in the server or the faulty router. In any case, the problem was in more than one place.
 
Just to throw my experience into here - I've been using a realtek NIC/drivers for several years forwarding packets between that and vr0/nfe0 NICs with no problems at all. I would guess that it's a hardware problem, just gut feeling.

Glad you got it sorted out.
 
Back
Top