Network interface hangs without explanation

Hi all,

I have a server running FreeBSD 12.2 (I know, I'll update soon...) that has been rock solid for a long time, it's now had the same thing happen twice in 2 months after years of no issues. Basically on it's primary network interface (igb0) it won't pass any traffic whatsoever.

If you try to ifconfig down and then ifconfig up the interface, it'll hang in e1000_delay and I can't Ctrl-C it after that point.

When it's in this state, I can still login via IPMI console and everything appears to be fine. Nothing in dmesg or /var/log/messages (other than things complaining they can't resolve addresses etc). The only thing that seems to bring it back to life is a reboot.

The last time it happened I did save the sysctl dev.igb.0 output, and noticed that crc_errs was astonishingly high:

Code:
dev.igb.0.mac_stats.crc_errs: 3448858737885

But then I did some maths in my head, and that seemed like junk, and sure enough, it's all junk.

Code:
dev.igb.0.mac_stats.tso_ctx_fail: 3448858737885
dev.igb.0.mac_stats.tso_txd: 3449023621390
dev.igb.0.mac_stats.tx_frames_1024_1522: 3449745172500
dev.igb.0.mac_stats.tx_frames_512_1023: 3448999995106
dev.igb.0.mac_stats.tx_frames_256_511: 3448903613198
dev.igb.0.mac_stats.tx_frames_128_255: 3448913888170
dev.igb.0.mac_stats.tx_frames_65_127: 3448923975946
dev.igb.0.mac_stats.tx_frames_64: 3448858873996
dev.igb.0.mac_stats.mcast_pkts_txd: 3448858737885
dev.igb.0.mac_stats.bcast_pkts_txd: 3448858741715
dev.igb.0.mac_stats.good_pkts_txd: 3450051829491
dev.igb.0.mac_stats.total_pkts_txd: 3450051829491
dev.igb.0.mac_stats.good_octets_txd: 1439782832204
dev.igb.0.mac_stats.good_octets_recvd: 149229054078
dev.igb.0.mac_stats.rx_frames_1024_1522: 3448876895295
dev.igb.0.mac_stats.rx_frames_512_1023: 3448882897678
dev.igb.0.mac_stats.rx_frames_256_511: 3448941882198
dev.igb.0.mac_stats.rx_frames_128_255: 3449129830686
dev.igb.0.mac_stats.rx_frames_65_127: 3449180857946
dev.igb.0.mac_stats.rx_frames_64: 3448858961879
dev.igb.0.mac_stats.mcast_pkts_recvd: 3448858737885
dev.igb.0.mac_stats.bcast_pkts_recvd: 3448859286046
dev.igb.0.mac_stats.good_pkts_recvd: 3449577636257
dev.igb.0.mac_stats.total_pkts_recvd: 3449583179207
dev.igb.0.mac_stats.xoff_txd: 3448858737885
dev.igb.0.mac_stats.xoff_recvd: 3448858737885
dev.igb.0.mac_stats.xon_txd: 3448858737885
dev.igb.0.mac_stats.xon_recvd: 3448858737885
dev.igb.0.mac_stats.coll_ext_errs: 3448858737885
dev.igb.0.mac_stats.alignment_errs: 3448858737885
dev.igb.0.mac_stats.crc_errs: 3448858737885
dev.igb.0.mac_stats.recv_errs: 3448858737885
dev.igb.0.mac_stats.recv_jabber: 3448858737885
dev.igb.0.mac_stats.recv_oversize: 3448858737885
dev.igb.0.mac_stats.recv_fragmented: 3448858737885
dev.igb.0.mac_stats.recv_undersize: 3448858737885
dev.igb.0.mac_stats.recv_no_buff: 3448858737885
dev.igb.0.mac_stats.missed_packets: 3448858737885
dev.igb.0.mac_stats.defer_count: 3448858737885
dev.igb.0.mac_stats.sequence_errors: 3448858737885
dev.igb.0.mac_stats.symbol_errors: 3448858737885
dev.igb.0.mac_stats.collision_count: 3448858737885
dev.igb.0.mac_stats.late_coll: 3448858737885
dev.igb.0.mac_stats.multiple_coll: 3448858737885
dev.igb.0.mac_stats.single_coll: 3448858737885
dev.igb.0.mac_stats.excess_coll: 3448858737885

Before I just blindly upgrade and hope it's fixed, has anyone else seen this? Is it likely to just be an e1000/FreeBSD kernel issue (ie, some form of memory corruption) or does it point to the BIOS? or, hope not... the hardware itself?
 
Oh, I should say:

Code:
igb0@pci0:2:0:0:    class=0x020000 card=0x153315d9 chip=0x15338086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

On a Supermicro X11SCM board.
 
After rebooting it's okay fine?
When it's in this state, I can still login via IPMI console and everything appears to be fine.
Isn't that on a separate interface?
it's now had the same thing happen twice in 2 months after years of no issues.
What changed?

Personally I haven't seen this but that's not a high quality opinion since I don't have a lot of experience with FreeBSD + network troubleshooting. The first thing I would try is probably replace the cable and clean the machine from head to feet with compressed air if necessary + reboot. What about the peer device, it's plugged into a switch I imagine?
 
Sporadic failures on old hardware on very EOL versions?

I think most people are going to say upgrade to a supported version and see if that improves things.

But twice recently in recent months after years of working smells like hardware or cable or “something” issues rather than the non-updated OS?
 
After rebooting it's okay fine?

Isn't that on a separate interface?
Correct, it's only the 1 interface that stops passing traffic - not the entire host

What changed?

Nothing!

Personally I haven't seen this but that's not a high quality opinion since I don't have a lot of experience with FreeBSD + network troubleshooting. The first thing I would try is probably replace the cable and clean the machine from head to feet with compressed air if necessary + reboot. What about the peer device, it's plugged into a switch I imagine?

I don't think it's the cable/switch because otherwise the
Code:
sysctl
numbers wouldn't be junk, unless it's an interaction between the hardware that's tickling a bug in the driver
 
The last time it happened I did save the sysctl dev.igb.0 output, and noticed that crc_errs was astonishingly high:

Code:
dev.igb.0.mac_stats.crc_errs: 3448858737885
A lot of these entries contain that exact same number. Could that be some sort of hardcoded limit for that driver? Does the number go higher?

Edit. Nevermind, it does go higher for some counters according to your post.
 
Back
Top