Network errors / collisions on bge0 - FramesTooLong ??

Hi,

Code:
FreeBSD navy.spectrumcs.net 9.2-RELEASE-p3 FreeBSD 9.2-RELEASE-p3 #0: Sat Jan 11 03:25:02 UTC 2014     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

I have three FreeBSD servers in the same data centre, one of which I've noticed with the help of Munin has a non-zero amount of Input Errors. There server has been in operation since around Feb 2012 and I don't recall seeing any errors until around mid 2013 (when I upgraded to FreeBSD 9.1 from 9.0). The errors don't appear to be impacting the servers performance (it's a mail server but is also running MySQL for feed mail configuration to the other two servers in the data centre).

Yesterday I upgraded all three servers to FreeBSD 9.2 and have been monitoring the servers closely and I've noticed that Munin is reporting (graph attached) even higher Input Errors than previously so I thought I'd take a look into the problem.

Looking at the Munin plugin I can see it's running the following command...

Code:
/usr/bin/netstat -i -b -n -I bge0
and getting the following output
Code:
Name    Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes    Opkts Oerrs     Obytes  Coll
bge0   1400 <Link#1>      e4:1f:13:XX:XX:XX 21480728   799     0 3313781806 16682639     0 3836432749     0
bge0   1400 109.169.26.0/ 109.169.26.118    17933718     -     - 2844454533 17077923     - 3602874137     -
bge0   1400 fe80::e61f:13 fe80::e61f:13ff:f        0     -     -          0        1     -         96     -

Munin only appears to be parsing the first line with the Link#1 in it. Ierrs is currently 799.

This post (http://serverfault.com/questions/122355 ... se-freebsd) suggested I should try the following...
Analyse:
Code:
dmesg
sysctl dev.bce
vmstat -z (USED/LIMIT)
nestat -s (Errors/Buffers)


# dmesg | grep bge0
Code:
bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x00a200> mem 0xe8200000-0xe820ffff irq 16 at device 0.0 on pci2
bge0: CHIP ID 0x0000a200; ASIC REV 0x0a; CHIP REV 0xa2; PCI-E
miibus0: <MII bus> on bge0
bge0: Ethernet address: e4:1f:13:XX:XX:XX

# sysctl dev.bge
Code:
dev.bge.0.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x00a200
dev.bge.0.%driver: bge
dev.bge.0.%location: slot=0 function=0 handle=\_SB_.PCI0.EXP5.PXS5
dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x165a subvendor=0x1014 subdevice=0x0378 class=0x020000
dev.bge.0.%parent: pci2
dev.bge.0.forced_collapse: 0
dev.bge.0.msi: 1
dev.bge.0.forced_udpcsum: 0
dev.bge.0.stats.FramesDroppedDueToFilters: 0
dev.bge.0.stats.DmaWriteQueueFull: 0
dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
dev.bge.0.stats.NoMoreRxBDs: 0
dev.bge.0.stats.InputDiscards: 0
dev.bge.0.stats.InputErrors: 800
dev.bge.0.stats.RecvThresholdHit: 0
dev.bge.0.stats.rx.ifHCInOctets: 3445050017
dev.bge.0.stats.rx.Fragments: 0
dev.bge.0.stats.rx.UnicastPkts: 21457049
dev.bge.0.stats.rx.MulticastPkts: 4433
dev.bge.0.stats.rx.BroadcastPkts: 270589
dev.bge.0.stats.rx.FCSErrors: 0
dev.bge.0.stats.rx.AlignmentErrors: 0
dev.bge.0.stats.rx.xonPauseFramesReceived: 0
dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
dev.bge.0.stats.rx.ControlFramesReceived: 0
dev.bge.0.stats.rx.xoffStateEntered: 0
[highlight]dev.bge.0.stats.rx.FramesTooLong: 799[/highlight]
dev.bge.0.stats.rx.Jabbers: 0
dev.bge.0.stats.rx.UndersizePkts: 0
dev.bge.0.stats.tx.ifHCOutOctets: 3984405548
dev.bge.0.stats.tx.Collisions: 0
dev.bge.0.stats.tx.XonSent: 0
dev.bge.0.stats.tx.XoffSent: 0
dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
dev.bge.0.stats.tx.SingleCollisionFrames: 0
dev.bge.0.stats.tx.MultipleCollisionFrames: 0
dev.bge.0.stats.tx.DeferredTransmissions: 0
dev.bge.0.stats.tx.ExcessiveCollisions: 0
dev.bge.0.stats.tx.LateCollisions: 0
dev.bge.0.stats.tx.UnicastPkts: 17497712
dev.bge.0.stats.tx.MulticastPkts: 0
dev.bge.0.stats.tx.BroadcastPkts: 2

# vmstat -z
[A lot of output, nothing useful as far as I could tell]

# netstat -s | grep buffer
Code:
42 dropped due to full socket buffers
0 messages dropped due to full socket buffers

# netstat -s | grep errors
Code:
0 times sctp_senderrors were caused from a user
0 errors not generated in response to an icmp message
0 errors not generated in response to an icmp6 message
0 errors not generated because of rate limitation

The line I found interesting was dev.bge.0.stats.rx.FramesTooLong: 799 as the value matched the Ierr value in the output of /usr/bin/netstat -i -b -n -I bge0. So the question is what does FramesTooLong mean? I've had a google around but I've not been able to find any suggestion as to what it means? Initially I thought perhaps it was possible related to TTL, but I think that's related to packets not frames and therefore not related?

Any help gratefully received!
 

Attachments

  • bge0-errors&collisions-by-year.png
    bge0-errors&collisions-by-year.png
    22.8 KB · Views: 777
I would assume FramesTooLong means that somebody is sending you frames (packets) that are too long for your configuration. Gigabit ethernet allows jumbo frames, but your MTU is only 1400. That is very short considering that the normal non-jumbo ethernet MTU is usually 1500.
 
Depending on the switch you might also want to try fixating the speed/duplex settings on both sides (on the switch and the machine). I've seen the auto-detect cause packet errors too.
 
Thank you very much for taking the time to respond to my query.

Picking up on the MTU configuration, I have checked the other two servers and they have a 1500 MTU and for some reason the machine with the iErrors has a 1400 MTU (persistently set in rc.conf too).

I'll look to alter that to 1500 out of hours this evening.

For completeness, the link speed on this network adapter is auto negotiated to 100baseTX <full-duplex>
# /sbin/ifconfig
Code:
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1400
        options=c019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
        ether e4:1f:13:XX:XX:XX
        inet 109.169.XXX.XXX netmask 0xffffff00 broadcast 109.169.XXX.XXX
        inet6 fe80::e61f:XXXX:XXXX:XXXX%bge0 prefixlen 64 scopeid 0x1
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
 
SirDice said:
Depending on the switch you might also want to try fixating the speed/duplex settings on both sides (on the switch and the machine). I've seen the auto-detect cause packet errors too.

That is OK if you are getting stuck with 10 Mbps half duplex and want to force 100 Mbps full duplex, but it is my understanding that IEEE Std 802.3ab requires auto negotiation for 1000BaseT (i.e. gigabit). It does more than set the speed and full/half duplex; it also sets the master and slave modes for the PHY clocking. I know gigabit negotiation generally fails if you don't have all four pairs connected, as in many early 10BaseT and 10/100 networks.

Here is a pretty good PDF article on this, although I think it is wrong when it says that you can run gigabit half duplex because as far as I know there are no switches that support it.
 
Thank you all for your contributions.

I altered the MTU on the server to 1500 twelve hours ago and since then netstats iErr's and sysctl dev.bge's dev.bge.0.stats.rx.FramesTooLong values have remained at zero so I think we have this problem resolved.

For future reference, I followed an example I found on the Internet and executed the following command...
ifconfig bge0 109.169.26.118 mtu 1500
at which point I lost remote connectivity. Once I gained KVMoIP access I discovered that the default gateway was unset (presumably as a result of the above command?).

I believe in retrospect I should have used the following command which doesn't seem to affect the default gateway
ifconfig bge0 mtu 1500
(I tested it while I had the KVMoIP session available).

I also altered /etc/rc.conf from
Code:
ifconfig_bge0="inet 109.169.26.118 netmask 255.255.255.0 MTU 1400"
to
Code:
ifconfig_bge0="inet 109.169.26.118 netmask 255.255.255.0"

Thanks again

Steve
 
Back
Top