strange behavior of the network ae0 on FreeBSD 8 RELEASE

What is:
1.
Code:
% uname-sr
FreeBSD 8.0-RELEASE FreeBSD 8.0-RELEASE
2.
Code:
% ifconfig
ae0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ae0: flags = 8843 <UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=2018<VLAN_MTU,VLAN_HWTAGGING,WOL_MAGIC> options = 2018 <VLAN_MTU,VLAN_HWTAGGING,WOL_MAGIC>
ether 00:25:11:c5:7d:06 ether 00:25:11: c5: 7d: 06
inet 192.168.3.9 netmask 0xffffff00 broadcast 192.168.3.255 inet 192.168.3.9 netmask 0xffffff00 broadcast 192.168.3.255
media: Ethernet 100baseTX <full-duplex> media: Ethernet 100baseTX <full-duplex>
status: active status: active
vr0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 vr0: flags = 8843 <UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=2808<VLAN_MTU,WOL_UCAST,WOL_MAGIC> options = 2808 <VLAN_MTU,WOL_UCAST,WOL_MAGIC>
ether 00:24:01:05:0b:a6 ether 00:24:01:05:0 b: a6
inet 172.16.121.9 netmask 0xffffff00 broadcast 172.16.121.255 inet 172.16.121.9 netmask 0xffffff00 broadcast 172.16.121.255
media: Ethernet autoselect (100baseTX <full-duplex>) media: Ethernet autoselect (100baseTX <full-duplex>)
status: active status: active
plip0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> metric 0 mtu 1500 plip0: flags = 8810 <POINTOPOINT,SIMPLEX,MULTICAST> metric 0 mtu 1500
pfsync0: flags=0<> metric 0 mtu 1460 pfsync0: flags = 0 <> metric 0 mtu 1460
syncpeer: 224.0.0.240 maxupd: 128 syncpeer: 224.0.0.240 maxupd: 128
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 lo0: flags = 8049 <UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=3<RXCSUM,TXCSUM> options = 3 <RXCSUM,TXCSUM>
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 inet6 fe80:: 1% lo0 prefixlen 64 scopeid 0x5
inet6 ::1 prefixlen 128 inet6:: 1 prefixlen 128
inet 127.0.0.1 netmask 0xff000000 inet 127.0.0.1 netmask 0xff000000
pflog0: flags=0<> metric 0 mtu 33200 pflog0: flags = 0 <> metric 0 mtu 33200
ng0: flags=88d1<UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1460 ng0: flags = 88d1 <UP,POINTOPOINT,RUNNING,NOARP,SIMPLEX,MULTICAST> metric 0 mtu 1460
inet 10.255.255.4 --> 10.255.255.3 netmask 0xffffffff inet 10.255.255.4 -> 10.255.255.3 netmask 0xffffffff
3.
Code:
% dmesg-a | grep ae0
ae0: <Attansic Technology Corp, L2 FastEthernet> mem 0xfeac0000-0xfeafffff irq 16 at device 0.0 on pci1 ae0: <Attansic Technology Corp, L2 FastEthernet> mem 0xfeac0000-0xfeafffff irq 16 at device 0.0 on pci1
Added to the kernel IPSEC and PF
Code:
% cat / usr/src/sys/i386/conf/MYKERNELL_IPSEC_2010-04-02
include GENERIC include GENERIC
options         IPSEC options IPSEC
options         IPSEC_DEBUG options IPSEC_DEBUG
device          crypto device crypto


device          pf device pf
device          pflog device pflog
device          pfsync device pfsync

options         ALTQ options ALTQ
options         ALTQ_CBQ options ALTQ_CBQ
options         ALTQ_RED options ALTQ_RED
options         ALTQ_RIO options ALTQ_RIO
options         ALTQ_HFSC options ALTQ_HFSC
options         ALTQ_CDNR options ALTQ_CDNR
options         ALTQ_PRIQ options ALTQ_PRIQ
options         ALTQ_NOPCC options ALTQ_NOPCC
options         ALTQ_DEBUG options ALTQ_DEBUG

Without any systematics people call in from the branch with a complaint about the lack of communication, I go to ssh 10.255.255.4 to do ping to any of the local-local addresses, in response
Code:
ping: sendto: No buffer space available

in the logs at this point
Code:
Feb 21 17:15:02 gateway kernel: ae0: watchdog timeout - resetting.
Feb 21 17:15:02 gateway kernel: ae0: link state changed to DOWN Feb 21 17:15:02 gateway kernel: ae0: link state changed to DOWN
Feb 21 17:15:04 gateway kernel: ae0: link state changed to UP Feb 21 17:15:04 gateway kernel: ae0: link state changed to UP
Feb 21 17:15:06 gateway kernel: ae0: Size mismatch: TxS:66 TxD:17593 Feb 21 17:15:06 gateway kernel: ae0: Size mismatch: TxS: 66 TxD: 17593
Feb 21 17:15:06 gateway kernel: ae0: Received stray Tx interrupt(s). Feb 21 17:15:06 gateway kernel: ae0: Received stray Tx interrupt (s).

None but some excess load on the computer and there is no speech, it is just a bridge to the central office, and Fixing a cheap unlimited traffic channel in the central office.

at the time of "falling"
Code:
% netstat-m
180/210/390 mbufs in use (current/cache/total) 180/210/390 mbufs in use (current / cache / total)
128/134/262/25600 mbuf clusters in use (current/cache/total/max) 128/134/262/25600 mbuf clusters in use (current / cache / total / max)
144/128 mbuf+clusters out of packet secondary zone in use (current/cache) 144/128 mbuf + clusters out of packet secondary zone in use (current / cache)
0/20/20/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/20/20/12800 4k (page size) jumbo clusters in use (current / cache / total / max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current / cache / total / max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current / cache / total / max)
288K/413K/701K bytes allocated to network (current/cache/total) 288K/413K/701K bytes allocated to network (current / cache / total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs denied (mbufs / clusters / mbuf + clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/3/6656 sfbufs in use (current/peak/max) 0/3/6656 sfbufs in use (current / peak / max)
0 requests for sfbufs denied 0 requests for sfbufs denied
0 requests for sfbufs delayed 0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile 0 requests for I / O initiated by sendfile
0 calls to protocol drain routines 0 calls to protocol drain routines
after the distortion of the interface ifconfig ae0 down; ifconfig ae0 up
Code:
% netstat-m
130/260/390 mbufs in use (current/cache/total) 130/260/390 mbufs in use (current / cache / total)
128/134/262/25600 mbuf clusters in use (current/cache/total/max) 128/134/262/25600 mbuf clusters in use (current / cache / total / max)
128/128 mbuf+clusters out of packet secondary zone in use (current/cache) 128/128 mbuf + clusters out of packet secondary zone in use (current / cache)
0/20/20/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/20/20/12800 4k (page size) jumbo clusters in use (current / cache / total / max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current / cache / total / max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current / cache / total / max)
288K/413K/701K bytes allocated to network (current/cache/total) 288K/413K/701K bytes allocated to network (current / cache / total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs denied (mbufs / clusters / mbuf + clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/3/6656 sfbufs in use (current/peak/max) 0/3/6656 sfbufs in use (current / peak / max)
0 requests for sfbufs denied 0 requests for sfbufs denied
0 requests for sfbufs delayed 0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile 0 requests for I / O initiated by sendfile
0 calls to protocol drain routines 0 calls to protocol drain routines
where to dig?
ps
yet the trouble is that there is no possibility to change the network (one integrated + one dlink 520) pci to nurse only one, a bad bargain very sensitive ...

p.p.s.
sorry, I know English bad, use translate.google.com :stud
 
Keep in mind this question was asked more than 3 years ago.
 
If you still see the same issue on more recent FreeBSD releases(e.g. 8.4-RELEASE), please open a new PR and let me know the PR number.

A long time ago, I fixed a controller lockup issue of ae(4) in r227452 but the fix does not fix watchdog timeouts. Probably diver can reset the controller when it detects an abnormal condition which in turn shall get rid of redundant watchdog timeouts.
 
I replied to the original PR but ok, I will open a new one.

PS: We had to insert another NIC in the affected server because it's a production machine.
 
Back
Top