server drops off network

I'm running 8.0-release on a Supermicro X7SPA-H with dual GbE. I have the two interfaces bridged (using if_bridge) with em0 connected to my PC and em1 to a 100Mb router.

After days of uptime, em1 will suddenly stop responding to anything. I cannot ping it from the router, and I cannot ping the router from the console. I can ping the server from my PC and vice versa, so em0 is still working. Nothing unusual appears in /var/log/messages.

The symptoms look exactly like this OpenSolaris issue with the same board, which I don't think was ever solved.

What can I do to further diagnose this problem next time it happens? The only other computer on the network is my WinXP desktop.
 
Try disable some hardware features:

Code:
ifconfig em1 -tso -rxcsum -txcsum

If it works, try enable them one at a time until you find the problematic one. Let us know if you find the solution... :)
 
aragon said:
Try disable some hardware features:

Code:
ifconfig em1 -tso -rxcsum -txcsum

No dice. Did the exact same thing again today. This time I got a new error message. I ran ifconfig em1 down; ifconfig em1 up and got this:

Code:
em1: could not setup receive structures
 
ScottJ97 said:
No dice. Did the exact same thing again today. This time I got a new error message. I ran ifconfig em1 down; ifconfig em1 up and got this:

Code:
em1: Could not setup receive structures

Looking through the source code, it seems this has to do with mbufs. I will try changing kern.ipc.nmbclusters from 32768 to 65536 and see if that helps.
 
ScottJ97 said:
Looking through the source code, it seems this has to do with mbufs. I will try changing kern.ipc.nmbclusters from 32768 to 65536 and see if that helps.

Another dropout today. I logged in from the console and ran ifconfig em1 down; ifconfig em1 up, and everything was fine again. So that's different since I changed nmbclusters.

After doing that, I ran netstat -m:

Code:
45866/20524/66390 mbufs in use (current/cache/total)
45727/19809/65536/65536 mbuf clusters in use (current/cache/total/max)
43366/591 mbuf+clusters out of packet secondary zone in use (current/cache)
0/260/260/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
102920K/45789K/148709K bytes allocated to network (current/cache/total)
0/3538/1768 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
2 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Then I ran ifconfig:

Code:
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
	ether 00:25:90:02:16:54
	media: Ethernet autoselect
	status: no carrier
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
	ether 00:25:90:02:16:55
	media: Ethernet autoselect (100baseTX <full-duplex>)
	status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=3<RXCSUM,TXCSUM>
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 
	inet6 ::1 prefixlen 128 
	inet 127.0.0.1 netmask 0xff000000 
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 00:25:90:02:16:54
	inet 192.168.1.93 netmask 0xffffff00 broadcast 192.168.1.255
	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
	maxage 20 holdcnt 6 proto rstp maxaddr 100 timeout 1200
	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
	member: em1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 2 priority 128 path cost 2000000
	member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 1 priority 128 path cost 2000000

Is there anything in there that stands out? I don't know enough about the network stack to interpret much of anything in there.
 
Maybe if you google ...
numbclusters AND em1 OR em0
someone found something? If too many results you
can add ...
supermicro OR gbe
to the above...
OTOH at bsdstats.org they have in one of the menus
listings of drivers "em" etc by number,
(or listed differently, ) maybe you
would want a cheaper slower one that works better if
the board has enough slots, enough irq's etc.
 
After transferring several gigs, netstat -m still shows low mbuf usage. So I think it's safe to say this is now solved.
 
Back
Top