Random Networking issues

Hi everyone,

I've been recently having some weird networking issues on my server running FreeBSD 9.0-RELEASE. This is a busy web server and I'm having those issues when the web server is getting many hits. I've been tweaking it for some time but I'm afraid I may overtweak it or something and don't know where to go. If someone could help me to recover, I would highly appreciate it.

The issues are: connections getting terminated, sometimes I could not even connect (connection reset), when I'm on the box sometimes name resolution can't communicate with remote and/or local DNS. I think I'm hitting some kind of network limit but don't know how to check. I'm also running ipfilter as my firewall. Can it hit some limits too? How to check?

Thanks in advance.

Here are some config files

/boot/loader.conf
Code:
autoboot_delay="2"
accf_http_load="YES"
accf_data_load="YES"
accf_dns_load="YES"

if_tap_load="YES"
if_bridge_load="YES"

mfi_linux_load="YES"

# network tuning
net.inet.tcp.tcbhashsize=8192
net.inet.tcp.hostcache.hashsize=8192
net.inet.tcp.hostcache.bucketlimit=400
net.inet.tcp.hostcache.cachelimit=524288

net.inet.tcp.syncache.hashsize=8192
net.inet.tcp.syncache.bucketlimit=400
net.inet.tcp.syncache.cachelimit=524288

net.link.ifqmaxlen=1024

/etc/sysctl.conf
Code:
# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0

security.jail.set_hostname_allowed=0
security.jail.allow_raw_sockets=1

# https://calomel.org/network_performance.html
kern.ipc.maxsockbuf=16777216

# network tuning
# http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel
kern.ipc.somaxconn=32768
kern.ipc.nmbclusters=524288
net.inet.ip.portrange.first=30000
kern.ipc.maxsockets=204800
net.inet.tcp.maxtcptw=200000
net.inet.tcp.fast_finwait2_recycle=1

net.inet.tcp.sendbuf_max=16777216 
net.inet.tcp.recvbuf_max=16777216

net.local.stream.recvspace=65535
net.local.stream.sendspace=65535

kern.threads.max_threads_per_proc=4096

net.inet.ip.intr_queue_maxlen=4096

# stops route cache degregation during a high-bandwidth flood
# http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html
#net.inet.ip.rtexpire=2
net.inet.ip.rtminexpire=2
net.inet.ip.rtmaxcache=4096

# http://klaver.it/bsd/sysctl.conf
net.inet.udp.maxdgram=57344
net.inet.udp.recvspace=256960
net.inet.ip.process_options=0

Here is my vmstat -z output:
http://pastebin.com/pkLuBCHj

netstat -m:
Code:
7660/8990/16650 mbufs in use (current/cache/total)
6121/3869/9990/524288 mbuf clusters in use (current/cache/total/max)
6121/3863 mbuf+clusters out of packet secondary zone in use (current/cache)
0/1352/1352/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
14157K/15393K/29550K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
62 requests for I/O initiated by sendfile
0 calls to protocol drain routines
 
RE: Ipfilter

Several years ago ipfilter, because of locking issues, could not run reliably on multi-processor/core systems. I am not sure whether that has changed. See http://markmail.org/message/7nealx7qfgbq2jdh#query:+page:1+mid:4fu6uerlukwmwuel+state:results And I also don't know whetheryou have a SMP machine ;)

RE: Tuning

The problem with most 'tuning' guides is that most of them are quite old, and may not be applicable anymore. If I don't know what a sysctl exactly does, I don't touch it. I would recommend to stick with what tuning(7) says.

RE:'net.inet.tcp.sendbuf_max=16777216' and 'net.inet.tcp.recvbuf_max=16777216'

This could cause your system to run out of memory. From tuning(7)

Code:
The net.inet.tcp.sendspace and net.inet.tcp.recvspace sysctls are of par-
     ticular interest if you are running network intensive applications.  They
     control the amount of send and receive buffer space allowed for any given
     TCP connection.  The default sending buffer is 32K; the default receiving
     buffer is 64K.  You can often improve bandwidth utilization by increasing
     the default at the cost of eating up more kernel memory for each connec-
     tion.  [B]We do not recommend increasing the defaults if you are serving
     hundreds or thousands of simultaneous connections because it is possible
     to quickly run the system out of memory due to stalled connections build-
     ing up.[/B]  But if you need high bandwidth over a fewer number of connec-
     tions, especially if you have gigabit Ethernet, increasing these defaults
     can make a huge difference.  You can adjust the buffer size for incoming
     and outgoing data separately.  For example, if your machine is primarily
     doing web serving you may want to decrease the recvspace in order to be
     able to increase the sendspace without eating too much kernel memory.

RE: Apache

Have you checked the /var/log/httpd-error.log for messages like
Code:
[info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers)
?

RE: Network or TCP/IP errors

Does netstat -ib report collisions or significant percentage of input/output errors? Check netstat -ss for detailed statistics of each network protocol.
 
J65nko,
First, thanks a lot for your response. I actually figured out what was wrong (later on that).

The server has plenty of memory so send/recv buf should not be an issue. I'll decrease it anyway because this was not the issue.
I have enough Apache instances to not worry about it too. Also, I cache static content and Apache load is minimal.
Lucky me, I have zero errors, no collisions issues and hardware network issues like you described. I would dump my colo provider if I had :-)

Here is how I found and fixed the issue just in case if someone else will get the same thing.
The issue was in my ipf. I started checking usage by "ipfstat -s" and saw increase in active states going to about 4100 (the default of net.inet.ipf.fr_statemax).
"Bingo!" was heard in my head at the same moment. I hit the limit. I bumped it up and all issues were gone right away.
I was monitoring my active states for awhile during peak time and got about 4400 max at some point. Then I increased net.inet.ipf.fr_statemax to be just high enough to handle this kind of load.
I can sleep during nights now. I wish I'd found the ipfstat earlier.

Thanks again for your response anyway.
 
Back
Top