poor network performance

jsibsd · Mar 14, 2011

Hi list.

I recently installed FreeBSD 7.4 on my server. I have this machine:

Code:

CPU: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz (2000.08-MHz 686-class CPU)
real memory  = 4831834112 (4607 MB)
avail memory = 4180480000 (3986 MB)
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  2
 cpu2 (AP): APIC ID:  4
 cpu3 (AP): APIC ID:  6

The server has 4 bgp session and 8 network cards: Intel and Broadcom.

Code:

bce0@pci0:14:0:0:       class=0x020000 card=0x7059103c chip=0x163914e4 rev=0x20 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme II Gigabit Ethernet (BCM5709)'
    class      = network
    subclass   = ethernet
bce1@pci0:14:0:1:       class=0x020000 card=0x7059103c chip=0x163914e4 rev=0x20 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme II Gigabit Ethernet (BCM5709)'
    class      = network
    subclass   = ethernet

bge0@pci0:3:4:0:        class=0x020000 card=0x703e103c chip=0x167814e4 rev=0xa3 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'BCM5715C 10/100/100 PCIe Ethernet Controller'
    class      = network
    subclass   = ethernet
bge1@pci0:3:4:1:        class=0x020000 card=0x703e103c chip=0x167814e4 rev=0xa3 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'BCM5715C 10/100/100 PCIe Ethernet Controller'
    class      = network
    subclass   = ethernet

The problem is: my bce0 card is directly connected with my CMTS. When I run a ping to CMTS, I have a high latency. Like this:

Code:

PING 10.20.0.2 (10.20.0.2): 56 data bytes
64 bytes from 10.20.0.2: icmp_seq=0 ttl=255 time=0.439 ms
64 bytes from 10.20.0.2: icmp_seq=1 ttl=255 time=0.285 ms
64 bytes from 10.20.0.2: icmp_seq=2 ttl=255 time=0.280 ms
64 bytes from 10.20.0.2: icmp_seq=3 ttl=255 time=0.492 ms
64 bytes from 10.20.0.2: icmp_seq=4 ttl=255 time=0.257 ms
64 bytes from 10.20.0.2: icmp_seq=5 ttl=255 time=0.302 ms
64 bytes from 10.20.0.2: icmp_seq=6 ttl=255 time=0.342 ms
64 bytes from 10.20.0.2: icmp_seq=7 ttl=255 time=0.266 ms
[snip]
64 bytes from 10.20.0.2: icmp_seq=17 ttl=255 time=79.075 ms
64 bytes from 10.20.0.2: icmp_seq=18 ttl=255 time=12.466 ms
64 bytes from 10.20.0.2: icmp_seq=19 ttl=255 time=45.409 ms
64 bytes from 10.20.0.2: icmp_seq=20 ttl=255 time=45.705 ms
64 bytes from 10.20.0.2: icmp_seq=21 ttl=255 time=7.613 ms
64 bytes from 10.20.0.2: icmp_seq=22 ttl=255 time=7.436 ms
64 bytes from 10.20.0.2: icmp_seq=23 ttl=255 time=7.609 ms
64 bytes from 10.20.0.2: icmp_seq=24 ttl=255 time=7.541 ms
[snip]
64 bytes from 10.20.0.2: icmp_seq=28 ttl=255 time=113.203 ms
[snip]
64 bytes from 10.20.0.2: icmp_seq=36 ttl=255 time=8.471 ms
64 bytes from 10.20.0.2: icmp_seq=37 ttl=255 time=12.514 ms
64 bytes from 10.20.0.2: icmp_seq=38 ttl=255 time=24.049 ms
64 bytes from 10.20.0.2: icmp_seq=39 ttl=255 time=66.910 ms
64 bytes from 10.20.0.2: icmp_seq=40 ttl=255 time=88.233 ms

--- 10.20.0.2 ping statistics ---
41 packets transmitted, 41 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.226/18.730/113.203/27.405 ms

The response time is high and not lower. Sometimes I have packet loss too.

My kernel have this options:

Code:

device          pf
device          pflog
device          pfsync
options         ALTQ
options         ALTQ_CBQ        # Class Bases Queuing (CBQ)
options         ALTQ_RED        # Random Early Detection (RED)
options         ALTQ_RIO        # RED In/Out
options         ALTQ_HFSC       # Hierarchical Packet Scheduler (HFSC)
options         ALTQ_PRIQ       # Priority Queuing (PRIQ)
options         ALTQ_NOPCC      # Required for SMP build
options         PAE
options         TCPDEBUG
options         IPSTEALTH
options         HZ=1000
options         ZERO_COPY_SOCKETS

Some sysctls that I changed:

Code:

kern.ipc.maxsockbuf=8388608
net.inet.tcp.rfc1323=1
net.inet.tcp.sendspace=131072
net.inet.tcp.recvspace=131072
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.interrupt=0
kern.ipc.somaxconn=1024
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.isr.direct=0
kern.ipc.nmbclusters=65535

When I enable the "net.isr.direct" I have more packet loss with a very poor performance.

Some more information:

Code:

gw-ija# netstat -m
6816/3429/10245 mbufs in use (current/cache/total)
6814/2926/9740/65536 mbuf clusters in use (current/cache/total/max)
2431/1281 mbuf+clusters out of packet secondary zone in use (current/cache)
0/0/0/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
15374K/6709K/22083K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/6/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

mbuf_packet:              256,        0,     2390,     1322, 610806721,        0
mbuf:                     256,        0,     4389,     2144, 2353625703,        0
mbuf_cluster:            2048,    65536,     8099,     1641, 1143052524,        0
mbuf_jumbo_pagesize:     4096,    12800,        0,        0,        0,        0
mbuf_jumbo_9k:           9216,     6400,        0,        0,        0,        0
mbuf_jumbo_16k:         16384,     3200,        0,        0,        0,        0

gw# netstat -I bce0 -w 1
            input         (bce0)           output
   packets  errs      bytes    packets  errs      bytes colls
     19221     0    5929169      25578     0   22801897     0
     19063     0    6006409      23729     0   20472136     0
     18764     0    5946431      22351     0   19233524     0
     19289     0    6033689      25174     0   22177539     0
     19314     0    6040090      24675     0   21935126     0
     18844     0    5913801      22897     0   20083401     0

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   12 root     171 ki31     0K     8K RUN     2 651:31 98.68% idle: cpu2
   13 root     171 ki31     0K     8K RUN     1 618:59 88.96% idle: cpu1
   11 root     171 ki31     0K     8K CPU3    3 704:50 86.96% idle: cpu3
   14 root     171 ki31     0K     8K CPU0    0 678:56 60.79% idle: cpu0
   17 root     -44    -     0K     8K CPU1    3  45:40 51.95% swi1: net
   34 root     -68    -     0K     8K WAIT    1  97:47  5.76% irq260: bce0
   41 root     -68    -     0K     8K WAIT    2  78:30  2.20% irq265: em3

I hope that someone can help me with this. I dont known what I can do to solve this problem.

Regards.

Alt · Mar 14, 2011

Very interesting..
What's the state of net.inet.ip.fastforwarding? Is polling enabled? idlepoll?
I think you should take a look these links link1 link2
..and just do some experiments=)

It's not looking like your router can't do more, maybe some kind of hardware problem?

jsibsd · Mar 14, 2011

The net.inet.ip.fastforwarding sysctl is disabled. But, when I enable I have this:

Code:

PING 10.20.0.2 (10.20.0.2): 56 data bytes
64 bytes from 10.20.0.2: icmp_seq=0 ttl=255 time=0.261 ms
64 bytes from 10.20.0.2: icmp_seq=1 ttl=255 time=0.360 ms
64 bytes from 10.20.0.2: icmp_seq=2 ttl=255 time=0.257 ms
64 bytes from 10.20.0.2: icmp_seq=3 ttl=255 time=0.347 ms
64 bytes from 10.20.0.2: icmp_seq=4 ttl=255 time=0.426 ms
64 bytes from 10.20.0.2: icmp_seq=5 ttl=255 time=0.652 ms
64 bytes from 10.20.0.2: icmp_seq=6 ttl=255 time=0.345 ms
64 bytes from 10.20.0.2: icmp_seq=7 ttl=255 time=0.374 ms
64 bytes from 10.20.0.2: icmp_seq=8 ttl=255 time=0.423 ms
64 bytes from 10.20.0.2: icmp_seq=9 ttl=255 time=0.434 ms
64 bytes from 10.20.0.2: icmp_seq=10 ttl=255 time=0.521 ms
64 bytes from 10.20.0.2: icmp_seq=11 ttl=255 time=0.223 ms
64 bytes from 10.20.0.2: icmp_seq=13 ttl=255 time=13.811 ms
[snip]
64 bytes from 10.20.0.2: icmp_seq=19 ttl=255 time=11.713 ms
[snip]
64 bytes from 10.20.0.2: icmp_seq=23 ttl=255 time=12.104 ms
64 bytes from 10.20.0.2: icmp_seq=24 ttl=255 time=8.114 ms
64 bytes from 10.20.0.2: icmp_seq=25 ttl=255 time=14.251 ms
64 bytes from 10.20.0.2: icmp_seq=26 ttl=255 time=7.206 ms
64 bytes from 10.20.0.2: icmp_seq=27 ttl=255 time=1.253 ms
64 bytes from 10.20.0.2: icmp_seq=29 ttl=255 time=2.770 ms
64 bytes from 10.20.0.2: icmp_seq=30 ttl=255 time=3.491 ms
64 bytes from 10.20.0.2: icmp_seq=31 ttl=255 time=4.669 ms
64 bytes from 10.20.0.2: icmp_seq=32 ttl=255 time=6.934 ms
^C
--- 10.20.0.2 ping statistics ---
33 packets transmitted, 31 packets received, 6.1% packet loss
round-trip min/avg/max/stddev = 0.223/4.385/14.251/4.296 ms

The latency is a little high, but I have more packet loss.

I'm not using polling. Other information:

Code:

gw# sysctl -a |grep bce
hw.bce.msi_enable: 1
hw.bce.tso_enable: 1

gw# sysctl -a | grep no_buffers
dev.bce.0.com_no_buffers: 696074
dev.bce.1.com_no_buffers: 0

net.inet.tcp.inflight.enable: 0
net.inet.tcp.sendbuf_max: 16777216
net.inet.tcp.recvbuf_max: 16777216
kern.ipc.maxsockbuf: 16777216
net.inet.tcp.rfc1323: 1
net.inet.tcp.sack.enable: 1
net.inet.tcp.path_mtu_discovery: 1
kern.ipc.nmbclusters: 65535

The bce0 has a average traffic ~200Mbps.

Alt · Mar 14, 2011

What's the value of net.inet.ip.intr_queue_drops? Output of [cmd=]vmstat -i[/cmd]
Have any output in /var/log/messages?

jsibsd · Mar 15, 2011

Code:

gw# sysctl net.inet.ip.intr_queue_drops
net.inet.ip.intr_queue_drops: 153478

Code:

gw# vmstat -i
interrupt                          total       rate
irq28: ciss0                       73203         51
irq1: atkbd0                          10          0
irq17: atapci0+                      242          0
irq22: uhci0                           2          0
cpu0: timer                      2840987       1999
irq256: em0                      3226711       2270
irq257: em0                        22794         16
irq258: em0                            2          0
irq259: em1                          862          0
irq260: bce0                    15819800      11132
irq261: bce1                       79257         55
irq262: em2                      2640994       1858
irq263: em2                      3344191       2353
irq264: em2                            1          0
irq265: em3                      9081915       6391
irq266: em3                      9187830       6465
irq267: em3                           14          0
irq268: bge0                     2587563       1820
irq269: bge1                     2082450       1465
cpu2: timer                      2832476       1993
cpu3: timer                      2832631       1993
cpu1: timer                      2832449       1993
Total                           59486384      41862

No important information in /var/log/messages. Only named's log.

Alt · Mar 15, 2011

Well, I think you got massive load on interrupts, which overruns net.inet.ip.intr_queue_maxlen. So you can try increasing it -- this should decrase packet loss, but probably you will get more latency. IMHO bce is not a very "lucky" driver, you can try to exchange it with other (less loaded) interface=)

Also you have big idle %, so I would choose polling+idlepoll..

jsibsd · Mar 15, 2011

Alt,

First, I'd like to thank your attention with my situation.

I'll to invert the bce0 with em1. My em3 card has a high traffic too. I actived polling in this interface and I configured the

Code:

kern.polling.idle_poll=1

. I don't known if is the better value in this case.

I think with polling I'll get more packet loss and the latency will be high. My 'top' show this at this moment:

Code:

CPU:  0.7% user,  0.0% nice, 23.3% system, 17.3% interrupt, 58.7% idle
Mem: 301M Active, 1241M Inact, 368M Wired, 2248K Cache, 112M Buf, 2079M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   62 root     171 ki31     0K     8K CPU1    1  56:04 96.58% idlepoll
   11 root     171 ki31     0K     8K RUN     3  89:44 89.70% idle: cpu3
   12 root     171 ki31     0K     8K CPU2    2  83:42 58.79% idle: cpu2
   17 root     -44    -     0K     8K CPU0    0  40:40 55.47% swi1: net
   14 root     171 ki31     0K     8K RUN     0  84:51 52.69% idle: cpu0
   13 root     171 ki31     0K     8K RUN     1  89:49 43.99% idle: cpu1
   34 root     -68    -     0K     8K WAIT    2  30:23  5.66% irq260: bce0
   47 root     -68    -     0K     8K WAIT    3   6:46  3.96% irq17: bge1 atapc
   59 root       8    -     0K     8K pftm    1   3:29  1.46% pfpurge
   46 root     -68    -     0K     8K WAIT    2   3:56  0.78% irq16: bge0
   28 root     -68    -     0K     8K WAIT    2   8:26  0.68% irq256: em0
   36 root     -68    -     0K     8K WAIT    0   7:21  0.49% irq262: em2
   38 root     -68    -     0K     8K WAIT    1   0:46  0.10% irq263: em2
   41 root     -68    -     0K     8K WAIT    3  10:24  0.00% irq265: em3

My kernel HZ is 1000.

I increase the value of kern.polling.burst_max to 1000. What's the kern.polling.lost_polls? Is this something about packet loss?

Code:

gw# sysctl -a | grep kern.poll
kern.polling.idlepoll_sleeping: 0
kern.polling.stalled: 361
kern.polling.suspect: 71269
kern.polling.phase: 2
kern.polling.enable: 0
kern.polling.handlers: 1
kern.polling.residual_burst: 0
kern.polling.pending_polls: 1
kern.polling.lost_polls: 313773
kern.polling.short_ticks: 2163
kern.polling.reg_frac: 20
kern.polling.user_frac: 50
kern.polling.idle_poll: 1
kern.polling.each_burst: 100
kern.polling.burst_max: 1000
kern.polling.burst: 407

Code:

gw# vmstat -i
interrupt                          total       rate
irq28: ciss0                      117426         12
irq1: atkbd0                          10          0
irq16: bge0                     14482893       1513
irq17: bge1 atapci+             13149855       1374
irq22: uhci0                           2          0
cpu0: timer                     19138204       1999
irq256: em0                     36586752       3823
irq257: em0                        77269          8
irq258: em0                            2          0
irq259: em1                         5874          0
irq260: bce0                   101454924      10601
irq261: bce1                     1444367        150
irq262: em2                     28758349       3005
irq263: em2                     36686501       3833
irq264: em2                            1          0
irq265: em3                     17662463       1845
irq266: em3                     17614867       1840
irq267: em3                        20866          2
cpu3: timer                     19129851       1998
cpu1: timer                     19129760       1998
cpu2: timer                     19129610       1998
Total                          344589846      36007

I really don't known if it is the better config.

DutchDaemon · Mar 15, 2011

jsibsd, format your posts, please.

Alt · Mar 16, 2011

What's the kern.polling.lost_polls? Is this something about packet loss?

No, its just about skipped pool ticks

Dont know why your vmstat shows interrupts for em3.. Maybe polling is not activated? What if you try to enable on all interfaces?

poor network performance

Administrator