High CPU interrupts on the router (igb driver). How to fix?

Hi

I have a server:
  • Supermicro 6016T-URF
  • CPU: 2 x Intel(R) Xeon(R) CPU E5620@2.40GHz (2 x 4 cores, hyper threading disabled)
  • RAM: 6Gb (3x2Gb DDR3)
  • HDD: Flash 8Gb
  • OS: NanoBSD 8.2 p4 amd64

At a load of 530M symmetric traffic and 81 kpps server loaded completely interrupts:
Code:
# top -aSP
last pid:  4616;  load averages:  1.91,  1.19,  0.67  up 0+12:49:46  15:44:28
188 processes: 36 running, 118 sleeping, 34 waiting
CPU 0:  0.0% user,  0.0% nice,  2.7% system, 97.3% interrupt,  0.0% idle
CPU 1:  0.0% user,  0.0% nice,  6.0% system, 93.3% interrupt,  0.7% idle
CPU 2:  0.0% user,  0.0% nice,  8.1% system, 91.9% interrupt,  0.0% idle
CPU 3:  0.0% user,  0.0% nice,  2.0% system, 98.0% interrupt,  0.0% idle
CPU 4:  0.0% user,  0.0% nice, 17.4% system, 77.2% interrupt,  5.4% idle
CPU 5:  2.0% user,  0.0% nice, 26.8% system, 63.8% interrupt,  7.4% idle
CPU 6:  0.0% user,  0.0% nice,  8.1% system, 87.2% interrupt,  4.7% idle
CPU 7:  2.7% user,  0.0% nice, 30.9% system, 55.0% interrupt, 11.4% idle
Mem: 38M Active, 13M Inact, 519M Wired, 1128K Cache, 77M Buf, 5338M Free
Swap:
  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   12 root       58 -60    -     0K   928K WAIT    0  21.3H 686.23% [intr]
    0 root       40 -68    0     0K   624K -       7  26:11 36.52% [kernel]
   11 root        8 171 ki31     0K   128K RUN     0  80.1H 36.04% [idle]
...

Configuration:
Code:
 #cat /boot/loader.conf
loader_logo="none"        
autoboot_delay="2"        
hw.ata.atapi_dma="0"      
hw.ata.ata_dma="0"        
hw.ata.wc="0"            
net.fibs=16               
kern.cam.boot_delay=10000 
hw.igb.rxd=4096
hw.igb.txd=4096
hw.igb.max_interrupt_rate=1000

Code:
 #cat /etc/sysctl.conf
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=1
net.inet.ip.redirect=0
net.inet.tcp.drop_synfin=1
net.inet.ip.forwarding=1
net.inet.flowtable.enable=0

kern.coredump=0
kern.ipc.nmbclusters=512000

dev.igb.0.rx_processing_limit=4096
dev.igb.0.enable_aim=0
dev.igb.0.flow_control=0
dev.igb.1.rx_processing_limit=4096
dev.igb.1.enable_aim=0
dev.igb.1.flow_control=0
dev.igb.2.rx_processing_limit=4096
dev.igb.2.enable_aim=0
dev.igb.2.flow_control=0
dev.igb.3.rx_processing_limit=4096
dev.igb.3.enable_aim=0
dev.igb.3.flow_control=0

Code:
1# cat /etc/start_if.igb0 
ifconfig igb0 -rxcsum -txcsum -lro -tso up
ifconfig igb1 -rxcsum -txcsum -lro -tso up
ifconfig igb2 -rxcsum -txcsum -lro -tso up
ifconfig igb3 -rxcsum -txcsum -lro -tso up
....

To balance the queues for processor cores using a script:
Code:
# cat /usr/local/etc/rc.d/cpuset-igb
#!/bin/sh
# PROVIDE: cpuset-igb
# REQUIRE: FILESYSTEMS
# BEFORE:  netif
# KEYWORD: nojail

case "$1" in
*start)
  echo "Binding igb(4) IRQs to CPUs"
  cpus=`sysctl -n kern.smp.cpus`
  vmstat -ai | sed -E '/^irq.*que/!d; s/^irq([0-9]+): igb([0-9]+):que ([0-9]+).*/\1 \2 \3/' |\
  while read irq igb que
  do
    cpuset -l $(( ($igb+$que) % $cpus )) -x $irq
  done
  ;;
esac

To limit the user's bandwidth is used PF + Altq.
For routing using Quagga.


How to fix high interrupt rate?
 
Why are you hobbling the NIC so much?

Remove all your sysctl/loader settings for the NICs and then check the performance. Most of your settings remove hardware functions, forcing the CPU to take over.
 
If you find out the answer, please post it, I have a very similar setup. There is also a Russian forum of network operators where also are a few topics with discussing interrupt rate in FreeBSD. And post, please, results of next commands:

# top -aSCHIP
# sysctl dev.igb | grep -v ": 0"
# pfctl -si

Have you got NAT in PF? Why do you prefer AltQ instead of ipfw's dummynet?
 
Sorry so long in reply to a topic.
I wanted to be sure that we have found the right solution.
After the refusal of PF and the transition to IPFW problem persists.
Now use the standard settings for your network card.
 
Almost without tuning.

Changed only the following parameters:
Code:
# сat / boot / loader.conf | grep igb
hw.igb.rxd = 4096
hw.igb.txd = 4096
Code:
# cat /etc/sysctl.conf | grep nmb
kern.ipc.nmbclusters=512000
It is possible to reduce costs hw.igb.max_interrupt_rate from 8000 to 1000, which would reduce the number of interrupts when used as terminating the server.
 
Well, it's a strange.
I have:
  • Supermicro 5017C-MTRF (Intel SandyBridge/Cougar Point)
  • CPU: 1 x Intel(R) Xeon(R) CPU E3-1270@3.40GHz (1 x 4 cores, hyper threading disabled)
  • RAM: 8Gb (2x4Gb DDR3)
  • HDD: 2x1Tb (in gmirror)
  • NIC: 6x1GbE (4x Intel 82576, 2x onboard: Intel 82579LM and 82574L)
  • OS: FreeBSD 8.2-RELEASE amd64

It used for NAT (pf) and shaping (ipfw's dummynet) ~3000 "home" customers, it's ~700 Mbit/sec with shaping or ~1000 Mbit/sec when I turn shaping off, and it's still far away from CPU saturation. I can't show it now (in reason of off-peak hours currently), but IRQ was around 20-30% per CPU and at one kernel was 30% load through dummynet works.
My tuning is :
/etc/sysctl.conf
Code:
dev.em.0.rx_processing_limit=1000
dev.em.1.rx_processing_limit=1000
dev.igb.0.dma_coalesce=1
dev.igb.0.flow_control=0
dev.igb.0.rx_processing_limit=4096
dev.igb.1.dma_coalesce=1
dev.igb.1.flow_control=0
dev.igb.1.rx_processing_limit=4096
dev.igb.2.dma_coalesce=1
dev.igb.2.flow_control=0
dev.igb.2.rx_processing_limit=4096
dev.igb.3.dma_coalesce=1
dev.igb.3.flow_control=0
dev.igb.3.rx_processing_limit=4096
kern.corefile="/var/tmp/%U/%N.core"
kern.ipc.maxsockbuf=2097152
kern.ipc.nmbclusters=262144
kern.ipc.somaxconn=4096
kern.timecounter.hardware=HPET
net.inet.ip.dummynet.expire=0
net.inet.ip.dummynet.io_fast=1
net.inet.ip.fastforwarding=1
net.inet.ip.intr_queue_maxlen=3000
net.inet.ip.process_options=0
net.inet.ip.redirect=0
net.inet.ip.stealth=1
net.inet.tcp.delayed_ack=0
net.inet.tcp.drop_synfin=1
net.inet.tcp.recvspace=65228
net.inet.tcp.sendspace=65228
net.inet.tcp.syncookies=1
net.inet.udp.maxdgram=57344
net.inet.udp.recvspace=65228
net.raw.recvspace=64000
net.raw.sendspace=64000
See, I removed dev.igb.1.enable_aim=0.

/boot/loader.conf
Code:
autoboot_delay="1"
geom_mirror_load="YES"
hw.em.rxd=4096
hw.em.txd=4096
hw.igb.rxd=4096
hw.igb.txd=4096
ichsmb_load="YES"
if_igb_load="YES"
net.isr.maxthreads=4
 
When the server load of about 50% of the peak
Code:
# top -aSCHIP
last pid: 80248;  load averages:  0.03,  0.01,  0.00                                                                                                                                                                                                  up 20+13:52:39  15:00:20
203 processes: 11 running, 136 sleeping, 56 waiting
CPU 0:  0.0% user,  0.0% nice,  0.5% system, 19.2% interrupt, 80.3% idle
CPU 1:  0.0% user,  0.0% nice,  1.4% system, 23.6% interrupt, 75.0% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system, 35.1% interrupt, 64.9% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system, 13.5% interrupt, 86.5% idle
CPU 4:  0.0% user,  0.0% nice,  0.5% system, 16.9% interrupt, 82.6% idle
CPU 5:  0.0% user,  0.0% nice,  7.2% system, 25.5% interrupt, 67.3% idle
CPU 6:  0.0% user,  0.0% nice,  0.5% system, 31.3% interrupt, 68.3% idle
CPU 7:  0.0% user,  0.0% nice,  0.5% system, 21.7% interrupt, 77.8% idle
Mem: 49M Active, 18M Inact, 590M Wired, 1072K Cache, 90M Buf, 5251M Free
Swap:

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    CPU COMMAND
   11 root     171 ki31     0K   128K CPU6    6 408.4H 84.67% {idle: cpu6}
   11 root     171 ki31     0K   128K CPU3    3 407.4H 84.57% {idle: cpu3}
   11 root     171 ki31     0K   128K CPU7    7 411.3H 83.69% {idle: cpu7}
   11 root     171 ki31     0K   128K RUN     0 403.4H 80.47% {idle: cpu0}
   11 root     171 ki31     0K   128K RUN     4 403.4H 80.08% {idle: cpu4}
   11 root     171 ki31     0K   128K CPU5    5 401.8H 79.98% {idle: cpu5}
   11 root     171 ki31     0K   128K RUN     1 404.9H 76.56% {idle: cpu1}
   11 root     171 ki31     0K   128K CPU2    2 408.9H 62.70% {idle: cpu2}
   12 root     -68    -     0K   928K WAIT    2  22.7H 23.00% {irq266: igb1:que}
   12 root     -68    -     0K   928K WAIT    5  21.5H  8.59% {irq261: igb0:que}
   12 root     -68    -     0K   928K WAIT    6 795:18  8.25% {irq278: igb2:que}
   12 root     -68    -     0K   928K WAIT    4  22.3H  7.57% {irq268: igb1:que}
   12 root     -68    -     0K   928K WAIT    2  21.3H  6.69% {irq258: igb0:que}
   12 root     -68    -     0K   928K WAIT    7  20.6H  6.69% {irq263: igb0:que}
   12 root     -68    -     0K   928K WAIT    0  19.7H  5.96% {irq288: igb3:que}
   12 root     -68    -     0K   928K WAIT    2  19.2H  5.96% {irq290: igb3:que}
   12 root     -68    -     0K   928K WAIT    5  19.7H  5.66% {irq285: igb3:que}
   12 root     -68    -     0K   928K WAIT    1  19.5H  5.66% {irq289: igb3:que}
   12 root     -68    -     0K   928K WAIT    3  19.3H  5.66% {irq283: igb3:que}
   12 root     -68    -     0K   928K WAIT    3  22.3H  5.37% {irq259: igb0:que}
   12 root     -68    -     0K   928K WAIT    7  20.8H  5.37% {irq287: igb3:que}
   12 root     -68    -     0K   928K WAIT    5 829:49  5.37% {irq277: igb2:que}
   12 root     -68    -     0K   928K RUN     1  20.4H  5.27% {irq257: igb0:que}
   12 root     -68    -     0K   928K WAIT    7  22.7H  5.08% {irq271: igb1:que}
   12 root     -68    -     0K   928K WAIT    4  20.0H  5.08% {irq260: igb0:que}
   12 root     -68    -     0K   928K WAIT    6  20.1H  4.98% {irq262: igb0:que}
   12 root     -68    -     0K   928K WAIT    2 827:05  4.98% {irq274: igb2:que}
   12 root     -68    -     0K   928K WAIT    3  25.0H  4.30% {irq267: igb1:que}
   12 root     -68    -     0K   928K CPU1    1  23.0H  4.20% {irq265: igb1:que}
   12 root     -68    -     0K   928K WAIT    4  19.6H  4.20% {irq284: igb3:que}
   12 root     -68    -     0K   928K WAIT    0  21.1H  4.05% {irq272: igb1:que}
   12 root     -68    -     0K   928K WAIT    5  25.3H  3.86% {irq269: igb1:que}
   12 root     -68    -     0K   928K WAIT    6  24.5H  3.86% {irq270: igb1:que}
   12 root     -68    -     0K   928K WAIT    0  17.0H  3.76% {irq280: igb2:que}
   12 root     -68    -     0K   928K WAIT    3 871:20  3.76% {irq275: igb2:que}
   12 root     -68    -     0K   928K WAIT    6  20.3H  3.37% {irq286: igb3:que}
   12 root     -68    -     0K   928K WAIT    0  18.6H  3.17% {irq256: igb0:que}
   12 root     -68    -     0K   928K WAIT    7 813:00  2.49% {irq279: igb2:que}
   12 root     -68    -     0K   928K WAIT    4 838:55  2.10% {irq276: igb2:que}
   12 root     -68    -     0K   928K WAIT    1 812:48  1.56% {irq281: igb2:que}
    0 root     -68    0     0K   624K -       6  88:51  0.68% {igb3 que}
    0 root     -68    0     0K   624K -       5  63:59  0.49% {igb2 que}
    0 root     -68    0     0K   624K -       4 107:20  0.29% {igb3 que}
    0 root     -68    0     0K   624K -       4 103:32  0.29% {igb3 que}
    0 root     -68    0     0K   624K -       7 105:23  0.10% {igb3 que}
    0 root     -68    0     0K   624K -       6  94:42  0.10% {igb3 que}
Code:
...
  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        8 171 ki31     0K   128K RUN     0 3250.4 652.59% [idle]
   12 root       58 -60    -     0K   928K WAIT    0 626.1H 154.54% [intr]
    0 root       40 -68    0     0K   624K CPU0    0  51.2H  0.20% [kernel]
...
Code:
# sysctl dev.igb.0 | grep -v ": 0"
dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.0.7
dev.igb.0.%driver: igb
dev.igb.0.%location: slot=0 function=0
dev.igb.0.%pnpinfo: vendor=0x8086 device=0x10c9 subvendor=0x15d9 subdevice=0x0600 class=0x020000
dev.igb.0.%parent: pci1
dev.igb.0.nvm: -1
dev.igb.0.flow_control: 3
dev.igb.0.enable_aim: 1
dev.igb.0.rx_processing_limit: 100
dev.igb.0.link_irq: 2
dev.igb.0.device_control: 1087373889
dev.igb.0.rx_control: 67141634
dev.igb.0.interrupt_mask: 4
dev.igb.0.extended_int_mask: 2147484031
dev.igb.0.fc_high_water: 58976
dev.igb.0.fc_low_water: 58960
dev.igb.0.queue0.interrupt_rate: 5319
dev.igb.0.queue0.txd_head: 2244
dev.igb.0.queue0.txd_tail: 2244
dev.igb.0.queue0.tx_packets: 4327003233
dev.igb.0.queue0.rxd_head: 737
dev.igb.0.queue0.rxd_tail: 736
dev.igb.0.queue0.rx_packets: 3322581729
dev.igb.0.queue0.rx_bytes: 1742207752340
dev.igb.0.queue1.interrupt_rate: 11494
dev.igb.0.queue1.txd_head: 2076
dev.igb.0.queue1.txd_tail: 2076
dev.igb.0.queue1.tx_packets: 4375544846
dev.igb.0.queue1.rxd_head: 83
dev.igb.0.queue1.rxd_tail: 82
dev.igb.0.queue1.rx_packets: 3684675667
dev.igb.0.queue1.rx_bytes: 2139783241314
dev.igb.0.queue2.interrupt_rate: 5586
dev.igb.0.queue2.txd_head: 2566
dev.igb.0.queue2.txd_tail: 2566
dev.igb.0.queue2.tx_packets: 4432057603
dev.igb.0.queue2.rxd_head: 492
dev.igb.0.queue2.rxd_tail: 491
dev.igb.0.queue2.rx_packets: 3827790316
dev.igb.0.queue2.rx_bytes: 1948270249183
dev.igb.0.queue3.interrupt_rate: 5208
dev.igb.0.queue3.txd_head: 3848
dev.igb.0.queue3.txd_tail: 3848
dev.igb.0.queue3.tx_packets: 4212193156
dev.igb.0.queue3.rxd_head: 2500
dev.igb.0.queue3.rxd_tail: 2499
dev.igb.0.queue3.rx_packets: 3914865092
dev.igb.0.queue3.rx_bytes: 2270280023651
dev.igb.0.queue4.interrupt_rate: 100000
dev.igb.0.queue4.txd_head: 2100
dev.igb.0.queue4.txd_tail: 2100
dev.igb.0.queue4.tx_packets: 4542669846
dev.igb.0.queue4.rxd_head: 129
dev.igb.0.queue4.rxd_tail: 128
dev.igb.0.queue4.rx_packets: 3466580097
dev.igb.0.queue4.rx_bytes: 1852777506148
dev.igb.0.queue5.interrupt_rate: 90909
dev.igb.0.queue5.txd_head: 2552
dev.igb.0.queue5.txd_tail: 2552
dev.igb.0.queue5.tx_packets: 4423941372
dev.igb.0.queue5.rxd_head: 1737
dev.igb.0.queue5.rxd_tail: 1736
dev.igb.0.queue5.rx_packets: 3792197321
dev.igb.0.queue5.rx_bytes: 1911416002424
dev.igb.0.queue6.interrupt_rate: 111111
dev.igb.0.queue6.txd_head: 2374
dev.igb.0.queue6.txd_tail: 2374
dev.igb.0.queue6.tx_packets: 4245675167
dev.igb.0.queue6.rxd_head: 3094
dev.igb.0.queue6.rxd_tail: 3093
dev.igb.0.queue6.rx_packets: 3512654870
dev.igb.0.queue6.rx_bytes: 1869429177084
dev.igb.0.queue7.interrupt_rate: 5208
dev.igb.0.queue7.txd_head: 2100
dev.igb.0.queue7.txd_tail: 2100
dev.igb.0.queue7.tx_packets: 3924184087
dev.igb.0.queue7.rxd_head: 352
dev.igb.0.queue7.rxd_tail: 351
dev.igb.0.queue7.rx_packets: 3617440096
dev.igb.0.queue7.rx_bytes: 1907168851481
dev.igb.0.mac_stats.recv_oversize: 10830
dev.igb.0.mac_stats.total_pkts_recvd: 29181026617
dev.igb.0.mac_stats.good_pkts_recvd: 29138780487
dev.igb.0.mac_stats.bcast_pkts_recvd: 67669057
dev.igb.0.mac_stats.mcast_pkts_recvd: 718291
dev.igb.0.mac_stats.rx_frames_64: 17
dev.igb.0.mac_stats.rx_frames_65_127: 17123765384
dev.igb.0.mac_stats.rx_frames_128_255: 1306847068
dev.igb.0.mac_stats.rx_frames_256_511: 615261191
dev.igb.0.mac_stats.rx_frames_512_1023: 956901512
dev.igb.0.mac_stats.rx_frames_1024_1522: 9136005315
dev.igb.0.mac_stats.good_octets_recvd: 15874440363955
dev.igb.0.mac_stats.good_octets_txd: 33722994792393
dev.igb.0.mac_stats.total_pkts_txd: 34483276811
dev.igb.0.mac_stats.good_pkts_txd: 34483276811
dev.igb.0.mac_stats.bcast_pkts_txd: 3454074
dev.igb.0.mac_stats.mcast_pkts_txd: 11850
dev.igb.0.mac_stats.tx_frames_64: 2261100453
dev.igb.0.mac_stats.tx_frames_65_127: 7586914868
dev.igb.0.mac_stats.tx_frames_128_255: 1580078107
dev.igb.0.mac_stats.tx_frames_256_511: 806007997
dev.igb.0.mac_stats.tx_frames_512_1023: 1097234517
dev.igb.0.mac_stats.tx_frames_1024_1522: 21151940869
dev.igb.0.interrupts.asserts: 41140283749
dev.igb.0.interrupts.rx_pkt_timer: 29138395477
dev.igb.0.interrupts.tx_abs_timer: 29138780479
dev.igb.0.interrupts.tx_queue_empty: 34483069509
dev.igb.0.host.rx_pkt: 385011
dev.igb.0.host.tx_good_pkt: 195453
dev.igb.0.host.rx_good_bytes: 15874440627247
dev.igb.0.host.tx_good_bytes: 33722990898052
dev.igb.0.host.length_errors: 37370
Code:
# netstat -w 1
            input        (Total)           output
   packets  errs idrops      bytes    packets  errs      bytes colls
    181200     0     0  139645485     180886     0  140481650     0
    174278     0     0  130387413     174772     0  131349972     0
    178319     0     0  134284915     178679     0  135237713     0
 
Hmm...
Increase dev.igb.x.rx_processing_limit (to 4096) and hw.igb.max_interrupt_rate (to 30 000).
Remove from kernel options FLOWTABLE and POLLING.
May be this helps.
Try to profile kernel time by pmcstat(8).
 
Back
Top