Packet capture using libpcap sometimes results in 100% CPU interrupts

Dear friends,

We use FreeBSD (10.3-RELEASE-p26) to route about 3.5Gb of traffic (maximum, input + output on main interface) at about ~ 1Mpps (maximum, in + out). The router hardware is:

- 2 x 2.0GHz CPU with 8 cores each (HT disabled);
- Intel 82599ES NIC.


We often use iftop and tcpdump utilities for troubleshooting. Two weeks ago I restructured ipfw rules to get maximum performance from the server. The maximum CPU usage lowered from 80 to 20 percents. But after the optimization we found that the libpcap utilities might hang the server for about 1-2 minutes. I noticed, that during this time CPU usage is 100%, below is example:
Code:
---------------
last pid:  2317;  load averages: 50.31, 25.49, 12.85  up 9+10:50:12    16:35:25
261 processes: 31 running, 139 sleeping, 91 waiting
CPU:  0.0% user,  0.0% nice,  0.0% system, 99.8% interrupt,  0.1% idle
Mem: 13M Active, 7367M Inact, 1453M Wired, 1949M Buf, 7023M Free
Swap: 32G Total, 32G Free

  PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   12 root          -92    -     0K  1552K WAIT    6  37.8H 100.00% intr{irq309: ix0:q6}
   12 root          -92    -     0K  1552K CPU5    5  37.4H 100.00% intr{irq308: ix0:q5}
   12 root          -92    -     0K  1552K WAIT    4  36.9H 100.00% intr{irq307: ix0:q4}
   12 root          -92    -     0K  1552K WAIT    3  36.8H 100.00% intr{irq306: ix0:q3}
   12 root          -92    -     0K  1552K WAIT    1  36.5H 100.00% intr{irq304: ix0:q1}
   12 root          -92    -     0K  1552K WAIT    7  36.2H 100.00% intr{irq310: ix0:q7}
   12 root          -92    -     0K  1552K CPU8    8  36.0H 100.00% intr{irq311: ix0:q8}
   12 root          -92    -     0K  1552K WAIT    2  35.9H 100.00% intr{irq305: ix0:q2}
   12 root          -92    -     0K  1552K CPU9    9  35.8H 100.00% intr{irq312: ix0:q9}
   12 root          -92    -     0K  1552K WAIT   10  35.7H 100.00% intr{irq313: ix0:q10}
   12 root          -92    -     0K  1552K CPU13  13  35.7H 100.00% intr{irq316: ix0:q13}
   12 root          -92    -     0K  1552K WAIT   15  35.5H 100.00% intr{irq318: ix0:q15}
   12 root          -92    -     0K  1552K CPU12  12  35.5H 100.00% intr{irq315: ix0:q12}
   12 root          -92    -     0K  1552K WAIT   11  35.4H 100.00% intr{irq314: ix0:q11}
   12 root          -92    -     0K  1552K WAIT   14  35.3H 100.00% intr{irq317: ix0:q14}
   12 root          -92    -     0K  1552K WAIT    0  36.5H  99.27% intr{irq303: ix0:q0}
    0 root          -92    -     0K  1232K -       3 570:16   2.59% kernel{dummynet}
5034 root           21    0 36092K  6088K CPU6    6   1:08   0.78% zebra
-------------------
The issue is not 100% reproducible. It occurs sometimes, I think only during peak hours. The following commands could lead to this problem:
Code:
iftop -n -i ix0 -f 'host 8.8.8.8'
tcpdump -n -i ix0 -c 10000 -w /tmp/test.pcap # just 10k packets

At peak hours the CPU usage is:
Code:
--------------
last pid: 96942;  load averages:  2.54,  2.78,  2.86                                                                          up 1+08:41:17  13:37:09
247 processes: 19 running, 133 sleeping, 95 waiting
CPU:  0.0% user,  0.0% nice,  0.5% system, 18.4% interrupt, 81.1% idle
Mem: 8244K Active, 67M Inact, 1079M Wired, 5344K Cache, 1722M Buf, 14G Free
Swap: 32G Total, 32G Free

  PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   12 root          -92    -     0K  1552K WAIT    1 319:33  18.90% intr{irq304: ix0:q1}
   12 root          -92    -     0K  1552K WAIT    2 321:07  18.80% intr{irq305: ix0:q2}
   12 root          -92    -     0K  1552K WAIT   10 307:41  18.80% intr{irq313: ix0:q10}
   12 root          -92    -     0K  1552K WAIT   13 317:14  18.65% intr{irq316: ix0:q13}
   12 root          -92    -     0K  1552K WAIT    8 306:12  18.16% intr{irq311: ix0:q8}
   12 root          -92    -     0K  1552K CPU14  14 308:14  17.29% intr{irq317: ix0:q14}
   12 root          -92    -     0K  1552K WAIT    0 320:28  17.09% intr{irq303: ix0:q0}
   12 root          -92    -     0K  1552K WAIT    3 307:38  17.09% intr{irq306: ix0:q3}
   12 root          -92    -     0K  1552K WAIT    7 301:48  17.09% intr{irq310: ix0:q7}
   12 root          -92    -     0K  1552K WAIT    5 318:14  16.99% intr{irq308: ix0:q5}
   12 root          -92    -     0K  1552K WAIT    6 314:38  16.99% intr{irq309: ix0:q6}
   12 root          -92    -     0K  1552K WAIT   12 299:17  16.99% intr{irq315: ix0:q12}
   12 root          -92    -     0K  1552K WAIT   11 300:40  16.80% intr{irq314: ix0:q11}
   12 root          -92    -     0K  1552K WAIT    9 296:23  16.80% intr{irq312: ix0:q9}
   12 root          -92    -     0K  1552K WAIT   15 307:43  16.46% intr{irq318: ix0:q15}
   12 root          -92    -     0K  1552K CPU4    4 316:50  15.19% intr{irq307: ix0:q4}
----------------------

The system doesn't log any warning messages during the lock state. We didn't use to have any issues with the libpcap before the ipfw optimizations.

Below is some system statistics:
Code:
# vmstat -i
interrupt                          total       rate
irq9: acpi0                            2          0
irq16: ehci0                      177026          1
irq23: ehci1                       43018          0
cpu0:timer                     133183611       1126
irq264: isci0                          1          0
irq302: ahci0                     176460          1
irq303: ix0:q0                1863842791      15767
irq304: ix0:q1                1903054655      16098
irq305: ix0:q2                1882301512      15923
irq306: ix0:q3                1865735523      15783
irq307: ix0:q4                1886061578      15955
irq308: ix0:q5                1884457783      15941
irq309: ix0:q6                1861739489      15749
irq310: ix0:q7                1834255721      15516
irq311: ix0:q8                1855165357      15693
irq312: ix0:q9                1847769541      15631
irq313: ix0:q10               1857979208      15717
irq314: ix0:q11               1858923244      15725
irq315: ix0:q12               1850654141      15655
irq316: ix0:q13               1898652905      16061
irq317: ix0:q14               1877714212      15884
irq318: ix0:q15               1895406114      16034
irq319: ix0:link                       3          0
cpu12:timer                    133176550       1126
cpu8:timer                     133176445       1126
cpu3:timer                     133175506       1126
cpu9:timer                     133175554       1126
cpu4:timer                     133171377       1126
cpu11:timer                    133174188       1126
cpu5:timer                     133171832       1126
cpu14:timer                    133173391       1126
cpu1:timer                     133172305       1126
cpu10:timer                    133173320       1126
cpu6:timer                     133171610       1126
cpu15:timer                    133173242       1126
cpu2:timer                     133175370       1126
cpu13:timer                    133176789       1126
cpu7:timer                     133176319       1126
Total                        32054907693     271169


# vmstat -z | egrep 'REQ|mbuf'
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
mbuf_packet:            256, 6494460,   65801,   21737,23183069918,   0,   0
mbuf:                   256, 6494460,     374,   22398,20909643684,   0,   0
mbuf_cluster:          2048, 1014758,   87538,     226,   87538,   0,   0
mbuf_jumbo_page:       4096, 507379,       0,     188,     883,   0,   0
mbuf_jumbo_9k:         9216, 150334,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384,  84563,       0,       0,       0,   0,   0
mbuf_ext_refcnt:          4,      0,       0,       0,       0,   0,   0

Any help would be appreciated! Thanks.
 
Note that FreeBSD 10.3 has been End-of-Life since April 2018 and is not supported any more.

That's right. We're forced to use 10.3 version for some time. We might have faced a well-known architectural issue/limitation not related to operating system's version, so I hope to receive some comments from experts. Thanks.
 
Intel 10G interfaces have faced interrupt storm issue for a while.

The Calomet network tuning guide is one of the best. See the tweaks there for your adapter.
https://calomel.org/freebsd_network_tuning.html

Note this:

This is a common sentiment on most FreeBSD high speed network tuning sites.
Mellanox is the second choice.


Actually, there is a perfect presentation about FreeBSD routing by Olivier Cochard-Labbé (https://people.freebsd.org/~olivier...FreeBSD_for_routing_and_firewalling-Paper.pdf).

The server has already been tuned and we don't experience interrupt storms during peak hours (CPU usage is pretty low, about 20%). But when we launch libpcap utilities we could get one. For example, if I gather 10k packets using tcpdump (it lasts 10ms at 1Mpps), the server could hang for 1-2 minutes.

Thanks.
 
Back
Top