Dear friends,
We use FreeBSD (10.3-RELEASE-p26) to route about 3.5Gb of traffic (maximum, input + output on main interface) at about ~ 1Mpps (maximum, in + out). The router hardware is:
- 2 x 2.0GHz CPU with 8 cores each (HT disabled);
- Intel 82599ES NIC.
We often use iftop and tcpdump utilities for troubleshooting. Two weeks ago I restructured ipfw rules to get maximum performance from the server. The maximum CPU usage lowered from 80 to 20 percents. But after the optimization we found that the libpcap utilities might hang the server for about 1-2 minutes. I noticed, that during this time CPU usage is 100%, below is example:
The issue is not 100% reproducible. It occurs sometimes, I think only during peak hours. The following commands could lead to this problem:
At peak hours the CPU usage is:
The system doesn't log any warning messages during the lock state. We didn't use to have any issues with the libpcap before the ipfw optimizations.
Below is some system statistics:
Any help would be appreciated! Thanks.
We use FreeBSD (10.3-RELEASE-p26) to route about 3.5Gb of traffic (maximum, input + output on main interface) at about ~ 1Mpps (maximum, in + out). The router hardware is:
- 2 x 2.0GHz CPU with 8 cores each (HT disabled);
- Intel 82599ES NIC.
We often use iftop and tcpdump utilities for troubleshooting. Two weeks ago I restructured ipfw rules to get maximum performance from the server. The maximum CPU usage lowered from 80 to 20 percents. But after the optimization we found that the libpcap utilities might hang the server for about 1-2 minutes. I noticed, that during this time CPU usage is 100%, below is example:
Code:
---------------
last pid: 2317; load averages: 50.31, 25.49, 12.85 up 9+10:50:12 16:35:25
261 processes: 31 running, 139 sleeping, 91 waiting
CPU: 0.0% user, 0.0% nice, 0.0% system, 99.8% interrupt, 0.1% idle
Mem: 13M Active, 7367M Inact, 1453M Wired, 1949M Buf, 7023M Free
Swap: 32G Total, 32G Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12 root -92 - 0K 1552K WAIT 6 37.8H 100.00% intr{irq309: ix0:q6}
12 root -92 - 0K 1552K CPU5 5 37.4H 100.00% intr{irq308: ix0:q5}
12 root -92 - 0K 1552K WAIT 4 36.9H 100.00% intr{irq307: ix0:q4}
12 root -92 - 0K 1552K WAIT 3 36.8H 100.00% intr{irq306: ix0:q3}
12 root -92 - 0K 1552K WAIT 1 36.5H 100.00% intr{irq304: ix0:q1}
12 root -92 - 0K 1552K WAIT 7 36.2H 100.00% intr{irq310: ix0:q7}
12 root -92 - 0K 1552K CPU8 8 36.0H 100.00% intr{irq311: ix0:q8}
12 root -92 - 0K 1552K WAIT 2 35.9H 100.00% intr{irq305: ix0:q2}
12 root -92 - 0K 1552K CPU9 9 35.8H 100.00% intr{irq312: ix0:q9}
12 root -92 - 0K 1552K WAIT 10 35.7H 100.00% intr{irq313: ix0:q10}
12 root -92 - 0K 1552K CPU13 13 35.7H 100.00% intr{irq316: ix0:q13}
12 root -92 - 0K 1552K WAIT 15 35.5H 100.00% intr{irq318: ix0:q15}
12 root -92 - 0K 1552K CPU12 12 35.5H 100.00% intr{irq315: ix0:q12}
12 root -92 - 0K 1552K WAIT 11 35.4H 100.00% intr{irq314: ix0:q11}
12 root -92 - 0K 1552K WAIT 14 35.3H 100.00% intr{irq317: ix0:q14}
12 root -92 - 0K 1552K WAIT 0 36.5H 99.27% intr{irq303: ix0:q0}
0 root -92 - 0K 1232K - 3 570:16 2.59% kernel{dummynet}
5034 root 21 0 36092K 6088K CPU6 6 1:08 0.78% zebra
-------------------
Code:
iftop -n -i ix0 -f 'host 8.8.8.8'
tcpdump -n -i ix0 -c 10000 -w /tmp/test.pcap # just 10k packets
At peak hours the CPU usage is:
Code:
--------------
last pid: 96942; load averages: 2.54, 2.78, 2.86 up 1+08:41:17 13:37:09
247 processes: 19 running, 133 sleeping, 95 waiting
CPU: 0.0% user, 0.0% nice, 0.5% system, 18.4% interrupt, 81.1% idle
Mem: 8244K Active, 67M Inact, 1079M Wired, 5344K Cache, 1722M Buf, 14G Free
Swap: 32G Total, 32G Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12 root -92 - 0K 1552K WAIT 1 319:33 18.90% intr{irq304: ix0:q1}
12 root -92 - 0K 1552K WAIT 2 321:07 18.80% intr{irq305: ix0:q2}
12 root -92 - 0K 1552K WAIT 10 307:41 18.80% intr{irq313: ix0:q10}
12 root -92 - 0K 1552K WAIT 13 317:14 18.65% intr{irq316: ix0:q13}
12 root -92 - 0K 1552K WAIT 8 306:12 18.16% intr{irq311: ix0:q8}
12 root -92 - 0K 1552K CPU14 14 308:14 17.29% intr{irq317: ix0:q14}
12 root -92 - 0K 1552K WAIT 0 320:28 17.09% intr{irq303: ix0:q0}
12 root -92 - 0K 1552K WAIT 3 307:38 17.09% intr{irq306: ix0:q3}
12 root -92 - 0K 1552K WAIT 7 301:48 17.09% intr{irq310: ix0:q7}
12 root -92 - 0K 1552K WAIT 5 318:14 16.99% intr{irq308: ix0:q5}
12 root -92 - 0K 1552K WAIT 6 314:38 16.99% intr{irq309: ix0:q6}
12 root -92 - 0K 1552K WAIT 12 299:17 16.99% intr{irq315: ix0:q12}
12 root -92 - 0K 1552K WAIT 11 300:40 16.80% intr{irq314: ix0:q11}
12 root -92 - 0K 1552K WAIT 9 296:23 16.80% intr{irq312: ix0:q9}
12 root -92 - 0K 1552K WAIT 15 307:43 16.46% intr{irq318: ix0:q15}
12 root -92 - 0K 1552K CPU4 4 316:50 15.19% intr{irq307: ix0:q4}
----------------------
The system doesn't log any warning messages during the lock state. We didn't use to have any issues with the libpcap before the ipfw optimizations.
Below is some system statistics:
Code:
# vmstat -i
interrupt total rate
irq9: acpi0 2 0
irq16: ehci0 177026 1
irq23: ehci1 43018 0
cpu0:timer 133183611 1126
irq264: isci0 1 0
irq302: ahci0 176460 1
irq303: ix0:q0 1863842791 15767
irq304: ix0:q1 1903054655 16098
irq305: ix0:q2 1882301512 15923
irq306: ix0:q3 1865735523 15783
irq307: ix0:q4 1886061578 15955
irq308: ix0:q5 1884457783 15941
irq309: ix0:q6 1861739489 15749
irq310: ix0:q7 1834255721 15516
irq311: ix0:q8 1855165357 15693
irq312: ix0:q9 1847769541 15631
irq313: ix0:q10 1857979208 15717
irq314: ix0:q11 1858923244 15725
irq315: ix0:q12 1850654141 15655
irq316: ix0:q13 1898652905 16061
irq317: ix0:q14 1877714212 15884
irq318: ix0:q15 1895406114 16034
irq319: ix0:link 3 0
cpu12:timer 133176550 1126
cpu8:timer 133176445 1126
cpu3:timer 133175506 1126
cpu9:timer 133175554 1126
cpu4:timer 133171377 1126
cpu11:timer 133174188 1126
cpu5:timer 133171832 1126
cpu14:timer 133173391 1126
cpu1:timer 133172305 1126
cpu10:timer 133173320 1126
cpu6:timer 133171610 1126
cpu15:timer 133173242 1126
cpu2:timer 133175370 1126
cpu13:timer 133176789 1126
cpu7:timer 133176319 1126
Total 32054907693 271169
# vmstat -z | egrep 'REQ|mbuf'
ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
mbuf_packet: 256, 6494460, 65801, 21737,23183069918, 0, 0
mbuf: 256, 6494460, 374, 22398,20909643684, 0, 0
mbuf_cluster: 2048, 1014758, 87538, 226, 87538, 0, 0
mbuf_jumbo_page: 4096, 507379, 0, 188, 883, 0, 0
mbuf_jumbo_9k: 9216, 150334, 0, 0, 0, 0, 0
mbuf_jumbo_16k: 16384, 84563, 0, 0, 0, 0, 0
mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0, 0
Any help would be appreciated! Thanks.