Hello,
I have a Dual E5-2650, and when dealing with small packets, lets say :
1.500.000 PPS up to 2.000.000+ PPS
The system starts really using alot of CPU, but before i had a Dual E5630 (old cpu compared to those ones w/ 8 Cores each and 16 threads), and the performance didn't changed at all after i replaced the CPU's by better ones.
Anybody have any tip on how this could be solved ?
I did a small system profiling and found the bottlenecks in the kernel which are :
I must add that :
1) I turned off interrupt moderation, otherwise i was being much more limited.
2) Followed Tips from here : http://wiki.freebsd.org/NetworkPerformanceTuning
But still, take a look :
1.250.000 PPS+
Here is the vmstat output :
netisr stats :
I have a Dual E5-2650, and when dealing with small packets, lets say :
1.500.000 PPS up to 2.000.000+ PPS
The system starts really using alot of CPU, but before i had a Dual E5630 (old cpu compared to those ones w/ 8 Cores each and 16 threads), and the performance didn't changed at all after i replaced the CPU's by better ones.
Anybody have any tip on how this could be solved ?
I did a small system profiling and found the bottlenecks in the kernel which are :
Code:
<spontaneous>
[1] 56.2 0.00 21.62 taskqueue_thread_loop [1]
0.01 19.34 1216937/1216937 taskqueue_run_locked [2]
0.03 2.24 1216937/1216937 msleep_spin [26]
-----------------------------------------------
0.01 19.34 1216937/1216937 taskqueue_thread_loop [1]
[2] 50.3 0.01 19.34 1216937 taskqueue_run_locked [2]
0.90 17.75 1068049/1068049 lem_handle_rxtx [3]
0.00 0.32 1216937/1217196 wakeup [63]
0.32 0.00 1216937/45219536 spinlock_exit <cycle 1> [6]
0.00 0.04 148888/148888 dummynet_task [103]
0.00 0.00 1216937/21067380 spinlock_enter [87]
-----------------------------------------------
0.90 17.75 1068049/1068049 taskqueue_run_locked [2]
[3] 48.4 0.90 17.75 1068049 lem_handle_rxtx [3]
1.38 12.59 8877441/8877441 ether_input [4]
0.25 3.50 8877441/8877441 lem_get_buf [19]
0.02 0.00 1068049/1068049 lem_enable_intr [111]
0.01 0.00 1068049/1068049 lem_txeof [118]
0.00 0.00 4/854 _mtx_lock_sleep [382]
0.00 0.00 2/543 lem_start_locked [416]
-----------------------------------------------
1.38 12.59 8877441/8877441 lem_handle_rxtx [3]
[4] 36.3 1.38 12.59 8877441 ether_input [4]
0.21 11.32 8877441/8877441 ether_demux [7]
0.59 0.00 8877441/8878778 bcmp [52]
0.23 0.18 8877441/8877459 random_harvest [60]
0.06 0.00 8877441/8877441 mac_ifnet_create_mbuf [98]
-----------------------------------------------
[5] 30.6 11.76 0.04 45219536+31167706 <cycle 1 as a whole> [5]
11.56 0.00 21067380 spinlock_exit <cycle 1> [6]
0.19 0.00 41152830 critical_exit <cycle 1> [79]
0.00 0.03 6035741 _thread_lock_flags <cycle 1> [107]
0.00 0.01 2761692 sched_switch <cycle 1> [121]
0.00 0.00 7175 tdq_lock_pair <cycle 1> [319]
0.00 0.00 4498 _mtx_lock_spin <cycle 1> [428]
0.00 0.00 2761692 mi_switch <cycle 1> [800]
0.00 0.00 2596234 thread_lock_block <cycle 1> [802]
I must add that :
1) I turned off interrupt moderation, otherwise i was being much more limited.
2) Followed Tips from here : http://wiki.freebsd.org/NetworkPerformanceTuning
But still, take a look :
1.250.000 PPS+
Code:
last pid: 18102; load averages: 3.30, 2.75, 1.48 up 0+15:00:32 10:33:17
43 processes: 2 running, 39 sleeping, 1 zombie, 1 waiting
CPU 0: 0.0% user, 0.0% nice, 0.0% system, 76.4% interrupt, 23.6% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 76.3% interrupt, 23.7% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 74.8% interrupt, 25.2% idle
CPU 3: 0.0% user, 0.0% nice, 0.0% system, 76.7% interrupt, 23.3% idle
Here is the vmstat output :
Code:
irq276: ix0:que 0 203917287 3584
irq277: ix0:que 1 198976921 3497
irq278: ix0:que 2 198092556 3482
irq279: ix0:que 3 218340699 3837
netisr stats :
Code:
Configuration:
Setting Value Maximum
Thread count 1 1
Default queue limit 256 10240
Direct dispatch enabled n/a
Forced direct dispatch enabled n/a
Threads bound to CPUs disabled n/a
Protocols:
Name Proto QLimit Policy Flags
ip 1 256 flow ---
igmp 2 256 source ---
rtsock 3 4096 source ---
arp 7 256 source ---
ip6 10 256 flow ---
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 2 1992536144 0 0 84 1992536228
igmp 0 0 0 0 0 0 0
rtsock 0 1 0 0 0 1340 1340
arp 0 0 3956 0 0 0 3956
ip6 0 0 0 0 0 0 0