Maximize Interface Performance

jrt03 · Mar 22, 2012

I'm currently running 8.2, IPFW, and our internal libpcap logging tool, on two different hardware flavors for our gateway firewalls (inline bridge). Our lower end unit is an Intel Dual Core 2.7Ghz proc with onboard gigabit Broadcom (bce) NICS. Our higher end units have an Intel Quad Core Xeon 3.0Ghz and Intel Pro Server gigabit (igb) NICS.

We've just started running our libpcap logging tool, which logs interesting packet flows traversing across the bridge and I'm looking to tune these for maximum performance and had a few questions:

1) Should I be using polling on either flavor or are both of these CPU's plenty capable of keeping up with interrupts?

2) If our libpcap logging tool is passively sniffing the bridge interface, is it able to cause any sort of packet loss due to poor performance. From my understanding it's possible the logging tool would saturate CPU usage, causing the kernel to drop packets? In this case, is there a way I can ensure the logging tool runs on dedicated to a specific core in order to avoid situations like this?

3) Will I see any considerable performance impact by offloading checksums on the Intel NICS?

I'm interested to hear if anyone sees any major red flags or must have tunables that would seriously impact performance in a situation like this

Thanks!

ecazamir · Mar 23, 2012

To reduce the impact on the network performance, I suggest using a switch configured with monitor ports, to duplicate the traffic to a monitoring machine. This machine should have the sniffing interface configured in monitor mode.
from ifconfig(8):

Code:

     monitor
             Put the interface in monitor mode.  No packets are transmitted,
             and received packets are discarded after bpf(4) processing.

This way, having a high CPU load on the nomitoring machine will not have any impact on the network performance.

jrt03 · Mar 23, 2012

ecazamir,

There's no doubt that would reduce impact on network performance, but unfortunately, that is not an option.

ondra_knezour · Mar 23, 2012

Both mentioned CPUs should be capable to handle gigabit traffic fine, others depends on demands of your logging tool.

Broadcom driver supports interrupt coalescing and Intels adaptive interrupt moderation, which is in fact the same, consider it NIC managed pooling - CPU is interrupted once for more packets in row. Doesn't affect latency on low traffic, but can reduce interrupt storms on higher one.

From my experience NIC offloading (defragmentation, checksums) on better cards (Intel and Broadcom included) can really improve throughtput.

To reduce copy overhead on packet sniffing, check if you can use non default zero-copy buffer mode on bpf(4) device.

Search SMP/CPU affinity for process binding to given CPU core.

For some tunning tips, see https://calomel.org/network_performance.html and http://www.psc.edu/networking/projects/tcptune/.

AFAIK you can also just duplicate packets fom bpf or make netflow stream to another machine where you can do some processing/sniffing on them.

Orum · Apr 1, 2012

I'll echo what others have said: use good hardware. Bells and whistles like tx/rxsum, interrupt coalescing, etc., can have large impacts on performance.

Apart from that, for TCP performance, you'll definitely want to up the window size for high throughput TCP apps. Also verify various other sysctls which can have a large impact on performance, typically starting with net.inet.tcp.rfc*. There are other things you can play with like MTU if your network supports jumbo frames, but YMMV.

RusDyr · Apr 3, 2012

libpcap definitly is the bottleneck. See alternatives, especially on ringmap.

jrt03 · Apr 5, 2012

Has anyone seen any major performance increase in 8.2 or 9 with net.bpf.zerocopy_enable?

RusDyr · Apr 6, 2012

I don't understand why are you trying to tune an obviously inappropriate capture mechanism instead of using an especially designed one for the high-speed capture, as I wrote above.

Crest · Apr 6, 2012

There is a libpcap compatible wrapper around netmap.

jrt03 · Apr 6, 2012

RusDyr said:
I don't understand why are you trying to tune obviously inappropriate capture mechanism instead of use espesially designed one for the high-speed capture, as I wrote above.

1) I am unable to recompile the kernel at the moment. Though that is an option long-term moving forward and I'd love to learn more about both ringmap and netmap.

2) From my understanding both ringmap and netmap are still considered quite eclectic in their level of use and tests backing their stability. I don't think the community sees not using them as "wildly inappropriate".

3) A quick glance over the documentation yields that ringmap currently supports the em driver and the ixgbe driver. I am currently using the igb driver.

Now having heard of both ringmap and netmap, I'm curious for some comparison between the two. Are they both considered stable? Assuming they both require kernel recompilation, do they improve libpcap application performance without requiring modification of the application itself (just the libpcap library?)

From my understanding, some of this stuff is in the FreeBSD roadmap but I know there are some tunables in the current tree to improve applications in environments like this. If anyone has stable and dramatic improvements in 8.2 or 9.0 with ringmap or netmap, I'd love to hear more as well.

melifaro · Apr 8, 2012

jrt03 said:
1) Should I be using polling on either flavor or are both of these CPU's plenty capable of keeping up with interrupts?

No. Modern NICs (like igb ones) offer adaptive interrupts, there is no reasons to use polling now.

jrt03 said:
2) If our libpcap logging tool is passively sniffing the bridge interface, is it able to cause any sort of packet loss due to poor performance. From my understanding it's possible the logging tool would saturate CPU usage, causing the kernel to drop packets? In this case, is there a way I can ensure the logging tool runs on dedicated to a specific core in order to avoid situations like this?

Current kernel-side BPF code locking model is not effective. Enabling BPF receiver on interface cause significant performance on big workloads. Patches are already in -current. MFC is planned after 3 weeks.

jrt03 said:
3) Will I see any considerable performance impact by offloading checksums on the Intel NICS?

L2 bridging should not alter checksums. However, RX/TX checksums are quite a stable feature now, so you should simply turn in (back) on and forget.

jrt03 said:
I'm interested to hear if anyone sees any major red flags or must have tunables that would seriously impact performance in a situation like this

Some generic networking checklist can be found at
http://wiki.freebsd.org/NetworkPerformanceTuning

It is slowly updating it for different kind of routing workloads.

Rudy · Apr 21, 2012

I don't know squat about libpcap, but here are some generic network tuning hints for routers ...

Code:

# cat /boot/loader.conf
net.link.ifqmaxlen="512"
hw.igb.max_interrupt_rate="16000"

# cat /etc/sysctl.conf
kern.random.sys.harvest.interrupt=0
kern.random.sys.harvest.ethernet=0
net.inet.ip.fastforwarding=1
kern.timecounter.hardware=HPET
dev.igb.0.rx_processing_limit=480
dev.igb.1.rx_processing_limit=480

Oh, and if you have a bridge interface and more than 100 computers, don't forget to up your maxaddr to prevent unicast floods:
# ifconfig bridge0 maxaddr 2000