A simple explanation...
The
mbuf(9) containing an input packet is handed to
bpf(4) if you are capturing traffic with libpcap, and they are copied to BPF's own buffer, which is not made of
mbufs. BPF was designed to use ping-pong buffers, which create a link to the cache. It is not hard to see an individual bpf buffer size should not close to the cache size if upper layer applications need to continue to capture packets. If up layer program cannot drain the BPF buffer faster than the NIC I/O to fill the buffer, increasing buffer size only gives you a short-time cushion at start phase. Once buffers are filled, you will start to lose packets anyway.
Checking setting values:
Code:
# sysctl net.bpf
net.bpf.zerocopy_enable: 0
net.bpf.maxinsns: 512
net.bpf.maxbufsize: 524288
net.bpf.bufsize: 4096
I recommend that you take a look at the documentation to use the
zero-copy BPF functionality discussed in BSDCan 2007 that is not enabled by default. In the normal case, with NIC I/O, buffers are copied from the user process into the kernel on the send side, and from the kernel into the user process on the receiving side. Using zero-copy provide a shared memory buffer to the kernel that will be written into, avoiding the copy from kernel-space to user-space. The send side zero copy code should work with most any network adapter. The receive side code, however, requires an adapter with an MTU that is at least a page size, due to the alignment restrictions for page substitution or page flipping.
Finally, to benchmark the performance try with
benchmarks/netperf or
benchmarks/nttcp to determine maximum throughput. Check for further benchmarks to obtain more reliability data.