FreeBSD High Performance Firewall

Hello,

Does the FreeBSD base distribution have an alternative of this project: hipac?

Can PF or IPFW can support large rulesets or high bandwidth networks?
 
The project's download page as well as its main page show the latest available version to be "2005-10-11" and "November 8th, 2005" respectively.

I don't know, but this netfilter must have been perfection itself, if with its all claims still standing it didn't need any further development since its release 0.9.1... Which I doubt somehow.
 
I know, this project was replaced with ipset and the Linux kernel supports this on the latest release. I mean:
  1. Can FreeBSD or OpenBSD use this on high network bandwidth? (For example 1 TB bandwidth.)
  2. Can pf or ipfw support over 100,000 rules on their configurations?
 
@mah454:
  1. 1000 GB/s exceeds any combination of internal buses an x86 based server has. A Xeon with LGA2011 socket (implying quad-channel DDR3) has a memory bandwidth of about 50 GiB/s. I've yet to see anything faster than a quad 10 Gb/s Ethernet card designed for IP traffic handled by the host OS. So stop just throwing around large numbers.
  2. Back then in 2005 netfilter required such large rulesets because it lacked ipsets. IPFW and PF support table lookups in their rules. I run IPFW and PF with tables containing ca. 250,000 IPv4 addresses without performance problems. Using tables and tags a sane ruleset doesn't require 100,000 rules. And a linear search through 100,000 rules will trash your performance. If you use anchors in PF or call and skipto in IPFW you can handle than many rules in a ruleset but any packet that hits a large subset of those rules will cost a lot of performance.
 
Last edited by a moderator:
Very good! You have such a large network, great! I ask you now how to achieve it!

mah454 said:
I know, this project was replaced with ipset and the Linux kernel supports this on the latest release. I mean:
  1. Can FreeBSD or OpenBSD use this on high network bandwidth? (For example 1 TB bandwidth.)
  2. Can pf or ipfw support over 100,000 rules on their configurations?

Very good. You have such a large network, great! I ask you now how to achieve it.
 
In general (this is from my experience with pf, I cannot speak for ipfw):
  • Be as simple as possible, while being only as specific as you need to be.
  • State table lookups are cheap. Ruleset evaluations are not.
  • Use a "first match" (i.e. all "quick") ruleset when possible.
  • Keep the rules that match most often first, as long as reordering doesn't change anything else. Remember, pf allows for profiled optimization!
  • It's not all software. Using good NICs (Intel; probably Broadcom as well) can make a world of a difference.
  • The type of traffic you are handling has a huge impact on performance (i.e. high PPS and low throughput is often harder than low PPS and high throughput).
  • You may need to tweak things, but always start with the basic settings so you know if you are even improving things.
That said, I can easily route at gigabit speeds with pf on an Intel Atom board, between two VLANs (both filtered) which are on a lagg interface. To be fair though, they were large packets (file transfers).

Edit: The PF performance page may be worth a look as well, but take the information there with a grain of salt. FreeBSD's pf has grown apart from OpenBSD's pf over the years, and a lot depends on your environment.
 
Speaking of PF and performance, FreeBSD 10.0 will have an improved PF that can make use of SMP and more fine-grained locking. If I'm not mistaken the current versions only run on a single core and can't take full advantage of a multi-core system. Not entirely sure what version of PF it's going to have but I guess it won't be that different (besides SMP support) from what's on 9.x.

https://wiki.freebsd.org/WhatsNew/FreeBSD10#Networking_improvements
 
It is a bit debatable if the improvements in locking and multicore support make a real difference to most people. You can easily pass trough the maximum of what a gigabyte NIC can do with PF even on a relatively low end system using FreeBSD 9. This assuming that the NICs are on a fast enough bus, PCIe for example.
 
I think it could be beneficial if you also run a (transparent) proxy on the server. But I agree, if PF is the only thing running you probably won't notice a difference.
 
kpa said:
It is a bit debatable if the improvements in locking and multicore support make a real difference to most people. You can easily pass trough the maximum of what a gigabyte NIC can do with PF even on a relatively low end system using FreeBSD 9. This assuming that the NICs are on a fast enough bus, PCIe for example.

Going multicore helps if you have lots of rules (not optimized) or a HFSC shaping with too many rules. If your firewall using PF hit 100% CPU on a single core, then going SMP will help improving performance of your firewall.
 
mah454 said:
PF how many request peer second can answer ? (for 32bit and 64bit OS)

Well it depends of the request. If a packet matches a states this is very fast. If the packet has not state (for example dropped traffic) the rules set must be checked.

There was a paper for OpenBSD/PF (a bit outdated)
http://www.benzedrine.cx/pf-paper.html

On real traffic at work, around 95% of packets matches a state.

Regards
 
An example from real world, not a rush hour. ipfw, dummynet and ng_netflow on the server isn't shown there.

Code:
14:03  up 41 days, 21:24, 1 user, load averages: 3,08 2,98 2,46

            input        (lagg0)           output
   packets  errs idrops      bytes    packets  errs      bytes colls drops
       49k     0     0        53M        38k     0        15M     0     0
       54k     0     0        61M        40k     0        15M     0     0

# pfctl -si
Code:
No ALTQ support in kernel
ALTQ related functions disabled
Status: Enabled for 41 days 21:25:49          Debug: Urgent

State Table                          Total             Rate
  current entries                    61663
  searches                    537583886433       148522.3/s
  inserts                       3766211362         1040.5/s
  removals                      3766149699         1040.5/s
Counters
  match                       273328099830        75514.4/s
  bad-offset                             0            0.0/s
  fragment                          205892            0.1/s
  short                              28409            0.0/s
  normalize                         819382            0.2/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                          14485            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                   3461401            1.0/s
  state-insert                       88486            0.0/s
  state-limit                          128            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s

# top -aSCHIP -b -d 2
Code:
...
last pid: 80256;  load averages:  2.69,  2.64,  2.42  up 41+21:28:40    14:08:27
167 processes: 10 running, 98 sleeping, 1 zombie, 58 waiting
CPU 0:  0.0% user,  0.0% nice, 15.7% system, 19.2% interrupt, 65.1% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system, 23.9% interrupt, 76.1% idle
CPU 2:  0.0% user,  0.0% nice,  0.8% system, 24.3% interrupt, 74.9% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system, 25.1% interrupt, 74.9% idle
CPU 4:  0.0% user,  0.0% nice,  0.4% system, 27.5% interrupt, 72.2% idle
CPU 5:  0.0% user,  0.0% nice,  0.4% system, 24.7% interrupt, 74.9% idle
CPU 6:  0.0% user,  0.0% nice,  0.4% system, 25.9% interrupt, 73.7% idle
CPU 7:  0.0% user,  0.0% nice,  0.8% system, 19.2% interrupt, 80.0% idle
Mem: 29M Active, 5769M Inact, 1731M Wired, 171M Cache, 826M Buf, 190M Free
Swap: 1024M Total, 504K Used, 1023M Free

  PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME    CPU COMMAND
   10 root          155 ki31     0K   128K RUN     0 672.8H 89.06% [idle{idle: cpu0}]
   10 root          155 ki31     0K   128K CPU7    7 817.7H 85.69% [idle{idle: cpu7}]
   10 root          155 ki31     0K   128K CPU6    6 806.9H 85.06% [idle{idle: cpu6}]
   10 root          155 ki31     0K   128K CPU2    2 806.7H 84.86% [idle{idle: cpu2}]
   10 root          155 ki31     0K   128K CPU3    3 819.1H 84.77% [idle{idle: cpu3}]
   10 root          155 ki31     0K   128K CPU1    1 812.9H 81.05% [idle{idle: cpu1}]
   10 root          155 ki31     0K   128K RUN     5 818.2H 79.79% [idle{idle: cpu5}]
   10 root          155 ki31     0K   128K RUN     4 805.8H 78.56% [idle{idle: cpu4}]
   11 root          -92    -     0K   944K WAIT    3  45.2H  8.40% [intr{irq259: igb0:que}]
   11 root          -92    -     0K   944K WAIT    4  47.5H  8.15% [intr{irq268: igb1:que}]
   11 root          -92    -     0K   944K WAIT    1  47.3H  7.86% [intr{irq289: igb3:que}]
   11 root          -92    -     0K   944K WAIT    5  45.7H  6.15% [intr{irq261: igb0:que}]
   11 root          -92    -     0K   944K WAIT    4  48.0H  6.05% [intr{irq260: igb0:que}]
   11 root          -92    -     0K   944K WAIT    6  47.5H  6.05% [intr{irq270: igb1:que}]
   11 root          -92    -     0K   944K WAIT    4  47.4H  6.05% [intr{irq284: igb3:que}]
   11 root          -92    -     0K   944K WAIT    6  47.5H  5.66% [intr{irq286: igb3:que}]
   11 root          -92    -     0K   944K WAIT    7  45.6H  5.47% [intr{irq271: igb1:que}]
   11 root          -92    -     0K   944K WAIT    1  47.7H  5.27% [intr{irq265: igb1:que}]
 
Another example of a real world server NOT in a rush hour too..

netstat -m

Code:
8566/6839/15405 mbufs in use (current/cache/total)
8427/6435/14862/25600 mbuf clusters in use (current/cache/total/max)
8427/3093 mbuf+clusters out of packet secondary zone in use (current/cache)
110/2625/2735/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
19442K/25079K/44522K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

pfctl -si

Code:
Status: Enabled for 34 days 21:44:45          Debug: Urgent

State Table                          Total             Rate
  current entries                    76039               
  searches                     38756825346        12850.9/s
  inserts                        761965298          252.7/s
  removals                       761931432          252.6/s
Counters
  match                          770177899          255.4/s
  bad-offset                             0            0.0/s
  fragment                              91            0.0/s
  short                                 17            0.0/s
  normalize                            365            0.0/s
  memory                              5998            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                    199773            0.1/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                          103561            0.0/s

/boot/loader.conf

Code:
hw.igb.enable_aim="1"
hw.igb.max_interrupt_rate="32000"
hw.igb.num_queues="0"
hw.igb.enable_msix="1"
hw.igb.txd="2048"
hw.igb.rxd="2048"
hw.igb.rx_process_limit="1000"
kern.ipc.nmbclusters="32768"
kern.ipc.maxsockets=204800
net.inet.tcp.tcbhashsize="32768"
net.link.ifqmaxlen="10240"
 
gkontos said:
Another example of a real world server NOT in a rush hour too..

pfctl -si

Code:
Status: Enabled for 34 days 21:44:45          Debug: Urgent
  memory                              5998            0.0/s

This looks not good, you should increase "set limit frags" I think. Absolute OpenBSD says:

"If a packet cannot be coherently reassembled, PF will drop the pieces. 5 "Normalize" shows how many packets have been dropped after scrubbing. Similarly, the 6 "memory" entry shows how many packets have been dropped because PF doesn't have enough memory to hold on to the packet fragments before reassembling them. If you start to lose packets due to memory shortages, you need to increase the memory you have allocated to PF (see "PF Memory Limits")."

Does anyone have a full documentation of the statistics provided by pfctl -sinfo?

Here, but on OpenBSD 5.1
Code:
# pfctl -si                                                              
Status: Enabled for 41 days 04:35:46             Debug: err

Interface Stats for all               IPv4             IPv6
  Bytes In                  78846831992961        492977744
  Bytes Out                 78906718029725        580841248
  Packets In
    Passed                    106760315564           564984
    Blocked                     1407780449          6133518
  Packets Out
    Passed                    106389400013          7500894
    Blocked                      684497551            59193

State Table                          Total             Rate
  current entries                   305617               
  searches                    217019721676        60978.6/s
  inserts                       3597831129         1010.9/s
  removals                      3597525512         1010.8/s
Counters
  match                         5759240852         1618.2/s
  bad-offset                             0            0.0/s
  fragment                           50581            0.0/s
  short                             696360            0.2/s
  normalize                         202918            0.1/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                      29327261            8.2/s
  ip-option                         208668            0.1/s
  proto-cksum                            0            0.0/s
  state-mismatch                   9777969            2.7/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s
 
plamaiziere said:
This looks not good, you should increase "set limit frags" I think. Absolute OpenBSD says:

"If a packet cannot be coherently reassembled, PF will drop the pieces. 5 "Normalize" shows how many packets have been dropped after scrubbing. Similarly, the 6 "memory" entry shows how many packets have been dropped because PF doesn't have enough memory to hold on to the packet fragments before reassembling them. If you start to lose packets due to memory shortages, you need to increase the memory you have allocated to PF (see "PF Memory Limits")."

Actually, what you saw in my statistics is relevant to the state entries limit. The firewall was indeed dropping packets and I had to raise it to 100000. The default in FreeBSD is 10000.
 
Back
Top