Poor network performance with PF firewall

Hello everybody,

I'm running 8.2-RELEASE with an Intel Gigabit CT Desktop Adapter (82574) NIC and a PF firewall.

Code:
FreeBSD microserver 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

Code:
em0: <Intel(R) PRO/1000 Network Connection 7.1.9> port 0xe800-0xe81f mem 0xfe8e0000-0xfe8fffff,0xfe800000-0xfe87ffff,0xfe8dc000-0xfe8dffff irq 16 at device 0.0 on pci2
em0: Using MSIX interrupts with 3 vectors
em0: [ITHREAD]
em0: [ITHREAD]
em0: [ITHREAD]
em0: Ethernet address: 00:1b:21:xx:xx:xx

Code:
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
        ether 00:1b:21:xx:xx:xx
        inet 10.0.0.15 netmask 0xffffff00 broadcast 10.0.0.255
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active

With the PF firewall enabled, throughput is approximately 6.3 MB/s according to benchmarks/iperf:

Code:
------------------------------------------------------------
Client connecting to 10.0.0.15, TCP port 5001
TCP window size: 0.13 MByte (default)
------------------------------------------------------------
[  3] local 10.0.0.11 port 64306 connected with 10.0.0.15 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  5.50 MBytes  5.50 MBytes/sec
[  3]  1.0- 2.0 sec  6.50 MBytes  6.50 MBytes/sec
[  3]  2.0- 3.0 sec  6.38 MBytes  6.38 MBytes/sec
[  3]  3.0- 4.0 sec  6.38 MBytes  6.38 MBytes/sec
[  3]  4.0- 5.0 sec  6.38 MBytes  6.38 MBytes/sec
[  3]  5.0- 6.0 sec  6.38 MBytes  6.38 MBytes/sec
[  3]  6.0- 7.0 sec  6.38 MBytes  6.38 MBytes/sec
[  3]  7.0- 8.0 sec  6.25 MBytes  6.25 MBytes/sec
[  3]  8.0- 9.0 sec  6.38 MBytes  6.38 MBytes/sec
[  3]  9.0-10.0 sec  6.25 MBytes  6.25 MBytes/sec
[  3]  0.0-10.0 sec  62.9 MBytes  6.28 MBytes/sec

With the firewall disabled, throughput jumps to 11.1 MB/s, which is, in effect, the maximum speed of the internal network:

Code:
------------------------------------------------------------
Client connecting to 10.0.0.15, TCP port 5001
TCP window size: 0.13 MByte (default)
------------------------------------------------------------
[  3] local 10.0.0.11 port 64268 connected with 10.0.0.15 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  11.2 MBytes  11.2 MBytes/sec
[  3]  1.0- 2.0 sec  11.1 MBytes  11.1 MBytes/sec
[  3]  2.0- 3.0 sec  11.2 MBytes  11.2 MBytes/sec
[  3]  3.0- 4.0 sec  11.1 MBytes  11.1 MBytes/sec
[  3]  4.0- 5.0 sec  11.2 MBytes  11.2 MBytes/sec
[  3]  5.0- 6.0 sec  11.1 MBytes  11.1 MBytes/sec
[  3]  6.0- 7.0 sec  11.2 MBytes  11.2 MBytes/sec
[  3]  7.0- 8.0 sec  11.1 MBytes  11.1 MBytes/sec
[  3]  8.0- 9.0 sec  11.1 MBytes  11.1 MBytes/sec
[  3]  9.0-10.0 sec  11.2 MBytes  11.2 MBytes/sec
[  3]  0.0-10.0 sec   112 MBytes  11.2 MBytes/sec

Copying larger files over SMB or scp(1) also result in similar speeds.

My /etc/pf.conf is as follows (tun0 is a Huawei 3G USB modem):

Code:
int_if="em0"
ext_if="tun0"
int_gw="10.0.0.2"
icmp_types="{ echoreq, unreach }"
martians = "{ 127.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12, \
              10.0.0.0/8, 169.254.0.0/16, 192.0.2.0/24, \
              0.0.0.0/8, 240.0.0.0/4 }"

set block-policy return
set loginterface $ext_if
set skip on lo

scrub in all

block in log
pass out all

antispoof quick for { $int_if $ext_if }

block drop in quick on $ext_if from $martians to any
block drop out quick on $ext_if from any to $martians

pass in inet proto icmp all icmp-type $icmp_types
# Enforce symmetric routing for incoming connections on $int_if 
pass in on $int_if reply-to ($int_if $int_gw)

Does anyone have any ideas why network throughput is so poor with PF enabled, and is there anything I could do to remedy the problem? Any settings to check or tunables to tweak?
 
pva said:
Does anyone have any ideas why network throughput is so poor with PF enabled, and is there anything I could do to remedy the problem? Any settings to check or tunables to tweak?

No idea... Check the output of pfctl -s info. There are some interesting counters (congestion, search...).

Also do not use a macro for the martians, a table will perform better (but I don't think this is the problem).

Regards.
 
plamaiziere said:
No idea... Check the output of pfctl -s info. There are some interesting counters (congestion, search...).

I changed the log interface to em0 and reran the iperf test. This resulted in the following statistics:

Code:
No ALTQ support in kernel
ALTQ related functions disabled
Status: Enabled for 0 days 00:00:47           Debug: Urgent

Interface Stats for em0               IPv4             IPv6
  Bytes In                       141072250                0
  Bytes Out                        4430988                0
  Packets In
    Passed                          104839                0
    Blocked                             36                0
  Packets Out
    Passed                           70027                0
    Blocked                              0                0

State Table                          Total             Rate
  current entries                       51               
  searches                           79999         1702.1/s
  inserts                               60            1.3/s
  removals                              55            1.2/s
Counters
  match                                 80            1.7/s
  bad-offset                             0            0.0/s
  fragment                               0            0.0/s
  short                                  0            0.0/s
  normalize                              0            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                              0            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                         0            0.0/s
  state-insert                           0            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s

There doesn't seem to be any congestion, but the search rate seems rather high to my eye.

Also do not use a macro for the martians, a table will perform better (but I don't think this is the problem).

Yeah, leaving out the martians rules doesn't have a perceptible effect on performance. (The macro, by the way, is from Peter N.M. Hansteen's PF tutorial.)
 
I finally had some time to further investigate the problem, and it would seem that the culprit is the reply-to option I'm using to enforce symmetric routing on incoming connections on the int_if interface. Leaving out the option results in the network throughput increasing from 6.5 to 11 Mb/s.

I found a year-old post on the freebsd-pf mailing list, which mentions the same problem occurring on 8.1; thankfully, in my case, the performance degradation isn't as severe. Unfortunately, the post hasn't received any replies.

I believe achieving 11 Mb/s throughput shouldn't be a problem on my hardware (even with the reply-to option enabled), so this might be a bug in the FreeBSD PF port, and I should probably file a PR on it.
 
Back
Top