Solved pf not creating state table entries with intel (ixgbe) NIC?

Hi,

I have a pf ruleset which has not been changed for 4 months and worked without any problems so far.

A couple of days ago I have upgraded my NIC card to Intel 82599EB 10-Gigabit SFI/SFP+ (ixgbe) from a Broadcom NetXtreme II BCM5709 (bce).
Additionally I have updated the server's (Dell PowerEdge R610) software via IDRAC as well.

The problem is that (some, not all at the same time) computers on LAN randomly can't reach my router's gateway now. It happens rarely. It's as if packets never hit my LAN rules (/etc/pf.lan_computers) in pf and no state gets created.

At the same time I can easily ping them from my router as if nothing happened. (At least it means that this pf rule is evaluated correctly).

tcpdump -i ix1 src ip_addr_of_affected_pc
shows that no traffic is going through from ip_addr_of_affected_pc

pftop reports no states for ip_addr_of_affected_pc

The very moment I do a pfctl -f /etc/pf.conf to reload the rules, BAM, states get created normally and the traffic resumes on the PC that couldn't reach my gateway.

This hasn't happened before for 4 months with my older NIC. I am clueless what can cause this. The pf rules themselves haven't been changed since then.

Unfortunately I don't have much access to the LAN computers when it happens. I only get a call, so not much that I can check besides that.

Below some info of my setup:

OS:
FreeBSD 10.1-RELEASE-p6 amd64
FreeBSD 10.1-STABLE #3 r279781M amd64 (happens with release and stable versions with the new NIC)

My /etc/pf.conf ruleset can be simplified to this:
Code:
### TABLES ###
table <conflicker> persist
table <ssh_abusers> persist

### MACROS ###
WAN0=ix0
LAN0=ix1

### OPTIONS ###
set skip on lo0
set block-policy return
set limit table-entries     500000
set limit frags             50000
set limit states            500000
set limit src-nodes         500000

### SCRUB ###
scrub in all fragment reassemble

### NAT ###
# ip addresses are fake
nat on $WAN0 from 10.10.10.0/24  to any -> 33.5.24.10
nat on $WAN0 from 10.10.20.0/24 to any -> 33.5.24.11
# ...

### anti virus message ###
rdr pass on $LAN0 proto tcp from <conflicker> to any port http -> 33.5.24.10 port 300

### DEFAULT DENY ###
block in all
block out all

antispoof quick for $WAN0
antispoof quick for $LAN0

block quick proto { tcp, udp } from any port = 0 to any
block quick proto { tcp, udp } from any to any port = 0

### WAN ###
block in quick on $WAN0 from <ssh_abusers>

pass in quick on $WAN0 inet proto icmp from any to $WAN0 keep state
pass in quick on $WAN0 proto tcp from any to $WAN0 port ssh synproxy state (max-src-conn 5, max-src-conn-rate 10/30, overload <ssh_abusers> flush)
pass out quick on $WAN0 keep state

### LAN ###
anchor lan_computers
load anchor lan_computers from "/etc/pf.lan_computers"

pass out quick on $LAN0 from {10.10.10.1, 10.10.20.1, ...} to ($LAN0:network) keep state
File /etc/pf.lan_computers:
Code:
table <computers_net_a> persist { 10.10.10.2, 10.10.10.3, ... more ips }
table <computers_net_b> persist { 10.10.20.2, 10.10.20.2, ... more ips }
# ...  more networks

# this is using the ifconfig group name
pass in quick on lanif from <computers_net_a> keep state (source-track rule, max-src-states 1200, max-src-conn-rate 200/1)
pass in quick on lanif from <computers_net_b> keep state (source-track rule, max-src-states 300, max-src-conn-rate 100/1)
# ...
Pastebin contains the usual ifconfig, netstat, /etc/sysctl.conf, /boot/loader.conf, etc... (I couldn't reasonably include it inline in the post)
http://pastebin.com/Z4sN7mAT

I would be glad if someone could point me to what might be wrong as I have no idea.

My only workaround currently is to reload pfctl -f /etc/pf.conf in crontab every couple of minutes to ensure no one loses access to the internet.

Thank you
 
Now that this has happened again I had a little bit of chance to check it.

ip address of affected pc: 10.10.20.2
ip address of my gateway: 10.10.20.1

tcpdump -i ix1 src 10.10.20.2
Code:
13:27:38.778926 ARP, Request who-has 10.10.20.1 tell 10.10.20.2, length 46
13:27:44.571924 ARP, Request who-has 10.10.20.1 tell 10.10.20.2, length 46
13:27:58.134455 ARP, Request who-has 10.10.20.1 tell 10.10.20.2, length 46
...
As it can be seen the PC is periodically requesting the MAC address of my gateway but receives no response.

Meanwhile I can arping (and ping as well) the affected PC from the router with no problem.

arping -I ix1 10.10.20.2
Code:
60 bytes from 00:24:e8:34:ab:a3 (10.10.20.2): index=0 time=955.104 usec

I am using staticarp on the LAN interface (ix1):
Code:
ifconfig ix1 inet 10.10.20.1 netmask 255.255.255.0 group lanif staticarp -tso -lro
...
arp -a | grep '10.10.20.2'
Code:
? (10.10.20.2) at 00:24:e8:34:ab:a3 on ix1 permanent [ethernet]
The IP:MAC mapping is kept in file /etc/ethers

cat /etc/ethers
Code:
...
10.10.20.2 00:24:e8:34:ab:a3
...
Reloading it has no effect: arp -f /etc/ethers

Only after reloading pf pfctl -f /etc/pf.conf the PC is able to send data to my router again.

The exact same setup with staticarp was used on the Broadcom NIC before.
 
Good troubleshooting. I don't have any answers but what you have observed appears to show the issue. You should see ARP replies from your box back to the client. It doesn't make sense that reloading pf(4) makes it work; I didn't even think the layer 2 stuff mattered for pf(4). What does netstat -s -p arp look like when this is happening? Is it registering ARP requests and no replies? Is there any indication of limits or counters triggering with pfctl -vs info?
 
So I had at least two occurances today with staticarp turned off on the NIC. I wasn't around so I couldn't debug. Anyway I have set to reload pf every 10 minutes in crontab, so it alleviates the problem somehow for now...

Oh! I totally forgot about the verbose output of pfctl -vs info. Thanks junovitch.
There's something interesting in it:

Code:
State Table                          Total             Rate
  current entries                   176344
  searches                     14652698513       103553.4/s
  inserts                         89361952          631.5/s
  removals                        89185508          630.3/s
Source Tracking Table
  current entries                    10547
  searches                        49701595          351.3/s
  inserts                           247319            1.7/s
  removals                          236765            1.7/s
Counters
  match                          113753638          803.9/s
  bad-offset                             0            0.0/s
  fragment                             721            0.0/s
  short                               1073            0.0/s
  normalize                            588            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                           3933            0.0/s
  proto-cksum                          243            0.0/s
  state-mismatch                    103950            0.7/s
  state-insert                         400            0.0/s
  state-limit                            0            0.0/s
  src-limit                        1526320           10.8/s
  synproxy                           50294            0.4/s
Limit Counters
  max states per rule                    0            0.0/s
  max-src-states                   1526106           10.8/s
  max-src-nodes                          0            0.0/s
  max-src-conn                          34            0.0/s
  max-src-conn-rate                    181            0.0/s
  overload table insertion              34            0.0/s
  overload flush states                 34            0.0/s

Notice the high max-src-states. Which means rules are hitting the maximum src-states I defined for them.

Pretty weird because I've have quite high limits. Why would the counter be so high?
Code:
table <computers_net_a> persist { 10.10.10.2, 10.10.10.3, ... more ips }
table <computers_net_b> persist { 10.10.20.2, 10.10.20.2, ... more ips }
# ... more networks

# this is using the ifconfig group name
pass in quick on lanif from <computers_net_a> keep state (source-track rule, max-src-states 1200, max-src-conn-rate 200/1)
pass in quick on lanif from <computers_net_b> keep state (source-track rule, max-src-states 300, max-src-conn-rate 100/1)
# ...

pfctl -s Sources | less
Code:
10.10.90.61 -> 0.0.0.0 ( states 7, connections 4, rate 0.0/1s )
10.10.90.61 -> 0.0.0.0 ( states 3, connections 3, rate 0.0/1s )
10.10.90.61 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s )
10.10.90.61 -> 0.0.0.0 ( states 4294967293, connections 4294967293, rate 0.0/1s )
10.10.33.83 -> 0.0.0.0 ( states 2, connections 0, rate 0.0/1s )
10.10.100.180 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s )
10.10.100.180 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s )
10.10.28.17 -> 0.0.0.0 ( states 4294967295, connections 0, rate 0.0/1s )
10.10.18.74 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s )
10.10.18.74 -> 0.0.0.0 ( states 15, connections 17, rate 0.0/1s )
10.10.18.74 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s )
10.10.18.74 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s )
10.10.18.74 -> 0.0.0.0 ( states 2, connections 2, rate 0.0/1s )
10.10.100.64 -> 0.0.0.0 ( states 128, connections 128, rate 0.0/1s )
10.10.100.64 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s )
10.10.100.64 -> 0.0.0.0 ( states 7, connections 7, rate 0.0/1s )
10.10.100.64 -> 0.0.0.0 ( states 1, connections 1, rate 0.0/1s
...

Whaaat? Some IP addresses report crazy amounts of states and connections. This can't be right.

pftop (with filter: src 10.10.90.61)
Code:
pfTop: Up State 1-20/20 (190298), View: default, Order: none, Cache:
10019:12:28
PR    D SRC                   DEST                 STATE   AGE   EXP  PKTS BYTES
tcp   I 10.10.90.61:52117     31.13.93.3:443       10:10    46    79    36  7237
tcp   I 10.10.90.61:37304     173.194.67.147:443    9:4    340   800    19  5962
tcp   I 10.10.90.61:49477     216.58.209.78:443     9:4    666   475    38 14888
tcp   I 10.10.90.61:55235     88.220.177.161:80     4:4    106 86294     7   884
tcp   I 10.10.90.61:55236     88.220.177.161:80     4:4    106 86295    10  1584
tcp   I 10.10.90.61:55234     65.52.233.45:80       4:4    106 86295     8  1706
tcp   I 10.10.90.61:55237     157.55.253.50:80      4:4    106 86295     6  1243
tcp   I 10.10.90.61:55233     65.52.233.45:80       4:4    106 86295    10  2470
tcp   I 10.10.90.61:56222     31.13.93.3:443       10:10    35    79    27  3884
tcp   I 10.10.90.61:39008     216.58.209.74:443     9:4   1004   137    21  7756
tcp   I 10.10.90.61:40456     64.233.163.188:5228   4:4   1187 86353    39  7602
tcp   I 10.10.90.61:55060     216.58.209.78:443     4:4   3216 86400   638  352K
udp   I 10.10.90.61:43417     31.13.83.11:50030     2:2    377    60   178 32340
tcp   I 10.10.90.61:55227     195.149.238.207:443   4:4    661 86393 14851   14M
tcp   I 10.10.90.61:45548     31.13.93.3:443        9:4    433   587    27  7952
tcp   I 10.10.90.61:51705     31.13.93.3:443       10:10     5    88    29  4842
tcp   I 10.10.90.61:36343     69.171.235.48:443     4:4   1240 86388  1166  134K
udp   I 10.10.90.61:39340     69.171.239.36:3478    2:2    378    33    32  1728
udp   I 10.10.90.61:39340     31.13.100.97:54752    2:2    374    60 13278 1951K
tcp   I 10.10.90.61:55238     88.220.177.161:80     4:4    105 86295     7  1470

According to pftop, those IPs that have crazy amounts of states/connections via pfctl -s Sources have a normal amount of them. Why would pf report such values then?

Also, pfctl -s Sources reports IP addresses multiple times. E.g 10.10.90.61 is reported 4 times total. I would expect from my pf rules for each IP address to be accounted only once. All rules are annotated with keep state in /etc/pf.lan_computers.

It's really weird and I haven't checked this before with the Broadcom NIC. It could explain this errant behaviour that people sometimes randomly can't connect to my gateway, simply pf assumes an IP has exceeded it's state limit and drops all traffic. This is why a reload pfctl -f /etc/pf.conf would magically fix it.

But then the question is: Why would the NIC change have anything to do with it?

I am perplexed.

Anyone using similar rules, e.g pass in on $IF from <some_table> keep state (source-track rule, max-src-states XXX, max-src-conn-rate XXX)? Do you also see multiple IP addresses reported for pfctl -s Sources?
 
Anytime you see 4294967295 don't think that's legit, think 32 bit integer overflow. See PR 182401. I don't know how the NIC change would come into play but if you chime in on that PR with what you are seeing you'll probably get some more technical details.
 
Anytime you see 4294967295 don't think that's legit, think 32 bit integer overflow. See PR 182401. I don't know how the NIC change would come into play but if you chime in on that PR with what you are seeing you'll probably get some more technical details.

Thanks! That's definitely the bug I am hitting. Will try some of the patches later. For the time being I will have to disable source-track rule.
 
I mark it as solved. I applied the patch: access pf_src_node->states only under pf_srchash lock from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=182401 and recompiled the kernel. It's been running fine for a few days already. It also appears that I had changed my ruleset a few days prior to the NIC change but forgot about it. Thus the NIC change had nothing to do with the bug, it was just source-track rule overflowing.
 
Back
Top