PF I do not understand why certain packets are blocked when the src and dst IP are specifically unconditionally allowed

We have a PF firewall with NAT running on FreeBSD-12 setup as a gateway router. We are attempting to transition from IPTables on CentOS to PF on FreeBSD. However, we have encountered a serious problem with a third-party ssh client running on MS Windows. This software randomly drops its connection with the upstream VAN provider that mandates the use of this software. This problem only began following the move to the FreeBSD PF gateway and so suspicion falls mainly there. We have discovered that PF is blocking some packets from and to the VAN despite there being specific allowances for such traffic in the rule set.

I do not understand TCP to any great extent so I cannot decipher what is causing PF to ignore the specific PF rules provided for the VAN. I can guess that there is something about the tcp flags or timeouts but I cannot interpret the traces. I tried simply removing the Calomel macros and just go with nothing on the rules, which should have given me '[FONT=courier new]keep state flags S/SA[/FONT]' . But that did not seem to work.

Switching to explicitly setting 'keep state flags S/SA' appears to eliminate the problem but I do not understand why this is necessary.

In our original PF ruleset we have the following macros taken from the Calomel example:

[codeTcpState ="flags S/UAPRSF modulate state"
SshSTO ="(max 5, source-track rule, max-src-conn 5, max-src-nodes 5, max-src-conn-rate 5/30, overload <BLOCKTEMP> flush global)"[/code]

We have these default block rules:

Code:
block return  out log   all
block drop    in  log   all

We also have these specific rules further down:

Code:
pass          in  log   quick \
                  from  11.22.123.34 \
                  to    any $TcpState $SshSTO

pass          out log   quick \
                  from  any \
                  to    11.22.123.34 $TcpState $SshSTO
However, some packets to and from nat'ed connections with [FONT=courier new]11.22.123.34[/FONT] are blocked by the default rules. Tcpdump shows this:
Code:
00:00:00.035645 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.61121: Flags [.], ack 2086591712, win 160, length 0
00:00:00.008608 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.58929: Flags [.], ack 4167718526, win 160, length 0
00:00:00.048635 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.61121: Flags [.], ack 1, win 160, length 0
00:00:00.045995 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.58929: Flags [.], ack 1, win 160, length 0
00:00:00.176116 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.61121: Flags [.], ack 1, win 160, length 0
00:00:00.181632 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.58929: Flags [.], ack 1, win 160, length 0
00:00:00.037239 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.57395: Flags [.], ack 3896367661, win 160, length 0
00:00:00.160030 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.61121: Flags [.], ack 1, win 160, length 0
00:00:00.049054 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.58929: Flags [.], ack 1, win 160, length 0
00:00:00.130106 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.57395: Flags [.], ack 1, win 160, length 0


I cannot interpret the TCP flags and I have checked the timeouts but do not see anything unusual there:
Code:
pfctl -s timeouts
tcp.first                   120s
tcp.opening                  30s
tcp.established           86400s
tcp.closing                 900s
tcp.finwait                  45s
tcp.closed                   90s
tcp.tsdiff                   30s
udp.first                    60s
udp.single                   30s
udp.multiple                 60s
icmp.first                   20s
icmp.error                   10s
other.first                  60s
other.single                 30s
other.multiple               60s
frag                         30s
interval                     10s
adaptive.start            24000 states
adaptive.end              48000 states
src.track                     0s

Can someone explain to me what it is about 'flags S/UAPRSF modulate state' that causes the problem we are seeing? Or is this issue related to NAT'ing in some way? It seems that the stateful TCP connection initially established gets forgotten somehow. Then the following flags get treated as a new connect thus causing the failure. But I do not know why.
 
Last edited by a moderator:
I'd start by testing if you still have the problem without 'flags S/UAPRSF' set. I'd have to have more time and a full wireshark capture to debug that in depth, but that's the first thing that springs to mind.
 
I have tested the pass rules using the default setting for PF ([FONT=courier new]keep state flags S/SA[/FONT]) and with [FONT=courier new]keep state flags S/SA[/FONT] explicitly set. I have also tried [FONT=courier new]keep state flags any[/FONT]. With this last setting there are no more block records logged in the pflog but, the application keeps faulting out and the connection drops on the clients regardless. This seems to say that there is something wrong with the way I have implemented nat or with how the established tcp connections are being managed by PF. I have zero experience with this so I have no idea what or where the problem is.

I can provide a wireshark capture dump for a period in which several sessions dropped but nothing was logged by pf.
 
[FONT=courier new]pass in log quick \
from [FONT=courier new]11.22.123.34[/FONT] \
to any $TcpState $SshSTO

pass out log quick \
from any \
to 11.22.123.34 $TcpState $SshSTO[/FONT]
First off, these rules effectively allow any traffic (tcp, udp, any portnumber, in and out). If that is what was intended, then it's fine, but you were specifically talking about SSH.
[FONT=courier new]However, some packets to and from nat'ed connections with [FONT=courier new]11.22.123.34[/FONT] are blocked by the default rules. Tcpdump shows this:

00:00:00.035645 rule 1/0(match): block in on em1: 11.22.123.34.2148 > [FONT=courier new]75.232.5.45[/FONT].61121: Flags [.], ack 2086591712, win 160, length 0
00:00:00.008608 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].58929: Flags [.], ack 4167718526, win 160, length 0
00:00:00.048635 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].61121: Flags [.], ack 1, win 160, length 0
00:00:00.045995 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].58929: Flags [.], ack 1, win 160, length 0
00:00:00.176116 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].61121: Flags [.], ack 1, win 160, length 0
00:00:00.181632 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].58929: Flags [.], ack 1, win 160, length 0
00:00:00.037239 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].57395: Flags [.], ack 3896367661, win 160, length 0
00:00:00.160030 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].61121: Flags [.], ack 1, win 160, length 0
00:00:00.049054 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > [FONT=courier new]75.232.5.45[/FONT].58929: Flags [.], ack 1, win 160, length 0
00:00:00.130106 rule 1/0(match): block in on em1: [FONT=courier new]11.22.123.34[/FONT].2148 > 75.232.5.45.57395: Flags [.], ack 1, win 160, length 0[/FONT]
The "Flags [.]" means the ACK flag. I tend to think that when ACKs get lost, the connection will sooner or later break as a result. It is important to understand that PF rules only apply to the initial packet that establishes a (stateful) connection. Everything after that is handled by PF's state table. The reason why those packets are dropped is likely caused by a non-existent/invalid state. When the packets do not match any entry in the state table and the packet's flags do not correspond with those in the filter rule that creates a new state (S/SA), there is nothing else left but to drop the packet.

A few things that might help you track down this problem further...

Verify that that the effective rules are actually what you wanted them to be. There are a few things at play here (ruleset optimization, implicit defaults, etc), so the effective rules might be different from those in your config: pfctl -s rules. Check the state table if there is anything that could help identify the cause of the problem: pfctl -s state. You might also try to simplify those rules and restrict the rules to only the intended protocol/port number. You can add other stuff like source tracking once the rules proved to be working without those features. Assuming the intended purpose is for any host to SSH into host 11.22.123.34 on port 2148, your rule could look like this:
Code:
pass log quick inet proto tcp from any to 11.22.123.34 port 2148 keep state
As the rule has no direction specified, it will match in both directions, so you only need one rule instead of two.
 
Thank you. Originally I had specified the protocol and had one rule. In order to assist with determining what was going on I revised it many times. What I have shown here is the current state of the rules respecting this one specific address.

I follow your comments respecting the invalid tcp packets. That is what i thought as one of the possibilites. However, the fact remains that this problem does not happen, or if it does does not cause the same result, when the gateway service is provided by iptables on centos. That is what is puzzling me. I have attached a png screenshot of the wireshark trace at the time one of the sessions crashed.
 

Attachments

  • Wireshark_NetTerm_2019-02-22_16-05-40.png
    Wireshark_NetTerm_2019-02-22_16-05-40.png
    138.8 KB · Views: 361
I follow your comments respecting the invalid tcp packets. That is what i thought as one of the possibilites. However, the fact remains that this problem does not happen, or if it does does not cause the same result, when the gateway service is provided by iptables on centos. That is what is puzzling me. I have attached a png screenshot of the wireshark trace at the time one of the sessions crashed.
It's hard to compare oranges with tomatoes, and while I have no experience whatsoever in iptables, i believe it's inner workings are quite different from PF, as are the rulesets involved. The wireshark trace shows a lot of retransmissions which would fit to ACKs not getting through, i guess. Did you experience connection problems with any other connections passing through your gateway, specifically TCP connections? Any chance, the interface you are performing NAT on has multiple IP addresses assigned? Perhaps you could post the entire pf.conf?
 
The pf and rc config files for that gateway host are attached below. There is only one IPv4 address assigned to the external i/f. The internal i/f has several aliases.
 

Attachments

  • pf.conf.gway04.txt
    14.3 KB · Views: 248
  • rc.conf.gway04.txt
    4.4 KB · Views: 216
The pf and rc config files for that gateway host are attached below. There is only one IPv4 address assigned to the external i/f. The internal i/f has several aliases.
That's quite the config you got there... When there's only a single address on the NAT interface then this shouldn't be a problem, as long as all traffic that needs to be NATed is actually passed to the nat rule.I got a few suggestions that should help to simplify your config and also make debugging easier:
  1. There's no need to put statements in curly braces when they contain only a single item, i.e. { tcp } -> tcp
  2. You are using macros to define several larger lists of addresses / network blocks, which results in PF creating one rule for every item in the list. It's more efficient to use tables instead, where a single table can hold any number of addresses/networks. Rules can then simply reference the table. PF's ruleset optimizer would probably already do this, but you explicitly disabled it by using set ruleset-optimization none. But then again you end up with a ruleset that when loaded differs much from what you had in your config file, which makes it harder to trace down problems.
  3. I would rather not use hostnames in /etc/pf.conf as there's a good chance DNS is not up/reachable at the time when the configuration gets loaded during system boot.
  4. For well-known port numbers that are defined in /etc/services you can use the service name to make your configuration more readable and to eliminate the use of macros for this purpose, i.e. instead of defining a macro port_dns = "{ 53 }", remove the macro and just use domain instead.
  5. You should never have to deal with any other ICMP types than echoreq, which is needed to allow for ping to pass through. All other icmp types like timex, unreach, paramprob and also the answers to previous echo requests - echorep - should automatically and statefully be handled by PF without needing any rules whatsoever.
  6. There's no reason to disable traffic normalization (scrubbing) which performs fragment reassembly and also it will sanitize packets that could otherwise cause problems later on. I suggest adding something like scrub in all in the normalization section of your config.
  7. Why do you disable filtering on the internal interface completely by using set skip on em0?
  8. The rules that provide the default (deny) action for anything not matched otherwise should be the first filter rules in your ruleset, so those should be moved before the antispoof rule(s). The antispoof rules could also use quick, so that processing of filter rules stops when those match.
  9. You got two rules to block connections from addresses in the temp/perm tables. Is there a reason why one rule applies to tcp+udp only and the other does not? Also you can use a single rule to handle both tables at once by using curly braces, i.e. block in from { <BLOCKTEMP> <BLOCKPERM> }
  10. Do not use flags any on a rule that also uses keep state as that would have any packet create a state which is certainly not what you want and likely to break something.
  11. Double check your filtering rules for traffic direction. I'm not actually sure whether those rules serve the purpose that is intended. Remember that stateful rules create a state that will allow for successive traffic belonging to that connection to flow in both directions. So you don't need a separate rule if all you want is to allow for the return traffic to pass. A separate rule is only needed if you actually want to allow for connections to be set up in both directions.
 
Ah. Well, the pf.conf I ended up with is not the pf.conf I intended. Nor is it what I expect to have in place at the end of this exercise. If I can get things to work at I require. Much of this is the consequence of trying to make things clear to ME and to avoid having to edit multiple files. I will attempt to answer your questions to the best of my ability. And thank you for your assistance.

1. Curly braces and single items. Understood, the set up I have is designed so that I can insert and remove addresses and ports easily for testing. In the case of addresses this is because I I wish to test certain network configurations from non-standard (for us) server hosts . For portsd, the same holds true. Once this is all working much of this will be unnecessary.

2. Lists vs. tables and no optimization are in place to allow single source editing and ease relating specific log entries to specific configuration file entries. Once things are stabilised then much of this can be simplified, as you suggest.

3. With respect to DNS entries in pf.conf these were put in place as an experiment, which did not prove useful, and were replaced witht eh IPv4 address, as shown in the pf.conf provided.

4. I prefer to work with port numbers mostly and I chose to use macros, even when there is only one real choice, to provide a consistent syntax for my personal use. This is entirely idiosyncratic ([FONT=arial]de gustibus non est disputandum[/FONT]).

5. I will return imcp types back to only allow echoreq. Again, the current state is the result of successive iterations attempting to figure out what is going on with the dropped packets.

6. Scrubbing was turned on up to this point and was turned off to see it that had any effect on the problem. I will turn it back on.

7. Em0 is not filtered because of our internal, and as I learned highly unusual, network topography, outlined in a separate thread. Basically we are running multiple net-masks over the same wire without the benefit of hardware that supports vlans. This has to be dealt with at some point but cannot be addressed at the moment. This situation evolved over time because iptables on our earlier gateway never exhibited any problems given that we only filtered forwarded packets.

8. I will move the antispoof rule to follow the default.

9. The separation of tcp and udp and blockperm from blocktemp was to provide finer granularity in the logging. Recall that PF is completely new to me.

10. With flags any the frequency of dropped connections is much reduced. Again, this was a blind attempt at trying to discover what is causing the problem with dropped connections to begin with. It has had an effect.

11. The existing rule set contains arefacts from multiple tests wherein I needed to see in the logs exactly what was affect in traffic.

I will reduce the pf.conf to the minimal needed and incorporate your suggestions to the best of my ability and understanding. Then I will report the results here.
 
The previous post was written on Saturday. I just did not press send.

Simplifying the pf.conf did not change the outcome. However, as a test I created two ssh connections to the problem dst ipv4 from the same host, each employing differing src addresses. One was an alias from our public netblock. The other from a private IP that must be NATed.

The public address passes through the pf filter without problems. The connection does not drop until terminated by the client. The one which uses NAT failed after the usual brief period.
 
2. Lists vs. tables and no optimization are in place to allow single source editing and ease relating specific log entries to specific configuration file entries. Once things are stabilised then much of this can be simplified, as you suggest.
The table vs macro question is actually pretty simple math. If you define a macro that holds 10 addresses and then use that macro in three different places, the result is 30 rules! Implementing the same thing using a table is almost the same as for the number of lines you have to have in your ruleset, but the result is just one table and three rules instead of 30. And it's easy to write too:
Code:
table <something> const persist { \
        10.1.1.0/24, 10.1.5.0/24, 10.1.19.0/24, \
        192.168.100.1, 192.168.100.28, 192.168.100.142, \
        fd96:7824:eab4:c68a/64 }

...

block log quick from <something> to any

Simplifying the pf.conf did not change the outcome. However, as a test I created two ssh connections to the problem dst ipv4 from the same host, each employing differing src addresses. One was an alias from our public netblock. The other from a private IP that must be NATed.

The public address passes through the pf filter without problems. The connection does not drop until terminated by the client. The one which uses NAT failed after the usual brief period.
So it seems the problem is somehow related to NAT. I think this leaves three possible causes:
  1. Not all traffic that should be is passed to the nat rule.
  2. NAT does something terribly wrong
  3. A filter rule blocks (portions of) the traffic after NAT.
In order to eliminate 1 I would try to make sure that the rules after loading (check pfctl -s rules) result in exactly one nat rule, not several, and to keep that rule as broad and generic as possible, even if it's only for testing. If it's possible use some broad address/mask to feed traffic to the nat rule, like 192.168.0.0/16. As to 2 you might want to try and add :0 to the interface name of the nat target address and see whether that makes any difference, i.e. nat on $ext_if from 192.168/16 to any -> ($ext_if:0). This should make sure that nat only uses the first address of that interface, even if there are several. As for 3 you can probably only check any traffic that is blocked using tcpdump -netttti pflog0 and see if there's something that shouldn't be blocked.
 
IS this the problem?

nat on em1 inet from 192.168.8.0/24 to any -> (em1) round-robin
nat on em1 inet from 192.168.150.0/24 to any -> (em1) round-robin
nat on em1 inet from 192.168.216.0/24 to any -> (em1) round-robin
 
IS this the problem?

nat on em1 inet from 192.168.8.0/24 to any -> (em1) round-robin
nat on em1 inet from 192.168.150.0/24 to any -> (em1) round-robin
nat on em1 inet from 192.168.216.0/24 to any -> (em1) round-robin
I think it's worth a shot. Those three rules are probably the result of using the $net_nat macro and that would neither happen using tables nor a plain address/mask specification. Reduce that to a single rule and see if it makes any difference.
Code:
nat on em1 inet from 192.168/16 to any -> (em1:0)
That should also make the 'round-robin' go away if I remember it right.
 
This I have done. I am awaiting a suitable opportunity to reload the rules. All the NAT connections drop when I do that. At the moment the misery is spread around. I would rather not provoke them all at once.
 
Reloaded PF:

[FONT=courier new]pfctl -s all | grep nat
nat on em1 inet from 192.168.0.0/16 to any -> (em1:0)[/FONT]

We will see if that solves the problem.
 
This did not resolve the issue. We have however established the following:
  1. The problem only affects natted tcp connections - direct connections using public IPv4 through the gateway are not affected.
  2. All natted connections are broken at the same time.
 
This did not resolve the issue. We have however established the following:
  1. The problem only affects natted tcp connections - direct connections using public IPv4 through the gateway are not affected.
  2. All natted connections are broken at the same time.
Looks like we're back to square one, in a way. That single NAT rule should effectively NAT all traffic with a source address within 192.168/16 that traverses the em1 interface to em1's first address only. I don't see what could possibly go wrong on that part. Your earlier findings seemed to suggest that portions of the traffic (more specifically ACKs) got dropped by the default deny rule. But as ACKs are usually part of an established TCP connection - which means your ruleset is not supposed to have any rules specifically dealing with ACKs - this can only be a symptom, not the cause. The ACKs should not have been passed through the filter ruleset in the first place, but rather have been matched by the state table.That leaves the question: Why aren't they?

Now that I think about it, something else seems odd:
However, some packets to and from nat'ed connections with [FONT=courier new]11.22.123.34[/FONT] are blocked by the default rules. Tcpdump shows this:
Code:
00:00:00.035645 rule 1/0(match): block in on em1: 11.22.123.34.2148 > 75.232.5.45.61121: Flags [.], ack 2086591712, win 160, length 0
Neither the source, nor the destination address appear to be from a private network block, so how can there be NAT involved at all?

Also you mentioned that:
Switching to explicitly setting 'keep state flags S/SA' appears to eliminate the problem but I do not understand why this is necessary.
Does this still fix the connection problems? If so, how about flags S/SAFR? Still working? Broken again?
 
The problem exists irrespective of the flags setting. I have actually tried using[FONT=courier new] flags S/S[/FONT] and the problem persists. There is a noticeable reduction in the number of incidents when [FONT=courier new]flags any[/FONT] or [FONT=courier new]flags S/S[/FONT] is in place on the rules concerning the host in question. But the problem persists.

I am attaching a couple of wireshark packet extracts each showing 13 packets received at about the same time that a connection failure was noted. These cover the remote host and the NAT external address.
 

Attachments

  • ssh_drop.txt
    7.6 KB · Views: 168
  • ssh_drop_2.txt
    8.3 KB · Views: 188
RE: Neither the source, nor the destination address appear to be from a private network block, so how can there be NAT involved at all?

What happened is that I likely, and inadvisedly, manually altered in that post to obscure actual ips. Replace [FONT=courier new]75.232.5.45[/FONT] with [FONT=courier new]192.250.28.164[/FONT] and [FONT=courier new]11.22.123.34[/FONT] with [FONT=courier new]72.142.105.234. [/FONT] The private network address is natted before it reaches the filter rules and the nat rule itself is not logged so it never appears in the tcpdump.
 
Atached is a wireshark packet trace with 100 entries commencing before and ending after a connection drop which provides details of the traffic leading up to the drop and the period immediately after.
 

Attachments

  • ssh_drop_3.txt
    113.7 KB · Views: 179
RE: Neither the source, nor the destination address appear to be from a private network block, so how can there be NAT involved at all?

What happened is that I likely, and inadvisedly, manually altered in that post to obscure actual ips. Replace [FONT=courier new]75.232.5.45[/FONT] with [FONT=courier new]192.250.28.164[/FONT] and [FONT=courier new]11.22.123.34[/FONT] with [FONT=courier new]72.142.105.234. [/FONT] The private network address is natted before it reaches the filter rules and the nat rule itself is not logged so it never appears in the tcpdump.
So that explains that. Looking at the two packet traces you provided, there seem to be multiple connections involved (given the src/dst port numbers). The most interesting one seems to be 72.142.105.234:64366 -> 192.250.28.164:2148. You can see that at 09:32:09 the initial three-way-handshake takes place which establishes the TCP connection, and which should also create a state table entry that allows successive traffic to flow in both directions between these source/destination adresses/ports without the need to consult the filtering rules again:
Code:
09:32:09.841899882 72.142.105.234:64366 -> 192.250.28.164:2148 SYN
09:32:09.847559378 72.142.105.234:64366 <- 192.250.28.164:2148 SYN+ACK
09:32:09.847751386 72.142.105.234:64366 -> 192.250.28.164:2148 ACK
The second log shows a total of 8 retransmissions (192.250.28.164:2148 -> 72.142.105.234:64366) starting at 09:43:00 all with Seq=14061 Ack=4015. This seems to indicate that the server did not see an acknowledgement from the client for data it had sent previously, therefore it is resending the data. But as the retransmissions remain unanswered, the connection probably breaks by either one side giving up on it.

So that is what's happening, but it still does not explain why it's happening. First thing to check for would be if you still see blocked packets from such connections. You can use tcpdump -netttti pflog0 to check for blocked packets in real-time, or use the recorded log file with tcpdump -nettttr /var/log/pflog. With some luck you might still find something that belonged to the above mentioned connection. You can restrict the display of blocked packets by specifying further arguments to tcpdump, something like tcpdump -nettttr /var/log/pflog host 72.142.105.234 and port 64366 should work to see only those packets involving the specified IP and port number. If there are still blocked packets next thing to take note of is which rule did block those packets. I suspect it will be the default deny rule at the start of the ruleset, which essentially means the entire filter ruleset was checked and that there was no other rule which explicitly allowed the packet in question.

Anyway that packet should not have been passed through the filter ruleset at all, because the expected situation would be, that a state table entry exists which allows the traffic to pass without requiring it to pass through the filter ruleset (again). So this is merely an indication that for some reason the state table did not match the packet although it should have. You can view the state table with pfctl -s state. This will of course show all currently active states, so you might want to pipe the output through grep to narrow it down to just the interesting parts. Adding the -v switch to the pfctl command will show verbose information for each state table entry, including age and expiration. Two things are of particular interest: 1) does a newly created connection actually create the corresponding state at all and 2) is this state still in effect at the time when the connection starts to break?
 
Just making sure: are you starting a connection, then (while the connection is live) changing the rules, and then finding that it breaks (disconnects)? I believe you can get into some odd states by doing that — you want to have the full ruleset in place before the connection is established.

I haven’t grokked the whole (long) thread here, but my dad always said to try the easy things first. ;)
 
No, I do not load revised pf.conf while there are active connections.

As an experiment I took the PF system down and diabled hyper threading. The system is an atom D525 with two cores on a single cpu. After restarting we could barely establish an ssh or https connection over nat before the connect broke. However pass through ssh/https connection from hosts with public addresses remained unaffected. So, that did not work.

We have had to return to IPTables on the older router for now. I will setup the test LAN again and work there. But, I sense that this problem is sensitive to the number of connections being natted. I tested this setup extensively before switching to it and never ran into this problem until it went live . The only difference that I can see is that the test setup usually consisted of one, or maybe two, concurrent nat connections. And neither were generating much in the way of actual traffic. Whereas the live load was much greater both in connection numbers and in data volume.
 
I am resurrecting this thread, given that I have returned to the problem.

We have a solution, which is to return to FreeBSD-11.2p9 from 12.0p3. The problem does not surface on 11.2,

We still have the problem of SSH traffic routed on the LAN being blocked by the default drop rules despite the fact that the client and the server are talking on the same internal network (192.168.216.0/24) and that the gateway is set up respond to 192.168.0.0/16. If we 'skip' em0 then ssh connections succeed and if we do not skip em0 then they fail. We have quick rules to permit in and out on the internal network interface for all local addresses but they are never triggered, for reasons that escape me.

For example: ssh from 192.168.216.44 to 192.168.216.31 gets this treatment:

Code:
00:00:00.020825 rule 5/0(match):
block in on em0: 192.168.216.31.22 > 192.168.216.44.64123: Flags [S.],
seq 283093415, ack 1513496196, win 65535,
options [mss 1440,nop,wscale 6,sackOK,TS val 956747543 ecr 3683277312],
length 0

despite all these rules, none of which seem to apply:
Code:
# rc.conf
. . .
ifconfig_em0_alias6="inet 192.168.0.1/16"
. . .
Code:
# pf.conf
. . .
nat               log   on $ext_if \
                  from  192.168.8.0/24 \
                  to    any -> ($ext_if:0)
. . .
pass                    quick on $int_if \
                  from  $int_if:network \
                  to    $int_if:network
. . .
pass                    quick on $int_if \
                  from  { self 192.168.0.0/16 216.185.71.0/25 } \
                  to    { 192.168.0.0/16 216.185.71.0/25 }
. . .
pass              log   quick proto { tcp } \
                  from  { $int_if:network } \
                  to    { $int_if:network } port  $port_ssh \
                  $TcpState $SshSTO
. . .

If somebody could explain to me why none of these rules apply and why the ssh server reply to the client gets blocked then I would appreciate it.
 
Last edited by a moderator:
What is rule 5 (the one that is blocking it)?

pfctl -vv s rules

Is it possible the software you are using is tripping the overload rule you've got in SshSTO, such that the states are flushed, and what was an active connection (in a state table) is no longer recognized as one? Is an address showing up in BLOCKTEMP? pfctl -t BLOCKTEMP -T show when the problem is occurring? How do you flush out BLOCKTEMP? I assume there is a rule somewhere using it?

In general, rather than starting with Calomel's optimized/tweaked ruleset, you should start with a simple one ( block in log) and add only the things as you need as you find them. You've got lots of extra options (you likely don't need to play with flags, for example) that add to the debugging complexity.
 
Back
Top