Solved losing access to (some jailed) sshd after pfctl -F rules

Hey guys,
I am currently experiencing a very strange behavior and I've got no more ideas:

Situation:
  • FreeBSD server running 13.0-RELEASE
  • several (bastille) jails running 13.0-RELEASE
  • some older (bastille) jails running 12.4-RELEASE
  • SSH access to host (x.x.0.0/24 network) works
  • SSH access to all jails (x.x.6.0/23 network on the same physical NIC as the host) works
  • LAN interface is lagg0 <-- all jails go here
Pattern:
Reloading the pf firewall with pfctl -f /etc/pf.conf -F rules or pfctl -f /etc/pf.conf -F all leads to the following situation

Error picture:
  • Host:
    • access to all services
  • 13.0-jails:
    • SSH access to the jail from host/LAN clients is TIMEOUT (see below for more details on the SSH situation)
      • even for 13.0-jails created after the pf reload mentioned above
    • SSH access from the jail to some LAN client (not the host) works
    • SSH access from the jail to the host does NOT work (timeout)
    • SSH access from the jail to the 12.4-jail does NOT work (timeout)
    • Access to all/from other services (HTTPS, DNS, etc.) still works
  • 12.4-jail:
    • SSH access to jail from host/LAN clients still works
  • Access to all other services on all jails still works (so seems no routing, firewall problem)
  • tshark shows NO blocked packets on the pflog0 interface
  • tshark on lagg0 interface shows just the initial SSH packet and then waiting ...
  • Jail: No entries or errors in /var/log/messages, /var/log/auth.log, /var/log/debug.log - even when setting "LogLevel DEBUG3" in /etc/ssh/sshd_config
client trying to access
Code:
user@client:[~]0$ ssh -Tv user@x.x.6.27
OpenSSH_8.8p1, OpenSSL 1.1.1m  14 Dec 2021
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to x.x.6.27 [x.x.6.27] port 22.
debug1: Connection established.
debug1: identity file /home/user/.ssh/id_rsa type 0
debug1: identity file /home/user/.ssh/id_rsa-cert type -1
debug1: identity file /home/user/.ssh/id_dsa type -1
debug1: identity file /home/user/.ssh/id_dsa-cert type -1
debug1: identity file /home/user/.ssh/id_ecdsa type -1
debug1: identity file /home/user/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/user/.ssh/id_ecdsa_sk type -1
debug1: identity file /home/user/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /home/user/.ssh/id_ed25519 type -1
debug1: identity file /home/user/.ssh/id_ed25519-cert type -1
debug1: identity file /home/user/.ssh/id_ed25519_sk type -1
debug1: identity file /home/user/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /home/user/.ssh/id_xmss type -1
debug1: identity file /home/user/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.8
and then the connection timeouts


tshark
Code:
  148 3273.720887483   x.x.0.19 → x.x.6.27   TCP 74 57186 → 22 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=3795582793 TSecr=0 WS=128
  149 3273.720920821   x.x.6.27 → x.x.0.19   TCP 58 22 → 57186 [SYN, ACK] Seq=0 Ack=1 Win=0 Len=0 MSS=1460
  150 3273.721031844   x.x.0.19 → x.x.6.27   TCP 60 57186 → 22 [ACK] Seq=1 Ack=1 Win=64240 Len=0
  151 3273.926837596   x.x.0.19 → x.x.6.27   TCP 60 [TCP Keep-Alive] 57186 → 22 [ACK] Seq=0 Ack=1 Win=64240 Len=0
  152 3274.346844154   x.x.0.19 → x.x.6.27   TCP 60 [TCP Keep-Alive] 57186 → 22 [ACK] Seq=0 Ack=1 Win=64240 Len=0
  153 3275.173162155   x.x.0.19 → x.x.6.27   TCP 60 [TCP Keep-Alive] 57186 → 22 [ACK] Seq=0 Ack=1 Win=64240 Len=0
x.x.0.19 = LAN client trying to access SSH in the jail
x.x.6.27 = on of the 13.0 jails

I managed to temporarily "fix" this behavior by doing the brute force approach:
Code:
bastille stop ALL <-- stops all jails and removes IP aliases
service netif restart
service routing restart
pfctl -n -f /etc/pf.conf && pfctl -F all -f /etc/pf.conf
bastille start ALL

Does someone have some pointers for me?
I've got no more ideas as it seems to just drop the SSH availability on the ifconfig alias.
 
if you don't have any specific pf rules for ssh and everything else works it's probably dns
sshd tries to reverse resolve the client and takes a long time
host someclientip from onesshdbox
 
Bastille rdr works by adding rules to an anchor in PF. A pfctl -F all or pfctl -F rules would clear those anchors too.

And please show us your pf.conf. It's going to be difficult to guess where the issue is if we don't know how your rules are set up.
 
Bastille rdr works by adding rules to an anchor in PF. A pfctl -F all or pfctl -F rules would clear those anchors too.

And please show us your pf.conf. It's going to be difficult to guess where the issue is if we don't know how your rules are set up.
Thanks for these extremely fast reponses. It's not an DNS issue as I tried with the IPs just to rule that out ;-)

Ah I see. My pf.conf is quite long, but I was not aware of bastille actively using the anchors. As you mention it, bastille shows the pf messages when you work with the jails, but I just ignored them. I will investigate further into that direction.
 
Also look out for rules that use interfaces. For example pass in on $ext_if from any to $ext_if. That second $ext_if is interpreted when PF loads the ruleset. If the IP addresses change on the interface after that point PF isn't going to notice. You should use pass in on $ext_if from any to ($ext_if) instead. The ($ext_if) will make sure PF dynamically interprets the IP addresses on that interface and it'll notice the IP address changes or addresses get added/removed.
 
you can't try that
sshd tries to resolve 192.168.77.55 or whatever the client ip is
if that does not resolve and takes a long time to timeout ....
Understood. But as far as I know all DNS resolution (forward and reverse) works.
 
Also look out for rules that use interfaces. For example pass in on $ext_if from any to $ext_if. That second $ext_if is interpreted when PF loads the ruleset. If the IP addresses change on the interface after that point PF isn't going to notice. You should use pass in on $ext_if from any to ($ext_if) instead. The ($ext_if) will make sure PF dynamically interprets the IP addresses on that interface and it'll notice the IP address changes or addresses get added/removed.
Ah, there's something ;-) I tried the rule from the bastille manual pf.conf
Code:
nat on $int_if      from <bastillejails> to any     -> ($int_if)
gives me a syntax error when trying it with "(..)" and pfctl -n -f /etc/pf.conf. And this may be the reason why I had the following (SSH) rule without the parentheses.
Code:
pass in quick on $int_if proto tcp from <mynets> to $int_if port ssh $SynState

OK, so I found the reason for the syntax error. I had int_if="{ lagg0 }" instead of int_if="lagg0" in /etc/pf.conf. Because of that the "(..)" couldn't work.
 
The problematic line of my /etc/pf.conf was the SSH access rule. I had the interface declaration wrong, the parentheses missing, and the flags/modulate state wrong.
The correct one is now, based on the bastille manual:
Code:
int_if="lagg0"
pass in quick on $int_if proto tcp from <mynets> to ($int_if) port ssh flags S/SA modulate state
Interestingly that worked for years before emerging.

Thanks for this excellent support, as always!
 
Back
Top