IPFW How to pair in-kernel NAT with check-state / keep-state

supportsobaka · Apr 20, 2018

Can't pair in-kernel NAT with real stateful ipfw. All 'seems-solved' examples I have seen online are in fact not safe.

The following set I have now works well allowing access web ('telnet google.com 80') both from jails and from the host system. (I'm not talking about access web on my box):

Code:

00220  nat 3 tcp from 10.1.1.11 to any dst-port 80,443 out via em0 #out of jail
00230  allow tcp from a.b.c.d to any dst-port 80,443 out via em0 #when leave NAT we have external IP, so allow it, leave ipfw here
00240  nat 3 tcp from any 80,443 to a.b.c.d in via em0 #return packet must be NATed again
00250  allow tcp from any 80,443 to 10.1.1.11 in via em0 #after NAT we are "to internal", so allow it, leave ipfw here, everything fine in jail now
00260  allow tcp from any 80,443 to a.b.c.d in via em0 #this is return for the host, when 'telnet example.com 80' issued from the host
65535  deny ip from any to any

But it's not safe. Anyone who pretends to be a 80 or 443 port will get access to any filtered ports on the system.

I figured out the following set. But this set of rules causes freeze of my box so that I need reboot it!

Code:

00220 skipto 3000 tcp from any 80,443 to a.b.c.d in via em0
00230 nat 3 tcp from 10.1.1.11 to any dst-port 80,443 out via em0 keep-state :accesswebint
00240 allow tcp from a.b.c.d to any dst-port 80,443 out via em0
03000 nat 3 tcp from any 80,443 to a.b.c.d in via em0
03010 check-state :accesswebint
65535  deny ip from any to any

My understanding is the following:

230 will get me out of jail and create dynamic rule for ports "80 - someport" and addresses "10.1.1.10 - someIP".
240 will get this packet out of ipfw to the world.
Return packet will hit 220 that allow bypass rule 230 that would otherwise trigger wrong dynamic rule because it has keep-state, but we need to nat this packet back to the jail first.
So 3000 will nat it and now I expected that 3010 will allow this paket, because of the created dynamic rule:

Code:

00230  1    60 (18s) STATE tcp 10.1.1.11 60491 <-> 172.217.21.174 80 :accesswebint

But as I said this set causes system freeze so that only reboot helps.

What's wrong with my understanding?

obsigna · Apr 20, 2018

see if this posts help:
https://forums.freebsd.org/threads/ipfw-share-internet.62149/post-358758
https://forums.freebsd.org/threads/substitute-external-address.60718/#post-349379

PMc · Apr 21, 2018

supportsobaka said:
I figured out the following set. But this set of rules causes freeze of my box so that I need reboot it!

Code:

00220 skipto 3000 tcp from any 80,443 to a.b.c.d in via em0 00230 nat 3 tcp from 10.1.1.11 to any dst-port 80,443 out via em0 keep-state :accesswebint 00240 allow tcp from a.b.c.d to any dst-port 80,443 out via em0 03000 nat 3 tcp from any 80,443 to a.b.c.d in via em0 03010 check-state :accesswebint 65535 deny ip from any to any

My understanding is the following:

230 will get me out of jail and create dynamic rule for ports "80 - someport" and addresses "10.1.1.10 - someIP".
240 will get this packet out of ipfw to the world.
Return packet will hit 220 that allow bypass rule 230 that would otherwise trigger wrong dynamic rule because it has keep-state, but we need to nat this packet back to the jail first.
So 3000 will nat it and now I expected that 3010 will allow this paket, because of the created dynamic rule:

Is it allowed to use keep-state on a "nat" rule?
And: from what I remember, a check-state will do the action (allow or deny) from the rule where the keep-state was set. So in this case, it might repeat the nat. I ran into this problem when trying to use pipe with keepstate - that gave me loops - it might be similar in Your case.

ipfw() says:
check-state [:flowname | :any]
Checks the packet against the dynamic ruleset. If a match is found, execute the action associated with the rule which generated this dynamic rule, otherwise move to the nextrule.

I think this is Your problem - You repeat the nat on check-state (with whatever undefined result).

Then, concerning Your general approach: when I built that, I found handling jails rather difficult. But at that time there were no labelled keepstates, and nat worked with divert. It may be simpler today. But in any case, it is possible.

I found it helpful to strictly separate all incoming and outgoing traffic right at the beginning, to work on these separately (obviousely this needs net.inet.ip.fw.one_pass=0):

Code:

# in/out separation:
add 9999 skipto 30000 all from any to any out

Then, there is a difficulty with jails and nat, because there is no proper place to do keep-state. We need to do keep-state before the nat (so that we can check-state the answers after the nat). Therefore, we need an allow rule before the nat - but if we allow, we don't reach the nat rule anymore.

For the local machine itself (not the jails) we can do keep-state after the nat, because nat should not change it.
For the other connected machines on the LAN the solution is simple: we do the allow + keep-state when the packet is incoming from LAN, while the nat then handles the packet when outgoing to WAN.

But traffic from jails is never incoming, and nevertheless needs nat - so I decided to make it incoming: I created an artificial loopback interface with netgraph(*), and then push all traffic originating from jails onto that interface first:

Code:

add 101 skipto 110 all from any to any out recv $echoif
add 102 fwd $echoip all from $jail1_ip to not $lanall out jail 1
add 103 fwd $echoip all from $jail2_ip to not $lanall out jail 2
add 104 fwd $echoip all from $jail3_ip to not $lanall out jail 3
...

So, when it comes back from the $echoif, then it is incoming, and then an allow keep-state can be placed. When then it reappears outgoing, nat does it's work. And finally the reply from the WAN will be handled by check-state as normal.

(*) There is another little problem with this: the kernel will not return a packet into the firewall if it didn't leave kernel space. So an ng_echo() did not work, and I had to get the stuff out to user space before returning it - really crude, but works:

Code:

echo "preparing interface for jail loopback"
# We need a tempfile to get two commands to ngctl, so to get a chance
# to grab the new device and give it a name:
TEMPFILE=/tmp/makeloop.$$
echo "mkpeer iface crhook inet" > $TEMPFILE
echo "name .:crhook jloopif" >> $TEMPFILE
ngctl -f $TEMPFILE
rm $TEMPFILE
ngctl mkpeer jloopif: device inet inet
ngctl name jloopif:inet jloopdev

# fetch the name of the iface
iface=`ngctl msg jloopif: getifname | \
        awk '$1 == "Args:" { print substr($2, 2, length($2)-2)}'`

# ng_device cannot tell us its devname (would need C code). Lets hope
# this is the first of its kind...
cat /dev/ngd0 > /dev/ngd0 &
sleep 1
ifconfig $iface inet 192.168.4.1 netmask 0xffffffff 192.168.4.2
ifconfig $iface up

The whole approach is ugly, and I know that.

If anybody did come up with a more elegant, or a more modern solution (I built this in ~2004), I am eager to learn.

supportsobaka · Apr 21, 2018

obsigna said:
see if this posts help:
https://forums.freebsd.org/threads/ipfw-share-internet.62149/post-358758

Unfortunately, not. I've seen lots of similar examples. The theory is correct, but template is not safe.
This last rule "65534 allow ip from any to any" will pass all the traffic to any ports on the box even if it is not allowed by rules. Yes, they first go through the NAT. Once you remove that rule, nothing will work.

supportsobaka · Apr 21, 2018

PMc

>Is it allowed to use keep-state on a "nat" rule?
At least it creates dynamic rule.

>If a match is found, execute the action associated with the rule which generated this dynamic rule, otherwise move to the nextrule.

Yes, I've been thinking about this last few days trying to understand what *exactly* it means for the NAT and I think I'm already somewhere on the right way. In fact I did something that works, but need more time for thorough tests with nmap before accept it as the right solution.

>I think this is Your problem - You repeat the nat on check-state (with whatever undefined result).

Correct, already got the reason of the problem. I never used check-state, but only keep-state before, so I thought check-state was a kind of allow itself that will not back the packet again to the rule where the first keep-state was set. Maybe it is because of language barrier, but that's how I read manpage where it says "execute the action associated with".

>But at that time there were no labelled keepstates

Again correct. Seems labels helped me too.

supportsobaka · Jun 22, 2018

Code:

${ipfw} add 7000 nat 3 tcp from any 80,443 to ${IpExternal} in via ${LanExternal}
##${ipfw} add 7010 skipto 7020 tcp from ${JailsNET} to any 80,443 out via ${LanExternal} ##setup keep-state :downloadfromjail  
${ipfw} add 7020 nat 3 tcp from ${JailsNET} to any 80,443 out via ${LanExternal}  setup keep-state :downloadfromjail               
${ipfw} add 7030 allow tcp from ${IpExternal} to any 80,443 out via ${LanExternal} setup keep-state :downloadfromhost        
${ipfw} add 7040 allow tcp from any 80,443 to ${JailsNET} in via ${LanExternal}

I was mistaken in assumption that I did the correct solution. In fact I did just static firewall. If I remove 7040, nothing works about jails (LAN), but everything is fine for host. I don't know how it passed my tests before (maybe I missed that I'm testing not from jail), but now I clearly see that if I remove keep-state check then it works too, because of 7040.

Also 7010 is useless.

PMc said:
The whole approach is ugly, and I know that. If anybody did come up with a more elegant, or a more modern solution (I built this in ~2004), I am eager to learn.

There is no easy solution with what ipfw offers for us now. The core of the issue is definitely here:

PMc said:
Then, there is a difficulty with jails and nat, because there is no proper place to do keep-state. We need to do keep-state before the nat (so that we can check-state the answers after the nat). Therefore, we need an allow rule before the nat - but if we allow, we don't reach the nat rule anymore.

You explained it very well.

Provided, that there is truly no any nice solution with ipfw, do you think it is possible to request some feature to solve this? Labels don't help about NATing, but they solve other things well.

My idea is to have a kind of skipto for check-state that will point which one rule to use instead of the rule that generated the dynamic rule. Or maybe just some argument for check-state (like they added labels), which will tell that "please treat that dynamic rule this way, instead of looking to the rule that was generated it". Something like:

Code:

check-state :downloadfromjail  :allow

where first argument is the current label that we have, and second is an optional action (if not set, then use the rule that was generated this dynamic rule)

supportsobaka · Jun 22, 2018

Then the following elegant construction will work for both NAT (jails/LAN) and host and will help thousands who are searching how to solve the problem that currently doesn't have any non-ugly solution:

Code:

${ipfw} add 7000 nat 3 tcp from any 80,443 to ${IpExternal} in via ${LanExternal} 
${ipfw} add 7010 check-state :downloadfromjail :allow
${ipfw} add 7020 nat 3 tcp from ${JailsNET} to any 80,443 out via ${LanExternal}  setup keep-state :downloadfromjail               
${ipfw} add 7030 allow tcp from ${IpExternal} to any 80,443 out via ${LanExternal} setup keep-state :downloadfromhost         
##REMOVED##${ipfw} add 7040 allow tcp from any 80,443 to ${JailsNET} in via ${LanExternal}

This would be real safe statefull ipfw solution.

A dreamer?

jef · Jun 22, 2018

I manage the challenges of IPv4 NAT with keep-state of count tag for IPv4 just prior to acceptance of the packet on output. This lets me tag the packet as "response to packet already sent" (out of a particular interface) when it arrives and make decisions based on that tag. It works on both "in" as well as "out", both before and after the NAT.

The tricky/ugly part is that on check-state, rule evaluation will continue after the declaration of the keep-state rule number, not the check-state rule number. This led me to keep-state on a call followed by a branch on "in" with a skipto action (and associated rule-numbering changes as skipto can only be "forward" in the rule set).

supportsobaka · Jun 23, 2018

jef
I couldn't figure out with neither skipto or count. The problem that we need allow on evaluation of check-state, not skipto or count again. Your are somewhere with your "forward" loop. Do you have an example?

But it is the fact that ipfw will not provide a smart way to manage statefull NAT without implementation of a new logic/features. With a box having 50+ NATed jails any ugly/tricks will lead to headache in maintenance and the box will be only firewalling instead of doing its direct functions.

We absolutely need to have a facility to override an action on check-state evaluation. This will solve all problems.

jef · Jun 24, 2018

Can't "allow" as one way or the other with NAT in there, the address is "wrong"

Here's a sketch of the approach -- by tagging on receipt, you make can make choices based on the tag, rather than just the addressing of the packet.

call MMM looks like:

accept out // to "outside"
count tag ${return_packet} in
return
drop all from any to any // overrun protection
# and the "next" rule declared is
skipto NNNNN

Connection from "inside":

* in
* check packet sanity
* check packet "permission"
* allow in

* out
* potentially NAT
* call MMM keep-state :ifaceID-outside
* accept out (to "outside")
* (return never reached)

Return packet from "outside"

* in
* check packet sanity
* check-state :ifaceID-outside
* "match" calls MMM
* count tag ${return_packet}
* return
* skipto NNNNN
* "no match" -- pre-filter incoming requests that aren't established
* NNNNN nat K proto ip4 in recv ifK
* accept in tagged ${return_packet}
* post-filtering of incoming requests
* which are perhaps accepted in

* out
* potentially NAT
* call MMM keep-state :ifaceID-inside
* accept out (to "inside")
* (return never reached)

supportsobaka · Jun 25, 2018

jef said:
Here's a sketch

Thanks!
You encouraged me to post a feature request

and continue use stateless NAT meanwhile, which is of course pity because of its poor security.
But I think you agree that above is rather workaround than solution.

Having such a set of rules for a busy box with dozens jails and different services is not a good idea. Statefull NAT shouldn't be that ugly.

jef · Jun 25, 2018

Meh, while something like Linux's conntrack is syntactically simpler, if anything that functionality is more complex under the covers. (I am making assumptions based on functionality as I haven't examined the source due to GPL licensing.) True, it handles some of the userland applications and protocols like old-school FTP, IRC, and various inbound streaming protocols, but I don't run any of those on my servers and have never had a problem with others' servers being unable to provide variants of the services to clients within my FreeBSD-based firewalls.

tagged ${return_packet} is basically the equivalent of ct state related (in nftables syntax).

That said, something equivalent to conntrack would be far more welcome for me than syntatic sugar in ipfw

Angelo Klin · Jun 28, 2018

jef said:
Can't "allow" as one way or the other with NAT in there, the address is "wrong"

Here's a sketch of the approach -- by tagging on receipt, you make can make choices based on the tag, rather than just the addressing of the packet.

call MMM looks like:

accept out // to "outside"
count tag ${return_packet} in
return
drop all from any to any // overrun protection
# and the "next" rule declared is
skipto NNNNN

Connection from "inside":

* in
* check packet sanity
* check packet "permission"
* allow in

* out
* potentially NAT
* call MMM keep-state :ifaceID-outside
* accept out (to "outside")
* (return never reached)

Return packet from "outside"

* in
* check packet sanity
* check-state :ifaceID-outside
* "match" calls MMM
* count tag ${return_packet}
* return
* skipto NNNNN
* "no match" -- pre-filter incoming requests that aren't established
* NNNNN nat K proto ip4 in recv ifK
* accept in tagged ${return_packet}
* post-filtering of incoming requests
* which are perhaps accepted in

* out
* potentially NAT
* call MMM keep-state :ifaceID-inside
* accept out (to "inside")
* (return never reached)

Hello jef,

If I got you right, you are tagging packages on the way out and verifying them when they come back.
If that is the case, then there is a problem as tags are lost after they leave the kernel.
From ipfw(8):

Code:

tag number
  ...
  Tags are kept with the packet everywhere within the kernel, but are lost when packet 
leaves the kernel, for example, on transmitting packet out to the network or sending 
packet to a divert(4) socket.

Would you have a template of sample code you could share to help the understanding?

Thanks and Regards

jef · Jun 28, 2018

In short, the "trick" is to keep-state on the call rather than the accept when a NAT-ed packet leaves the host and stores the tuple in the state table at that point. You've already decided before the call to accept -- the call action replaces the typical accept keep-state action. On the way out, the call accepts the packet (checking "out"). When a packet returns, the corresponding check-state makes the call again. Within the call, "in" is seen and the packet is tagged, the return executes, and you now have a tag on the packet that indicates that it is associated with an "existing packet flow". The tag persists through one or more NAT modifications, both on the "in" pass, as well as, if forwarded, on the "out" pass.

With all the attendant caveats that you are responsible for your own security and that what follows is supplied without any warranty of any sort, the following snippets outline one approach for implementing this kind of tagging.

This is not a complete rule set and provides no security if one were to use this in a running system.

Code:

${ipfw} add 5 set ${rs} reass via any

#
# Dispatch first on where the packet is in the process
#

# ether_demux or bdg_forward
${ipfw} add 91 set ${rs} skipto 1000 layer2 in

# ip_input
${ipfw} add 92 set ${rs} skipto 7000 not layer2 in

# ip_output -- forwarded
${ipfw} add 93 set ${rs} skipto 3000 not layer2 out recv '*'

# ip_output -- self-generated
${ipfw} add 94 set ${rs} skipto 3500 not layer2 out // not recv '*'

# ether_output_frame -- forwarded / bridged
${ipfw} add 95 set ${rs} skipto 5000 layer2 out recv '*'

# ether_output_frame -- self-generated
${ipfw} add 96 set ${rs} skipto 6000 layer2 out // not recv '*'

# FreeBSD 11.1 man ipfw:
# (yes, at the moment there is no way to differentiate
# between ether_demux and bdg_forward)

${ipfw} add 99 set ${rs} deny log via any \
    // first-stage dispatch problem

#
# This can be declared "anywhere"
#

######################################################
# call 400 -- IPv4 keep-state / check-state handling #
######################################################

#
# Yes, tagging on "out" serves no function
# rules 410 and 420 could be swapped -- this is just the way it evolved
#

${ipfw} add 410 set ${rs} count tag ${tag_ip4_outer} proto ip4

${ipfw} add 420 set ${rs} allow proto ip4 out

${ipfw} add 430 set ${rs} return proto ip4 in // then send IPv4 to NAT

${ipfw} add 499 set ${rs} deny log all from any to any \
    // IPv4 state tagging overrun


#
# output needs to be declared lower in rule number than input
# since the check-state executes the rule of the matching keep-state
# and the return action on "in" packets will return to the rule following
# skipto only goes forward in rule numbers, so rules around "in"
# need to follow that call keep-state action that is executed on "out"
#

###################################
# 3000s -- ip_output -- forwarded #
###################################

${ipfw} add 3000 set ${rs} count all from any to any \
    // ip_output -- forwarded

# "Related" IPv4 should already be tagged from ip_input phase

${ipfw} add 3004 set ${rs} skipto 4000 tagged ${tag_ip4_outer} \
    // IPv4 established, ip_output, forward

###
### Here: Insert rules to manage forwarded packets
### skipto 4000 for the ones that are acceptable
###

${ipfw} add 3499 set ${rs} deny log all from any to any \
    // ip_output -- forwarded -- DENY remaining


########################################
# 3500s -- ip_output -- self-generated #
########################################

${ipfw} add 3500 set ${rs} count all from any to any \
    // ip_output -- self-generated

###
### Here: Insert rules to manage packets generated on this host
### skipto 4000 for the ones that are acceptable
###

${ipfw} add 3999 set ${rs} deny log all from any to any \
    // ip_output -- self-generated -- DENY remaining


#######################################
# 4000s -- ip_output -- common output #
#######################################

${ipfw} add 4000 set ${rs} count all from any to any \
    // ip_output -- common output

#
# Assumption is that by the time this point is reached, it's OK
# at least in the pre-NAT format
#

#
# IPv4, clear the tag first as will use it later to allow them
#

${ipfw} add 4200 set ${rs} count untag ${tag_ip4_outer} proto ip4

${ipfw} add 4204 set ${rs} nat 1 proto ip4 xmit ${outside_interface}

###
### Here: Insert post-NAT rules here drop everything *except* what is OK
### since the "call 400" is effectively an "accept" action
###

${ipfw} add 4301 set ${rs} call 400 keep-state :IP4_TAG_OUTER_outside \
    xmit ${outside_interface} proto ip4
#
# IPv4 output will have been accepted in the 400s
# check-state on :IP4_TAG_OUTER_outside executes as rule 4301
# so when the "return" action is executed it continues from this point
#
# If tagged ("related") my choice is to have the flow skip to the NAT at 7501
# If and how much of your input chain you wish to skip is a personal choice
#

${ipfw} add 4400 set ${rs} skipto 7500 tagged ${tag_ip4_outer} ip4 in

${ipfw} add 4402 set ${rs} drop log tagged ${tag_ip4_outer} ip4 out \
    // Never should get here

${ipfw} add 4999 set ${rs} deny log all from any to any \
    // ip_output -- common output -- DENY remaining

#####################
# 7000s -- ip_input #
#####################

# Note: These have been pushed to 7000s since the check-state
#       will execute in the 4000s and needs to skipto from there

###
### Here: Insert rules to block "bad" packets,
### even from established connections
###

#
# For tagged ("related") IPv4, skip the remaining checks and proceed to NAT
#

${ipfw} add 7104 set ${rs} check-state :IP4_TAG_OUTER_outside \
        recv ${outside_interface} proto ip4

###
### Here: Insert rules to check "new" connections, ICMP, ... prior to NAT,
### dropping "unacceptable" packets
###

${ipfw} add 7501 set ${rs} nat 1 proto ip4 recv ${outside_interface}

${ipfw} add 7510 set ${rs} allow tagged ${tag_ip4_outer} \
    // \"established\" IPv4 connections, post-NAT

###
### Here: Insert rules to allow desired incoming connections
###

${ipfw} add 7999 set ${rs} deny log all from any to any \
    // ip_input -- DENY remaining

IPFW How to pair in-kernel NAT with check-state / keep-state

Profile disabled