Vnet jail with IPFW NAT outbound traffic no longer works after upgrade from 12.2-RELEASE to 13.0-RELEASE

Hi,

I'm new to FreeBSD (only started tinkering about with it last week), and after lots of digging through the documentation, handbook, and many other online resources I managed to have my vnet(9) enabled jail(8) working in conjunction with ipfw(8) in-kernel NAT. Both inbound and outbound traffic was working properly within the jail on FreeBSD 12.2-RELEASE-p7.

After upgrading to FreeBSD 13.0-RELEASE-p3 - without touching anything else - outbound traffic from the jail stopped working. Inbound traffic is still working fine. I read through the release notes for 13.0-RELEASE to see if there are any relevant backward incompatible changes listed and figured that the routing stack rewrite may have something to do with it, although I can't seem to figure out how it is affecting my NAT setup.

This is a VirtualBox VM running on a Windows 10 host.

/etc/ipfw.rules
Code:
#!/bin/sh

cmd="ipfw -q add"
skip="skipto 1000"
pif=em0
ks="keep-state"

# Delete all rules
ipfw -q -f flush

# Disable one_pass and setup NAT with port forwarding
ipfw disable one_pass
ipfw -q nat 1 config if $pif same_ports unreg_only reset \
  redirect_port tcp 10.0.0.10:80 80


$cmd 010 allow all from any to any via lo0
$cmd 099 reass all from any to any in

# Inbound NAT
$cmd 100 nat 1 ip from any to any in via $pif

# Statefull firewall
$cmd 101 check-state

# SSH
$cmd 110 allow tcp from any to me ssh setup $ks

# ICMP
$cmd 111 allow icmp from any to any

# Outbound NAT
$cmd 112 $skip tcp from any to any out via $pif setup $ks
$cmd 113 $skip ip from any to any out via $pif $ks

$cmd 999 deny log all from any to any
$cmd 1000 nat 1 ip from any to any out via $pif
$cmd 1001 allow ip from any to any

# Deny the rest
$cmd 65500 deny ip from any to any

/etc/jail.conf
Code:
# Global settings applied to all jails.
path = "/zroot/jails/$name";
devfs_ruleset = 4;

# VNET & Bridge
exec.clean;
vnet;
vnet.interface  = "epair${epair}b";
exec.prestart   = "ifconfig bridge0 > /dev/null 2> /dev/null || ( ifconfig bridge0 create && ifconfig bridge0 addm em0 && ifconfig bridge0 up )";
exec.prestart  += "ifconfig epair${epair} create up";
exec.prestart  += "ifconfig bridge0 addm epair${epair}a";

# Standard recipe
exec.start      = "/bin/sh /etc/rc";
exec.start     += "ifconfig epair${epair}b inet ${ipv4}";
exec.start     += "route add default ${gw4}";
exec.stop       = "/bin/sh /etc/rc.shutdown";
exec.poststop   = "ifconfig bridge0 deletem epair${epair}a";
exec.poststop  += "ifconfig epair${epair}a destroy";
exec.consolelog = "/var/log/jail_${name}_console.log";
mount.devfs;

# Per-jail settings
nginx {
    host.hostname = "nginx";
    $ipv4 = "10.0.0.10/24";
    $gw4 = "10.0.0.1";
    $epair = "0";
}

/etc/rc.conf
Code:
clear_tmp_enable="YES"
syslogd_flags="-ss"
sendmail_enable="NONE"
hostname="freebsd.lan"
ifconfig_em0="DHCP"
ifconfig_em0_ipv6="inet6 accept_rtadv"
sshd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"
jail_enable="YES"
gateway_enable="YES"
firewall_enable="YES"
firewall_script="/etc/ipfw.rules"
firewall_nat_enable="YES"

ifconfig
Code:
em0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4810099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,VLAN_HWFILTER,NOMAP>
        ether 08:00:27:9f:6c:2f
        inet6 fe80::a00:27ff:fe9f:6c2f%em0 prefixlen 64 scopeid 0x1
        inet6 fd57:ff64:a85a:0:a00:27ff:fe9f:6c2f prefixlen 64 autoconf
        inet 10.0.0.225 netmask 0xffffff00 broadcast 10.0.0.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
        inet 127.0.0.1 netmask 0xff000000
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 58:9c:fc:10:ff:b9
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 4 priority 128 path cost 2000
        member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 1 priority 128 path cost 20000
        groups: bridge
        nd6 options=9<PERFORMNUD,IFDISABLED>
epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:3a:96:e3:ce:0a
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

What am I missing here?

Thanks in advance!
 
In a spray 'n pray type of attempt to figure out what is happening I basically tried all possible interfaces for outbound traffic coming from the jail in my IPFW NAT configuration, to no avail either.

At this point I'm starting to think it may not be related to IPFW after all, although I wouldn't know where else to look. Any input is highly appreciated.
 
Does anyone have jails with VNET/NAT working in 13.0? I'd imagine I can't possibly be the only one attempting to do this, so I was thinking if someone who managed to get it to work could share their config, I could compare the two and maybe figure out the cause that way - even if not using IPFW.
 
sysctl net.link.bridge.inherit_mac

Is it set to 0 or 1? I have never been able to get a bridge to behave properly in a VM if this is set to 0. Try setting it to 1 and then service netif restart. The bridge "should" now inherit em0's mac address. We'll see if that makes a difference.
 
sysctl net.link.bridge.inherit_mac

Is it set to 0 or 1? I have never been able to get a bridge to behave properly in a VM if this is set to 0. Try setting it to 1 and then service netif restart. The bridge "should" now inherit em0's mac address. We'll see if that makes a difference.

This is what I'm getting:

sysctl: unknown oid 'net.link.bridge.inherit_mac'

Here are examples for all three firewalls:

Thanks, though I'm still not quite able to figure out where the issue resides.

The setup was working perfectly fine on 12.2, so I'm just wondering what changed in 13.0 that broke it. Documentation on this subject seems sparse in general, let alone for the newest FreeBSD release.
 
Thomas, can you run an ifconfig from the host just to verify again that your bridge is currently up and running? I tried to reproduce the "sysctl: unknown oid" error message you received above, but was only able to reproduce it by shutting the bridge down and then checking the value of the net.link.bridge.inherit_mac setting.
 
Also I configure my bridge in the rc.conf of the host, but add the members in jail.conf. I noticed you do both of those things in jail.conf. Might not make a difference but it might be something worth trying.
 
I use a very similar jail setup to yours (actually, mine is more complicated - I have a segregated VNET jail LAN and I dial in via OpenVPN into it).
13.0-RELEASE works for me just fine, I had to do only 1 modification.

Just a shot in the dark, can you change devfs_ruleset = 4; in your jail.conf to devfs_ruleset = 5; ? Does it work? In 13.0 they added an extra section in devfs.rules specially for vnet jails.

Also - did you upgrade BOTH jails and host, or only the host? You need your jail to use the same OS as the running kernel, otherwise you might experience hard to explain problems.

I tested your configuration with 13.0-RELEASE and the connections work fine.
Please post also the contents of /etc/sysctl.conf, /etc/defaults/devfs.rules, the output of kldstat and the output of jexec nginx ipfw list.
 
Thomas, can you run an ifconfig from the host just to verify again that your bridge is currently up and running? I tried to reproduce the "sysctl: unknown oid" error message you received above, but was only able to reproduce it by shutting the bridge down and then checking the value of the net.link.bridge.inherit_mac setting.

For a moment I thought I had broken the universe as the bridge was no longer showing up in ifconfig either. Turns out I was being silly and accidentally started the wrong VM (I cloned the 12.2-RELEASE VM before the upgrade to 13.0-RELEASE).

Firing up the correct VM, sysctl net.link.bridge.inherit_mac was reporting 0. I've set net.link.bridge.inherit_mac="1" but unfortunately the issue persists.

I use a very similar jail setup to yours (actually, mine is more complicated - I have a segregated VNET jail LAN and I dial in via OpenVPN into it).
13.0-RELEASE works for me just fine, I had to do only 1 modification.

Just a shot in the dark, can you change devfs_ruleset = 4; in your jail.conf to devfs_ruleset = 5; ? Does it work? In 13.0 they added an extra section in devfs.rules specially for vnet jails.

Also - did you upgrade BOTH jails and host, or only the host? You need your jail to use the same OS as the running kernel, otherwise you might experience hard to explain problems.

I tested your configuration with 13.0-RELEASE and the connections work fine.
Please post also the contents of /etc/sysctl.conf, /etc/defaults/devfs.rules, the output of kldstat and the output of jexec nginx ipfw list.

I've changed the ruleset to 5; although it didn't fix the problem, I'll keep it in as that does indeed seem to be a more appropriate ruleset.

I upgraded the jail immediately after upgrading the host following the instructions from the handbook, yes. Though just for good measure, I just installed a completely new VM with a fresh 13.0-RELEASE image and replicated the exact same configuration - same issue.

/etc/sysctl.conf
Code:
# $FreeBSD$
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
security.bsd.see_other_uids=0
security.bsd.see_other_gids=0
security.bsd.see_jail_proc=0
security.bsd.unprivileged_read_msgbuf=0
security.bsd.unprivileged_proc_debug=0
kern.randompid=1
vfs.zfs.min_auto_ashift=12
net.inet.tcp.tso="0"
net.link.bridge.inherit_mac="1"

/etc/defaults/devfs.rules
Code:
#
# The following are some default rules for devfs(5) mounts.
# The format is very simple. Empty lines and lines beginning
# with a hash '#' are ignored. If the hash mark occurs anywhere
# other than the beginning of a line, it and any subsequent
# characters will be ignored.  A line in between brackets '[]'
# denotes the beginning of a ruleset. In the brackets should
# be a name for the rule and its ruleset number. Any other lines
# will be considered to be the 'action' part of a rule
# passed to the devfs(8) command. These will be passed
# "as-is" to the devfs(8) command with the exception that
# any references to other rulesets will be expanded first. These
# references must include a dollar sign '$' in front of the
# name to be expanded properly.
#
# $FreeBSD$
#

# Very basic and secure ruleset: Hide everything.
# Used as a basis for other rules.
#
[devfsrules_hide_all=1]
add hide

# Basic devices typically necessary.
# Requires: devfsrules_hide_all
#
[devfsrules_unhide_basic=2]
add path log unhide
add path null unhide
add path zero unhide
add path crypto unhide
add path random unhide
add path urandom unhide

# Devices typically needed to support logged-in users.
# Requires: devfsrules_hide_all
#
[devfsrules_unhide_login=3]
add path 'ptyp*' unhide
add path 'ptyq*' unhide
add path 'ptyr*' unhide
add path 'ptys*' unhide
add path 'ptyP*' unhide
add path 'ptyQ*' unhide
add path 'ptyR*' unhide
add path 'ptyS*' unhide
add path 'ptyl*' unhide
add path 'ptym*' unhide
add path 'ptyn*' unhide
add path 'ptyo*' unhide
add path 'ptyL*' unhide
add path 'ptyM*' unhide
add path 'ptyN*' unhide
add path 'ptyO*' unhide
add path 'ttyp*' unhide
add path 'ttyq*' unhide
add path 'ttyr*' unhide
add path 'ttys*' unhide
add path 'ttyP*' unhide
add path 'ttyQ*' unhide
add path 'ttyR*' unhide
add path 'ttyS*' unhide
add path 'ttyl*' unhide
add path 'ttym*' unhide
add path 'ttyn*' unhide
add path 'ttyo*' unhide
add path 'ttyL*' unhide
add path 'ttyM*' unhide
add path 'ttyN*' unhide
add path 'ttyO*' unhide
add path ptmx unhide
add path pts unhide
add path 'pts/*' unhide
add path fd unhide
add path 'fd/*' unhide
add path stdin unhide
add path stdout unhide
add path stderr unhide

# Devices usually found in a jail.
#
[devfsrules_jail=4]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add path fuse unhide
add path zfs unhide

[devfsrules_jail_vnet=5]
add include $devfsrules_jail
add path pf unhide
 
I've changed the ruleset to 5; although it didn't fix the problem, I'll keep it in as that does indeed seem to be a more appropriate ruleset.

I upgraded the jail immediately after upgrading the host following the instructions from the handbook, yes. Though just for good measure, I just installed a completely new VM with a fresh 13.0-RELEASE image and replicated the exact same configuration - same issue.

Your config files look fine. devfs.rules looks unchanged. What does the command jexec nginx ipfw list return?
 
Initially it only returned this:

Code:
65535 deny ip from any to any

I went ahead and enabled IPFW inside the jail using the open type, just to test:

Code:
00100 allow ip from any to any via lo0
00200 deny ip from any to 127.0.0.0/8
00300 deny ip from 127.0.0.0/8 to any
00400 deny ip from any to ::1
00500 deny ip from ::1 to any
00600 allow ipv6-icmp from :: to ff02::/16
00700 allow ipv6-icmp from fe80::/10 to fe80::/10
00800 allow ipv6-icmp from fe80::/10 to ff02::/16
00900 allow ipv6-icmp from any to any icmp6types 1
01000 allow ipv6-icmp from any to any icmp6types 2,135,136
65000 allow ip from any to any
65535 deny ip from any to any

Without luck.

kldstat
Code:
Id Refs Address                Size Name
 1   32 0xffffffff80200000  1f11f28 kernel
 2    1 0xffffffff82113000   67feb0 zfs.ko
 3    1 0xffffffff82793000     ae38 cryptodev.ko
 4    1 0xffffffff82918000     3218 intpm.ko
 5    1 0xffffffff8291c000     2180 smbus.ko
 6    2 0xffffffff8291f000    27040 ipfw.ko
 7    1 0xffffffff82947000     42a0 ipfw_nat.ko
 8    1 0xffffffff8294c000     b852 libalias.ko
 9    1 0xffffffff82958000     2a08 mac_ntpd.ko
10    1 0xffffffff8295b000     7638 if_bridge.ko
11    1 0xffffffff82963000     50d8 bridgestp.ko
12    1 0xffffffff82969000     33cc if_epair.ko
 
Although we may be on to something now - whereas previously I got a "Permission denied" error pretty much instantly when I tried to telnet to the outside world from within the jail, it now fails with a different error message: "Operation timed out".

I don't see why though, the firewall should allow the traffic to pass, right? The same operation does work on the host:

jail:
Code:
root@nginx:/ # telnet 142.250.179.227 80
Trying 142.250.179.227...
telnet: connect to address 142.250.179.227: Operation timed out
telnet: Unable to connect to remote host

host:
Code:
root@freebsd-nginx:~ # telnet 142.250.179.227 80
Trying 142.250.179.227...
Connected to lhr25s31-in-f3.1e100.net.
Escape character is '^]'.
 
tcpdump on the bridge or epair iface
also can you ssh from the jail to host ?
Yes, I am able to ssh from jail to host.

tcpdump on the bridge does show the traffic:

Code:
18:21:43.496107 ARP, Request who-has 10.0.0.1 tell 10.0.0.10, length 28
18:21:43.496668 ARP, Reply 10.0.0.1 is-at 32:46:9a:fe:3b:17, length 46
18:21:43.496730 IP 10.0.0.10.35198 > 10.0.0.1.53: 10604+ PTR? 227.179.250.142.in-addr.arpa. (46)
18:21:48.505708 IP 10.0.0.10.25688 > 10.0.0.1.53: 10604+ PTR? 227.179.250.142.in-addr.arpa. (46)
18:21:58.529391 IP 10.0.0.10.30215 > 142.250.179.227.80: Flags [S], seq 25434630, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3363634130 ecr 0], length 0
18:21:59.522835 IP 10.0.0.10.30215 > 142.250.179.227.80: Flags [S], seq 25434630, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3363635124 ecr 0], length 0
18:22:01.728607 IP 10.0.0.10.30215 > 142.250.179.227.80: Flags [S], seq 25434630, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3363637330 ecr 0], length 0
18:22:05.959508 IP 10.0.0.10.30215 > 142.250.179.227.80: Flags [S], seq 25434630, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3363641560 ecr 0], length 0
18:22:14.202643 IP 10.0.0.10.30215 > 142.250.179.227.80: Flags [S], seq 25434630, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3363649803 ecr 0], length 0
18:22:30.412774 IP 10.0.0.10.30215 > 142.250.179.227.80: Flags [S], seq 25434630, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3363666013 ecr 0], length 0
18:23:02.619297 IP 10.0.0.10.30215 > 142.250.179.227.80: Flags [S], seq 25434630, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3363698220 ecr 0], length 0
 
looks like forwarding is not enabled ?
try ping instead of telnet (ping 8.8.8.8) because that seems exempt from nat rules (see if anything goes on em0)
 
sysctl net.inet.ip.forwarding returns 1 on the host, I'm guessing that is what you mean? Pinging 8.8.8.8 from the jail also works.

Code:
18:45:18.993839 IP 10.0.0.10 > 8.8.8.8: ICMP echo request, id 21868, seq 2, length 64
18:45:19.003613 IP 8.8.8.8 > 10.0.0.10: ICMP echo reply, id 21868, seq 2, length 64
 
do you see your telnet packets blocked by rule 999 ?
That actually does seem to be the case, yes:

Code:
Oct 10 19:54:57 freebsd-nginx kernel: ipfw: 999 Deny UDP 10.0.0.10:62484 10.0.0.1:53 in via epair0a
Oct 10 19:55:02 freebsd-nginx kernel: ipfw: 999 Deny UDP 10.0.0.10:52494 10.0.0.1:53 in via epair0a
Oct 10 19:55:12 freebsd-nginx kernel: ipfw: 999 Deny TCP 10.0.0.10:45060 142.250.179.227:80 in via epair0a
Oct 10 19:55:19 freebsd-nginx syslogd: last message repeated 3 times
 
add a rule at the top like skipto 1000 ip from 10.0.0.0/24 to any in via epair0a
anyway it seems your setup may work without nat at all on the host and just use your router's nat
(you may need a static route on your router for 10.0.0.0/24 via the ip of em0)
 
Awesome, you found it! I ended up having to add a few additional rules (in via em0 and the bridge, and also out via the epair iface and bridge).

/etc/ipfw.rules now looks like this. As my understanding about how IPFW works is still limited I'm wondering if I did this properly (particularly regarding keep-state and the overall "correctness").
Code:
#!/bin/sh

cmd="ipfw -q add"
skip="skipto 1000"
pif=em0
ks="keep-state"

# Delete all rules
ipfw -q -f flush

# Disable one_pass and setup NAT with port forwarding
ipfw disable one_pass
ipfw -q nat 1 config if $pif same_ports unreg_only reset \
  redirect_port tcp 10.0.0.10:80 80


$cmd 010 allow all from any to any via lo0
$cmd 099 reass all from any to any in

# Inbound NAT
$cmd 100 nat 1 ip from any to any in via $pif

$cmd 102 $skip tcp from any to 10.0.0.10 http in via $pif setup $ks
$cmd 103 $skip tcp from 10.0.0.0/24 to any in via epair0a setup $ks
$cmd 104 $skip ip from 10.0.0.0/24 to any in via epair0a $ks
$cmd 105 $skip tcp from 10.0.0.0/24 to any in via bridge0 setup $ks
$cmd 106 $skip ip from 10.0.0.0/24 to any in via bridge0 $ks

# Statefull firewall
$cmd 101 check-state

# SSH
$cmd 110 allow tcp from any to me ssh setup $ks

# ICMP
$cmd 111 allow icmp from any to any

# Outbound NAT
$cmd 112 $skip tcp from any to any out via $pif setup $ks
$cmd 113 $skip ip from any to any out via $pif $ks
$cmd 114 $skip tcp from any to any out via bridge0 setup $ks
$cmd 115 $skip ip from any to any out via bridge0 $ks
$cmd 116 $skip tcp from any to any out via epair0a setup $ks
$cmd 117 $skip ip from any to any out via epair0a $ks

$cmd 999 deny log all from any to any
$cmd 1000 nat 1 ip from any to any out via $pif
$cmd 1001 allow ip from any to any

# Deny the rest
$cmd 65500 deny ip from any to any

DNS doesn't yet work though, but I think that's because of the resolver (router) being on 10.0.0.1. The resolver does respond to the query, it's just not reaching the jail:
Code:
20:33:30.933456 IP 10.0.0.143.57041 > 10.0.0.1.53: 61261+ A? google.nl. (27)
20:33:30.947617 IP 10.0.0.1.53 > 10.0.0.143.57041: 61261 1/0/0 A 142.251.36.35 (43)
20:33:30.947723 IP 10.0.0.10 > 10.0.0.143: ICMP 10.0.0.10 udp port 57041 unreachable, length 79
20:33:30.947745 IP 10.0.0.10 > 10.0.0.1: ICMP 10.0.0.10 udp port 57041 unreachable, length 79

Works fine when I change the nameserver to, for example, 8.8.8.8.

anyway it seems your setup may work without nat at all on the host and just use your router's nat
(you may need a static route on your router for 10.0.0.0/24 via the ip of em0)

Sounds logical. One of the main reasons I'm currently experimenting with FreeBSD is that I'm looking to install a new server for a fun little personal project which I intend to host on a VPS somewhere. Up until this point I've mainly used Linux for that purpose but I've always wanted to try and learn BSD, and this little project seems like the perfect opportunity for that (I'm hoping it allows me to gain a bit of "real world" experience without too serious consequences in case things go haywire). Hence why I'm trying to locally simulate the setup as it would be on the VPS as closely as possible in order to find potential problems early and to gain some basic experience with the OS first.
 
cool that it works
you have too many nat rules
for start just look at rc.conf/rc.firewall and just set
firewall_nat_enable and firewall_nat_interface
it will auto add the required nat rules
 
Back
Top