jails VNET Jail with ZFS howto

Hello folks,

yesterday I wrote a Twitter thread to give an example how to deploy VNET jails in a ZFS environment.

Here is it again in this forum.

A guide to deploy a VNET jail using a FreeBSD 13.0 server with ZFS and populated /usr/src. We start with preparing the file tree. I use /l/prison
(in zpool/prison) as starting point.
I assume /usr/src is in zpool/usr/src, /usr/ports in zpool/usr/ports. We need snapshots of /usr/src and /usr/ports:
Code:
# zfs snapshot zpool/usr/src@jail-template
# zfs snapshot zpool/usr/ports@jail-template

We prepare a jail template according to the jail(8):
Code:
# zfs create zpool/prison/template
# cd /usr/src
# make world DESTDIR=/l/prison/template
# make distribution DESTDIR=/l/prison/template
# zfs snapshot zpool/prison/template@jail-template

Now we make the zfs datasets for the jail. I put these in a simple shell script and use zfs clone quite often:
Code:
:
target=myvnetjail
source="zpool/prison/template@jail-template"
t="zpool/prison/$target"
path="/l/prison/$target"
zfs clone -o exec=on -o setuid=on -o compression=off $source $t
cd $path || exit 1
tar cvf /tmp/$target.$$ var
chflags -R noschg var usr
rm -rf usr var
zfs create -o mountpoint="$path/var" -o exec=off -o setuid=off -o compression=off "${t}/var"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/mail"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/log"
zfs create -o exec=off -o setuid=off -o compression=off "${t}/var/run"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/tmp"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/db"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/db/pkg"
zfs create -o exec=off -o setuid=off -o compression=off "${t}/var/db/portsnap"
zfs create -o exec=off -o setuid=off -o compression=off -o readonly=on "${t}/var/empty"
zfs create -o exec=off -o setuid=off -o compression=off "${t}/var/local"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/spool"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/local/spool"
zfs create -o exec=off -o setuid=off -o compression=lz4 "${t}/var/local/log"
zfs create -o exec=on -o setuid=on -o compression=off -o mountpoint=${path}/l "${t}/localdisk"
zfs create -o exec=on -o setuid=on -o compression=off "${t}/localdisk/home"
zfs create -o exec=on -o setuid=on -o compression=off "${t}/localdisk/local"
zfs create -o exec=on -o setuid=off -o compression=lz4 "${t}/tmp"
cd $path
tar xvpf /tmp/$target.$$
rm /tmp/$target.$$
#
zfs clone -o exec=on -o setuid=on -o compression=off  $source ${t}/usr
cd ${path}/usr
chflags -R noschg lib libexec sbin var
rm -r .cshrc .profile COPYRIGHT bin boot dev etc lib libexec media mnt net proc rescue root sbin tmp var
cd usr
mv * ..
cd ..
rmdir usr
#
zfs create -o exec=on -o setuid=on -o compression=off "${t}/usr/local"
zfs clone -o exec=on -o setuid=off -o compression=lz4  zpool/usr/ports@jail-template ${t}/usr/ports
zfs clone -o exec=on -o setuid=off -o compression=lz4 zpool/usr/src@jail-template ${t}/usr/src

Now we prepare the host environment: I put local addons in /l/local:
Code:
# cp -p /usr/src/share/examples/jails/jib /l/local/sbin

If you want to use netgraph(4), look at the jng script. I go here with if_bridge(4) because of some issues with netgraph. A few little patches to jib:

Code:
--- /usr/src/share/examples/jails/jib   2021-09-12 03:13:47.057333000 +0200
+++ /l/local/sbin/jib   2021-06-09 14:30:57.000000000 +0200
@@ -215,11 +215,25 @@
        # the MAC address will be recalculated to a new, similarly
        # unique value preventing conflict.
        #
+       # ARNE ## OM ## START #
+       # That's wrong. The jails of good-hope and odin get the same
+       # MAC address!
+       # Why?
+       # arne@trajan:~-<10> echo -n gh | sum
+       # 32923 1
+       # arne@trajan:~-<11> echo -n od | sum
+       # 32923 1
+       # ARNE ## OM ## END   #
+       #
+       #
        __iface_devid=$( ifconfig $__iface ether | awk '/ether/,$0=$2' )
        # ??:??:??:II:II:II
        __new_devid=${__iface_devid#??:??:??} # => :II:II:II
        # => :SS:SS:II:II:II
-       __num=$( set -- `echo -n "$__name" | sum` && echo $1 )
+       # ARNE ## OM ## START #
+       # __num=$( set -- `echo -n "$__name" | sum` && echo $1 )
+       __num=$( set -- `echo -n "$__name" | cksum` && echo $1 )
+       # ARNE ## OM ## END   #
        __new_devid=$( printf :%02x:%02x \
                $(( $__num >> 8 & 255 )) $(( $__num & 255 )) )$__new_devid
        # => P:SS:SS:II:II:II
@@ -307,6 +321,10 @@

                # Create a new interface to the bridge
                new=$( ifconfig epair create ) || return
+               # ARNE # OM # START #
+               mtu=$( ifconfig $iface$bridge | head -1 | sed -e 's/^.*mtu //') || return
+               ifconfig $new mtu $mtu || return
+               # ARNE # OM # END   #
                ifconfig "$iface$bridge" addm $new || return

                # Rename the new interface

Of course, if_bridge(4) must be linked in the kernel or available as kernel module.
Code:
# chmod 755 /l/local/sbin/jib

Now the most important part: /etc/jail.conf. Here is a example for a pretty normal jail.

Code:
# ATTENTION in case you have firewall code inside the kernel (drop as default)
# Increasing the secure level here means that inside the Jail
# the firewall configurations can no longer be made (== opened)!
# securelevel 3 only works with Netgraph or in traditional (non-vnet) jails
# according to this

myjail {
        host.hostname = "myjail.example.com";
        path = "/l/prison/myvnetjail";
        devfs_ruleset = "7";
        securelevel = 0;
        vnet = "new";
        vnet.interface = e0b_myjail, e1b_myjail;
        exec.fib = "8";
        exec.system_user = "root";
        exec.jail_user = "root";
        exec.consolelog = "/var/local/log/jails/myjail_console.log";
        exec.clean;
        exec.prestart += "/l/local/sbin/jib addm myjail mynet0 mynet1";
        exec.start = "/bin/sh /etc/rc";
        exec.stop = "/bin/sh /etc/rc.shutdown";
        exec.poststop += "/l/local/sbin/jib destroy myjail";
        mount.devfs;
        enforce_statfs = "1";
        persist;
}

Code:
# mkdir /var/local/log/jails
# touch /var/local/log/jails/myjail_console.log
# chmod 600 /var/local/log/jails/myjail_console.log

In this example I grant the jail access to two network interfaces on the host side. No preparation is necessary on the host side. The host doesn't even
need access to this networks.
Example rc.conf entry (host side):
Code:
ifconfig_mce0_name="mynet0"
ifconfig_mce1_name="mynet1"
ifconfig_mynet0="-lro -tso4 -tso6 -vlanhwtso mtu 9000 up"
ifconfig_mynet1="-lro -tso4 -tso6 -vlanhwtso mtu 9000 up"

You have to switch off some fancy hardwaresupport on certain ethernet controllers (as in my example) to get VNET work but in most cases you won't have problems. Using Mellanox Bluefield controllers there are some additional tasks to do but this is out of scope here.

Of course, the host can also set an IP address on mynet0/1 and use it for its own network connectivity.

As you see in my example I use FIBs here. You have to adjust the number of FIBs to at least the same number as you use in /boot/loader.conf and/or /etc/sysctl.conf. Since FreeBSD 13 you have to set net.fibs ALSO in jail's sysctl.conf /l/prison/myvnetjail/etc/sysctl.conf !!

In my example:
/boot/loader.conf:
[...]
net.fibs="16"
net.add_addr_allfibs="0"
[...]

/etc/sysctl.conf AND /l/prison/myvnetjail/etc/sysctl.conf:
[...]
net.fibs=16
net.add_addr_allfibs=0
[...]

As I use pf in the kernel I need access to pf from inside the jail:
Code:
# cp -p /etc/defaults/devfs.rules /etc/devfs.rules
Add stuff like that to devfs.rules. The entry number (here 7) must match /etc/jail.conf:

Code:
# Devices usually found in a jail.
#
[devfsrules_jail=4]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add path fuse unhide
add path zfs unhide

# Jail mit Berkeley Paket Filter (DHCP Client und Server)
#
[devfsrules_jail_bpf=5]
add include $devfsrules_jail
add path 'bpf*' unhide

# Jail mit Berkeley Paket Filter (DHCP Client und Server) und pf
#
[devfsrules_jail_bpf_pf=6]
add include $devfsrules_jail_bpf
add path pf unhide
add path pflog unhide
add path pfsync unhide

# Jail mit pf
#
[devfsrules_jail_pf=7]
add include $devfsrules_jail
add path pf unhide
add path pflog unhide
add path pfsync unhide
Now we have to customize the jail:
Code:
# chroot /l/prison/myvnetjail /bin/sh
# cd etc
# vipw
# vi /etc/group /etc/resolv.conf /etc/sysctl.conf /etc/make.conf # .... as you like
# cp -p /usr/share/zoneinfo/Europe/Berlin /etc/localtime # Whatever you need
# cap_mkdb /etc/login.conf
# cap_mkdb -f /usr/share/misc/termcap /etc/termcap
# vi /etc/rc.conf

Having ipfw and pf with drop default in the kernel you need at least something like that:

Code:
hostname="myjail.example.com"
#
ifconfig_e0b_myjail="inet 192.0.2.2 netmask 255.255.255.0 mtu 9000"
ifconfig_e1b_myjail="inet 198.51.100.2 netmask 255.255.255.0 mtu 9000"
#
# We need this because we use net.add_addr_allfibs="0" !
static_routes="lo0ifroute e0bifroute e1bifroute"
route_lo0ifroute="-host 127.0.0.1 -iface lo0"
route_e0bifroute="-net 192.0.2.0/24 -iface e0b_myjail"
route_e1bifroute="-net 198.51.100.0/24 -iface e1b_myjail"
#
ipv6_activate_all_interfaces="NO"
defaultrouter="198.51.100.1"
gateway_enable="NO"
#
firewall_enable="YES"           # Set to YES to enable firewall functionality
firewall_script="/etc/rc.firewall"      # Which script to run to set up the firewall
firewall_type="OPEN"            # Firewall type (see /etc/rc.firewall)
#
pf_enable="YES"
#
sshd_enable="YES"
#

Don't forget a "pass all" rule in jail's /etc/pf.conf.
Now something strange: Copy host's boot/kernel inside the jail:

Code:
# cp -pr /boot/kernel /l/prison/myvnetjail/boot

Why that? All ports depending on perl with break out unable to configure the dtrace stuff without the kernel

Now add this to your host's /etc/rc.conf:
Code:
jail_enable="YES"
jail_confwarn="YES"
jail_parallel_start="NO"
jail_list="myjail"
jail_reverse_stop="YES"

and type "service jail start" as root user. jls and friends won't show jail's IP address. This is correct for VNET jails

One last addition: Don't even think about using NFS, client and server side anywhere near VNET jails. The NFS guys never ported the NFS kernel code to VNET and this is a real pain in the ass because it makes some very important use cases in my professional environment difficult.
 
Back
Top