Jails -- vimage stability; epair/if_bridge vs. netgraph -- recommendations?

Now that FreeBSD 9.0 is out, I'm finally getting around to upgrading my jail servers. There seems to be quite a bit of information written in the 7.x and 8.x timeframe, but not so much that is current. Before embarking on the task, I'd appreciate some insight into 9.0 implementations.

The goal is to be able to configure publicly available servers in their own jails, each with a public interface and a private (management) interface and connect them with virtual routers, each with ipfw and NAT (as appropriate) to more easily manage communication to and between them (as opposed to a monolithic set of ipfw rules). Also, being able to jail DHCP so that I don't have promiscuous mode on a "real" interface is very attractive.

First, I understand that vimage is still considered an "experimental" feature. At least in 8.0, it was incompatible with SCTP. Some posts suggest that this is (was) the main reason that vimage was considered experimental. Is vimage still incompatible with SCTP in 9.0? Are there any significant issues with vimage in 9.0 beyond SCTP incompatibility?

Second, while most information I have seen use epair/if_bridge to handle networking within the jail server, I have also seen (http://druidbsd.sourceforge.net/vimage.shtml) netgraph used. What experience, if any, is out there to recommend one approach over the other?
 
I am also looking for good pointers about this.

My experience so far is that VIMAGE is experimental indeed, but the main problem is when shutting down jails. Many people experience easily reproducable panics when tearing down a jail, due to vnet. I have had the problem with both epair/if_bridge and netgraph. Removing PF from the kernel config completely (I wasn't actually using it, hadn't gotten that far) removed my panics, and reduced it to a memory leak.

Many use cases involve very few shutdowns of jails, mine included, so I am now planning production using VIMAGE.

One annoying thing about epair is the way they are numbered makes it hard to handle them gracefully in a config with 20++ jails. Also, there is no built in support in /etc/rc.d/jail, and the most popular patch, jailv2, requires a horrible amount of config lines per jails and doesn't scale very well. You also have do decide at config time which epairN belongs to which jail, which might be OK for a very static setup, but doesn't scale if you want to be able to simply provision jails on demand. I patched /etc/rc.d/jail rather heavily to support epairs better, especially in a setup similar to what you described, but then I found the DruidBSD's vimage package. It uses netgraph instead of epair, and the netgraph interfaces can be named and numbered per jail instead of globally, which scales a lot better from a configurational point of view. I'm down to three simple lines per jail, and IP config is in the jail's own /etc/rc.conf:

Code:
gateway_enable="YES"

vimage_enable="YES"
vimage_fdescfs_enable="YES"
vimage_procfs_enable="YES"
vimage_mount_enable="YES"
vimage_devfs_enable="YES"
vimage_devfs_ruleset="devfsrules_jail"

vimage_list="palle palle2"

# jail "palle"
vimage_palle_rootdir="/tank/master"
vimage_palle_hostname="palle.example.com"
vimage_palle_bridges="bce0 bce1"

# jail "palle2"
vimage_palle2_rootdir="/tank/masterpp"
vimage_palle2_hostname="palle2.example.com"
vimage_palle2_bridges="bce0 bce1"

and in the jail's /etc/rc.conf:

Code:
ifconfig_ng0_palle="inet 192.168.1.155/24"
ifconfig_ng1_palle2="inet 10.0.0.155/8"
ifconfig_ng0_palle2="inet 192.168.1.156/24"
ifconfig_ng1_palle2="inet 10.0.0.156/8"
vimage seems to be worth the effort. I've been using aliases before, but without vimage you don't get a proper loopback interface (at least I failed to get it working) unless setting different IP-addresses, i.e. 127.0.0.155... that sucks...

Right now I'm looking for pointers as to which of if_bridge+epair or netgraph has the best performance. Seems more people are biased to netgraph, it is said to scale better and has been tested with 65534 nodes, FWIW... :eek:)

To test the teardown memory leak, to estimate how bad it is, I'm now looping [cmd=]vimage start; vimage stop[/cmd] so far 1300+ iterations on two jails and the machine is still OK.

For firewall, you should not use PF, as far as I can tell. IPFW is reported to support vimage.

Here's the error message from tearing down a jail:
Code:
Freed UMA keg was not empty (70 items).  Lost 7 pages of memory.
Freed UMA keg was not empty (672 items).  Lost 4 pages of memory.
Freed UMA keg was not empty (40 items).  Lost 4 pages of memory.
Freed UMA keg was not empty (20 items).  Lost 5 pages of memory.
hhook_vnet_uninit: hhook_head type=1, id=1 cleanup required
hhook_vnet_uninit: hhook_head type=1, id=0 cleanup required

There is a PR: http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/164763
 
@girgen:

You wrote such a good reply and no one even replied or thanked you for it? I'm surprised! Let me be the first one to thank you for your description. :beergrin

Have you had any new insights since you wrote this?

I have no experience with DruidBSD and I am using a simple self-written (zsh) shell script that is able to start and stop jails. It's tailored for my own needs but might be useful for others. I did notice the same problems as you describe when shutting down one or more jails.

Here is my script:

Code:
#!/usr/local/bin/zsh
# Yes, ZSH, the superior shell.

# Where are the jails located?
jaildir="/usr/jails"

# Warden is the jail host.
warden="192.168.1.254"

# Use an associative array.
typeset -A jailhosts
jailhosts=( template    template:192.168.1.253/24:epair0
            node1       node1.mydom.tld:192.168.1.1/24:epair1
            node2       node1.mydom.tld:192.168.1.2/24:epair2
            node3       node1.mydom.tld:192.168.1.3/24:epair3
            node4       node1.mydom.tld:192.168.1.4/24:epair4)

# configuration ends

myname=$0

usage() {
    print "Usage: $myname jailname [on|off]" >&2
    exit 1
}

if (( ${#argv} != 2 )) ; then
    usage
fi

if [[ ! -d $1 ]]; then
    print "Jail $1 doesn't exist."
    usage
fi


jail=$1
jailhn=${${(s.:.)jailhosts[$jail]}[1]}	# Jail hostname
jailip=${${(s.:.)jailhosts[$jail]}[2]}	# Jail IP
jailif=${${(s.:.)jailhosts[$jail]}[3]}	# Jail epair name

#print "jailhn: '$jailhn' jailip: '$jailip' jailif: '$jailif'"
#exit 0

case "$2" in
	on)
		ifconfig ${jailif} create
		ifconfig bridge0 addm ${jailif}a
		ifconfig ${jailif}a up
		mount -t devfs devfs ${jaildir}/${1}/dev
		test -d ${jaildir}/${jail}/usr/ports || mkdir ${jaildir}/${jail}/usr/ports
		mount_nullfs -o noatime /usr/ports/ ${jaildir}/${jail}/usr/ports
		test -d ${jaildir}/${jail}/usr/ports/packages || mkdir ${jaildir}/${jail}/usr/ports/packages
		mount_nullfs -o noatime /usr/ports/packages ${jaildir}/${jail}/usr/ports/packages
		test -d ${jaildir}/${jail}/usr/ports/distfiles || mkdir ${jaildir}/${jail}/usr/ports/distfiles
		mount_nullfs -o noatime /usr/ports/distfiles/ ${jaildir}/${jail}/usr/ports/distfiles
		test -d ${jaildir}/${jail}/usr/sysccache || mkdir ${jaildir}/${jail}/usr/sysccache
		mount_nullfs -o noatime /usr/sysccache ${jaildir}/${jail}/usr/sysccache
		jail -c vnet name=${jail} host.hostname=${jailhn} path=${jaildir}/${1} persist
		ifconfig ${jailif}b vnet ${jail}
        jexec ${jail} ifconfig lo0 127.0.0.1/8 alias
		jexec ${jail} ifconfig ${jailif}b ${jailip}
		jexec ${jail} route add default ${warden}
		#jexec ${jail} /etc/rc.d/netif start
		#jexec ${jail} /etc/rc.d/routing start
		jexec ${jail} /bin/sh /etc/rc
		;;
	off)
        print "Shutting down ${jail} in 3 seconds. Press ^C if this is not what you want."
        sleep 1
        print -n "."
        sleep 1
        print -n "."
        sleep 1
        print -n "."
        sleep 1
        print " running rc.shutdown"
        sleep 1
        jexec ${jail} /bin/sh /etc/rc.shutdown
        print
        print -n "Waiting 3s to settle down"
        sleep 1
        print -n "."
        sleep 1
        print -n "."
        sleep 1
        print -n "."
        sleep 1
        print " proceeding."
        print
		jail -r ${jail}
		umount ${jaildir}/${jail}/usr/sysccache
		umount ${jaildir}/${jail}/usr/ports/packages
		umount ${jaildir}/${jail}/usr/ports/distfiles
		umount ${jaildir}/${jail}/usr/ports
		umount ${jaildir}/${jail}/dev
		ifconfig bridge0 deletem ${jailif}a
		ifconfig ${jailif}a destroy
		;;
     *) usage ;;
esac


# vim: ft=zsh ts=4 sw=4 et
 
  • Thanks
Reactions: swa
Thanks girgen!

I'm looking into this again and one concerning note for anyone working with VNET is kern/164763

Apparently this memory leak still exists in yesterday's build of 9.1-STABLE (based on the steps to repeat from that PR) and the guidance is to stop processes within a jail that uses VNET, but not to tear it down. According to the PR, a reboot is required to reclaim the memory.

kern/164763 said:
Our current rule of thumb is "if you need to shutdown one vimage jail
then you need to reboot the host."

So we just shut down the services in each jail, leave the jails
themselves up, and just reboot the host.

Of course this is far from optimal. Is this PR still on the radar of
anyone?
 
As a follow-up on this, I've been running "VIMAGE" jails under FreeBSD 9-STABLE (with ZFS filesystems) for some time now. Jail configuration is all through jail.conf and use of pre-start/stop scripts to set up plumbing them in using netgraph tee, bridge, and eiface devices. It all runs pretty smoothly, though there are the warnings about memory leaks on the console when you shut down a jail.

I haven't yet successfully tackled passing in both a netgraph eiface and a WiFi device to a single jail.

Running poudriere on that machine results in the generation and destruction of hundreds of jails, but I haven't seen any critical effects yet, just the warnings about memory leaks.
 
The ports utility qjail has an option for creating VIMAGE jails under 9.2 using jail.conf. The SCTP problem is fixed in 9.2. Read the VIMAGE details in qjail for info about firewalls usage in 9.2 VNET jails.

The PR kern/164763: VNET Memory leak problem still exists in 9.2.

Big internal changes in jails between 8.X/9.0/9.1 and 9.2. Even bigger internal changers between 9.2 and 10.0. To use Netgraph you need expert understanding of networking and netgraph. epair/if_bridge is much simpler to use.

Recommendation: vnet/vimage is not ready for production usage as of 9.2. Best to wait for 10.0 and see how it works then.
 
Thanks for the heads-up on the SCTP conflict. I didn't know that had been resolved. Do you know which svn revision finally resolved it?

I'm not sure that you need to be an expert in networking and netgraph to hook up a jail. I would call it more of an interesting experience in learning part of FreeBSD that I wasn't familiar with before. Certainly it isn't turn-key the first time through, but it is fully scripted for me here now and I can just declare a new jail in /etc/jail.conf and it will plumb it in when created, and tear down the plumbing when it is stopped.

The basic steps that I am using are more complicated than needed -- you can skip the addition of the snoop/debug tees I use.

  1. If a ng_bridge doesn't already exist for the desired interface (real, VLAN, what have you), create an ng_bridge and attach the upper and lower hooks of the "real" interface to the first two hooks of the bridge (optionally use two tees here).
  2. Find an open hook on the bridge. Connect a new ng_eiface to the bridge (again a tee can be used).
  3. I rename the eiface and the newly created interface here to make my life easier.
  4. Assign a unique MAC address to the new interface. Do other ifconfig, as you see fit.
  5. Let jail -c give the new interface to the jail.

About the only thing that tripped me up are the restrictions on name length for a netgraph node and an interface:
NG_NODESIZ=32 # src/sys/netgraph/ng_message.h
IF_NAMESIZE=16 # src/sys/net/if.h
 
Back
Top