1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Jails -- vimage stability; epair/if_bridge vs. netgraph -- recommendations?

Discussion in 'Networking' started by jef, May 1, 2012.

  1. jef

    jef New Member

    Messages:
    64
    Thanks Received:
    1
    Now that FreeBSD 9.0 is out, I'm finally getting around to upgrading my jail servers. There seems to be quite a bit of information written in the 7.x and 8.x timeframe, but not so much that is current. Before embarking on the task, I'd appreciate some insight into 9.0 implementations.

    The goal is to be able to configure publicly available servers in their own jails, each with a public interface and a private (management) interface and connect them with virtual routers, each with ipfw and NAT (as appropriate) to more easily manage communication to and between them (as opposed to a monolithic set of ipfw rules). Also, being able to jail DHCP so that I don't have promiscuous mode on a "real" interface is very attractive.

    First, I understand that vimage is still considered an "experimental" feature. At least in 8.0, it was incompatible with SCTP. Some posts suggest that this is (was) the main reason that vimage was considered experimental. Is vimage still incompatible with SCTP in 9.0? Are there any significant issues with vimage in 9.0 beyond SCTP incompatibility?

    Second, while most information I have seen use epair/if_bridge to handle networking within the jail server, I have also seen (http://druidbsd.sourceforge.net/vimage.shtml) netgraph used. What experience, if any, is out there to recommend one approach over the other?
     
  2. girgen@

    girgen@ New Member Developer

    Messages:
    10
    Thanks Received:
    9
    I am also looking for good pointers about this.

    My experience so far is that VIMAGE is experimental indeed, but the main problem is when shutting down jails. Many people experience easily reproducable panics when tearing down a jail, due to vnet. I have had the problem with both epair/if_bridge and netgraph. Removing PF from the kernel config completely (I wasn't actually using it, hadn't gotten that far) removed my panics, and reduced it to a memory leak.

    Many use cases involve very few shutdowns of jails, mine included, so I am now planning production using VIMAGE.

    One annoying thing about epair is the way they are numbered makes it hard to handle them gracefully in a config with 20++ jails. Also, there is no built in support in /etc/rc.d/jail, and the most popular patch, jailv2, requires a horrible amount of config lines per jails and doesn't scale very well. You also have do decide at config time which epairN belongs to which jail, which might be OK for a very static setup, but doesn't scale if you want to be able to simply provision jails on demand. I patched /etc/rc.d/jail rather heavily to support epairs better, especially in a setup similar to what you described, but then I found the DruidBSD's vimage package. It uses netgraph instead of epair, and the netgraph interfaces can be named and numbered per jail instead of globally, which scales a lot better from a configurational point of view. I'm down to three simple lines per jail, and IP config is in the jail's own /etc/rc.conf:

    Code:
    gateway_enable="YES"
    
    vimage_enable="YES"
    vimage_fdescfs_enable="YES"
    vimage_procfs_enable="YES"
    vimage_mount_enable="YES"
    vimage_devfs_enable="YES"
    vimage_devfs_ruleset="devfsrules_jail"
    
    vimage_list="palle palle2"
    
    # jail "palle"
    vimage_palle_rootdir="/tank/master"
    vimage_palle_hostname="palle.example.com"
    vimage_palle_bridges="bce0 bce1"
    
    # jail "palle2"
    vimage_palle2_rootdir="/tank/masterpp"
    vimage_palle2_hostname="palle2.example.com"
    vimage_palle2_bridges="bce0 bce1"
    


    and in the jail's /etc/rc.conf:

    Code:
    ifconfig_ng0_palle="inet 192.168.1.155/24"
    ifconfig_ng1_palle2="inet 10.0.0.155/8"
    ifconfig_ng0_palle2="inet 192.168.1.156/24"
    ifconfig_ng1_palle2="inet 10.0.0.156/8"
    

    vimage seems to be worth the effort. I've been using aliases before, but without vimage you don't get a proper loopback interface (at least I failed to get it working) unless setting different IP-addresses, i.e. 127.0.0.155... that sucks...

    Right now I'm looking for pointers as to which of if_bridge+epair or netgraph has the best performance. Seems more people are biased to netgraph, it is said to scale better and has been tested with 65534 nodes, FWIW... :eek:)

    To test the teardown memory leak, to estimate how bad it is, I'm now looping vimage start; vimage stop so far 1300+ iterations on two jails and the machine is still OK.

    For firewall, you should not use PF, as far as I can tell. IPFW is reported to support vimage.

    Here's the error message from tearing down a jail:
    Code:
    Freed UMA keg was not empty (70 items).  Lost 7 pages of memory.
    Freed UMA keg was not empty (672 items).  Lost 4 pages of memory.
    Freed UMA keg was not empty (40 items).  Lost 4 pages of memory.
    Freed UMA keg was not empty (20 items).  Lost 5 pages of memory.
    hhook_vnet_uninit: hhook_head type=1, id=1 cleanup required
    hhook_vnet_uninit: hhook_head type=1, id=0 cleanup required


    There is a PR: http://www.FreeBSD.org/cgi/query-pr.cgi?pr=kern/164763
     
    raitech, lalebarde, jef and 3 others thank for this.
  3. donduq

    donduq New Member

    Messages:
    38
    Thanks Received:
    5
    @girgen:

    You wrote such a good reply and no one even replied or thanked you for it? I'm surprised! Let me be the first one to thank you for your description. :beergrin

    Have you had any new insights since you wrote this?

    I have no experience with DruidBSD and I am using a simple self-written (zsh) shell script that is able to start and stop jails. It's tailored for my own needs but might be useful for others. I did notice the same problems as you describe when shutting down one or more jails.

    Here is my script:

    Code:
    #!/usr/local/bin/zsh
    # Yes, ZSH, the superior shell.
    
    # Where are the jails located?
    jaildir="/usr/jails"
    
    # Warden is the jail host.
    warden="192.168.1.254"
    
    # Use an associative array.
    typeset -A jailhosts
    jailhosts=( template    template:192.168.1.253/24:epair0
                node1       node1.mydom.tld:192.168.1.1/24:epair1
                node2       node1.mydom.tld:192.168.1.2/24:epair2
                node3       node1.mydom.tld:192.168.1.3/24:epair3
                node4       node1.mydom.tld:192.168.1.4/24:epair4)
    
    # configuration ends
    
    myname=$0
    
    usage() {
        print "Usage: $myname jailname [on|off]" >&2
        exit 1
    }
    
    if (( ${#argv} != 2 )) ; then
        usage
    fi
    
    if [[ ! -d $1 ]]; then
        print "Jail $1 doesn't exist."
        usage
    fi
    
    
    jail=$1
    jailhn=${${(s.:.)jailhosts[$jail]}[1]}	# Jail hostname
    jailip=${${(s.:.)jailhosts[$jail]}[2]}	# Jail IP
    jailif=${${(s.:.)jailhosts[$jail]}[3]}	# Jail epair name
    
    #print "jailhn: '$jailhn' jailip: '$jailip' jailif: '$jailif'"
    #exit 0
    
    case "$2" in
    	on)
    		ifconfig ${jailif} create
    		ifconfig bridge0 addm ${jailif}a
    		ifconfig ${jailif}a up
    		mount -t devfs devfs ${jaildir}/${1}/dev
    		test -d ${jaildir}/${jail}/usr/ports || mkdir ${jaildir}/${jail}/usr/ports
    		mount_nullfs -o noatime /usr/ports/ ${jaildir}/${jail}/usr/ports
    		test -d ${jaildir}/${jail}/usr/ports/packages || mkdir ${jaildir}/${jail}/usr/ports/packages
    		mount_nullfs -o noatime /usr/ports/packages ${jaildir}/${jail}/usr/ports/packages
    		test -d ${jaildir}/${jail}/usr/ports/distfiles || mkdir ${jaildir}/${jail}/usr/ports/distfiles
    		mount_nullfs -o noatime /usr/ports/distfiles/ ${jaildir}/${jail}/usr/ports/distfiles
    		test -d ${jaildir}/${jail}/usr/sysccache || mkdir ${jaildir}/${jail}/usr/sysccache
    		mount_nullfs -o noatime /usr/sysccache ${jaildir}/${jail}/usr/sysccache
    		jail -c vnet name=${jail} host.hostname=${jailhn} path=${jaildir}/${1} persist
    		ifconfig ${jailif}b vnet ${jail}
            jexec ${jail} ifconfig lo0 127.0.0.1/8 alias
    		jexec ${jail} ifconfig ${jailif}b ${jailip}
    		jexec ${jail} route add default ${warden}
    		#jexec ${jail} /etc/rc.d/netif start
    		#jexec ${jail} /etc/rc.d/routing start
    		jexec ${jail} /bin/sh /etc/rc
    		;;
    	off)
            print "Shutting down ${jail} in 3 seconds. Press ^C if this is not what you want."
            sleep 1
            print -n "."
            sleep 1
            print -n "."
            sleep 1
            print -n "."
            sleep 1
            print " running rc.shutdown"
            sleep 1
            jexec ${jail} /bin/sh /etc/rc.shutdown
            print
            print -n "Waiting 3s to settle down"
            sleep 1
            print -n "."
            sleep 1
            print -n "."
            sleep 1
            print -n "."
            sleep 1
            print " proceeding."
            print
    		jail -r ${jail}
    		umount ${jaildir}/${jail}/usr/sysccache
    		umount ${jaildir}/${jail}/usr/ports/packages
    		umount ${jaildir}/${jail}/usr/ports/distfiles
    		umount ${jaildir}/${jail}/usr/ports
    		umount ${jaildir}/${jail}/dev
    		ifconfig bridge0 deletem ${jailif}a
    		ifconfig ${jailif}a destroy
    		;;
         *) usage ;;
    esac
    
    
    # vim: ft=zsh ts=4 sw=4 et
    
     
    swa thanks for this.
  4. jef

    jef New Member

    Messages:
    64
    Thanks Received:
    1
    Thanks girgen!

    I'm looking into this again and one concerning note for anyone working with VNET is kern/164763

    Apparently this memory leak still exists in yesterday's build of 9.1-STABLE (based on the steps to repeat from that PR) and the guidance is to stop processes within a jail that uses VNET, but not to tear it down. According to the PR, a reboot is required to reclaim the memory.

     
  5. jef

    jef New Member

    Messages:
    64
    Thanks Received:
    1
    As a follow-up on this, I've been running "VIMAGE" jails under FreeBSD 9-STABLE (with ZFS filesystems) for some time now. Jail configuration is all through jail.conf and use of pre-start/stop scripts to set up plumbing them in using netgraph tee, bridge, and eiface devices. It all runs pretty smoothly, though there are the warnings about memory leaks on the console when you shut down a jail.

    I haven't yet successfully tackled passing in both a netgraph eiface and a WiFi device to a single jail.

    Running poudriere on that machine results in the generation and destruction of hundreds of jails, but I haven't seen any critical effects yet, just the warnings about memory leaks.
     
  6. fbsd1

    fbsd1 New Member

    Messages:
    213
    Thanks Received:
    47
    The ports utility qjail has an option for creating VIMAGE jails under 9.2 using jail.conf. The SCTP problem is fixed in 9.2. Read the VIMAGE details in qjail for info about firewalls usage in 9.2 VNET jails.

    The PR kern/164763: VNET Memory leak problem still exists in 9.2.

    Big internal changes in jails between 8.X/9.0/9.1 and 9.2. Even bigger internal changers between 9.2 and 10.0. To use Netgraph you need expert understanding of networking and netgraph. epair/if_bridge is much simpler to use.

    Recommendation: vnet/vimage is not ready for production usage as of 9.2. Best to wait for 10.0 and see how it works then.
     
  7. jef

    jef New Member

    Messages:
    64
    Thanks Received:
    1
    Thanks for the heads-up on the SCTP conflict. I didn't know that had been resolved. Do you know which svn revision finally resolved it?

    I'm not sure that you need to be an expert in networking and netgraph to hook up a jail. I would call it more of an interesting experience in learning part of FreeBSD that I wasn't familiar with before. Certainly it isn't turn-key the first time through, but it is fully scripted for me here now and I can just declare a new jail in /etc/jail.conf and it will plumb it in when created, and tear down the plumbing when it is stopped.

    The basic steps that I am using are more complicated than needed -- you can skip the addition of the snoop/debug tees I use.

    1. If a ng_bridge doesn't already exist for the desired interface (real, VLAN, what have you), create an ng_bridge and attach the upper and lower hooks of the "real" interface to the first two hooks of the bridge (optionally use two tees here).
    2. Find an open hook on the bridge. Connect a new ng_eiface to the bridge (again a tee can be used).
    3. I rename the eiface and the newly created interface here to make my life easier.
    4. Assign a unique MAC address to the new interface. Do other ifconfig, as you see fit.
    5. Let jail -c give the new interface to the jail.

    About the only thing that tripped me up are the restrictions on name length for a netgraph node and an interface:
    NG_NODESIZ=32 # src/sys/netgraph/ng_message.h
    IF_NAMESIZE=16 # src/sys/net/if.h