VNET networking in jails with NAT

rf10 · Sep 13, 2019

Intro:
It appears ~Luna in [this] post and I are trying to do something very similar. Instead of hijacking the original thread, I decided to create a new one (but with the link to the original question by ~Luna, maybe it will help them).

Current State:
I am currently running a handful of applications in jails and because there is a handful of them, I was able to "get by" cloning a lo1 on the host, assigning a predefined ip4 range and using IPs from this range in jail definitions (really, hardcoding them in the jail configuration file). I am also using pf for nat traffic in and out of jails where this is necessary. The cloning of lo1 interface and hardcoding ipv4 addresses in jails appears to be the solution to how lo0 interface behaves in jails (it is done to basically prevent jailed services from binding to the host's loopback interface).

Desired State:
Minimally, I want to run a set of jails (let's say, 100 different services) on their own subnet with ip addresses handed over by the dhcpd service running on one of the jails on this subnet. Obviously, manually hardcoding 100 ip addresses in the jail configuration files is not sustainable anymore, so definitely need dhcp paired with bind for name resolution.

So far I tried:
1. VNET jails with NetGraph (I ruled out epair bridging because this leaves the a-pair visible on the host, and I would eventfully end up seeing 100s of interfaces). I used the `jng` script to create a netgraph (ng) bridge with the host's "real" interface and create any additional eifaces for individual jails. This *almost* worked: I was able to give static ip addresses to vnet jails if I wanted to, and once the dhcpd service was running in the jail, it was happy to give out ip addresses to any other jails with a dynamic ip configuration. The almost part is because I realized that the dhcp server running in a jail was also responding to DHCP requests made by the devices outside of my jail subnet. I believe this is a side effect of using the *bridge* which allows the broadcast traffic to pass in either direction its ethernet neighbors. The stumbling block is that I can't figure out the pf rule to block the broadcast traffic from leaving the net graph bridge, especially, since it may not be visible to it (the net graph bridge does not show up in the output of ifconfig cmd on the host). I can write a rule to block/allow traffic by using subnet criteria, but to write a rule without having the iface is beyond by skill level

.
2. This led me to the next solution: created a new net graph eiface device for the host is bridged it with a "real" iface using net graph's bridge. The idea is to use this new iface (ngeth0) in pf rules to block traffic at the interface level. So, the device creation part was not trivial, but I was able to do it yesterday. However, when I tried to use this new eiface (ngeth0) in the `jng` script, it didn't work. After some troubleshooting it appeared to have failed in enabling the promiscuous mode on the "fake" eiface ngeth0 (or maybe it was the new ng bridge - can't remember now)
3. This whole thing may be easier if I were to just add another network interface to the host (it just occurred to me yesterday, doh!!!), and use one of jails, and one for everything else; the whole thing is running on the bhive virtual machine, anyways, but I can't help thinking what would happen when I need to deploy it on a real box (dedicate a set of network cards for jails? is this what people do?). I didn't try this one yet, but don't see why this wouldn't work.

Still, the open questions:
1. (Probably simple) Can I use a pf rule to block broadcast traffic coming from the bridge connected to the network interface (without blocking the broadcast traffic on the network interface itself)? I prefer pf since I already invested a lot of time into educating myself in syntax, etc.
2. (Open-ended) Did anyone (other than ~Luna) attempt a similar setup and care to share experience?

tommiie · Sep 15, 2019

Do you still have your configuration for your attempt #1 (VNET jails with Netgraph)? If you don't connect your bridge to your physical network interfaces, the DHCPd jail cannot give out IP addresses to "public" machines, only to jails.

I am currently also playing with jails and jail networking but I find it hard to find decent documentation on the subject, especially when it concerns Netgraph. Perhaps we could form some kind of "study group" with a wiki where we share information, notes, et cetera?

In my notes I wrote "(...)the jng script does not work out of the box [for me]." I do not believe I currently have a working example nor clear notes on what I attempted and failed for me. But I will look into it and provide some pointers where possible.

keafao · Sep 16, 2019

I am sharing my expirience/setup with vnet netgraph... but it was my first and only time using vnet/netgraph and pf. I read your thread by curiosity and hoping learn more... and from what I read you seem to be looking for pf more than vnet...

Jng does not worked for me if used on jail.conf following docs examples, I posted about it here a few months ago. If I run on shell it works like the other scripts I tested.

I read the jng, virtual.lan and vnet scripts. Both do basically the same thing, with differences like one creates the ipv4 ipv6, mac... and other do not create.

Basic steps:
1. Create a bridge if not exist
2. Create a interface (eiface) and link to bridge
3. Rename eiface if you want
4. Connect the xn0 upper hook to xn0bridge

xn0 is the ether for micro aws ec2. The medium instances is ena0.

I found more google results when searching for ngctl.

Got stuck with when I tried to create a bridge and link to ether like ngctl docs, it drops my ssh. I was testing this on aws ec2,

Here is the command I use to create a bridge

Code:

ngctl mkpeer xn0: bridge lower link0

But when I put the 4 steps in a script and run, it works.

This could explain why I was loosing ssh connection https://serverfault.com/questions/487775/creating-a-bridge-on-ec2-causes-connectivity-loss

Here is my netgraph script example creating bridge and connect the ether and an eiface.

Bash:

# vi ng_ta.sh
#!/bin/sh

# PROVIDE: ng_ta
# REQUIRE: LOGIN FILESYSTEM
# BEFORE: securelevel jail

. /etc/rc.subr

name="ng_ta"
rcvar=ng_ta_enable
start_cmd="${name}_start"
stop_cmd=":"

ng_ta_start()
{
# load ng_ether, xn0 was not load on boot
kldload ng_ether
# ngctl msg xn0: setpromisc 1
# ngctl msg xn0: setautosrc 0

# create a bridge and link to xn0
ngctl mkpeer xn0: bridge lower link0
# change name to xn0bridge
ngctl name xn0:lower xn0bridge
# connect upper xn0 to xn0bridge
ngctl connect xn0: xn0bridge: upper link1

# create a eiface and connect to xn0bridge
ngctl mkpeer xn0bridge: eiface link2 ether
# change name
ngctl name ngeth0: xn0_haproxy
ifconfig ngeth0 name xn0_haproxy

# assign ipv4 and ipv6
ifconfig xn0_haproxy 172.18.0.1/24
ifconfig xn0_haproxy inet6 fd12:f18:c26:4426::1 prefixlen 128 alias

}

load_rc_config $name
run_rc_command "$1"

In my tests, mysql (inside a jail) was getting error, since it was up before netgraph setup. So I put the # BEFORE: jail on my netgraph script.

On my host rc.conf

Code:

# vi /etc/rc.conf
# Netgraph
ng_ta_enable="YES"

And on host jail.conf I put the ipv4 and ipv6. But since you are dealing with 100, you will be doing this another way.

I put the netgraph script on /usr/local/etc/rc.d/ but I test it on /usr/local/sbin first.

Here is the result when boot

Code:

# ngctl ls -l
There are 4 total nodes:
  Name: xn0             Type: ether           ID: 00000001   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  upper           xn0bridge       bridge       00000003        link1
  lower           xn0bridge       bridge       00000003        link0
  Name: xn0bridge       Type: bridge          ID: 00000003   Num hooks: 3
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  link2           xn0_haproxy     eiface       00000007        ether
  link1           xn0             ether        00000001        upper
  link0           xn0             ether        00000001        lower
  Name: xn0_haproxy     Type: eiface          ID: 00000007   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  ether           xn0bridge       bridge       00000003        link2
  Name: ngctl1055       Type: socket          ID: 00000009   Num hooks: 0

Code:

# ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
    inet 127.0.0.1 netmask 0xff000000
    groups: lo
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9001
    options=503<RXCSUM,TXCSUM,TSO4,LRO>
    ether 02:89:e3:cb:5e:24
    inet6 fe80::89:e3ff:fecb:5e24%xn0 prefixlen 64 scopeid 0x2
    inet 172.30.0.239 netmask 0xffffff00 broadcast 172.30.0.255
    media: Ethernet manual
    status: active
    nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
xn0_haproxy: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=28<VLAN_MTU,JUMBO_MTU>
    ether 00:00:00:00:00:00
    inet 172.18.0.1 netmask 0xffffff00 broadcast 172.18.0.255
    inet6 fe80::1427:e888:767c:dce1%xn0_haproxy prefixlen 64 scopeid 0x3
    inet6 fd12:f18:c26:4426::1 prefixlen 128
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

I do not found the dhcpd requests issues. And about pf, I also use pf but do not have a rule like your case. I think you can use block rule for it.

tommiie · Sep 16, 2019

Hey keafao, thank you for sharing your trials and the script. It will take me some time to figure out exactly what you are doing as I find the terminology and concepts of Netgraph quite weird.

Below are my notes.

Code:
First create three ethernet interfaces. You can’t name them while
creating them. e rst parameter to mkpeer is the type of node you
want to create. In our case we want Ethernet interface as we will want
to bridge them together.

I’m not sure about the last two parameters. Perhaps they are to
indicate which type of hook you want; in the case of an eiface, they
only supported type is ‘ether’. It would make sense to require the peer
to be of the same type.

Code:

# ngctl mkpeer eiface ether ether # ifconfig ngeth0 name ng0_host # ngctl mkpeer eiface ether ether # ifconfig ngeth1 name ng0_jail1 # ngctl mkpeer eiface ether ether # ifconfig ngeth2 name ng0_jail2

Next create a bridge connecting them together. is syntax is even
fuzzier.

Code:

# ngctl mkpeer bridge link0 link0 # ngctl name [1e]: br0 # ngctl connect br0: ngeth0 : link0 ether # ngctl connect br0: ngeth1 : link1 ether # ngctl connect br0: ngeth2 : link2rp ether

I expect this would create a node of type “bridge” and both its local
and peer hook are of type “link0”. However, ngctl list does not list
this newly created bridge so I must have done something wrong.

Code:

# ngctl list There are 4 total nodes : Name : ngeth0 Type : eiface ID: 00000007 Num hooks : 0 Name : ngeth1 Type : eiface ID: 00000009 Num hooks : 0 Name : ngctl37171 Type : socket ID: 0000001 a Num hooks : 0 Name : ngeth2 Type : eiface ID: 0000000 e Num hooks : 0

Note also that although I have renamed the interfaces, I did so
only in the ‘global’ environment, when issuing a ifcong. To rename
the intefaces also in ngctl, you need to rename them separately.

Code:

ngctl name ngeth0 : ng0_host ngctl name ngeth1 : ng0_jail1 ngctl name ngeth2 : ng0_jail2

Note that the colon aer the name is required.
I don’t understand why you have to rename the interfaces twice.
What would be the added value.

I hope I can find some time this week to play around with Netgraph again. Hopefully I can now figure out how it all works.

I'm currently playing around again with aliases, giving jails IP addresses attached to a loopback address so they are on a "private" network and let PF use NAT to connect them to the outside world. This makes sense to me for my VPS where I only have one public IP address.

rf10 · Sep 16, 2019

Thank you both. I figured it out, and I *think* it all works now.

I am using the "vnet" script (which is an extension to jng), which allows me to either statically configure ip addresses in jails or omit it and use DHCP inside of a jail. Another benefit is that it creates the eiface on the host which can be used for pf rules.

I struggled a bit with the pf block rule on the host machine, and I am still not 100% sure it works. The DHCP broadcast traffic I am seeing on my host may have something to do with my test environment (a physical box running bhyve VMs with a shared vm-switch set in promiscuous mode). Pretty sure this is a non-issue on any public cloud provider: in a shared environment you always want to aggressively block the broadcast traffic. I was able to solve this by setting a pf block rule on the broadcast traffic from other networks from ever entering the bhyve test vm (where the jails are deployed), although ideally I want this to happen on my jails host itself.

What is left is to do is to test this again, and to document all this. I will share the link here once done.

rf10 · Sep 16, 2019

tommiie said:
I'm currently playing around again with aliases, giving jails IP addresses attached to a loopback address so they are on a "private" network and let PF use NAT to connect them to the outside world. This makes sense to me for my VPS where I only have one public IP address.

This is my "current" setup as well - a cloned loopback address with aliases assigned to it. Basically, it i like this:
1. In /etc/rc.conf

Code:

cloned_interfaces="lo1"
ifconfig_lo1_alias0="inet 10.0.0.10-20 netmask 255.255.255.0"

pf_enable="YES"
gateway_enable="YES"

jail_enable="YES"
jail_list="test-01 test-02"

2. in /etc/pf.conf

Code:

ext_if="vnet0"
jail_if="lo1"
jail_net="10.0.0.0/24"
ext_ip = "$ext_if:0"
ext_net = "10.10.10.0/24"

front_end="10.0.0.1"
front_end_services="{ http, https }"

set block-policy drop
set loginterface $ext_if
set optimization normal

set skip on $jail_if

# Allow jail subnet to access the internet on the external interface
nat pass on $ext_if from $jail_net to any -> ($ext_if)

# Redirect http(s) ports on the external interface to read front end services running in jails
rdr pass on $ext_if proto tcp from any to ($ext_if) port $front_end_services -> $front_end

pass out all

3. In /etc/jail.conf

Code:

exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;
mount.devfs;

$domain="foo.com";
$subnet="10.0.0";

host.hostname="$name.$domain";
path="/usr/local/jails/run/$name";
interface="lo1";
ip4.addr = "$interface|$subnet.$ip";

test-01 {
  $ip = 1;
}

test-02 {
  $ip = 2;
}

This approach is actually quite useful, e.g. like not to allow the traffic between the app in one jail and the database in another to go through the network at all. The problem is that the addresses are hardcoded in the jail config file. This is fine for a handful of services but I found it difficult to keep track of ip address assignments once you have more than 10 or so jails. This prompted to look for a proper networking setup with DHCPD, DNS, etc.

tommiie · Sep 16, 2019

rf10 said:
Code:

ip4.addr = "$interface|$subnet.$ip"; test-01 { $ip = 1; }

That is a cool trick I did not know about. Thanks!

i don't like your idea of a PF block rule. I want the Netgraph bridge to be completely isolated from the outside host network. I'm looking at options for NAT with Netgraph or connecting a loopback interface on the host to the Netgraph bridge and again use NAT on the PF host.

keafao · Sep 16, 2019

Hey tommiie,

I also have only one public IP address for this server and use PF/Nat to connect them to outside.

I read your notes, the colon note I spend few minutes because of it. I remember typing ngctl info xn0_haproxy and got error, until got the missing colon... I also expected when I change the eiface name it would reflect on ifconfig but do not. In jng, they rename using both.

Hey rf10,

I do not remember now what script, but one of them I got a strange conflict. It conflict with another command when I doing these tests, but I do not took note.

I found nice that you are using bhyve. I want to get one laptop to install freebsd in future to test it.

About assign several ips and keep track to jail.conf, read the virtual.lan script it may give you idea to solve this. It create the netgraph and jails.

/usr/share/examples/netgraph/virtual.lan

rf10 · Sep 16, 2019

keafao said:
About assign several ips and keep track to jail.conf, read the virtual.lan script it may give you idea to solve this. It create the netgraph and jails.

/usr/share/examples/netgraph/virtual.lan

I am aware of this script. I had a couple of design problems with it.
1. It mixed networking and jails. Calling it "virtual.lan" implies it creates a network topology, which is what I wanted to use it for. But then it also did something with jails. I think whomever created it needed it for jails, so... combined the two.
2. It was just... a script you need to run manually. In most scenarios, you would want to setup your network at startup. So, I turned it into a service which you can turn on and off in your rc.conf, or to specify additional parameters. Let me know if you are interested, I can share it.

I didn't actually use it for my jail solution, but it is useful for general networking setup using netgraph.

rf10 · Sep 16, 2019

tommiie said:
i don't like your idea of a PF block rule. I want the Netgraph bridge to be completely isolated from the outside host network. I'm looking at options for NAT with Netgraph or connecting a loopback interface on the host to the Netgraph bridge and again use NAT on the PF host.

Yes, I scratched that off. In my new solution (as of last night), I do not bridge with the host network, just use NAT for letting jails talk to the outside, and RDR to talk the outside world with jails on specific ports.

tommiie · Sep 16, 2019

rf10 said:
Let me know if you are interested, I can share it

I'm certainly interested in seeing the result. Perhaps I can steal some ideas from it ;-)

keafao · Sep 16, 2019

rf10, I thought you did not see it... anyway maybe inserting the ip4.addr line in jail.conf from your netgraph script can solve.

About sharing your solution, yes I would like to read. I clicked on your thread to see how others do.