PF Load balancing VPN tunnels using PF

I've read a number of threads on this topic but couldn't find a solution, also quite confused in reconciliation between OpenBSD PF and FreeBSD PF examples (neither of which I understand well).

Objective
I have a bhyve VM with a single network interface, and in it I open 2 VPN tunnels:
  1. vtnet0 (primary virtual network interface, gigabit internet)
  2. tun0 (OpenVPN privacy tunnel, 150Mbps)
  3. tun1 (OpenVPN privacy tunnel, 150Mbps)
I'd like to load-balance these VPN tunnels to achieve greater collective throughput (my connection is gigabit, but each tunnel maxes out at 150 Mbps). Keeping it down to 2 tunnels to keep the question simple.

My application is a torrent client, runs on this same virtual host, and binds to a single local IP address (typically that of a single tunnel IP, such as 10.120.1.118 when running only one tunnel, but with both tunnels I would like to NAT my local IP 10.0.6.47 - since I can't think of other options).

Network Configuration

Code:
$ifconfig
vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
        ether 00:a0:98:85:cc:5a
        inet 10.0.6.47 netmask 0xfffff000 broadcast 10.0.6.255
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
        inet 127.0.0.1 netmask 0xff000000
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=0<> metric 0 mtu 33160
        groups: pflog
tun0: flags=8151<UP,POINTOPOINT,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        inet 10.190.1.146 --> 10.190.1.145 netmask 0xffffffff
        groups: tun
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 6964
tun1: flags=8151<UP,POINTOPOINT,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        inet 10.120.1.118 --> 10.120.1.117 netmask 0xffffffff
        groups: tun
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        Opened by PID 7041

routes look like this:

Code:
$ netstat -rn4               
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            10.0.0.1           UGS      vtnet0    <-- my local router
10.0.0.0/20        link#1             U        vtnet0    <-- my local subnet
10.0.6.47          link#1             UHS         lo0    <-- my local IP
10.120.0.1         10.120.1.117       UGHS       tun1    <-- tun1 gateway
10.120.1.117       link#5             UH         tun1    <-- tun1 gateway
10.120.1.118       link#5             UHS         lo0    <-- tun1 ip
10.190.0.1         10.190.1.145       UGHS       tun0    <-- tun0 gateway
10.190.1.145       link#4             UH         tun0    <-- tun0 gateway
10.190.1.146       link#4             UHS         lo0    <-- tun0 ip
xxx.xxx.xxx.xxx    10.0.0.1           UGHS     vtnet0    <-- tun0 remote host IP
xxx.xxx.xxx.xxx    10.0.0.1           UGHS     vtnet0    <-- tun1 remote host IP
127.0.0.1          link#2             UH          lo0

I can use each tunnel one at a time, by adding appropriate routes pointing to each tunnel's gateway:

tun0:
Code:
route add 0.0.0.0/1    10.190.1.145
route add 128.0.0.0/1  10.190.1.145
or
tun1:
Code:
route add 0.0.0.0/1    10.120.1.117
route add 128.0.0.0/1  10.120.1.117

I am of course unable to use both at the same time; only one tunnel is usable - the one which has its respective routes added.

I tried to use PF (with my rudimentary understanding of it) to load-balance these tunnels via round-robin, but have been partially unsuccessful - I see round robin traffic on both tun0 and tun1, but the source IP is my local IP ( 10.0.6.47) and not the local IP of the respective tunnel ( 10.190.1.146, 10.120.1.118)

pf.conf:
Code:
lan_net = "10.0.0.0/20"
int_if = "vtnet0"

ext_if1 = "tun0"
ext_if2 = "tun1"

ext_gw1 = "10.190.1.145"
ext_gw2 = "10.120.1.117"

ext_ip1 = "10.190.1.146"
ext_ip2 = "10.120.1.118"

# Presumably this creates a NAT
nat on $ext_if1 from $lan_net to any -> $ext_ip1
nat on $ext_if2 from $lan_net to any -> $ext_ip2

# Testing only with TCP traffic on port 80
pass out on $int_if route-to { ($ext_if1 $ext_gw1), ($ext_if2 $ext_gw2) } round-robin proto tcp to any port 80

I run a test to check how things are working (two requests):

Code:
$ fetch -qo - http://wtfismyip.com/text
$ fetch -qo - http://wtfismyip.com/text

I see activity in both tunnels, one after another, but the source IP is still my local IP and not the one that was assigned to each of my tunnels (which I would expect to see if NAT were to be configured correctly, to my rudimentary understanding).

Outcome on tun0:
Code:
# tcpdump -ni tun0 ip
19:21:31.569393 IP 10.0.6.47.20735 > 64.120.19.134.80: Flags [S], seq 1254479142, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3530623286 ecr 0], length 0
19:21:32.602185 IP 10.0.6.47.20735 > 64.120.19.134.80: Flags [S], seq 1254479142, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3530624320 ecr 0], length 0
19:21:34.828108 IP 10.0.6.47.20735 > 64.120.19.134.80: Flags [S], seq 1254479142, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3530626546 ecr 0], length 0

Outcome on tun1:
Code:
# tcpdump -ni tun1 ip
19:21:29.017998 IP 10.0.6.47.60392 > 64.120.19.134.80: Flags [S], seq 659876046, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2511861594 ecr 0], length 0
19:21:30.043188 IP 10.0.6.47.60392 > 64.120.19.134.80: Flags [S], seq 659876046, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2511862619 ecr 0], length 0

But what I would expect to see in tcpdump, is something like this:

Expected output on tun1, for example:
Code:
# tcpdump -ni tun1 ip
19:56:35.557762 IP 10.120.1.118.46555 > 64.120.19.134.80: Flags [S], seq 3812475477, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3960557121 ecr 0], length 0
19:56:35.589131 IP   64.120.19.134.80 > 10.120.1.118.46555: Flags [S.], seq 2182009712, ack 3812475478, win 65160, options [mss 1103,sackOK,TS val 3087361557 ecr 3960557121,nop,wscale 8], length 0
19:56:35.589144 IP 10.120.1.118.46555 > 64.120.19.134.80: Flags [.], ack 1, win 1035, options [nop,nop,TS val 3960557152 ecr 3087361557], length 0
19:56:35.589346 IP 10.120.1.118.46555 > 64.120.19.134.80: Flags [P.], seq 1:108, ack 1, win 1035, options [nop,nop,TS val 3960557152 ecr 3087361557], length 107: HTTP: GET /text HTTP/1.1
19:56:35.620359 IP   64.120.19.134.80 > 10.120.1.118.46555: Flags [.], ack 108, win 255, options [nop,nop,TS val 3087361589 ecr 3960557152], length 0
19:56:35.637612 IP   64.120.19.134.80 > 10.120.1.118.46555: Flags [P.], seq 1:202, ack 108, win 255, options [nop,nop,TS val 3087361607 ecr 3960557152], length 201: HTTP: HTTP/1.1 200 OK
19:56:35.637624 IP   64.120.19.134.80 > 10.120.1.118.46555: Flags [F.], seq 202, ack 108, win 255, options [nop,nop,TS val 3087361607 ecr 3960557152], length 0
19:56:35.637630 IP 10.120.1.118.46555 > 64.120.19.134.80: Flags [.], ack 203, win 1032, options [nop,nop,TS val 3960557201 ecr 3087361607], length 0
19:56:35.637917 IP 10.120.1.118.46555 > 64.120.19.134.80: Flags [F.], seq 108, ack 203, win 1035, options [nop,nop,TS val 3960557201 ecr 3087361607], length 0
19:56:35.670674 IP   64.120.19.134.80 > 10.120.1.118.46555: Flags [.], ack 109, win 255, options [nop,nop,TS val 3087361640 ecr 3960557201], length 0

Then, would have liked to see 10.120.1.118 map back over NAT to 10.0.6.47, and my application listening on 10.0.6.47 receives the traffic and achieves greater throughput.

Could someone please point me in the right direction of how to make this happen? Is my NAT misconfigured, or am I misunderstanding how this configuration might work?
 
im not sure when the nat is applied (not familiar with pf) but if the rules are executed in order and packets pass only once thru pf you probably need the route-to before nat rules
LE
a very crude solution (without pf, nat) is
route add 0.0.0.0/1 10.190.1.145
route add 128.0.0.0/1 10.120.1.117
wont work for getmyipstuff.com but as a torrent client might have some effect
 
In pf the order of NAT (translation) must come before the route-to rule (filtering); if not - if we follow your suggestion, pf will output an error:

Code:
Rules must be in order: options, normalization, queueing, translation, filtering

I appreciate the segmentation suggestion, but splitting the network like that is indeed quite crude :) Plus 2 tunnels is just an example, my intent was to scale up to 8 in order to take full advantage of my internet connection. I may wish to hold my breath on this topic a bit longer :)
 
if you control the vpn exit points then you iroute add your vm lan ip and skip local nat
the exit points already do nat but they don't know what to do with 10.0.6.47
 
im not sure when the nat is applied (not familiar with pf) but if the rules are executed in order and packets pass only once thru pf you probably need the route-to before nat rules
LE
a very crude solution (without pf, nat) is
route add 0.0.0.0/1 10.190.1.145
route add 128.0.0.0/1 10.120.1.117
wont work for getmyipstuff.com but as a torrent client might have some effect

PF applies the last matching rule, unless it's marked with a 'quick' designation.
If you have a 'pass all' at the very bottom of your ruleset and no rules that are marked as 'quick',
all traffic will pass with no filtering.
T
 
The problem is a bit more subtle (to my ignorant mind).

I was able to get NAT to work, but only if I NAT out from the internal interface ( vtnet0), and I can't seem to be able to intercept egress traffic on the external interfaces ( tun0, tun1) via PF.

Here's a sample where NAT works, but I can only configure the outbound NAT on vtnet0 for a single outgoing IP:

pf.conf

Code:
# Only the first NAT will work (to $ext_ip1), unsurprisingly
nat on $int_if from $lan_net to any port 80 -> $ext_ip1
nat on $int_if from $lan_net to any port 80 -> $ext_ip2

pass out on $int_if route-to { ($ext_if1 $ext_gw1), ($ext_if2 $ext_gw2) } round-robin proto tcp to any port 80 keep state allow-opts

I understand why NAT in the above config works for only one of the IPs, and I understand why the config above does not achieve what I want. But it does actually mutate traffic in a predictable way.

Here's an outline for what I had in mind, although it's not what I actually did in my pf.conf examples above:

1. The traffic is expected to originate from $int_if (from a $lan_net IP)
2. The $int_if gets routed-to (via round robin) to $ext_if1 or $ext_if2
3. At this point, in the context of $ext_if<1|2> NAT is supposed to translate $lan_net source IP to $ext_ip1 or $ext_ip2
4. Traffic exits the $ext_if<1|2> through each interface's respective NAT'd IP

The problem is that after I use PF to redirect traffic from $int_if to either of the $ext_if interfaces, I can't seem to have any control over any traffic on $ext_if interfaces via PF (ingress or egress).

Why is that?

For example, when I do this in order to realize the plan outlined above, there is nothing being captured on egress $ext_if1 or $ext_if2:

Code:
nat on $ext_if1 from $lan_net to any port 80 -> $ext_ip1
nat on $ext_if2 from $lan_net to any port 80 -> $ext_ip2

pass out on $int_if route-to { $ext_if1 , $ext_if2 } round-robin proto tcp to any port 80

# At this point I have no control over $ext_if1 or $ext_if2, and NAT doesn't happen on those interfaces

It seems like my goal should be first to change outbound context from $int_if to $ext_if<1|2>, and then to expect the above NAT configuration to work. Does anyone have any ideas on how I could do this?

Edit: to be clear, when I say nothing gets captured on $ext_if1 or $ext_if2 - I mean strictly in PF context. I do see traffic on those interfaces via tcpdump, I'm just unable to manipulate it at all from PF.
 
covacat that's a very interesting way to add an abstraction layer! I tried to take a similarly spirited route entirely locally by introducing custom loopback interfaces to change directionality context, but that didn't work out.

Your suggestion is quite brilliant, and may very well be a plausible workaround. I could even reuse the same jails for multiple hosts that require such a config.

Not to sound greedy, but I would very much like to learn of any solutions that can be configured locally without adding such a complex abstraction layer, but it is quite an acceptable strategy for my situation. Many thanks.
 
Back
Top