Solved FIB 1 routing on direct interface stops working after upgrade from 12.4 to 13.2

rowan194 · Jan 27, 2024

[SOLUTION: sysctl net.add_addr_allfibs=1 in /etc/sysctl.conf]

----

I'm having trouble with routing (on FIB 1) that worked, until an upgrade from 12.4R to 13.2R. It's a 4G mobile data stick, but it presents as USB ethernet (ue0) with a DHCP server and gateway. After the upgrade, the gateway is no longer reachable on FIB 1?

I feel like I'm missing something fundamental here. Shouldn't any FIB be able to send to directly reachable ethernet hosts? Is there some behaviour that changed between 12 and 13 that requires a change in config?

(Edit: I have confirmed the same happens with a standard 'em' ethernet interface. FIB 1 will not allow me to add a route via a directly reachable host.)

Further info...

Interface exists, is up, and has the usual IP (.182) assigned by DHCP on the 4G stick:

Code:

# ifconfig ue0
ue0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        ether 36:4b:50:b7:ef:2d
        inet 192.168.0.182 netmask 0xffffff00 broadcast 192.168.0.255
        media: Ethernet autoselect
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

Gateway is pingable via main FIB 0:

Code:

# ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=1.646 ms

...but gateway is not pingable via FIB 1:

Code:

# setfib 1 ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1): 56 data bytes
ping: sendto: No route to host
ping: sendto: No route to host

And that also means I cannot add any routes on FIB 1:

Code:

# setfib 1 route add default 192.168.0.1
route: writing to routing socket: Network is unreachable
add net default: gateway 192.168.0.1 fib 1: Network is unreachable

Thanks for any help.

sko · Jan 27, 2024

The interface em0 belongs to the context of FIB0, you have to either put the interface in the context of that FIB (via ifconfig em0 fib1) or place the packets from FIB0 into FIB1 e.g. via the 'rtable' parameter in PF. E.g.: pass in on em0 to !<localnets> rtable 1 to place all traffic that should go to anything not defined in <localnets> into FIB1.

You can always check the contents of a FIB via netstat -nrF<fibnum> to see what is reachable via that FIB.

Routing between FIBs can get quite messy and hard to debug and IMHO the PF syntax on FreeBSD is a bit weird for that, too. But I might be a bit biased, because I usually use OpenBSD on routers, so I'm more used to their PF syntax and the nuances of routing domains. The best advice I can give regardless of the PF variant: try to avoid inter-FIB/rdomain routing at any cost.
FIBs (and rdomains) ar great to completely segragate interfaces and traffic flows, e.g. on a router having the management interface and most services in the default FIB that (only) has access to the management network, and place all routing/uplink interfaces into another FIB/rdomain and run any services for those networks (e.g. routing daemons, DHCP etc...) within the context of that FIB/rdomain.

edit: regarding why your setup doesn't work any more after the update:
from the 13.0-RELEASE notes:

net.add_addr_allfibs sysctl default has beenchanged to 0. 2d3982419593

If you set that sysctl to "1" you should be able to restore the previous behaviour.

PMc · Jan 27, 2024

As sko already mentioned, behaviour has seriousely changed somewhere in Rel. 13.
There should be messages during startup, like these:
WARNING: Adding ifaddrs to all fibs has been turned off by default. Consider tuning net.add_addr_allfibs if needed

What that means is, whenever you configure an interface with ifconfig, you get two basic routes automatically inserted into the routing table:

Code:

192.168.0.0/25    link#3             U          igb2
192.168.0.2       link#3             UHS         lo0

And only by these routes does the connecting to directly reachable hosts work.

Before 13.x these routes were inserted into all fibs. Now they are only inserted into the designated fib given by the fib keyword to ifconfig (or into fib 0 by default). This has advantages if you intend to instead configure some sophisticated behaviour, but in any case you must now take care to configure the appropriate routes yourself - or elsewise try to use the mentioned sysctl.

rowan194 · Jan 28, 2024

Glad to know I'm not going crazy.

The reason I'm using a non default FIB is that the 4G stick is a backup link, and I need to maintain a consistent connection with the endpoint of a VPN tunnel, which I do by routing the IP of the endpoint via FIB #1.

It's probably better that I do the config properly rather than just change the sysctl, but I'm wondering how ifconfig is going to work with a pluggable USB device, and an IP assigned via DHCP. A simple ifconfig ue0 fib 1 does not add directly connected routes. dhclient already runs on fib 1 (and has since before the upgrade), so I'm not sure why that also isn't picking up the routes (eg default) that the DHCP server on the 4G stick offers.

(Edit: after setting net.add_addr_allfibs=1 and rebooting, ifconfig ue0 shows fib: 1, presumably because the script which brought it up runs as setfib 1 script.sh. So why is the 192.168.0.0/24 route not installed for that fib when the sysctl is 0?)

PMc · Jan 28, 2024

rowan194 said:
A simple ifconfig ue0 fib 1 does not add directly connected routes

No. This command in itself does something different; it should give you a fib entry on the interface:

Code:

nlan_2u: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=28<VLAN_MTU,JUMBO_MTU>
        fib: 1

And this entry will make all packets incoming from that interface being automatically tagged with fib 1.

We need to consider here the two directions of packet flow. The routing table (which would be selected from the fib on a packet) determines which outgoing iface will be used. Whereas the fib entry on an interface determines which fib tagging an incoming packet gets.

Then, running ifconfig inet some_address ... after ifconfig fib N should bring these routes into the desired fib (while it actually has nothing to do with each other, because opposite directions).
Further, there are a lot of possible codepaths, like running ifconfig under setfib etc., and not all of them result in the expected behaviour (or let's say, the behaviour I would have expected

).
This definitely needs a bit of test and experimentation within the actual use-case.

So, if your environment did work just fine in R.12, then I would put my bet onto the sysctl and otherwise leave it as it is.

In any case you can also insert (or remove) the required routes explicitely with commands like (according to my snippet from above):

Code:

route add -host -iface 192.168.0.2 lo0 -fib 1
route add -net -iface 192.168.0.0/25 igb2 -fib 1

rowan194 said:
. dhclient already runs on fib 1 (and has since before the upgrade), so I'm not sure why that also isn't picking up the routes (eg default) that the DHCP server on the 4G stick offers.

Sorry, but DHCP is something I try to stay away from, as much as possible.

rowan194 · Jan 28, 2024

PMc said:
Further, there are a lot of possible codepaths, like running ifconfig under setfib etc., and not all of them result in the expected behaviour (or let's say, the behaviour I would have expected ).
This definitely needs a bit of test and experimentation within the actual use-case.

So, if your environment did work just fine in R.12, then I would put my bet onto the sysctl and otherwise leave it as it is.

My concern is that at some point in a future release, net.add_addr_allfibs may be removed or superseded by some other functionality, and everything breaks again. I guess for now it's best to stick with what works.

With regards to expected behaviour, I've done both explicit ( setfib 1 ...) and implied/inherited (entire script that uses routing related commands runs via setfib 1). Again, something that has worked until now, but probably not the best way to go.

There is something else odd that's happening, that is possibly related to (other?) routing related changes made betwen 12.x and 13.x. I have an SSH connection (on fib 1) that uses remote port forwarding: a connect to localhost : port on the remote end will connect to IP : port on the local end. When connecting to IP on the local side, ssh now uses the source IP of the WAN interface, rather than the source IP of the LAN interface (and it's this interface that a connect goes via).

FreeBSD 12.x: ssh local connect uses 203.x.x.193 (em0 [LAN interface] IP) -> 203.x.x.20
FreeBSD 13.x: ssh local connect uses 192.168.0.182 (ue0 [WAN interface] IP) -> 203.x.x.20

192.168.0.182 is (deliberately) not an address known to any machines on the LAN, which means that it's impossible to establish a connection with 203.x.x.20

Unsure if this is a 13.x thing, or something has changed in ssh. Any ideas?

(Setting up a kludge route for 192.168.0.182 won't work, because two different routers have the same 4G stick, and thus, the same local IP for each of their respective ue0 interfaces...)

Luckily this is a backup router, with a LAN and WAN interface, that sets up a couple of tunnels, then sits there monitoring for loss of WAN connectivity on the other router. Not looking forward to upgrading the main router.

Cath O'Deray · Jan 28, 2024

rowan194 said:
… Is there some behaviour that changed between 12 and 13 that requires a change in config? …

sko said:
… from the 13.0-RELEASE notes: …

Mentions of fib in two sets of notes include:

sko · Jan 28, 2024

rowan194 said:
The reason I'm using a non default FIB is that the 4G stick is a backup link, and I need to maintain a consistent connection with the endpoint of a VPN tunnel, which I do by routing the IP of the endpoint via FIB #1.

TBH, this seemingly simple scenario is rather clunky to achieve on FreeBSD, because sadly it lacks route priorities.
On OpenBSD you can simply set priorities to the interface (or directly to the learned/configured route), and the default routes in the routing table are used according to their priorities - the only caveat is, the route actually has to disappear or become unreachable if the uplink goes down.

rowan194 said:
There is something else odd that's happening, that is possibly related to (other?) routing related changes made betwen 12.x and 13.x. I have an SSH connection (on fib 1) that uses remote port forwarding: a connect to localhost : port on the remote end will connect to IP : port on the local end. When connecting to IP on the local side, ssh now uses the source IP of the WAN interface, rather than the source IP of the LAN interface (and it's this interface that a connect goes via).

I also remember having some weird (wrong) source IPs and unwanted cross-talk between seemingly separated interfaces, when dealing with multiple FIBs and trying to (not) route between them. Hence I exclusively use FIBs for jails now, e.g. if I need trimmed down or differing routing tables from the host.
For actual (and more complex) routing I now exclusively use OpenBSD with routing domains, which IMHO is more predictable in behavior and easier to grasp: if an interface or service is running in an rdomain, all routes, traffic etc from that belongs *only* to that rdomain, NO exceptions. Every interaction between rdomains have to be explicitly done via PF rules.

But in your case setting the sysctl *should* already be sufficient to restore your previously working config. If there were plans to remove that sysctl, it would have been mentioned in the commit message, so I doubt it will go anywhere, especially because the old behavior is explicitly wanted in several usecases.

rowan194 · Jan 28, 2024

sko said:
But in your case setting the sysctl *should* already be sufficient to restore your previously working config. If there were plans to remove that sysctl, it would have been mentioned in the commit message, so I doubt it will go anywhere, especially because the old behavior is explicitly wanted in several usecases.

Turns out net.add_addr_allfibs used to work in loader.conf, but now needs to be in sysctl.conf. I was trying to figure out why all interfaces were not showing in FIB 1 like they do on the other 12.x router, but it was because the value of net.add_addr_allfibs was still the (new) default. Fixed, and ssh is now working. It's been a long day...

BTW, it looks like the original intent was to remove the old behaviour entirely, but presumably that was walked back after those usecases were mentioned:

"The goal is to make net.add_addr_allfibs=0 default behaviour and remove net.add_addr_allfibs."

net.add_addr_allfibs=1 behaviour deprecation