Questions Regarding BHyve and FreeBSD Bridging (11.2)

Hey gang -

Here's my server:
  • interfaces em0 and em1 bond together into interface lagg1
  • interface lagg1 gets added to interface bridge0 to share the network with bhyve VMs
  • IP address the server is applied to bridge0, not lagg1
Questions and/or Challenges:
1. I have the first tap interface for each VM get added to bridge0 so they can access my internal network here. Each time the VM boots or gets nuked and the tap interface is added or removed from the bridge, the interfaces that make up the lagg flap. Which causes the lagg interface to flap. Which causes the server to drop off the 'net for a few seconds. I read a back-and-forth thread on the kernel bugzilla:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221122

and it looks like there are a list of switches that can be added to the ifconfig line?

Code:
-tso -lro -toe -txcsum -txcsum6

Am I misunderstanding that? Because it doesn't seem to matter. VM creation and/or deletion causes the lagg to flap which I just can't have happen. That's massively no bueno and I'm hoping there's a way to keep the lagg (and bridge) up? Any suggestions?

2. Once the VMs boot, they attempt to DHCP out. Nyet. The DHCP server (my FreeBSD router) doesn't see the requests at all, so it never replies. I can statically assign IP addresses to the interfaces and they have full network access. But the DHCP broadcasts aren't getting through. Is the bridge0 interface stomping on them? Or the lagg?

Help? :)

Thanks!
 
  1. Basically it sounds like the flags on the lagg interface should mirror what the bridge interface supports, or this can cause flapping. I'd check the flags before and after bridge creation to see which ones you need to disable. I have a similar setup at work (I think it uses em NICs, but I don't recall off the top of my head), but I'm also using vlans on top of lagg, so from the VM's PoV packets go tap <-> bridge <-> vlan <-> lagg <-> emX. That may be why I'm not experiencing the problem you are. I'll check when I'm next in or have my laptop handy to VPN in.
  2. IIRC, I use static configuration on our Windows guests, but I think DHCP should work. Do you firewall any of the interfaces? What driver are you using for the virtual Ethernet within Windows? Have you run Wireshark from within Windows or tcpdump on the FreeBSD interfaces?
 
I'd check the flags before and after bridge creation to see which ones you need to disable.

While the VM is running:
Code:
# ifconfig em0
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=2098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>

And after the VM is destroyed:
Code:
# ifconfig em0
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>

So RXCSUM and TXCSUM are being added back. What do I do here?

IIRC, I use static configuration on our Windows guests, but I think DHCP should work. Do you firewall any of the interfaces? What driver are you using for the virtual Ethernet within Windows? Have you run Wireshark from within Windows or tcpdump on the FreeBSD interfaces?

Windows isn't running in this case; the VM is a Linux image that's wide open; on other hypervisors it DHCP's out just fine. On the FreeBSD hypervisor, I have these lines in my pf.conf file:

Code:
# Allow DHCP
pass in quick proto udp from port = 68 to port = 67
pass out quick proto udp from port = 67 to port = 68

And nothing is being logged in pf when the VM tries to DHCP out. So as far as I can see: pf isn't dropping anything for DHCP.

Thanks.
 
I still haven't found an solution to the bridge and interface bouncing. It looks like the last tap interface to get pulled from bridge0 causes the lagg interface to bounce. If there are already tap interfaces in the bridge and one is removed, nothing happens. Hopefully a solution can be found but I can work around it for now.

Through some combination of changes to the pf filtering, I've managed to get the DHCP requests out and back in again. So I think I'm good there. The VMs are snagging their management IP and all is well.

Another bridging question: Now that I have the VMs running and a series of bridge and tap interfaces created, I'm noticing that certain L2 traffic doesn't transit the bridges properly. LLDP is the thing that caught my eye. Each VM-to-VM interconnect is via a bridge interface created using the
vm switch create
command. Each bridge has 2 tap interfaces that get added to it, to act like virtual point-to-point network cables between the VMs. For instance:

Code:
# ifconfig vm-spn01-lf01
vm-spn01-lf01: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 4a:78:50:ca:00:e2
    nd6 options=1<PERFORMNUD>
    groups: bridge vm-switch viid-1c2d6@ 
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: tap22 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 46 priority 128 path cost 2000000
    member: tap7 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 31 priority 128 path cost 2000000

Interface tap7 is part of one VM, tap22 the other. Once the VMs boot, they can see the opposing interface via IP. I can set /31 IP addresses on each interface within the VM, and they can ping each other, create BGP sessions with each other, etc. But what they can't do is exchange LLDP info.

The bridge interface is eating the L2 traffic for some reason. I know this because I can attach tcpdump to each tap interface and I clearly see LLDP traffic egressing each one. But it never ingresses. It seems like it's going right from the VM to the tap interface to the hypervisor, which is not what I want.

Is there a sysctl call or some ifconfig line I can add to the bridges to stop this from happening?

Thanks.
 
Interesting that the RXCSUM and TXCSUM are causing issues. From our byhve server:
Code:
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
em1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
        laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

Exemplar VLAN/VM config:
vlan254: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=1<RXCSUM>
vm-data: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        member: tap1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
        member: vlan254 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
tap1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
Then again, I almost never shutdown our byhve guests, so maybe it's a "problem" and I just haven't seen it. I'd try disabling the options on the interfaces and see if that fixes it.

As for LLDP, it's not supposed to transfer past the bridge (like STP), so you shouldn't be seeing it. I'm not sure if there's any way to turn a bridge into a hub (or a more stupid switch), as I've never wanted to.
 
Interesting that the RXCSUM and TXCSUM are causing issues.

I don't think they're causing issues specifically. Just that they're added as options back to em0 and em1 when the last tap is removed from bridge0. And that addition (or subtraction when the first VM is started) causes the interface flaps. Just the fact that the options are changing is doing it, I think. I could be wrong.

As for LLDP, it's not supposed to transfer past the bridge (like STP), so you shouldn't be seeing it. I'm not sure if there's any way to turn a bridge into a hub (or a more stupid switch), as I've never wanted to.

Yeah, I'm learning that now. Not sure if there's another way to accomplish what I'm trying to do but I'm guessing I'm sorta stuck with this.
 
I don't think they're causing issues specifically. Just that they're added as options back to em0 and em1 when the last tap is removed from bridge0. And that addition (or subtraction when the first VM is started) causes the interface flaps. Just the fact that the options are changing is doing it, I think. I could be wrong.
Right, that's what I gathered from the PR. However, I've left them enabled without issue, but as I said I our VMs are running almost all the time. Also, ours are tied to vlan interfaces and not directly to lagg, so that might negate the issue as well.
 
I think a fixed MAC on bridge is wanted for reliability. Especially with your troubles with members departing. Give it a try.

My pea brains thinks of the scenario where the bridge is being rebuilt on member departure and maybe on reconstitution it is lagging checking out the MAC details. Provide one and the issue disappears. Pretty easy to check and I see no problem using the existing MAC it is using as well. Providing a MAC takes one piece of the puzzle out of the equation.

Here is a sample:
ifconfig_bridge0="ether 26:3d:2b:f1:73:7a addm igb0 addm tap0 up"
 
I think a fixed MAC on bridge is wanted for reliability. Especially with your troubles with members departing. Give it a try.

Thanks for the suggestion, but I'm pretty sure the MAC address isn't the issue. The issue is centered around the interface options changing on em0 and em1. I'm not sure there's a way to prevent that from happening, which is unfortunate.
 
I'm not sure there's a way to prevent that from happening, which is unfortunate.
Why not try the workaround recommended in the PR? It's not a "fix" but it seems to completely work around the issue, and should prevent flapping.
 
Why not try the workaround recommended in the PR? It's not a "fix" but it seems to completely work around the issue, and should prevent flapping.

I guess I'm not following the PR properly then? I thought the ifconfig em0 -tso -lro -toe -txcsum -txcsum6 was what was being suggested?
 
Those were the options for that person's specific interface. You'll also need -rxcsum, and if you're already using -txcsum, that shouldn't be appearing in the ifconfig output (though strangely it is).
 
Those were the options for that person's specific interface. You'll also need -rxcsum, and if you're already using -txcsum, that shouldn't be appearing in the ifconfig output (though strangely it is).

Right. The suggestions in the PR aren't working. I tried -rxcsum and -txcsum and it doesn't matter. The options keep getting put back in the em0 and em1 interfaces.
 
I just wonder if when a member leaves the bridge you are not experiencing this:

https://www.freebsd.org/cgi/man.cgi?ifconfig(8)
The following parameters are specific to bridge interfaces:

deletem interface
Remove the interface named by interface from the bridge. Promis-
cuous mode is disabled on the interface when it is removed from
the bridge

Maybe when promiscuous mode is removed it is adding back the CAPS regardless of /etc/rc.conf settings..
 
I just wonder if when a member leaves the bridge you are not experiencing this

In my case, the interface that stays in the bridge is the one that flaps. The last remaining interface in bridge0 when all VMs are shut down is lagg1, which is made up of em0 and em1. When the last VM is removed (shut down) lagg1 flaps but remains in the bridge when it comes back up with the options set.
 
What about MTU's then. The MTU of the bridge is set by the first member.
Is the last member also the first?

The MTU of the bridge is 1500 and stays 1500. I'm pretty sure this isn't an MTU thing. It's the fact that rxcsum and txcsum get added back onto em0 and em1 when the last tap interface is removed from the bridge. Why that's happening is the question.
 
To debug set: /etc/sysctl.conf
dev.em.0.debug=1
Then reboot and check: /var/log/messages

This is a server that I need running constantly. It was up for almost 600 days before I finally rebooted it to upgrade to 11.2. So, I'm going to pass on this suggestion for debugging, but thanks. :)
 
Back
Top