Solved FreeBSD 11.1 - Bridge IP numbered interface not working at all

Hi all,

I am running 11.1-RELEASE-p1 on a server with 4 'oce' interfaces - oce0, oce1, oce2, oce3.
For that setup only oce3 is connected and used - oce2 will come later.
No firewall is enabled.
When bonding interfaces, the configuration on the switch is modified - bottom line, the problem does not seem to be related to lagg but more to bridge.
I started my troubleshooting journey with Scenario D being the target configuration and then try to isolate the problem by validating the other scenarios below.

Scenario A:
oce3 only + IP address on oce3 - connectivity: OK

Scenario B:
[oce3]-lagg0 + IP address on lagg0 - connectivity: OK

Scenario C:
oce3 only in a bridge0 + IP address on bridge0 - connectivity: NOK (ARP of the server is seen on the upstream router however FreeBSD sees the MAC address of the upstream router in the bridge MAC address table but won't have any ARP entry for the upstream router. Ping not successful...)

Scenario D:
[oce3]-lagg0 in a bridge0 + IP address on bridge0 - connectivity: NOK (ARP of the server is seen on the upstream router however FreeBSD sees the MAC address of the upstream router in the bridge MAC address table but won't have any ARP entry for the upstream router. Ping not successful...)

So far I have tried the following to try to identify what is wrong with both the scenarios C & D:
- tcpdump on bridge0 or oce3 shows nothing at all related to a ping issued from either router or server_bridge0_ip
- disabled TSO, LRO etc... on the oce3 interface -> no change
- change all the sysctl for net.link.bridge.* to 0 to prevent any unexpected filtering
- enable IP forwarding although it should not make any difference as my first ping is towards the router in the same connected network so no routing involved.

I have tried using an Ubuntu liveCD on top of which I installed the bridge-utils components and Scenario C worked. I did not push for Scenario D but it looks to me like the if_bridge has a bit of any issue.

What else should I consider to troubleshoot this issue?
 
Bridge works fine:
Code:
root@molly:~ # vm switch info public
------------------------
Virtual Switch: public
------------------------
  type: auto
  ident: bridge0
  vlan: -
  nat: -
  physical-ports: em0
  bytes-in: 272968945 (260.323M)
  bytes-out: 20267354236 (18.875G)

  virtual-port
    device: tap0
    vm: wintermute

root@molly:~ # ifconfig bridge0
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 6000
        description: vm-public
        ether 02:91:b0:69:06:00
        nd6 options=1<PERFORMNUD>
        groups: bridge
        id 00:00:00:00:00:00 priority 0 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 0 ifcost 0 port 0
        member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 5 priority 128 path cost 2000000
        member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 2 priority 128 path cost 20000

How exactly are you configuring the interfaces? And why the need for both lagg(4) and bridge(4)?
 
Bridge works fine:
Code:
root@molly:~ # vm switch info public
------------------------
Virtual Switch: public
------------------------
  type: auto
  ident: bridge0
  vlan: -
  nat: -
  physical-ports: em0
  bytes-in: 272968945 (260.323M)
  bytes-out: 20267354236 (18.875G)

  virtual-port
    device: tap0
    vm: wintermute

root@molly:~ # ifconfig bridge0
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 6000
        description: vm-public
        ether 02:91:b0:69:06:00
        nd6 options=1<PERFORMNUD>
        groups: bridge
        id 00:00:00:00:00:00 priority 0 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 0 ifcost 0 port 0
        member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 5 priority 128 path cost 2000000
        member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 2 priority 128 path cost 20000

How exactly are you configuring the interfaces? And why the need for both lagg(4) and bridge(4)?

Thank you for the reply.
What I meant by not working is the recommendation I followed from documentation to actually number the bridge interface with an IP instead of the em0 interface (to use your example).
Last point of Section 30.6.1 in https://www.freebsd.org/doc/handbook/network-bridging.html

When it comes to the need for lagg and bridge, I have a server attached with dual 1G links to a network equipment. On that server, I then want to start bhyve's VM hence the need for the bridge.
When I got the whole thing setup, the VM did not get connectivity through the bridge despite showing ARP resolution which puzzled me. From that, I started troubleshooting to the most fundamental setup which is:
[ network equipment-single routed interface]-----------[oce3---bridge0(numbered with IP)-SERVER]. That setup should let me have the SERVER pinging the network equipment-single routed interface IP address since this is a connected route. I see the MAC address of the the network equipment when I look at the ifconfig bridge0 addr which really puzzled me.

So, to clarify my question based on your working setup: do you have the em0 interface numbered with an IP since the bridge0 is unnumbered?

UPDATED the title to reflect on the fundamental issue I am trying to address here which has to do with bridge0 IP connectivity.
 
When it comes to the need for lagg and bridge, I have a server attached with dual 1G links to a network equipment. On that server, I then want to start bhyve's VM hence the need for the bridge.
Ok, that would mean it's oce2+oce3 (for example) for lagg0. The bridge should then be bound to the lagg0 interface. The lagg0 interface will have the host's IP address configured. The bridge shouldn't have an IP address.

So, to clarify my question based on your working setup: do you have the em0 interface numbered with an IP since the bridge0 is unnumbered?
In my case, yes. The em0 interface is the host's interface. The bridge is for vm(8) virtuals. So I'm bridging my virtuals to the same network as the host.
 
Ok, that would mean it's oce2+oce3 (for example) for lagg0. The bridge should then be bound to the lagg0 interface. The lagg0 interface will have the host's IP address configured. The bridge shouldn't have an IP address.


In my case, yes. The em0 interface is the host's interface. The bridge is for vm(8) virtuals. So I'm bridging my virtuals to the same network as the host.

Technically the bridge interface is no different from a bridge on other operating system. It should be possible to number it with an IP and get it to ping.
Would you mind adding an IP on your bridge (using a free IP address in the same subnet as your em0) and tell me if that works?

Thanks.
 
Code:
So testing on another setup running 11.0-RELEASE, I confirm there is something that would require some explanation:

Physical interface
Code:
bce1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=800b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE>
    ether 00:0a:f7:00:f6:a6
    inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active

Bridge interface numbered
Code:
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 02:50:31:0f:7d:00
    inet 192.168.0.248 netmask 0xffffff00 broadcast 192.168.0.255
    nd6 options=9<PERFORMNUD,IFDISABLED>
    groups: bridge
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: bce1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
           ifmaxaddr 0 port 2 priority 128 path cost 20000


From a remote host, the ping is successful towards that bridge interface for a few seconds then it stops working:
ARP is there
Code:
? (192.168.0.248) at 2:50:31:f:7d:0 on en0 ifscope [ethernet]

Then ping works right after applying the ifconfig bridge0 192.168.0.248/24 but for a fix amount of time:
Code:
ping 192.168.0.248
PING 192.168.0.248 (192.168.0.248): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
64 bytes from 192.168.0.248: icmp_seq=2 ttl=64 time=92.922 ms
64 bytes from 192.168.0.248: icmp_seq=3 ttl=64 time=0.363 ms
64 bytes from 192.168.0.248: icmp_seq=4 ttl=64 time=0.353 ms
64 bytes from 192.168.0.248: icmp_seq=5 ttl=64 time=0.358 ms
64 bytes from 192.168.0.248: icmp_seq=6 ttl=64 time=0.358 ms
64 bytes from 192.168.0.248: icmp_seq=7 ttl=64 time=0.294 ms
64 bytes from 192.168.0.248: icmp_seq=8 ttl=64 time=0.255 ms
64 bytes from 192.168.0.248: icmp_seq=9 ttl=64 time=0.352 ms
64 bytes from 192.168.0.248: icmp_seq=10 ttl=64 time=0.363 ms
64 bytes from 192.168.0.248: icmp_seq=11 ttl=64 time=0.383 ms
64 bytes from 192.168.0.248: icmp_seq=12 ttl=64 time=0.434 ms
64 bytes from 192.168.0.248: icmp_seq=13 ttl=64 time=0.356 ms
64 bytes from 192.168.0.248: icmp_seq=14 ttl=64 time=0.350 ms
64 bytes from 192.168.0.248: icmp_seq=15 ttl=64 time=0.341 ms
64 bytes from 192.168.0.248: icmp_seq=16 ttl=64 time=0.325 ms
64 bytes from 192.168.0.248: icmp_seq=17 ttl=64 time=0.306 ms
64 bytes from 192.168.0.248: icmp_seq=18 ttl=64 time=0.323 ms
64 bytes from 192.168.0.248: icmp_seq=19 ttl=64 time=0.365 ms
64 bytes from 192.168.0.248: icmp_seq=20 ttl=64 time=0.415 ms
64 bytes from 192.168.0.248: icmp_seq=21 ttl=64 time=0.357 ms
64 bytes from 192.168.0.248: icmp_seq=22 ttl=64 time=0.427 ms
64 bytes from 192.168.0.248: icmp_seq=23 ttl=64 time=0.367 ms
64 bytes from 192.168.0.248: icmp_seq=24 ttl=64 time=0.303 ms
64 bytes from 192.168.0.248: icmp_seq=25 ttl=64 time=0.356 ms
64 bytes from 192.168.0.248: icmp_seq=26 ttl=64 time=0.411 ms
64 bytes from 192.168.0.248: icmp_seq=27 ttl=64 time=0.404 ms
64 bytes from 192.168.0.248: icmp_seq=28 ttl=64 time=0.372 ms
64 bytes from 192.168.0.248: icmp_seq=29 ttl=64 time=0.460 ms
64 bytes from 192.168.0.248: icmp_seq=30 ttl=64 time=0.434 ms
Request timeout for icmp_seq 31
Request timeout for icmp_seq 32
Request timeout for icmp_seq 33
Request timeout for icmp_seq 34
Request timeout for icmp_seq 35
Request timeout for icmp_seq 36
Request timeout for icmp_seq 37

ARP on MacOS is gone
Code:
? (192.168.0.248) at (incomplete) on en0 ifscope [ethernet]

When I try from pfSense on the same LAN, the ping goes one which I guess has to do with the ARP entry expiring quickly on MacOS compared to FreeBSD.

ARP on pfSENSE on the same LAN:
Code:
? (192.168.0.248) at 02:50:31:0f:7d:00 on igb1 expires in 1065 seconds [ethernet]

After clearing the ARP cache on pfSense using arp -d -a, and without touching anything on the server, the ping fails.

It looks like ARP resolution only happens after configuration of the bridge interface and no further request seems to go through - bridge does not reply to ARP request?

The setup is simple:

Code:
Server ----1Gb----Switch----1Gb----MacOS
                     |
                     | 1Gb
                     |
                  pfsense

Any idea?
 
Just tried on yet another system equipped with bge interfaces. I configure something very simple on 11.1-RELEASE
bge with an IP address
create bridge0 with an IP address
and ping works from and to the bridge interface

Not sure what the link could be to the NIC driver but that would be a differentiator. If you wouldn't mind verifying my idea and for you to check with a system equipped with 'em' drivers that would help me isolate the issue.

Thanks in advance.
 
Would you mind adding an IP on your bridge (using a free IP address in the same subnet as your em0) and tell me if that works?
Code:
root@molly:~ # ifconfig em0
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 6000
        options=42098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWTSO>
        ether 68:05:ca:22:91:9c
        hwaddr 68:05:ca:22:91:9c
        inet 192.168.10.190 netmask 0xffffff00 broadcast 192.168.10.255
        inet 192.168.10.202 netmask 0xffffffff broadcast 192.168.10.202
        inet6 fe80::6a05:caff:fe22:919c%em0 prefixlen 64 scopeid 0x2
        inet6 2001:470:1f15:bcd::190 prefixlen 64
        inet6 2001:470:1f15:bcd::202 prefixlen 128
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
root@molly:~ #
root@molly:~ # ifconfig bridge0 inet 192.168.10.250 netmask 255.255.255.0
root@molly:~ # ping 192.168.10.250
PING 192.168.10.250 (192.168.10.250): 56 data bytes
64 bytes from 192.168.10.250: icmp_seq=0 ttl=64 time=0.044 ms
64 bytes from 192.168.10.250: icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from 192.168.10.250: icmp_seq=2 ttl=64 time=0.099 ms
64 bytes from 192.168.10.250: icmp_seq=3 ttl=64 time=0.044 ms
64 bytes from 192.168.10.250: icmp_seq=4 ttl=64 time=0.026 ms
64 bytes from 192.168.10.250: icmp_seq=5 ttl=64 time=0.043 ms
64 bytes from 192.168.10.250: icmp_seq=6 ttl=64 time=0.044 ms
^C
--- 192.168.10.250 ping statistics ---
7 packets transmitted, 7 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.018/0.045/0.099/0.024 ms
 
Thank you ! on my system I have to set for the bridge to become pingable:
net.link.bridge.inherit_mac: 1

That being done, there is no connectivity from the bhyve VM over the tap interface so for now I will consider my primary issue solved.
 
Back
Top