Lagg failover is not working (at all)

nc3b · Aug 6, 2010

Hello. I am using FreeBSD 8.0-STABLE in virtualbox
I have two interfaces em1 and em2. I use the following config:

Code:

cloned_interfaces="lagg0"
ifconfig_em1="up"
ifconfig_em2="up"

ifconfig_lagg0="laggproto failover laggport em1 laggport em2 192.168.1.99 netmask
255.255.255.0"

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 08:00:27:78:54:9e
        inet 192.168.1.99 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet autoselect
        status: active
        laggproto failover
        laggport: em2 flags=0<>
        laggport: em1 flags=5<MASTER,ACTIVE>

When I ping 192.168.1.1 everything goes fine. I move to another console and take down the em1 interface

Code:

ifconfig em1 down
ifconfig lagg0
...same...
        laggport: em2 flags=4<ACTIVE>
        laggport: em1 flags=1<MASTER>

Ping 192.168.1.1 doesn't work anymore. I waited for some 5 minutes and it wouldn't budge. Then I brought em1 up. Hello sunshine ! It was working. Brought it down again - stopped working. What's more, I brought em0 up with some ip and it STILL wouldn't ping 192.168.1.1.

Code:

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 08:00:27:37:ca:72
        inet 192.168.1.121 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

I ssh into 192.168.1.1 and ping 192.168.1.121 works!

So I start thinking freebsd gave up on me with a I-wont-ping-your-stupid-ip attitude. On 192.168.1.1 tcpdump shows it's pinging it. Back on freebsd it shows the same (it actually shows outgoing packets and orderly replies). But ping doesn't show ANYTHING, 100% packet loss wohooo. So I decide to really look into tcpdump.

I see icmp requests originating from 08:00:27:78:54:9e to the mac on 192.168.1.1. And I see icmp replies originating on 192.168.1.1 to the mac 08:00:27:78:54:9e. Then I see it's the MAC of em1 and lagg0.

So, although I took the interface down, it still sends packets with the same source mac. (And it And so, it doesn't work in a very creative way. I can ping from 192.168.1.1 but not the other way around.

Mind you, the same happens when instead of taking down the interface I unplug the cable. This way I can't see how failover works. How can I solve this problem ?
Any help is appreciated.

loop · Aug 9, 2010

I think that you need to have the MAC address common between the two em(4) interfaces, what you have done is set up a aggregated link (802.3ad)

Try this:

Code:

cloned_interfaces="lagg0"
ifconfig_em1="up"
ifconfig_em2="ether 08:00:27:78:54:9e up"
ifconfig_lagg0="laggproto failover laggport em1 laggport em2 192.168.1.99 netmask 255.255.255.0"

nc3b · Aug 9, 2010

Thank you loop. I saw in the handbook that specifying the MAC is required for the wired+wireless teaming. I now added the MAC as you said

Code:

ifconfig_em2="ether 08:00:27:78:54:9e up"

When I take down em1 ping still doesn't work, but for some reason, if tcpdump is running in another console ping works. If I stop tcpdump, ping stops working :q (I tested a few times).

DutchDaemon · Aug 9, 2010

Try tcpdump -p. Maybe the promiscuous mode screws up here.

nc3b · Aug 9, 2010

DutchDaemon said:
Try tcpdump -p. Maybe the promiscuous mode screws up here.

Thanks DutchDaemon, indeed after disabling promiscuous mode ping stops acting that way. Any ideas how I can get lagg failover to work ?

IKC · Aug 11, 2010

try this config

rc.conf

Code:

cloned_interfaces="lagg0"
ifconfig_em0="up"
ifconfig_em1="up"
ifconfig_lagg0="laggproto failover laggport em0 laggport em1 "
ipv4_addrs_lagg0="192.168.1.50/24"

on vmware virtual freebsd box it works

nc3b · Aug 12, 2010

@IKC

I reluctantly tried your config, and it still doesn't work. Thank your for your answer, I did not know about the ipv4_addrs parameter.

Cheers.

rfranzke · Sep 14, 2010

This may be more of a switching/network problem than a LAGG issue. I am having a similar issue using LAGG myself. What I have observed happening is that it seems LAGG does not have any mechanism to tell the network that a particular MAC address has moved on the network. Since LAGG uses the same MAC address for the LAGG interface, when the MAC moves somewhere else on the network (different port)the switches still seem to think that MAC is on the old port which it no longer is as that link is down. So you ping to the host and get nothing because the ping traffic is getting sent to the wrong port on the switches at layer 2. Its seems like LAGG is not working but if you look into things it actually is working as you would expect, just not correctly if that maikes any sense.

You might check your CAM tables on your switches to see if you see the MAC addresses move around. On cisco switches if a port is shutdown, it removes the CAM table entry associated with that port immediately. If nothing says "hey MAC X is over here, then the switches used what they know which is now wrong. You could check the CAM entries in the switches and see what port the switch thinks your LAGG MAC is on, then down the port, and check the CAM again. On my stuff the CAM entries totally disappear. Host becomes available after the CAM timers expire (5 minutes default on Cisco)or if I ping out to network and the switches relearn the host MAC addresses as being on the new port. If you are pinging out from host to network than the CAM tables should be repopulated from the traffic so your issue may be a different one.

There should be some sort of process which informs the layer 2 network that LAGG is moving the layer 2 addresses around. Other wise the switches just use whats cached and host looks dead. Some of the broadcom windows drivers I think issue a gratuitous arp to do this.

BTW here is my rc.conf LAGG config which works great aside from the MAC issue:

Code:

cloned_interfaces="lagg0"
ifconfig_bce0="up"
ifconfig_bce1="up"
ifconfig_lagg0="laggproto failover laggport bce0 laggport bce1 x.x.x.x/24"
defaultrouter="x.x.x.x"

This is on 8.1 Release. Good luck. Again this could be totally unrelated to your issue but thought I would include it anyway.

loop · Sep 14, 2010

The gratuitous arp is not a bad idea, might be good to put it to reyk@openbsd.org since he wrote the driver initially.

rfranzke · Sep 16, 2010

Seems like folks may be looking into this G-Arp idea already. See Item 6:

http://wiki.freebsd.org/EdMaste/ToDo

pv2b · Apr 6, 2011

I'm seeing the exact same behaviour on my FreeBSD 8.2 machine, with lagg-failover with network cards using the em driver.

Code:

em0: <Intel(R) PRO/1000 Network Connection 7.1.9> port 0x9c00-0x9c1f mem 0xfb8e0000-0xfb8fffff,0xfb8dc000-0xfb8dffff irq 16 at device 0.0 on 
pci3

I'll take down one of the ports the NIC is plugged into by disabling the port in the switch, LAGG will properly switch over, but no packet is sent to the switch informing it to update its CAM. That will only happen once the CAM expires (I imagine) or when I manually send out a packet with ping, for example.

pv2b · Apr 6, 2011

I have filed a FreeBSD Problem Report regarding this issue, hopefully that'll actually bring this to the attention of somebody capable of fixing it

shitson · Jan 16, 2012

I can confirm that i was having the same problem, it's not a problem per se with the lagg driver but with the switches and hosts ability to update their arp tables. If you have traffic in both directions you will force another arp request and this will fix the problem, but if your failover host has not got outgoing traffic you will not force a switch/host refresh. Try ping in both directions during you testing and i guarantee it will work. I also agree a gratuitous arp would fix this problem!

shitson · Mar 2, 2012

Can also confirm that this on a todo list of things to fix

Code:

lagg failover discarding input -> Tushar

    disconnect master link
    switch's mac table assigns mac to backup link
    reconnect master link
    switch continues to send frames to backup link
    lagg discards the frames
    easy hack: allow accept on any port via sysctl setting
    better fix: allow lagg to send gratuitous ARP when failover active port changes
    need link state notification input to lagg to do the latter

http://wiki.freebsd.org/EdMaste/ToDo

shitson · Mar 10, 2012

After speaking with the guys who maintain this driver here is a patch which needs to be tested.

http://lists.freebsd.org/pipermail/freebsd-net/2012-February/031328.html

I will be testing it on my hardware ASAP

rfranzke · Mar 10, 2012

Great news. I'll install on my hardware as well and post results.

Lagg failover is not working (at all)

Administrator