CARP + LAGG = fail

Hey guys,

I found a few posts about CARP and LAGG, everyone seems to be having issues and I haven't found any good working examples. So, I thought I'd start a new thread.

Anyway, I have HAST/CARP/iSCSI SAN setup, it works flawlessly. Now I just stuck in a couple new NICs and want to get some more performance from my iSCSI as well as some port failover.

I'm not using rc.conf to get things going, instead, I'm scripting it because the CAS4 drivers will not load properly at boot time. I have to load them after boot via a startup script. So, I have to wait until after they load to create my LAGG device.

Here's my current configuration:
Code:
nas1# ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
        ether 00:30:48:c3:42:5b
        inet 10.1.101.1 netmask 0xffff0000 broadcast 10.1.255.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

nas1# ifconfig cas0
cas0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        ether 00:03:ba:95:c2:0a
        media: Ethernet autoselect (1000baseT <full-duplex,master>)
        status: active

nas1# ifconfig cas1
cas1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        ether 00:03:ba:95:c2:0a
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

nas1# ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=b<RXCSUM,TXCSUM,VLAN_MTU>
        ether 00:03:ba:95:c2:0a
        inet 10.1.101.2 netmask 0xffff0000 broadcast 10.1.255.255
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: cas1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: cas0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

nas1# ifconfig carp1
carp1: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
        inet 10.1.101.3 netmask 0xffff0000
        carp: MASTER vhid 1 advbase 1 advskew 0

So, I have 3 addresses on the 10.1/16 network: 101.1 (em0), 101.2 (lagg0), and 101.3 (carp1). All are plugged into the same swtich. I can ping each successfully.

Link failover appears to work with lagg0 fine, if I just pull the plug on one of it's ports, it keeps going, showing one down, one up. Both down, it stops completely.

But here's the rub: when I bring em0 down [CMD=]ifconfig em0 down[/CMD], I can no longer ping the carp1 nor can I ping lagg0. I can't get out from the system either, networking is 100% down. In fact, I even can't just bring em0 back up, I have to reboot the system to get it to start working right again. Frustrating.

I really don't now what the relationship might be. At first I thought the carp1 iface might be binding to em0, but why would lagg0 also stop working? Oh, and it does the same even if there is no CARP dev configured.

Any words of wisdom would be greatly appreciated =)

UPDATE:
I just noticed that my routing table isn't showing the CAS4 interfaces. This is what it looks like:
Code:
[tim@nas1 ~]$ netstat -nr
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            10.1.1.254         UGS         0       30    em0
10.1.0.0/16        link#1             U           0        0    em0
10.1.101.1         link#1             UHS         0        0    lo0
10.10.10.0/24      link#2             U           0        0    em1
10.10.10.1         link#2             UHS         0        0    lo0
127.0.0.1          link#8             UH          0        0    lo0

So, if em0 is the only link on the 10.1/16 network, then that makes sense that everything goes down when it does. But since I can't load my CAS4 drivers at boot, how do I update the routing table after the link goes up?
 
You can't have 2 interfaces on the same network and expect it to work the way you have it setup. Is there a reason that em0 isn't part of lagg0?
 
gordon@ said:
You can't have 2 interfaces on the same network and expect it to work the way you have it setup. Is there a reason that em0 isn't part of lagg0?

At the most basic, having 3 interfaces, 3 distinct MAC and IP addys allows MPIO as well as access to the system should any of the 3 fail, even without LAGG or CARP.

If you're right, and I can't add the others to the routing table, meaning if one fails, they all fail, then something is VERY WRONG or I am missing something.

And I did try it with em0 being part of lagg0 too. The same issue, em0 goes down, everything goes down. So, that isn't a solution nor is leaving em0 out the problem.

And what I mean by "down" is NOT that the remaining interfaces show as being down, they actually still show as up, but all network access is unavailable when only em0 is down.

So, I'm still open to suggestions :e
 
I tested this multilink setup on another BSD box which all use the em driver. No LAGG, no CARP, just 3 physical interfaces all connected to the same network.

Here's relavent rc.conf info:
Code:
ifconfig_em0="inet 10.2.1.1 netmask 255.255.0.0"
ifconfig_em1="inet 10.2.1.2 netmask 255.255.0.0"
ifconfig_em2="inet 10.2.1.3 netmask 255.255.0.0"
default_router="10.2.1.254"
Here's the interface config:
Code:
[tim@f1 ~]$ ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:0b:d2
        inet 10.2.1.1 netmask 0xffff0000 broadcast 10.2.255.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:52:48
        inet 10.2.1.2 netmask 0xffff0000 broadcast 10.2.255.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:37:c5
        inet 10.2.1.3 netmask 0xffff0000 broadcast 10.2.255.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

Here's the routing table output:
Code:
[tim@f1 ~]$ netstat -nr
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
10.2.0.0/16        link#1             U           1       32    em0
10.2.1.1           link#1             UHS         0        0    lo0
10.2.1.2           link#2             UHS         0        0    lo0
10.2.1.3           link#3             UHS         0        0    lo0
127.0.0.1          link#5             UH          0        0    lo0

All physical devices mapped to lo loopback except em0.

The only thing different here is that I'm assiging IP addresses to the interface. In my first post I didn't show that, even though I did test it thaty way as well as with the LAGG config.

So, this is not an issue with LAGG, CARP, or the CAS4 driver.

And again, when em0 goes down, it all goes down. I have to get around this issue to move forward ...or so I think ;)

Then I decided to challenge Gordons statement that having multiple links on the same network was not possible. FALSE. It is quite possible.

Check this out. On Debian 6, I have configured 3 interfaces:
Code:
tim@deb6:~$ sudo ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:ad:4c:3c
          inet addr:38.100.208.12  Bcast:38.100.208.127  Mask:255.255.255.128
          inet6 addr: fe80::250:56ff:fead:4c3c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3612 errors:0 dropped:0 overruns:0 frame:0
          TX packets:214 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:231078 (225.6 KiB)  TX bytes:21986 (21.4 KiB)

eth1      Link encap:Ethernet  HWaddr 00:50:56:ad:68:05
          inet addr:38.100.208.13  Bcast:38.100.208.127  Mask:255.255.255.128
          inet6 addr: fe80::250:56ff:fead:6805/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3348 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:208654 (203.7 KiB)  TX bytes:1842 (1.7 KiB)

eth2      Link encap:Ethernet  HWaddr 00:50:56:ad:55:5f
          inet addr:38.100.208.9  Bcast:38.100.208.127  Mask:255.255.255.128
          inet6 addr: fe80::250:56ff:fead:555f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3453 errors:0 dropped:0 overruns:0 frame:0
          TX packets:246 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:216708 (211.6 KiB)  TX bytes:18864 (18.4 KiB)

And here is the routing table:
Code:
tim@deb6:~$ netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
38.100.208.0    0.0.0.0         255.255.255.128 U         0 0          0 eth0
38.100.208.0    0.0.0.0         255.255.255.128 U         0 0          0 eth1
38.100.208.0    0.0.0.0         255.255.255.128 U         0 0          0 eth2

As you can see quite plainly, three interfaces, three links, all on the same network. One goes down, the rest stay up. That's how it should be on FreeBSD too.

So, can someone please enlighten me as to what FreeBSD is doing differently or what I'm missing?
 
Let me ask this, when you ask the Debian box to connect to some place on the internet, how does it select the source IP address? According to your routing table, there are 3 valid options.

You cannot have more than one entry for a given route in FreeBSD. The way to do it is to include all physical interfaces into the lagg0 lacp group. Then make sure your routing table is correctly setup to have your default route through the lagg0 interface. Can you post the output from ifconfig and netstat -rn with that setup? Don't even worry about carp at this point, just get the lagg interface working first.
 
gordon@ said:
Let me ask this, when you ask the Debian box to connect to some place on the internet, how does it select the source IP address? According to your routing table, there are 3 valid options.

You cannot have more than one entry for a given route in FreeBSD. The way to do it is to include all physical interfaces into the lagg0 lacp group. Then make sure your routing table is correctly setup to have your default route through the lagg0 interface. Can you post the output from ifconfig and netstat -rn with that setup? Don't even worry about carp at this point, just get the lagg interface working first.

You're trying to tell me that I can't have multiple intefaces on the same network at the same time? Can you show me in any man page or the FreeBSD handbook or anywhere it states this? Really? I'm having a very hard time believing that BSD would be incapable of something so trivial :(

And here I thought I was being smart about bringing this down to some basics before working with LAGG or CARP again. Oh well. Still, in answer to your question about the multiple physical non-aggregated interfaces, it doesn't matter which interface is selected for outbound routing, although technically, it's going to be the first one in the routing table. And as for inbound, I just select which IP I want to use and response should stay on that interface. That's how every other system does it that I've worked with.

Then, what matters most at this point is that all interfaces must be available at the same time on the same network or it's basically broken. And, as I mentioned, I did actually try to add all the physical interfaces to LAGG0, but it didn't work at all. Still, happy to try it again and post that config if it helps.

rc.conf bits:
Code:
hostname="f1.exit.local"

# lagg: create, assign all physical interfaces
cloned_interfaces="lagg0"
ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 laggport em2 10.2.1.1 netmask 255.255.0.0"

# physical interfaces up
ifconfig_em0="up"
ifconfig_em1="up"
ifconfig_em2="up"

defaultrouter="10.2.1.254"

Routing table after boot:
Code:
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            10.2.1.254         UGS         1       34  lagg0
10.2.0.0/16        link#6             U           0        3  lagg0
10.2.1.1           link#6             UHS         0        0    lo0
127.0.0.1          link#5             UH          0        0    lo0

ifconfig output:
Code:
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:0b:d2
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:0b:d2
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:0b:d2
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
plip0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:0b:d2
        inet 10.2.1.1 netmask 0xffff0000 broadcast 10.2.255.255
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: em2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

ping default gateway, showing issues with duplicates (not good):
Code:
PING 10.2.1.254 (10.2.1.254): 56 data bytes
64 bytes from 10.2.1.254: icmp_seq=0 ttl=64 time=0.473 ms
64 bytes from 10.2.1.254: icmp_seq=0 ttl=64 time=0.488 ms (DUP!)
64 bytes from 10.2.1.254: icmp_seq=0 ttl=64 time=0.498 ms (DUP!)
64 bytes from 10.2.1.254: icmp_seq=1 ttl=64 time=0.492 ms
64 bytes from 10.2.1.254: icmp_seq=1 ttl=64 time=0.535 ms (DUP!)
64 bytes from 10.2.1.254: icmp_seq=1 ttl=64 time=0.547 ms (DUP!)
64 bytes from 10.2.1.254: icmp_seq=2 ttl=64 time=0.401 ms
64 bytes from 10.2.1.254: icmp_seq=2 ttl=64 time=0.416 ms (DUP!)
64 bytes from 10.2.1.254: icmp_seq=2 ttl=64 time=0.427 ms (DUP!)

--- 10.2.1.254 ping statistics ---
3 packets transmitted, 3 packets received, +6 duplicates, 0.0% packet loss
round-trip min/avg/max/stddev = 0.401/0.475/0.547/0.048 ms

Now, when I have it configured like this, it works, on my virtual system anyway, but ping to/from the system, I'm getting dupes. Looks like a switch incompatibility, using Vmware vswitch for this testing right now. It's kind of limited.

But since I got a litle futher this time, I just tried the same on the real box and now I'm locked out. Have to wait until I'm back in the office to fix it. Rrr.

So, I have good reasons to keep one physical interface nice and simple: so that I can test things without locking myself out of the box when it fails, LOL!

Then getting back to the basic problem, once again, if I have just three plain old physical interfaces configured, as I have shown in my previous post, when the one em0 goes down, they all stop working. That simply can't be correct operation and I feel that it has to do with the routing table only having em0 in it.

But maybe that's just the case, which is a bummer. I don't see anyway to update the routing table to add more than one device to the same network, except for lo. That's a serious design flaw imho.

Thanks for the help though, really :)
 
I seriously doubt that the virtual system's network supports 802.3ad (corresponds to the ifconfig option laggproto lacp) since it requires switch configuration and support. On the real host, did you configure the switch side to support 802.3ad? If you don't have switch support, you will likely need to use the laggproto roundrobin or laggproto loadbalance. Check the lagg(4) manpage for the different options.
 
I was just checking my real switch. Yes it supports LACP. I enabled it on the ports that are plugged in, but no joy, not one ping response. Funny, even though the Vmware vswitch doesn't really support it, I can still ping the test box and get in.

I don't think that I've tried roundrobin or loadbalance on the real system, but as I have just locked myself out of both of my physical servers because I have to have all my ports in the lagg0 device <grrr> I'll have to try that tomorrow.

However, I still need to use CARP and I believe that it only work with laggproto LACP. I've spent three days on this now with something that I got to work on Linux in 5 minutes. There has to a simpler/better/easier solution. Is this a well-known issue in the FreeBSD community? Any plans to fix? Any workarounds?

If I could just have all my NICs on the same network, I'd be able to implement other solutions that don't require LAGG at all and I wouldn't be locking myself out of the box I'm experimenting with each and every time the configuration fails x(
 
You get a lot more benefit from using LACP than individual network interfaces. Also, it's probably a better practice to have a dedicated management network to host the em0 interface.

I don't know of any plans to fix this. I'm not sure I would call it broken.
 
gordon@ said:
You get a lot more benefit from using LACP than individual network interfaces. Also, it's probably a better practice to have a dedicated management network to host the em0 interface.

I don't know of any plans to fix this. I'm not sure I would call it broken.

Well, maybe not "broken" per se, but does seem like a design flaw. I mean, I just don't see any good reason not to be able to do such a simple thing as have multiple devs on the same network. But this isn't Linux or Windows, so sometimes one just has to accept the nature of the beast, I reckon.

Funny you should mention a dedicated management network, that's exactly what I was just testing out!

So, here's my new config:
Code:
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:0b:d2
        inet 10.1.1.1 netmask 0xffff0000 broadcast 10.1.255.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:52:48
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:52:48
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
plip0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:52:48
        inet 10.2.1.1 netmask 0xffff0000 broadcast 10.2.255.255
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: em2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

and the routing table:
Code:
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            10.1.1.254         UGS         0        0    em0
10.1.0.0/16        link#1             U           1       86    em0
10.1.1.1           link#1             UHS         0        0    lo0
10.2.0.0/16        link#6             U           0        6  lagg0
10.2.1.1           link#6             UHS         0        0    lo0
127.0.0.1          link#5             UH          0        0    lo0

While this isn't what I was going for, having all my interfaces on the same logical network, they are on the same physical network, so it's doable, just a bit more maintenance for the router and Vmware hosts. If I can't get LAGG to work on my switch, hope it does though, I can assign each interface to its own subnet and just configure the multipathing in the iSCSI initiator accordingly.

So, it seems like I have a reasonable workaround, one way or the other. Whish I would have seen it sooner, but it was a good experience trying all this out just the same. Using LAGG though, I do still get those ping (DUP!)'s no matter which laggproto I select. My guess is that it's because the Vmware vswitch doesn't do LACP and so the same MAC address on more than one port is causing it. Would that be correct?

Again, thanks for your time with this Gordon!
 
I messed around with it briefly on my VMWare player host, and it while I didn't get the DUPs, offlining the interfaces individually didn't failover correctly.

If you want to do multipathing, you can use the geom_fox(4) module.
 
gordon@ said:
I messed around with it briefly on my VMWare player host, and it while I didn't get the DUPs, offlining the interfaces individually didn't failover correctly.

If you want to do multipathing, you can use the geom_fox(4) module.

Wow! That's an interesting little tool. Thanks for turning me on to that. I'm implementing network multipathing though, never even considered that the concept could apply to disks too.

Yeah, I believe Vmware's virtual switch implementation is its weakest link. Xen Server (or XCP) is much more robust in that department. But Vmware is far simpler to manage from my experience. There is an implementation of a virtual Cisco switch that is available for Vmware, but I haven't tried it yet. Since having a virtual test system that matches my real SAN is kind of important, I just might have to bump that project up the list!
 
IT WORKS!

A this point, I'm not exactly sure what was wrong. There are three things that needed to get sorted though:

1. Set lagg0 and em0 on separate subnets.
This ensured that both were listed in the routing table, thus one being up or down or misconfigured didn't affect the other.

2. Enable LACP on the switch. DOH!
This didn't occur to me because it worked without that at first, but then everything was routed through em0 come to find out, so it really wasn't working at that point.

3. Reboot the server with #1 and #2 correctly set up.
This must have helped to register the lagg0 MAC address in the switch or something, should have worked with all ifaces assigned to lagg0 but didn't, weird?

My last test was to configure a CARP dev on the same network as lagg0. Since one cannot assign an interface in the CARP config, this is another good reason for #1.

So it all seems to be working flawlessly now =)

I'm calling this one solved!
 
Back
Top