Issues with CARP under Qemu

Network related discussions (including general TCP/IP stuff, routing, etc).

Issues with CARP under Qemu

Postby mvip » 08 Mar 2011, 13:16

Hey guys,

First, a big thanks to the developers for all the hard work. You guys rock!

Now to the issue. I've been using CARP on a few servers in the past without any issues. It usually works without any hick-ups. Now I'm planning to move our company's infrastructure from physical hardware to a virtual environment over at CloudSigma (http://www.cloudsigma.com). Unfortunately I'm having some issues with getting CARP to work there. For the record, they're using Qemu as the virtualization platform.

Let me start by describing my setup in more details.

I have two nodes: [FILE]nas0[/FILE] and [FILE]nas1[/FILE]. Both these nodes have two interfaces, one public and one private. I'm obviously using the private one for CARP. [FILE]nas0[/FILE] is using the IP 192.168.1.11 and [FILE]nas1[/FILE] is using the IP 192.168.1.12. The CARP interface is configured to use the IP 192.168.1.10. The internal network is using a dedicated VLAN. Only these two nodes are using this VLAN to eliminate any possible conflicts.

I've also disabled all software firewalls, so we should also be able to exclude that from the equation.

Both nodes are using FreeBSD 8.2, and both the internal and external interfaces are working (ie. the two nodes can ping each other on the private interfaces).

In [FILE]rc.conf[/FILE] on [FILE]nas0[/FILE], I have the following lines:
Code: Select all
cloned_interfaces="carp0"
ifconfig_carp0="vhid 1 pass foobar 192.168.1.10/24"


On [FILE]nas1[/FILE] (which is the failover), the equivalent lines are:
Code: Select all
cloned_interfaces="carp0"
ifconfig_carp0="vhid 1 advskew 100 pass foobar 192.168.1.10/24"


(note the [FILE]advskew[/FILE] value on [FILE]nas1[/FILE])

To verify that CARP is enabled and configured etc., here's the [FILE]sysctl[/FILE] output (same on both nodes):
Code: Select all
net.inet.carp.allow: 1
net.inet.carp.preempt: 1
net.inet.carp.log: 1
net.inet.carp.arpbalance: 0
net.inet.carp.suppress_preempt: 0


Normally, that should be it. [FILE]nas0[/FILE] should automatically become the master, and [FILE]nas1[/FILE] the backup/failover. Unfortunately that doesn't happen. Instead, what I get this on the node with the lowest advskew value ([FILE]nas0[/FILE], but if I raise the [FILE]advskew[/FILE] on [FILE]nas0[/FILE], the error moves to [FILE]nas1[/FILE]):
Code: Select all
Mar  7 14:42:57 nas0 kernel: carp0: MASTER -> BACKUP (more frequent advertisement received)
Mar  7 14:42:57 nas0 kernel: carp0: 2 link states coalesced
Mar  7 14:42:57 nas0 kernel: carp0: link state changed to DOWN


When checking the CARP interface status, I get the following on [FILE]nas0[/FILE]:
Code: Select all
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
        inet 192.168.1.10 netmask 0xffffff00
        carp: BACKUP vhid 1 advbase 1 advskew 0


and the following on [FILE]nas1[/FILE]:
Code: Select all
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
        inet 192.168.1.10 netmask 0xffffff00
        carp: BACKUP vhid 1 advbase 1 advskew 100


I've google'd this error ([FILE]carp0: 2 link states coalesced[/FILE]), and some of the forum posts mentioned that they've seen this with faulty NICs or switches. However, I've reached out to CloudSigma, and they've been very helpful and set up a replication of the setup, but on 8.1). Their head network guy was able to reproduce the same errors as I got, and he was also able to confirm that the packages were indeed sent and received on both nodes (using tcpdump). His conclusion was that this was likely a bug in CARP. It's also worth mentioning that VRRP does work.

Since it's also in their interest to get this working for us (as this is what is holding us back from moving), they've been kind enough to provide access to their CARP test-nodes to any developer that want to take a stab at it. I have the credentials and details, but I don't want to post them here, but will provide them to anyone interested in a DM.
mvip
Junior Member
 
Posts: 5
Joined: 16 Feb 2009, 06:29

Works on OpenBSD 4.8

Postby mvip » 15 Mar 2011, 13:43

Since I didn't get any reply here, I continued my investigation. Since FreeBSD's CARP drivers originates from OpenBSD, I figured I'd see if it works with OpenBSD. Turns out it did, which indicates that this really is a bug in FreeBSD.

The setup is using the same private network, and OpenBSD 4.8 (amd64). There were two nodes obsd0 (192.168.10.11) and obsd1 (192.168.10.12). The CARP interface was set to 192.168.10.10. These are fresh installs, and nothing else was installed.

On both obsd0 and obsd1:
Code: Select all
# sysctl net.inet.carp
net.inet.carp.allow=1
net.inet.carp.preempt=1
net.inet.carp.log=2


on obsd0:
Code: Select all
# ifconfig carp1
carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:01
        priority: 0
        carp: MASTER carpdev em1 vhid 1 advbase 1 advskew 10
        groups: carp
        status: master
        inet6 fe80::200:5eff:fe00:101%carp1 prefixlen 64 scopeid 0x6
        inet 192.168.10.10 netmask 0xffffff00 broadcast 192.168.10.255


on obsd1:
Code: Select all
# ifconfig carp1
carp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:01
        priority: 0
        carp: BACKUP carpdev em1 vhid 1 advbase 1 advskew 100
        groups: carp
        status: backup
        inet6 fe80::200:5eff:fe00:101%carp1 prefixlen 64 scopeid 0x6
        inet 192.168.10.10 netmask 0xffffff00 broadcast 192.168.10.255


When I bump up the advskew on obsd0, obsd1 automatically picks up as master.
mvip
Junior Member
 
Posts: 5
Joined: 16 Feb 2009, 06:29

two masters

Postby Pfarthing6 » 24 Mar 2011, 00:46

So carp isn't working for me either on FreeBSD 8.2, at least not in an intelligible way.

I have two hosts configured identically as follows:

hostA: 10.1.10.89/24
hostB: 10.1.10.90/24

[FILE]sysctl.conf[/FILE], identical on both:
Code: Select all
net.inet.carp.preempt=1
net.inet.carp.allow=1
net.inet.carp.log=1


Setup carp:
Code: Select all
hostA# ifconfig carp0 create
hostA# ifconfig carp0 vhid 1 pass CARP_PASS  10.1.10.252/24

hostB# ifconfig carp0 create
hostB# ifconfig carp0 vhid 1 pass CARP_PASS  10.1.10.252/24


I am leaving the default advbase and advskew so as to let Carp decide which should be master. The first one up should be I would think.


Then after intializing carp0 interface on hostA, I check the status:
Code: Select all
hostA# ifconfig carp0
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:0b:d2
        inet 10.1.10.89 netmask 0xffffff00 broadcast 10.1.10.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
plip0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
        inet 10.1.10.252 netmask 0xffffff00
        carp: MASTER vhid 1 advbase 1 advskew 0


Now I check hostB:
Code: Select all
hostB# ifconfig
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
        ether 00:50:56:ad:57:ec
        inet 10.1.10.90 netmask 0xffffff00 broadcast 10.1.10.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
plip0: flags=8810<POINTOPOINT,SIMPLEX,MULTICAST> metric 0 mtu 1500
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
        inet 10.1.10.252 netmask 0xffffff00
        carp: MASTER vhid 1 advbase 1 advskew 0


Uggh! What's up with that? Both nodes are master. Weird.

They can talk to each other as I can ping between both hosts on their local IP.

If I ping the virtual IP from a 3rd host, I get a MAC address, but ifconfig doesn't tell me what the MAC is for carp0, so it's difficult to figure out which host I'm pinging.

As a test, I ping continuously from the 3rd host while both hostA and hostB is up. I bring down carp0 on both and ping fails. I bring up carp0 on hostA and ping succeeds again. I bring up carp0 on hostB, no change. I bring down carp0 on hostA and ping fails a couple times, then picks up again. So, obviously hostB is responding now.

Something is kind of working, but not the right way. And since both links are UP all the time unless I force one to be down, I can't use devd events for LINK_UP and LINK_DOWN to trigger any role switching.

So, I'm basically stuck without being able to use CARP/DEVD for failover.

That last tidbit of info on this is that I'm running these as virtual machines on Vmware ESXi4. But since I can ping the VIP between each host, I'm assuming that Vmware isn't at issue? ...unless the multicast is the culprit, which I'm trying to investigate now.

Update: I am using 'permiscuous' mode in the vmware vswitch as suggested by others.

Any suggestions would be greatly appreciated!

thanks!
Pfarthing6
Junior Member
 
Posts: 53
Joined: 23 Jun 2010, 03:11

Postby mvip » 24 Mar 2011, 10:52

Pfarthing6,

I'm no CARP expert by any means, but I think you need to bump up the [FILE]advskew[/FILE] on one node, which would make it master. That's the whole point of [FILE]net.inet.carp.preempt[/FILE]. Check out the CARP handbook section for details.

Anyways, I've found the solution to the issue this thread was created for. In order to get CARP working on Qemu (and turns out VMware ESX is having the same issue), you need to apply a patch to the carp-module.

The instructions on how to do that is available here.
mvip
Junior Member
 
Posts: 5
Joined: 16 Feb 2009, 06:29

Postby Pfarthing6 » 24 Mar 2011, 22:45

The link here that mvip supplied did have the answer for me.

I didn't compile in the patch as suggested in the article, though I was prepared to. I thought to try instead just the
Code: Select all
net.inet.carp.drop_echoed=1
setting and it worked!

So, my [FILE]sysctl.conf[/FILE] now looks like this:
Code: Select all
net.inet.carp.preempt=1
net.inet.carp.allow=1
net.inet.carp.log=1
net.inet.carp.drop_echoed=1


For Vmware I also made sure to update the security settings for the vswitch and portgroup to enable: Promiscuous Mode, MAC Address Changes, and Forged Transmits.

I'm using 8.2-RELEASE btw, not tested in other releases.

Hope this helps others!
Pfarthing6
Junior Member
 
Posts: 53
Joined: 23 Jun 2010, 03:11


Return to Networking

Who is online

Users browsing this forum: No registered users and 0 guests