CARP: advbase advskew understanding issues

Dear @ll

I'm toying around with CARP, and it's working, but unfortunately I'm not able to reduce the failover time. It is working if advbase difference is greater then 2 seconds, I can shutdown HOST A and HOST B takes offer. If I bring up HOST A, it resumes its MASTER position.

HOST A
Code:
#/boot/loader.conf

carp_load="YES"

Code:
#/etc/rc.conf

ifconfig_vtnet0="inet 10.0.0.1/24"
ifconfig_vtnet0_alias0="inet 10.0.0.3/24 vhid 1 pass secretpw advbase 1 advskew 0"

HOST B
Code:
#/boot/loader.conf

carp_load="YES"

Code:
#/etc/rc.conf

ifconfig_vtnet0="inet 10.0.0.2/24"
ifconfig_vtnet0_alias0="inet 10.0.0.3/24 vhid 1 pass secretpw advbase 3 advskew 0"

What I don't understand is, when I set advbase to 1 for host A and B, and set host B advskew to 100, it's not working (please see configuration below). I shutdown host A and host B takes over, but when I bring up host A again, it never resumes it's position as MASTER. Perhaps I'm not patient enough, but I already waited several minutes. 😆
HOST A
Code:
#/boot/loader.conf

carp_load="YES"

Code:
#/etc/rc.conf

ifconfig_vtnet0="inet 10.0.0.1/24"
ifconfig_vtnet0_alias0="inet 10.0.0.3/24 vhid 1 pass secretpw advbase 1 advskew 0"

HOST B
Code:
#/boot/loader.conf

carp_load="YES"

Code:
#/etc/rc.conf

ifconfig_vtnet0="inet 10.0.0.2/24"
ifconfig_vtnet0_alias0="inet 10.0.0.3/24 vhid 1 pass secretpw advbase 1 advskew 100"

I'm using FreeBSD-14.1-RELEASE and the hypervisor is FreeBSD-14.0-p6.

Any advice would be highly appreciate.

Thanks.

tanis
 
from the carp manual :)

net.inet.carp.preempt

Allow virtual hosts to preempt
each other. When enabled, a vhid
in a backup state would preempt a
master that is announcing itself
with a lower advskew. Disabled
by default
 
So this forces the MASTER position on host A, or is there more to to this ?
To expand on the excerpt from the man page iRobbery quotes, a host in the MASTER state will remain in that state until it fails if net.inet.carp.preempt is not enabled. My guess is this is to prevent flapping. A flaky host with a low advskew that is stuck in some kind of crash loop would cause constant CARP failovers if preempt is enabled.

I personally prefer the behavior without preempt. I like to have two identical hosts and be able to update them separately. I take the master down, update it, and let it run for a while as the fallback host. If the update proves stable after some burn-in period, I repeat the process on the second host, thereby promoting the original master back into that role.
 
There can be reasoning for both options, to allow or not allow preempt. But as man page states in case of multiple interfaces doing CARP and one of the interfaces failing, you probably want to allow it to preempt the other server. But in case you are doing upgrades, indeed best to first check if all services are properly updated and running before switching traffic back. But there are ways to just disable the preempt in those situations, change advskew via net.inet.carp.demotion and I think via ifconfig one can change state too, though never used that myself.
 
Thank you very much Jose and iRobbery. I’m going to read the man page more carefully in the future. 😅
 
Thank you very much Jose and iRobbery. I’m going to read the man page more carefully in the future. 😅
You're welcome! FWIW, I find the wording in the man page confusing.
  • "Allow virtual hosts to preempt each other." A "virtual host" is in the context of CARP is a group of concrete hosts with virtual interfaces. I understand how members of this group might preempt each other, but I'm boggled by the idea of the whole group itself doing so.
  • "...a vhid in a backup state would preempt a master that is announcing itself with a lower advskew." A vhid is "...a common virtual host ID...on each machine which is to take part in the virtual group". How can a virtual host id do anything? The vhid is the same for all hosts in the virtual group. This sentence makes no sense to me. Also, my understanding of advskew is that hosts with lower values have higher priority for becoming the master host.
 
English is my second language, which sometimes makes me miss the intricacies of that language. 😆

So from my understanding, if I set that flag, the host trumps all other hosts in that virtual host group and becomes MASTER. However, I was under the impression by setting advskew to 100 and leave the MASTER at zero, the MASTER would regain it’s position after entering the group again.

Am I mistaken here? 🤓
 
You know, the more I read the man page, the more confused I become. I suspect advskew 0 is "special"*. Would you mind running your tests again with preempt off and advskew 100 on the preferred master and advskew 200 on the backup?

English is my second language too, but I've lived in the US for a long time.

* This sentence in the examples section has me wondering "...advskew is above 0 so it could be overwritten in the emergency situation from the other host".
 
I tried, as suggested, the following configuration and it's not working:

sh:
# Host A
advskew 100

# Host B
advskew 200

I also tried to following settings without success:

sh:
# Host A
advskew 10

# Host B
advskew 240


As suggest, I can force the MASTER role on host A by setting:

sh:
sysctl -w net.inet.carp.preempt=1

Any suggestions are highly appreciated. :)
 
Interesting. If you have time and patience for more tests, what does sysctl net.inet.carp.demotion report on Host A with preempt disabled and after a failover? I.e., when things are not working the way you want them to.

As suggest, I can force the MASTER role on host A by setting:

sh:
sysctl -w net.inet.carp.preempt=1
You've only enabled preempt on one host?
 
Just for the record, all my previous tests had been conducted by powering off and on the particular host to force the failover.

Now:

Host A: advbase 1 advskew 10
Host B: advbase 1 advskew 240

Host A (MASTER) force CARP failover:
sh:
$ ifconfig vtnet0 down

Host A reports:
sh:
$ sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 240
net.inet.carp.log: 1
net.inet.carp.preempt: 0
net.inet.carp.dscp: 56
net.inet.carp.allow: 1
$

sh:
$ dmesg | grep carp
carp: 1@vtnet0: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 240 (interface down)
$

Host B successfully gained MASTER position.

Trying to regain MASTER position on Host A by executing:
sh:
$ ifconfig vtnet0 up

Host A reports:
sh:
$ sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 0
net.inet.carp.dscp: 56
net.inet.carp.allow: 1
$

sh:
$ dmesg | grep carp
carp: 1@vtnet0: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 240 (interface down)
carp: 1@vtnet0: INIT -> BACKUP (initailization complete)
carp: demoted by -240 to 0 (interface up)
$

Host B remains MASTER, Host A is not able to claim back the MASTER position.

If I set now net.inet.carp.preempt to 1 on Host A:
sh:
$ sysctl -w net.inet.carp.preempt=1
net.inet.carp.preempt: 0 -> 1
$

Host A immediately gains back it's MASTER position.

sh:
$ dmesg | grep carp
carp: 1@vtnet0: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 240 (interface down)
carp: 1@vtnet0: INIT -> BACKUP (initailization complete)
carp: demoted by -240 to 0 (interface up)
carp: 1@vtnet0: BACKUP -> MASTER (preempting a slower master)
$

Hmm unfortunately the following command is blocking now:

sh:
$ sysctl -a | grep carp
<6>carp: 1@vtnet0: MASTER -> INIT (hardware interface down)
<6>carp: demoted by 240 to 240 (interface down)
<6>carp: 1@vtnet0: INIT -> BACKUP (initailization complete)
<6>carp: demoted by -240 to 0 (interface up)
<6>carp: 1@vtnet0: BACKUP -> MASTER (preempting a slower master)
*** Blocking, no further response ***

sh:
$ ps -ax | grep sysctl
28171  u0    T+  0:00.01 sysctl -a
$ kill -15 28171
$ ps -ax | grep sysctl
28171  u0    T+  0:00.01 sysctl -a
$ kill -9 28171
$ ps -ax | grep sysctl
28171  u0    D+  0:00.01 sysctl -a
$

sh:
$ sysctl -a 
[...]
kern.geom.collectstatus: 1
kern.geom.notaste: 0
kern.geom.debugflags: 0
*** Blocking, no further response ***

sh:
$ ps -ax | grep sysctl
28171  u0    D+  0:00.01 sysctl -a
98725    0    D+  0:00.01 sysctl -a
$
 
Firstly, thanks so much for jumping through all these hoops!
Host A reports:
sh:
$ sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 240
net.inet.carp.log: 1
net.inet.carp.preempt: 0
net.inet.carp.dscp: 56
net.inet.carp.allow: 1
Yep, the demotion factor is why Host A remains in the backup state.
Trying to regain MASTER position on Host A by executing:
sh:
$ ifconfig vtnet0 up

Host A reports:
sh:
$ sysctl -a | grep carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 0
net.inet.carp.dscp: 56
net.inet.carp.allow: 1
$

sh:
$ dmesg | grep carp
carp: 1@vtnet0: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 240 (interface down)
carp: 1@vtnet0: INIT -> BACKUP (initailization complete)
carp: demoted by -240 to 0 (interface up)
I'm guessing the demotion factor is stored in the CARP multicast packets, and that's how it persists across a reboot. Interestingly, the carp(4) man page says 240 is the maximum advskew value, but the ifconfig(8) page says acceptable values are in the range of 0 to 256. Looks like the CARP page is right.

Hmm unfortunately the following command is blocking now:

sh:
$ sysctl -a | grep carp
<6>carp: 1@vtnet0: MASTER -> INIT (hardware interface down)
<6>carp: demoted by 240 to 240 (interface down)
<6>carp: 1@vtnet0: INIT -> BACKUP (initailization complete)
<6>carp: demoted by -240 to 0 (interface up)
<6>carp: 1@vtnet0: BACKUP -> MASTER (preempting a slower master)
*** Blocking, no further response ***
That sounds like a bug to me.
 
So I guess I will just stick with advbase 1 on host A and advbase 3 on host B which is reliable enough.

Jose, thank you very much again. :)
 
advscew and demotion factor are two different things.

IF the sum of advscew + carp_demotion > 240 (CARP_MAXSKEW) then the DEMOTE_ADVSKEW is set to 240
IF the sum of advscew + carp_demotion < 0 then DEMOTE_ADVSKEW is eq (advscew + carp_demotion)
 
but unfortunately I'm not able to reduce the failover time. It is working if advbase difference is greater then 2 seconds, I can shutdown HOST A and HOST B takes offer. If I bring up HOST A, it resumes its MASTER position.
I have no idea why only a difference greater than 2 works. If you have net.inet.carp.preempt disabled, and if you bring HOST A back up it should not become MASTER again as I see it. When you have net.inet.carp.preempt enabled the "competition" who becomes master and who stays in the backup position (old terminology: slave) is determined by the algorithm based on the frequency of the CARP advertisements sent out; parameters such as advskew etc. influence that frequency.

Introduction to CARP by Mariusz Zaborski (FreeBSD Journal • September/October 2022) may be helpful. Micheal Lucas has written some notes: CARP and devd on FreeBSD. Keep in mind though that FreeBSD's carp(4) did lose its load balancing capabilities sometime after the fork of OpenBSD; it was significantly rewritten in FreeBSD 10.0 (see carp(4) - History). However, with this caveat taken into consideration, for some more info about CARP you can look at Chapter 28. Introduction to the Common Address Redundancy Protocol (CARP) and CARP The Free Fail-over Protocol

When experimenting with CARP, besides using:
sh:
# ifconfig <interface> down # stop carp
# ifconfig <interface> up  # up carp
you can try # ifconfig <interface> vhid 1 state backup as used in 34.11. Common Address Redundancy Protocol (CARP). The example give there is of a setup with two masters and one common backup.

FWIW, I find the wording in the man page confusing.
I agree.
* This sentence in the examples section has me wondering "...advskew is above 0 so it could be overwritten in the emergency situation from the other host".
However hard I try, I get the distinct feeling that I'm missing the necessary context (within carp(4) and elsewhere) to understand what is really meant by this remark, a little more context added to the terseness would be welcome. The only small clue is perhaps from the log messages: Change examples to have master skew above 0 to have ability to overwrite this.
 
However hard I try, I get the distinct feeling that I'm missing the necessary context (within carp(4) and elsewhere) to understand what is really meant by this remark, a little more context added to the terseness would be welcome. The only small clue is perhaps from the log messages: Change examples to have master skew above 0 to have ability to overwrite this.
That commit message does not clear things up for me. I might have to give up and try to read the CARP source. I have the lazies, though.
 
Back
Top