Hello,
Recently I have migrated 2 firewalls in CARP setup from OpenBSD to FreeBSD 12.1-RELEASE on vmware VMs (ESXi 6.5) with E1000 interfaces. Both hosts are with 10Gbit interfaces connected to on single 10Gbit switch.
The CARP setup is working fine during the day but during the night I got some alerts saying that there's a package loss to the internal CARP IP address, also checking the dmesg I see lots of CARP failovers.
It appears to be caused by high network load during znapzend backups.
Today I have successfully reproduced it by running a simple
My setup:
Primary firewall:
Backup firewall:
Command executed:
The result:
This time I've noticed only CARP switches on the backup firewall however during the night I have also seen such behavior on the primary one too.
The most strange thing is that failovers and timeouts continue 10-15 minutes after the scp has been stopped.
This is very very very strange behavior, it's hard for me to believe that a simple scp can cause firewall failovers.
Have anybody seen this behavior before? Could it be a configuration issue?
I would appreciate any idea what may be causing it.
Thanks in advance
Recently I have migrated 2 firewalls in CARP setup from OpenBSD to FreeBSD 12.1-RELEASE on vmware VMs (ESXi 6.5) with E1000 interfaces. Both hosts are with 10Gbit interfaces connected to on single 10Gbit switch.
The CARP setup is working fine during the day but during the night I got some alerts saying that there's a package loss to the internal CARP IP address, also checking the dmesg I see lots of CARP failovers.
It appears to be caused by high network load during znapzend backups.
Today I have successfully reproduced it by running a simple
scp
of a larger file between the 2 firewalls.My setup:
Primary firewall:
Code:
carp: MASTER vhid 201 advbase 1 advskew 0
Backup firewall:
Code:
carp: BACKUP vhid 201 advbase 1 advskew 100
Command executed:
Code:
root@fw1:/usr/home/k_georgiev # scp /home/k_georgiev/test.file me@fw2.deltanews.lan:/usr/home/k_georgiev
Password for k_georgiev@fw2.deltanews.lan:
test.file 2% 165MB 47.2MB/s 02:37 ETA
root@fw1:/usr/home/k_georgiev #
The result:
Code:
Jan 1 15:01:54 fw2 su[96881]: k_georgiev to root on /dev/pts/1
Jan 1 15:15:36 fw2 kernel: carp: 203@em3: BACKUP -> MASTER (master timed out)
Jan 1 15:15:36 fw2 kernel: carp: 205@em0: BACKUP -> MASTER (master timed out)
Jan 1 15:15:36 fw2 kernel: carp: 200@em4: BACKUP -> MASTER (master timed out)
Jan 1 15:15:36 fw2 kernel: carp: 201@em1: BACKUP -> MASTER (master timed out)
Jan 1 15:15:36 fw2 kernel: carp: 202@em2: BACKUP -> MASTER (master timed out)
Jan 1 15:15:36 fw2 kernel: carp: 202@em2: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:36 fw2 kernel: em2: deletion failed: 3
Jan 1 15:15:36 fw2 kernel: carp: 201@em1: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:36 fw2 kernel: em1: deletion failed: 3
Jan 1 15:15:36 fw2 syslogd: last message repeated 1 times
Jan 1 15:15:36 fw2 kernel: carp: 205@em0: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:36 fw2 kernel: em0: deletion failed: 3
Jan 1 15:15:36 fw2 kernel: carp: 200@em4: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:36 fw2 kernel: em4: deletion failed: 3
Jan 1 15:15:36 fw2 syslogd: last message repeated 1 times
Jan 1 15:15:36 fw2 kernel: carp: 203@em3: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:36 fw2 kernel: em3: deletion failed: 3
Jan 1 15:15:39 fw2 kernel: carp: 201@em1: BACKUP -> MASTER (master timed out)
Jan 1 15:15:39 fw2 kernel: carp: 202@em2: BACKUP -> MASTER (master timed out)
Jan 1 15:15:39 fw2 kernel: carp: 203@em3: BACKUP -> MASTER (master timed out)
Jan 1 15:15:39 fw2 kernel: carp: 200@em4: BACKUP -> MASTER (master timed out)
Jan 1 15:15:39 fw2 kernel: carp: 205@em0: BACKUP -> MASTER (master timed out)
Jan 1 15:15:40 fw2 kernel: carp: 200@em4: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:40 fw2 kernel: em4: deletion failed: 3
Jan 1 15:15:40 fw2 syslogd: last message repeated 1 times
Jan 1 15:15:40 fw2 kernel: carp: 203@em3: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:40 fw2 kernel: em3: deletion failed: 3
Jan 1 15:15:40 fw2 kernel: carp: 202@em2: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:40 fw2 kernel: em2: deletion failed: 3
Jan 1 15:15:40 fw2 kernel: carp: 205@em0: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:40 fw2 kernel: em0: deletion failed: 3
Jan 1 15:15:40 fw2 kernel: carp: 201@em1: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:40 fw2 kernel: em1: deletion failed: 3
Jan 1 15:15:40 fw2 syslogd: last message repeated 1 times
Jan 1 15:15:43 fw2 kernel: carp: 200@em4: BACKUP -> MASTER (master timed out)
Jan 1 15:15:43 fw2 kernel: carp: 205@em0: BACKUP -> MASTER (master timed out)
Jan 1 15:15:43 fw2 kernel: carp: 202@em2: BACKUP -> MASTER (master timed out)
Jan 1 15:15:43 fw2 kernel: carp: 203@em3: BACKUP -> MASTER (master timed out)
Jan 1 15:15:43 fw2 kernel: carp: 201@em1: BACKUP -> MASTER (master timed out)
Jan 1 15:15:44 fw2 kernel: carp: 202@em2: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:44 fw2 kernel: em2: deletion failed: 3
Jan 1 15:15:44 fw2 kernel: carp: 201@em1: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:44 fw2 kernel: em1: deletion failed: 3
Jan 1 15:15:44 fw2 syslogd: last message repeated 1 times
Jan 1 15:15:44 fw2 kernel: carp: 205@em0: MASTER -> BACKUP (more frequent advertisement received)
Jan 1 15:15:44 fw2 kernel: em0: deletion failed: 3
Jan 1 15:15:44 fw2 kernel: carp: 200@em4: MASTER -> BACKUP (more frequent advertisement received)
..................
This time I've noticed only CARP switches on the backup firewall however during the night I have also seen such behavior on the primary one too.
The most strange thing is that failovers and timeouts continue 10-15 minutes after the scp has been stopped.
This is very very very strange behavior, it's hard for me to believe that a simple scp can cause firewall failovers.
Have anybody seen this behavior before? Could it be a configuration issue?
I would appreciate any idea what may be causing it.
Thanks in advance