CARP (dns/dhcp/tftp) network boot.

Hi,
My setup is like following.
server1 = primary server setup with carp, it is running the dnsmasq and it is also the gateway for my local network 192.168.1.0/24
server2
= secondary server setup with carp, it is also running the dnsmasq (/var/db/dnsmasq.leases is copied from server1) and it is also the gateway for my local network 192.168.1.0/24.
The machines on the local network boot from the network, the tftp and http server in on 10.0.0.0/24 network.

The server1 and server2 are identical apart from the advskew value, on server1 the value is set to 10 and on server2 the value is set to 100, this is done so that the server1 always become master when it is up.
When the server1 is up, the machines on local network boot via network without any problem, but when I bring the server1 down the server2 becomes master, there is no problem with the connectivity, everything works apart from the network-boot. When the server1 was active up, for the machine machine-1 the network-boot worked with the following logs on the tftp/pxe boot server.
Code:
18:26:15.104738 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 101: 192.168.0.113.25303 > 10.0.0.4.69: TFTP, length 59, RRQ "/grub/x86_64-efi/terminal.lst" octet blk
size 1024 tsize 0
18:26:15.105433 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25303 > 10.0.0.4.32957: UDP, length 4
18:26:15.105868 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25303 > 10.0.0.4.32957: UDP, length 4
18:26:15.111715 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 86: 192.168.0.113.25304 > 10.0.0.4.69: TFTP, length 44, RRQ "/grub/grub.cfg" octet blksize 1024 tsize
0
18:26:15.112330 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25304 > 10.0.0.4.60529: UDP, length 4
18:26:15.112771 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25304 > 10.0.0.4.60529: UDP, length 4
18:26:15.118482 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 93: 192.168.0.113.25305 > 10.0.0.4.69: TFTP, length 51, RRQ "/hosts/machine-1.cfg" octet blksize 1024 tsize 0
18:26:15.119114 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25305 > 10.0.0.4.41332: UDP, length 4
18:26:15.119580 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25305 > 10.0.0.4.41332: UDP, length 4
18:26:20.132284 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags , seq 12513, win 8192, length 0
18:26:20.132700 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 4066878295, win 8192, length 0   << Can't see acknowledge this in the logs above )
18:26:20.132730 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 169: 192.168.0.113.21550 > 10.0.0.4.80: Flags [P.], seq 0:115, ack 1, win 8192, length 115: HTTP: GET /pxe/folder/vmlinuz-5.19.0-41-generic HTTP/1.1
18:26:20.134678 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 537, win 8192, length 0
18:26:20.134701 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 1073, win 8192, length 0
18:26:20.134723 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 1609, win 8192, length 0

For the same machine-1 when server2 is active the logs are as follows (no ack packet, why ? udp seems successful in this case but tcp failing as far as I can see).
Code:
20:17:17.804445 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 101: 192.168.1.113.25303 > 10.0.0.4.69: TFTP, length 59, RRQ "/grub/x86_64-efi/terminal.lst" octet blksize 1024 tsize 0
20:17:17.805032 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25303 > 10.0.0.4.39191: UDP, length 4
20:17:17.805425 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25303 > 10.0.0.4.39191: UDP, length 4
20:17:17.811267 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 86: 192.168.1.113.25304 > 10.0.0.4.69: TFTP, length 44, RRQ "/grub/grub.cfg" octet blksize 1024 tsize 0
20:17:17.811911 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25304 > 10.0.0.4.45806: UDP, length 4
20:17:17.812314 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25304 > 10.0.0.4.45806: UDP, length 4
20:17:17.818021 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 93: 192.168.1.113.25305 > 10.0.0.4.69: TFTP, length 51, RRQ "/hosts/machine-1.cfg" octet blksize 1024 tsize 0
20:17:17.818656 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25305 > 10.0.0.4.33872: UDP, length 4
20:17:17.819077 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25305 > 10.0.0.4.33872: UDP, length 4
20:17:22.829446 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.21550 > 10.0.0.4.80: Flags , seq 12595, win 8192, length 0
20:17:23.228637 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.21550 > 10.0.0.4.80: Flags , seq 12595, win 8192, length 0
20:17:23.628777 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.21550 > 10.0.0.4.80: Flags , seq 12595, win 8192, length 0


When the server2 becomes the master and I reboot machine machine-1 it as explained earlier doesn't boot via network, but when I switch over the active role to server1 the machine machine-1 still gives me the same error and tcpdump and rest of the machines i.e. machine-2 and machine-3 boot from network without any problems. I feel that there is somewhere cached mac addresses which doesn't let the machine-1 sent the ack packet or may be it doesn't get the seq packets from 10.0.0.4 but why it gets the TFTP to work which is hosted on the same server (10.0.0.4) ?

Has it got something to do with tcp/udp communication ? It seems like the udp communication takes place fine and files get downloaded via tftp, the problem occurs when communication switches to tcp i.e. http.
The content of machine-1.cfg is following.
Code:
set default="1"
set timeout=5
menuentry ' folder- DISK' {
  linux (http,10.0.0.4)/pxe/folder/vmlinuz-5.19.0-41-generic root=[URL]http://10.0.0.4/pxe/folder/.squashfs[/URL] loglevel=6 overlayroot=crypt:mkfs=1,dev=/dev/disk/by-partlabel/overlay
  initrd (http,10.0.0.4)/pxe/folder/initrd.img-5.19.0-41-generic
}
When server2 is active this screen appears on machine-1 after the grub timer goes off, then I press any key and get the main grub screen with the menu (second picture).
1693481570595.png

Then following appears and pressing enter boots the machine-1 properly.
1693481729066.png
 
Your issue can be anywhere, staring firewall on server-1/2 ending network switch or misconfiguration services within server1/2.
Try to compare your configurations (including software) between server-1 and server-2. Then, check firewall on server-1 and server-2. May be, extend tcpdump output to look inside into packets ("-X" option) or logs at your web-server.
 
Thanks for replying, how the traffic is passed to the vlan-interface ? Does the traffic arrive at the parent interface (with the mac-address of the parent interface) and then it is routed to the vlan interface ?
I have checked the traffic arriving at the parent interface via the tcpdump, the destination-mac address is the mac-address of the vlan interface and not the mac address of the parent interface (physical interface).
 
It doesn't called "routing" in classic meaning. It's like a trunk port on the switch. But in common sense, yes.
On parent interface you can see a MACs all descendant VLANs, but on vlan interface - only this vlan traffic.
Some notice about MACs and why you don't see requests for parent MAC. When you want to reach some network object (site, PC, samba share,...) your device looking for IP, then asked a MAC which corresponds this IP. Then your PC pushes the traffic to this MAC. If we consider a parent interface only with vlans, without any IPs, so, it's normal, when you don't see request to parent's MAC, because it doesn't hold any IPs or resource on level3 OSI model. But, descendant interface holds the IP.
You can see a parent MAC only when network devices asked (for example, "who is connected to my port" - asked s switch and obtain all MACs, including parent and descendants) from time to time each other on level2 OSI model.
 
Back
Top