Hi,
My setup is like following.
server1 = primary server setup with carp, it is running the dnsmasq and it is also the gateway for my local network 192.168.1.0/24
server2= secondary server setup with carp, it is also running the dnsmasq (/var/db/dnsmasq.leases is copied from server1) and it is also the gateway for my local network 192.168.1.0/24.
The machines on the local network boot from the network, the tftp and http server in on 10.0.0.0/24 network.
The server1 and server2 are identical apart from the advskew value, on server1 the value is set to 10 and on server2 the value is set to 100, this is done so that the server1 always become master when it is up.
When the server1 is up, the machines on local network boot via network without any problem, but when I bring the server1 down the server2 becomes master, there is no problem with the connectivity, everything works apart from the network-boot. When the server1 was active up, for the machine machine-1 the network-boot worked with the following logs on the tftp/pxe boot server.
For the same machine-1 when server2 is active the logs are as follows (no ack packet, why ? udp seems successful in this case but tcp failing as far as I can see).
When the server2 becomes the master and I reboot machine machine-1 it as explained earlier doesn't boot via network, but when I switch over the active role to server1 the machine machine-1 still gives me the same error and tcpdump and rest of the machines i.e. machine-2 and machine-3 boot from network without any problems. I feel that there is somewhere cached mac addresses which doesn't let the machine-1 sent the ack packet or may be it doesn't get the seq packets from 10.0.0.4 but why it gets the TFTP to work which is hosted on the same server (10.0.0.4) ?
Has it got something to do with tcp/udp communication ? It seems like the udp communication takes place fine and files get downloaded via tftp, the problem occurs when communication switches to tcp i.e. http.
The content of machine-1.cfg is following.
When server2 is active this screen appears on machine-1 after the grub timer goes off, then I press any key and get the main grub screen with the menu (second picture).
Then following appears and pressing enter boots the machine-1 properly.
My setup is like following.
server1 = primary server setup with carp, it is running the dnsmasq and it is also the gateway for my local network 192.168.1.0/24
server2= secondary server setup with carp, it is also running the dnsmasq (/var/db/dnsmasq.leases is copied from server1) and it is also the gateway for my local network 192.168.1.0/24.
The machines on the local network boot from the network, the tftp and http server in on 10.0.0.0/24 network.
The server1 and server2 are identical apart from the advskew value, on server1 the value is set to 10 and on server2 the value is set to 100, this is done so that the server1 always become master when it is up.
When the server1 is up, the machines on local network boot via network without any problem, but when I bring the server1 down the server2 becomes master, there is no problem with the connectivity, everything works apart from the network-boot. When the server1 was active up, for the machine machine-1 the network-boot worked with the following logs on the tftp/pxe boot server.
Code:
18:26:15.104738 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 101: 192.168.0.113.25303 > 10.0.0.4.69: TFTP, length 59, RRQ "/grub/x86_64-efi/terminal.lst" octet blk
size 1024 tsize 0
18:26:15.105433 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25303 > 10.0.0.4.32957: UDP, length 4
18:26:15.105868 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25303 > 10.0.0.4.32957: UDP, length 4
18:26:15.111715 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 86: 192.168.0.113.25304 > 10.0.0.4.69: TFTP, length 44, RRQ "/grub/grub.cfg" octet blksize 1024 tsize
0
18:26:15.112330 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25304 > 10.0.0.4.60529: UDP, length 4
18:26:15.112771 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25304 > 10.0.0.4.60529: UDP, length 4
18:26:15.118482 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 93: 192.168.0.113.25305 > 10.0.0.4.69: TFTP, length 51, RRQ "/hosts/machine-1.cfg" octet blksize 1024 tsize 0
18:26:15.119114 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25305 > 10.0.0.4.41332: UDP, length 4
18:26:15.119580 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.25305 > 10.0.0.4.41332: UDP, length 4
18:26:20.132284 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags , seq 12513, win 8192, length 0
18:26:20.132700 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 4066878295, win 8192, length 0 << Can't see acknowledge this in the logs above )
18:26:20.132730 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 169: 192.168.0.113.21550 > 10.0.0.4.80: Flags [P.], seq 0:115, ack 1, win 8192, length 115: HTTP: GET /pxe/folder/vmlinuz-5.19.0-41-generic HTTP/1.1
18:26:20.134678 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 537, win 8192, length 0
18:26:20.134701 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 1073, win 8192, length 0
18:26:20.134723 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.0.113.21550 > 10.0.0.4.80: Flags [.], ack 1609, win 8192, length 0
For the same machine-1 when server2 is active the logs are as follows (no ack packet, why ? udp seems successful in this case but tcp failing as far as I can see).
Code:
20:17:17.804445 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 101: 192.168.1.113.25303 > 10.0.0.4.69: TFTP, length 59, RRQ "/grub/x86_64-efi/terminal.lst" octet blksize 1024 tsize 0
20:17:17.805032 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25303 > 10.0.0.4.39191: UDP, length 4
20:17:17.805425 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25303 > 10.0.0.4.39191: UDP, length 4
20:17:17.811267 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 86: 192.168.1.113.25304 > 10.0.0.4.69: TFTP, length 44, RRQ "/grub/grub.cfg" octet blksize 1024 tsize 0
20:17:17.811911 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25304 > 10.0.0.4.45806: UDP, length 4
20:17:17.812314 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25304 > 10.0.0.4.45806: UDP, length 4
20:17:17.818021 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 93: 192.168.1.113.25305 > 10.0.0.4.69: TFTP, length 51, RRQ "/hosts/machine-1.cfg" octet blksize 1024 tsize 0
20:17:17.818656 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25305 > 10.0.0.4.33872: UDP, length 4
20:17:17.819077 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.25305 > 10.0.0.4.33872: UDP, length 4
20:17:22.829446 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.21550 > 10.0.0.4.80: Flags , seq 12595, win 8192, length 0
20:17:23.228637 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.21550 > 10.0.0.4.80: Flags , seq 12595, win 8192, length 0
20:17:23.628777 d4:5d:64:bb:95:e4 > 56:8d:a0:19:e8:91, ethertype IPv4 (0x0800), length 60: 192.168.1.113.21550 > 10.0.0.4.80: Flags , seq 12595, win 8192, length 0
When the server2 becomes the master and I reboot machine machine-1 it as explained earlier doesn't boot via network, but when I switch over the active role to server1 the machine machine-1 still gives me the same error and tcpdump and rest of the machines i.e. machine-2 and machine-3 boot from network without any problems. I feel that there is somewhere cached mac addresses which doesn't let the machine-1 sent the ack packet or may be it doesn't get the seq packets from 10.0.0.4 but why it gets the TFTP to work which is hosted on the same server (10.0.0.4) ?
Has it got something to do with tcp/udp communication ? It seems like the udp communication takes place fine and files get downloaded via tftp, the problem occurs when communication switches to tcp i.e. http.
The content of machine-1.cfg is following.
Code:
set default="1"
set timeout=5
menuentry ' folder- DISK' {
linux (http,10.0.0.4)/pxe/folder/vmlinuz-5.19.0-41-generic root=[URL]http://10.0.0.4/pxe/folder/.squashfs[/URL] loglevel=6 overlayroot=crypt:mkfs=1,dev=/dev/disk/by-partlabel/overlay
initrd (http,10.0.0.4)/pxe/folder/initrd.img-5.19.0-41-generic
}
Then following appears and pressing enter boots the machine-1 properly.