I found the connection from my secondary to primary nameserver producing errors, and the primary ignoring XFR connects from the secondary.
The primary is running:
It has ports configured:
And it seems to listen on these ports:
When I open a telnet from a neighbouring node, it looks like this:
That is what I normally expect. However, when I do the same from the secondary nameserver, it looks like this:
One would assume a network issue. But there is none. The packets do arrive at the destination, only there they get ignored.
This is how the working telnet connection from the neighbour node looks at the destination
And this is the connection from the secondary nameserver:
The packets do appear at the destination, but nothing is done with them.
I did switch the local firewall to pass-thru, so that cannot be the reason.
I have no idea what else in a machine could just swallow packets without notice, and do that selectively only for those connections which I need.
The good thing is, this happens only with IPv4. The same link in IPv6 does work. That is good, because while spitting errors, the nameserver will still work. It is also bad, because this could already be here for a longer time, no idea when it appeared (because I don't think it was there from the beginning - should have noticed).
The primary is running:
Code:
# ps ax | grep named
13667 - IsJ 0:00.39 /usr/local/sbin/named -n 1 -u bind -c /usr/local/etc/namedb/named.conf
It has ports configured:
Code:
listen-on port 53 { 192.168.97.24; };
listen-on-v6 port 53 { fd00::118; };
And it seems to listen on these ports:
Code:
$ netstat -an
tcp6 0 0 fd00::118.53 *.* LISTEN
tcp4 0 0 192.168.97.24.53 *.* LISTEN
udp6 0 0 fd00::118.53 *.*
udp4 0 0 192.168.97.24.53 *.*
When I open a telnet from a neighbouring node, it looks like this:
Code:
pmc@disp:511:1~$ telnet 192.168.97.24 53
Trying 192.168.97.24...
Connected to admn-e.intra.daemon.contact.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
That is what I normally expect. However, when I do the same from the secondary nameserver, it looks like this:
Code:
$ telnet 192.168.97.24 53
Trying 192.168.97.24...
telnet: connect to address 192.168.97.24: Operation timed out
telnet: Unable to connect to remote host
One would assume a network issue. But there is none. The packets do arrive at the destination, only there they get ignored.
This is how the working telnet connection from the neighbour node looks at the destination
Code:
# tcpdump -xxninadmn1l
20:40:01.604840 ARP, Request who-has 192.168.97.24 tell 192.168.97.18, length 46
0x0000: ffff ffff ffff 061d 9201 0222 0806 0001
0x0010: 0800 0604 0001 061d 9201 0222 c0a8 6112
0x0020: 0000 0000 0000 c0a8 6118 0000 0000 0000
0x0030: 0000 0000 0000 0000 0000 0000
20:40:01.604913 ARP, Reply 192.168.97.24 is-at 06:1d:92:01:01:05, length 28
0x0000: 061d 9201 0222 061d 9201 0105 0806 0001
0x0010: 0800 0604 0002 061d 9201 0105 c0a8 6118
0x0020: 061d 9201 0222 c0a8 6112
20:40:01.604993 IP 192.168.97.18.64497 > 192.168.97.24.53: Flags [S], seq 1911977491, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3668593643 ecr 0], length 0
0x0000: 061d 9201 0105 061d 9201 0222 0800 4510
0x0010: 003c 0000 4000 4006 f730 c0a8 6112 c0a8
0x0020: 6118 fbf1 0035 71f6 7613 0000 0000 a002
0x0030: ffff f9be 0000 0204 05b4 0103 0306 0402
0x0040: 080a daaa 4beb 0000 0000
20:40:01.605033 IP 192.168.97.24.53 > 192.168.97.18.64497: Flags [S.], seq 1403100325, ack 1911977492, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 244552087 ecr 3668593643], length 0
0x0000: 061d 9201 0222 061d 9201 0105 0800 4500
0x0010: 003c 0000 4000 4006 f740 c0a8 6118 c0a8
0x0020: 6112 0035 fbf1 53a1 9ca5 71f6 7614 a012
0x0030: ffff 693c 0000 0204 05b4 0103 0306 0402
0x0040: 080a 0e93 9197 daaa 4beb
And this is the connection from the secondary nameserver:
Code:
20:40:56.717735 IP 192.168.99.1.41219 > 192.168.97.24.53: Flags [S], seq 3201504914, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 67229358 ecr 0], length 0
0x0000: 061d 9201 0105 061d 9201 0202 0800 4510
0x0010: 003c 0000 4000 3e06 f741 c0a8 6301 c0a8
0x0020: 6118 a103 0035 bed3 1692 0000 0000 a002
0x0030: ffff e396 0000 0204 05b4 0103 0306 0402
0x0040: 080a 0401 d6ae 0000 0000
20:40:57.717784 IP 192.168.99.1.41219 > 192.168.97.24.53: Flags [S], seq 3201504914, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 67230358 ecr 0], length 0
0x0000: 061d 9201 0105 061d 9201 0202 0800 4510
0x0010: 003c 0000 4000 3e06 f741 c0a8 6301 c0a8
0x0020: 6118 a103 0035 bed3 1692 0000 0000 a002
0x0030: ffff dfae 0000 0204 05b4 0103 0306 0402
0x0040: 080a 0401 da96 0000 0000
20:40:59.917842 IP 192.168.99.1.41219 > 192.168.97.24.53: Flags [S], seq 3201504914, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 67232558 ecr 0], length 0
0x0000: 061d 9201 0105 061d 9201 0202 0800 4510
0x0010: 003c 0000 4000 3e06 f741 c0a8 6301 c0a8
0x0020: 6118 a103 0035 bed3 1692 0000 0000 a002
0x0030: ffff d716 0000 0204 05b4 0103 0306 0402
0x0040: 080a 0401 e32e 0000 0000
The packets do appear at the destination, but nothing is done with them.
I did switch the local firewall to pass-thru, so that cannot be the reason.
I have no idea what else in a machine could just swallow packets without notice, and do that selectively only for those connections which I need.
The good thing is, this happens only with IPv4. The same link in IPv6 does work. That is good, because while spitting errors, the nameserver will still work. It is also bad, because this could already be here for a longer time, no idea when it appeared (because I don't think it was there from the beginning - should have noticed).