Solved Socket selectively ignores IPv4 (nameserver XFR failure)

I found the connection from my secondary to primary nameserver producing errors, and the primary ignoring XFR connects from the secondary.

The primary is running:

Code:
# ps ax | grep named
13667  -  IsJ  0:00.39 /usr/local/sbin/named -n 1 -u bind -c /usr/local/etc/namedb/named.conf

It has ports configured:

Code:
        listen-on port 53       { 192.168.97.24; };
        listen-on-v6 port 53    { fd00::118; };

And it seems to listen on these ports:

Code:
$ netstat -an
tcp6       0      0 fd00::118.53           *.*                    LISTEN
tcp4       0      0 192.168.97.24.53       *.*                    LISTEN
udp6       0      0 fd00::118.53           *.*
udp4       0      0 192.168.97.24.53       *.*

When I open a telnet from a neighbouring node, it looks like this:

Code:
pmc@disp:511:1~$ telnet 192.168.97.24 53
Trying 192.168.97.24...
Connected to admn-e.intra.daemon.contact.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

That is what I normally expect. However, when I do the same from the secondary nameserver, it looks like this:

Code:
$ telnet 192.168.97.24 53
Trying 192.168.97.24...
telnet: connect to address 192.168.97.24: Operation timed out
telnet: Unable to connect to remote host

One would assume a network issue. But there is none. The packets do arrive at the destination, only there they get ignored.

This is how the working telnet connection from the neighbour node looks at the destination

Code:
# tcpdump -xxninadmn1l
20:40:01.604840 ARP, Request who-has 192.168.97.24 tell 192.168.97.18, length 46
        0x0000:  ffff ffff ffff 061d 9201 0222 0806 0001
        0x0010:  0800 0604 0001 061d 9201 0222 c0a8 6112
        0x0020:  0000 0000 0000 c0a8 6118 0000 0000 0000
        0x0030:  0000 0000 0000 0000 0000 0000
20:40:01.604913 ARP, Reply 192.168.97.24 is-at 06:1d:92:01:01:05, length 28
        0x0000:  061d 9201 0222 061d 9201 0105 0806 0001
        0x0010:  0800 0604 0002 061d 9201 0105 c0a8 6118
        0x0020:  061d 9201 0222 c0a8 6112
20:40:01.604993 IP 192.168.97.18.64497 > 192.168.97.24.53: Flags [S], seq 1911977491, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3668593643 ecr 0], length 0
        0x0000:  061d 9201 0105 061d 9201 0222 0800 4510
        0x0010:  003c 0000 4000 4006 f730 c0a8 6112 c0a8
        0x0020:  6118 fbf1 0035 71f6 7613 0000 0000 a002
        0x0030:  ffff f9be 0000 0204 05b4 0103 0306 0402
        0x0040:  080a daaa 4beb 0000 0000
20:40:01.605033 IP 192.168.97.24.53 > 192.168.97.18.64497: Flags [S.], seq 1403100325, ack 1911977492, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 244552087 ecr 3668593643], length 0
        0x0000:  061d 9201 0222 061d 9201 0105 0800 4500
        0x0010:  003c 0000 4000 4006 f740 c0a8 6118 c0a8
        0x0020:  6112 0035 fbf1 53a1 9ca5 71f6 7614 a012
        0x0030:  ffff 693c 0000 0204 05b4 0103 0306 0402
        0x0040:  080a 0e93 9197 daaa 4beb

And this is the connection from the secondary nameserver:

Code:
20:40:56.717735 IP 192.168.99.1.41219 > 192.168.97.24.53: Flags [S], seq 3201504914, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 67229358 ecr 0], length 0
        0x0000:  061d 9201 0105 061d 9201 0202 0800 4510
        0x0010:  003c 0000 4000 3e06 f741 c0a8 6301 c0a8
        0x0020:  6118 a103 0035 bed3 1692 0000 0000 a002
        0x0030:  ffff e396 0000 0204 05b4 0103 0306 0402
        0x0040:  080a 0401 d6ae 0000 0000
20:40:57.717784 IP 192.168.99.1.41219 > 192.168.97.24.53: Flags [S], seq 3201504914, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 67230358 ecr 0], length 0
        0x0000:  061d 9201 0105 061d 9201 0202 0800 4510
        0x0010:  003c 0000 4000 3e06 f741 c0a8 6301 c0a8
        0x0020:  6118 a103 0035 bed3 1692 0000 0000 a002
        0x0030:  ffff dfae 0000 0204 05b4 0103 0306 0402
        0x0040:  080a 0401 da96 0000 0000
20:40:59.917842 IP 192.168.99.1.41219 > 192.168.97.24.53: Flags [S], seq 3201504914, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 67232558 ecr 0], length 0
        0x0000:  061d 9201 0105 061d 9201 0202 0800 4510
        0x0010:  003c 0000 4000 3e06 f741 c0a8 6301 c0a8
        0x0020:  6118 a103 0035 bed3 1692 0000 0000 a002
        0x0030:  ffff d716 0000 0204 05b4 0103 0306 0402
        0x0040:  080a 0401 e32e 0000 0000

The packets do appear at the destination, but nothing is done with them.
I did switch the local firewall to pass-thru, so that cannot be the reason.

I have no idea what else in a machine could just swallow packets without notice, and do that selectively only for those connections which I need.
The good thing is, this happens only with IPv4. The same link in IPv6 does work. That is good, because while spitting errors, the nameserver will still work. It is also bad, because this could already be here for a longer time, no idea when it appeared (because I don't think it was there from the beginning - should have noticed).
 
This did resolve as a checksum error on the packets, originally caused by libalias
 
Code:
pmc@disp:511:1~$ telnet 192.168.97.24 53
Trying 192.168.97.24...
Connected to admn-e.intra.daemon.contact.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
Nothing to do with the issue, but did you know you can easily check this with nc -zv 192.168.97.24 53?

Code:
     -v      Have nc give more verbose output.
Code:
     -z      Specifies that nc should just scan for listening daemons, without
             sending any data to them.  It is an error to use this option in
             conjunction with the -l option.
nc(1)
 
  • Thanks
Reactions: PMc
Back
Top