Local network with dns problem

I have a strange local network configuration to deal with. Connection is NATed in 172.16 segment, many useful ports are closed (no NTP service, for example). And DNS are only those from two addresses given in that segment by DHCP something like 172.16.1.254, no access to external DNS servers.

Still I want some caching DNS on my machine, may be for 192.168 WiFi subnet (that is the next step with its additional problems).
But either bind/named or unbound do not work as expected.

I managed to start named, configured with "forward" directive directing to known local DNS resolvers, but
Code:
dig @localhost google.com
always gives no answer (servfail answer), although
Code:
dig @172.16.1.254 google.com
have no problem resolving that address.

Is there a way to get a working local caching DNS server on my machine? And how to diagnose what is really wrong?
 
Have you tried a simple nameserver directive in /etc/resolv.conf nameserver 172.16.1.254?
Yes, this works. But this does not give me local caching DNS and every request goes upstream to that 176.16 address, does it?
 
I edited my original post when I realised you wanted to do caching (while you were responding). If you can get to 172.16.1.254 for DNS queries without problems, then you should be able to do a caching name server. The method I use on my Raspberry Pi is to use dnsmasq(8). Here is my dnsmasq.conf:
Code:
domain-needed
bogus-priv
strict-order
no-resolv
server=43.229.60.176
server=1.0.0.1
server=208.67.220.220
server=8.8.4.4
server=9.9.9.9
listen-address=127.0.0.1
listen-address=192.168.1.254
cache-size=10000
no-negcache
conf-dir=/etc/dnsmasq.d
dhcp-mac=set:client_is_a_pi,B8:27:EB:*:*:*
dhcp-reply-delay=tag:client_is_a_pi,2
dhcp-name-match=set:wpad-ignore,wpad
dhcp-ignore-names=tag:wpad-ignore
I have removed the lines relating to the (optional) DHCP service. Note that my subnet is 192.168.1.0/24 and the DNS cache service listens on 192.168.1.254. You can ignore the two lines relating to "client_is_a_pi". Also, your "conf-dir" will probably be /usr/local/etc/dnsmasq.d. You would also need to change the "server=" lines to point to your DNS server(s).

Also, /etc/resolv.conf needs to point to the local host:
Code:
$ cat /etc/resolv.conf
nameserver 127.0.0.1
 
I have a strange local network configuration to deal with. Connection is NATed in 172.16 segment, many useful ports are closed (no NTP service, for example). And DNS are only those from two addresses given in that segment by DHCP something like 172.16.1.254, no access to external DNS servers.

Still I want some caching DNS on my machine, may be for 192.168 WiFi subnet (that is the next step with its additional problems).
But either bind/named or unbound do not work as expected.

I managed to start named, configured with "forward" directive directing to known local DNS resolvers, but
Code:
dig @localhost google.com
always gives no answer (servfail answer), although
Code:
dig @172.16.1.254 google.com
have no problem resolving that address.

Is there a way to get a working local caching DNS server on my machine? And how to diagnose what is really wrong?

Try dig @127.0.0.1 google.com or dig @localhost -4 google.com. Your name server is not listening on IPv6.
 
Try dig @127.0.0.1 google.com or dig @localhost -4 google.com. Your name server is not listening on IPv6.
My nameserver (now) listens on IPv6, if it was not, I'd get "no servers could be reached" error, but not an answer with "SERVFAIL" in status field with "QUERY: 1, ANSWER: 0" in flags line.
And yes, I did "-4" and "127.0.0.1" without luck.
 
For a caching name server, I would recommend dnsmasq(8).
Isn't dnsmasq a DHCP server also? I'll try it, although I'm more accustomed with ISC-DHCP server and BIND DNS server, which worked fine for me at home, when I used them to make WiFi AP via my FreeBSD machine.

I have problems with DHCP too.
For some reason ifconfig shows me two IPv4 addresses for a wired interface, one of them is 0.0.0.0. I'm trying dhcpcd(8) now, without much improvement though.
 
I can't reproduce your problem. Though I'm running bind918.
I can't reproduce it on my home machine also. So the question, how to debug it, to diagnose the problem and solve it.

It looks very strange for me, that I have written working local DNS servers addresses in forwarders section in named.conf, but get SERVFAIL answer from it.

Rebuilt bind918 port a dozen of times, even with different compilers. Nothing changed.
 
I can't reproduce it on my home machine also. So the question, how to debug it, to diagnose the problem and solve it.

It looks very strange for me, that I have written working local DNS servers addresses in forwarders section in named.conf, but get SERVFAIL answer from it.

Rebuilt bind918 port a dozen of times, even with different compilers. Nothing changed.
tcpdump -i lo0 udp port 53 or tcp port 53

This should give you an idea where to look next.
 
tcpdump -i lo0 udp port 53 or tcp port 53

This should give you an idea where to look next.
Code:
15:33:57.126723 IP (tos 0x0, ttl 64, id 56596, offset 0, flags [none], proto UDP (17), length 79, bad cksum 0 (->9f87)!)
    localhost.41482 > localhost.domain: 44668+ [1au] A? google.com. (51)
15:33:57.134712 IP (tos 0x0, ttl 64, id 37103, offset 0, flags [none], proto UDP (17), length 95, bad cksum 0 (->eb9c)!)
    localhost.domain > localhost.41482: 44668 ServFail 0/0/1 (67)
15:33:57.503246 IP (tos 0x0, ttl 64, id 37104, offset 0, flags [none], proto UDP (17), length 79, bad cksum 0 (->ebab)!)
    localhost.38457 > localhost.domain: 1225+ [1au] PTR? 1.0.0.127.in-addr.arpa. (51)
15:33:57.503482 IP (tos 0x0, ttl 64, id 56600, offset 0, flags [none], proto UDP (17), length 102, bad cksum 0 (->9f6c)!)
    localhost.domain > localhost.38457: 1225* 1/0/1 1.0.0.127.in-addr.arpa. PTR localhost. (74)
15:33:59.430778 IP6 (flowlabel 0x53082, hlim 64, next-header UDP (17) payload length: 59) localhost.36292 > localhost.domain: [bad udp cksum 0x004e -> 0x1c4b!] 11412+ [1au] A? google.com. (51)
15:33:59.432794 IP6 (hlim 64, next-header UDP (17) payload length: 75) localhost.domain > localhost.36292: [bad udp cksum 0x005e -> 0x9007!] 11412 ServFail 0/0/1 (67)
15:33:59.502724 IP (tos 0x0, ttl 64, id 37106, offset 0, flags [none], proto UDP (17), length 129, bad cksum 0 (->eb77)!)
    localhost.19603 > localhost.domain: 15950+ [1au] PTR? 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa. (101)
15:33:59.502954 IP (tos 0x0, ttl 64, id 56601, offset 0, flags [none], proto UDP (17), length 152, bad cksum 0 (->9f39)!)
    localhost.domain > localhost.19603: 15950* 1/0/1 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa. PTR localhost. (124)

I'm in doubts. What is the UDP checksum error, and from where it could arise???
 
Your problem could be anywhere. Apps, doing DNS lookps and NTP, simply open UDP sockets, letting the kernel do the rest. DHCP OTOH does funky things with packets so it must manipulate packets in such a way that raw sockets are needed.

All we see ATM is a snippet of tcpdump output. My wild guess is that you may have a driver or hardware problem. If you do have a hardware issue, disabling hardware checksum (rxcsum, rxcsum6, txcsum, and txcsum6) may be a workaround.

It may not be a hardware problem but the driver may not be communicating with your NIC properly.

Which NIC are the packets passing through? Is it a member of a lagg(4) or bridge(4)?

Would you by chance be using any of the packet filters (ipfw, pf, ipflilter)?
 
Your problem could be anywhere. Apps, doing DNS lookps and NTP, simply open UDP sockets, letting the kernel do the rest. DHCP OTOH does funky things with packets so it must manipulate packets in such a way that raw sockets are needed.

All we see ATM is a snippet of tcpdump output. My wild guess is that you may have a driver or hardware problem. If you do have a hardware issue, disabling hardware checksum (rxcsum, rxcsum6, txcsum, and txcsum6) may be a workaround.

It may not be a hardware problem but the driver may not be communicating with your NIC properly.

Which NIC are the packets passing through? Is it a member of a lagg(4) or bridge(4)?

Would you by chance be using any of the packet filters (ipfw, pf, ipflilter)?
Well, as far as I can see local interface do not make checksums and it's O.K. The same is with my good working home computer. It also gives "bad cksum" on localhost, although no ServFail and good answers from bind-named.

No lagg nor bridge for now (only planning to bridge ethernet to wireless interface), and the same DNS problem was via wlan (ath and run) and is now via ethernet (age). As far as I am aware, no packet filters are in action. How to become totally sure?
 
I missed this was on lo0.

If you haven't set up a packet filter (kernel firewall), then it's not an issue.

I suspect some kind of corruption somewhere. Bad RAM???

If this is a recently updated system could you have been bitten by one of the ZFS regressions?
 
I missed this was on lo0.

If you haven't set up a packet filter (kernel firewall), then it's not an issue.

I suspect some kind of corruption somewhere. Bad RAM???

If this is a recently updated system could you have been bitten by one of the ZFS regressions?
If it was a bad RAM it would be obvious in other activities like compilation of kernel, world and a bunch of ports, wouldn't it? And as far as I remember memcheck was O.K.

The system has only a UFS disc.

And how to check if there's no any filtering?
Any debugging options for bind itself?
 
If it was a bad RAM it would be obvious in other activities like compilation of kernel, world and a bunch of ports, wouldn't it? And as far as I remember memcheck was O.K.

The system has only a UFS disc.

And how to check if there's no any filtering?
Any debugging options for bind itself?
It's hard to tell without more information.

Are there any out of the ordinary messages in /var/log/messages?

Do you see any other packets with bad checksums?

Have you tried disabling rxcsum, rxcsum6, txcsum, and txcsum6, as suggested previously?
 
Hoorah! The problem was that provided DNS servers of the local network do not make trusted responses.
"broken trust chain" appered in logs when I managed to configure them (not too friendly with bind).
And only "dnssec-validation no;" on named's options solved the problem (somehow, although without trust).
May be I should consider using DNS-over-TLS or DNS-over-HTTPS, shouldn't I? Any experience (tutorial) on it?
 
Back
Top