Solved Local unbound DNS cache returns SERVFAIL for my own domain

I have a couple of jails in a bridged network:
  • ns0.test.domain (10.0.48.10): authoritative DNS server with dns/nsd
  • ns1.test.domain (10.0.48.11): DNS cache with dns/unbound, in which ns0 is responsible for the zone "test.domain." and the rest is forwarded to the outside Internet DNS servers.
  • ci.test.domain (10.0.48.43): using local_unbound as a host-local DNS cache, forwarding everything to ns1. (I know that having a DNS cache on ci is unnecessary, I am just wondering why it does not work?)
ns0 and ns1 work exactly as expected. ns1 caches everything correctly and serves queries to "test.domain." and external queries.
The local_unbound on ci however cannot resolve anything related to "test.domain.", although it is supposed to delegate all queries to ns1. The return code is SERVFAIL. I do not understand why?

Here is a log demonstrating the issue:
Bash:
# jls
   JID  IP Address      Hostname                      Path
   ...
    53                  ci.test.domain                /.../ci.test.domain/mnt

# jexec ci_test_domain

root@ci:/ # cat /etc/resolv.conf
search test.domain
# nameserver 10.0.48.11
nameserver 127.0.0.1
options edns0
The initial local_unbound setup commented out ns1 and set 127.0.0.1 as expected.
Bash:
root@ci:/ # cat /etc/resolvconf.conf
# This file was generated by local-unbound-setup.
# Modifications will be overwritten.
resolv_conf="/dev/null" # prevent updating /etc/resolv.conf
unbound_conf="/var/unbound/forward.conf"
unbound_pid="/var/run/local_unbound.pid"
unbound_service="local_unbound"
unbound_restart="service local_unbound reload"

root@ci:/ # sysrc local_unbound_enable
local_unbound_enable: YES

root@ci:/ # service local_unbound status
local_unbound is running as pid 71504.
local_unbound is enabled and running.
Bash:
root@ci:/ # ls -lad /etc/unbound
lrwxr-xr-x  1 root  wheel  14 23 Okt.  2020 /etc/unbound -> ../var/unbound

root@ci:/ # ls /etc/unbound/
conf.d         control.conf   forward.conf   lan-zones.conf root.key       unbound.conf

root@ci:/ # cat /etc/unbound/forward.conf
# This file was generated by local-unbound-setup.
# Modifications will be overwritten.
forward-zone:
        name: .
        forward-addr: 10.0.48.11
Unbound is automatically configured to forward all queries to ns1 (as expected).
Bash:
root@ci:/ # ping 10.0.48.11
PING 10.0.48.11 (10.0.48.11): 56 data bytes
64 bytes from 10.0.48.11: icmp_seq=0 ttl=64 time=0.197 ms
64 bytes from 10.0.48.11: icmp_seq=1 ttl=64 time=0.248 ms
64 bytes from 10.0.48.11: icmp_seq=2 ttl=64 time=0.563 ms
64 bytes from 10.0.48.11: icmp_seq=3 ttl=64 time=0.367 ms
^C
--- 10.0.48.11 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.197/0.344/0.563/0.141 ms
The network connection to ns1 works properly.
Bash:
root@ci:/ # drill ci.test.domain
;; ->>HEADER<<- opcode: QUERY, rcode: SERVFAIL, id: 47176
;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; ci.test.domain.      IN      A

;; ANSWER SECTION:

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 24 msec
;; SERVER: 127.0.0.1
;; WHEN: Mon Sep 27 18:23:23 2021
;; MSG SIZE  rcvd: 32
Resolving ci.test.domain fails with SERVFAIL, returned by the local_unbound???
Bash:
root@ci:/ # drill @10.0.48.11 ci.test.domain
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 10855
;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; ci.test.domain.      IN      A

;; ANSWER SECTION:
ci.test.domain. 46085   IN      CNAME   test.domain.
test.domain.    46085   IN      A       10.0.48.43

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 0 msec
;; SERVER: 10.0.48.11
;; WHEN: Mon Sep 27 18:23:32 2021
;; MSG SIZE  rcvd: 62
Resolving works when calling ns1 directly.
Bash:
root@ci:/ # drill www.google.com
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 42504
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; www.google.com.      IN      A

;; ANSWER SECTION:
www.google.com. 300     IN      A       172.217.23.100

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 156 msec
;; SERVER: 127.0.0.1
;; WHEN: Wed Sep 29 18:35:32 2021
;; MSG SIZE  rcvd: 48
Local resolving of an external domain name works also locally.

What am I missing in this picture?

P.S. I searched for similar topics but was unable to find similar threads. I do not use DNSSEC (at least not intentionally).
 
Try removing that options edns0 line from your /etc/resolv.conf.
SirDice, thanks for the suggestion. I tried that but the behavior did not change :( :
Bash:
 # service jail restart ci.test.domain
Stopping jails: ci_test_domain.
Starting jails: ci_test_domain.
 # jexec ci_test_domain
root@ci:/ # sysrc local_unbound_enable
local_unbound_enable: YES
root@ci:/ # service local_unbound status
local_unbound is running as pid 55772.
root@ci:/ # cat /etc/resolv.conf
# Генерирано от lansnap, Mo. 27 Sep. 2021 18:45:52 CEST
search test.domain
# nameserver 10.0.48.11
nameserver 127.0.0.1
# options edns0
root@ci:/ # host ci.test.domain
Host ci.test.domain not found: 2(SERVFAIL)
My assumption is that the local_unbound at ci is misconfigured because when I switch to ns1 in resolv.conf the resolving works (even with the edns0 option).
 
Update: I have tried the workaround proposed in this thread: https://forums.freebsd.org/threads/unbound-fails-to-resolve-some-hostnames.53269/#post-299270
I changed the module-config to "iterator" and now resolving works:
Bash:
root@ci:/ # cat /etc/resolv.conf
search test.domain
# nameserver 10.0.48.11
nameserver 127.0.0.1
# options edns0

root@ci:/ # cat /etc/unbound/conf.d/test.domain.conf
server:
    module-config: "iterator"

root@ci:/ # host ci.test.domain
ci.test.domain is an alias for test.domain.
test.domain has address 10.0.48.43

root@ci:/ # drill ci.test.domain
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 51868
;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; ci.test.domain.      IN      A

;; ANSWER SECTION:
ci.test.domain. 86330   IN      CNAME   test.domain.
test.domain.    86330   IN      A       10.0.48.43

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 0 msec
;; SERVER: 127.0.0.1
;; WHEN: Fri Oct  1 13:54:53 2021
;; MSG SIZE  rcvd: 62

If I understand correctly, this turns off the DNSSEC validation for the cache resolver. I am not sure why this influences the behavior of unbound in this way. Anyway, I will mark the thread as SOLVED but if anyone can shed some more light on this, it would be awesome.
 
Back
Top