At home, I run a FreeBSD server, which acts as a router, firewall, and DNS/DHCP/NTP/... server. It has one internal ethernet, which all computers in the house are connected to. It has two external ethernet ports, one each for the old and the new internet providers: the old one has been used for many years, the new one was added last week. I use pf as both a firewall (to make sure no obnoxious traffic enters, and to filter some undesired connections), and to provide NAT of the many internal hosts to the outside world. Selecting which outside internet provider to use requires only "route change default <which gateway>". All that works fine, except that named fails. And without name service, nothing is really usable.
So let's talk about the DNS configuration. I run a full installation of bind, because (a) our external internet has always been unreliable, and I want the internal network to continue functioning, and (b) I have quite a few internal computers that should have names, but are intentionally invisible from the outside world, so they are not listed in the public name service that runs in the cloud (that's called split horizon DNS). The way named is configured is: It is the authoritative server for our internal domain zone (and the inverses required), and otherwise it uses the default setup of going to the root servers (not to 8.8.8.8 or to our internet provider's DNS server). The config in /etc/resolv.conf is to use 192.168.0.1 as the name server, and that is also given (via DHCP) to all internal clients.
Here is what goes wrong: If I just switch the route command (above) to use the new internet connection, all DNS queries for external things fail; my DNS server returns 2(SERVFAIL). Restarting the named server (with "service named restart") after switching to the new connection does not help, but it least gives me one extra message in the log: "managed-keys-zone: No DNSKEY RRSIGs found for '.': success" (whatever that might mean). If I don't restart named, it eventually puts out lots of warning messages: "validating <domainname>com.wlan0/NS: bad cache hit (wlan0/DS)". Again, I have no idea what this means, and note that my server does not have any WiFi hardware and no wlan0 device.
My suspicion is that the DNS server has some form of internal cache (perhaps related to DNSSEC and keys), and that cache refers to what external IP address it received the cached data from. And I suspect that the /usr/local/etc/namedb/working/managed-keys.bind file plays a role in that. I've tried flushing all caches and reload the named daemon with "rndc flush" and "rndc reload", plus restarting it with "service named restart", none of that helps.
I have verified that the new internet service passes all packets, including DNS packets to/from port 53, using nc. The problem seems to be solely on my end, with my named refusing to cooperate. So what am I doing wrong? How does named remember what IP address it saw recently, to know to behave badly when that IP address changes?
So let's talk about the DNS configuration. I run a full installation of bind, because (a) our external internet has always been unreliable, and I want the internal network to continue functioning, and (b) I have quite a few internal computers that should have names, but are intentionally invisible from the outside world, so they are not listed in the public name service that runs in the cloud (that's called split horizon DNS). The way named is configured is: It is the authoritative server for our internal domain zone (and the inverses required), and otherwise it uses the default setup of going to the root servers (not to 8.8.8.8 or to our internet provider's DNS server). The config in /etc/resolv.conf is to use 192.168.0.1 as the name server, and that is also given (via DHCP) to all internal clients.
Here is what goes wrong: If I just switch the route command (above) to use the new internet connection, all DNS queries for external things fail; my DNS server returns 2(SERVFAIL). Restarting the named server (with "service named restart") after switching to the new connection does not help, but it least gives me one extra message in the log: "managed-keys-zone: No DNSKEY RRSIGs found for '.': success" (whatever that might mean). If I don't restart named, it eventually puts out lots of warning messages: "validating <domainname>com.wlan0/NS: bad cache hit (wlan0/DS)". Again, I have no idea what this means, and note that my server does not have any WiFi hardware and no wlan0 device.
My suspicion is that the DNS server has some form of internal cache (perhaps related to DNSSEC and keys), and that cache refers to what external IP address it received the cached data from. And I suspect that the /usr/local/etc/namedb/working/managed-keys.bind file plays a role in that. I've tried flushing all caches and reload the named daemon with "rndc flush" and "rndc reload", plus restarting it with "service named restart", none of that helps.
I have verified that the new internet service passes all packets, including DNS packets to/from port 53, using nc. The problem seems to be solely on my end, with my named refusing to cooperate. So what am I doing wrong? How does named remember what IP address it saw recently, to know to behave badly when that IP address changes?