DNSSEC stopped working.

byrnejb · Friday at 1:31 AM

bind918-9.18.28

We have had our DNSSEC setup working for years. At some point in the recent past it simply failed. The proximate cause is that the DS records at the .ca. registrar no longer match what the master dns zone is providing. There were no changes in the recent past to the configuration files. Everything is in its expected place. I have run through DNSviz and this is what it tells me:

ca to harte-lyne.ca: No valid RRSIGs made by a key corresponding to a DS RR were found covering the DNSKEY RRset, resulting in no secure entry point (SEP) into the zone. See RFC 4035, Sec. 2.2, RFC 6840, Sec. 5.11. (216.185.71.33, 216.185.71.34, UDP_-_EDNS0_4096_D_KN)

ca to harte-lyne.ca: The DS RRset for the zone included algorithm 8 (RSASHA256), but no DS RR matched a DNSKEY with algorithm 8 that signs the zone's DNSKEY RRset. See RFC 4035, Sec. 2.2, RFC 6840, Sec. 5.11. (216.185.71.33, 216.185.71.34, UDP_-_EDNS0_4096_D_KN)

harte-lyne.ca zone: The server(s) were not responsive to queries over UDP. See RFC 1035, Sec. 4.2. (216.185.71.133, 216.185.71.134)

harte-lyne.ca/A: No response was received from the server over UDP (tried 12 times). See RFC 1035, Sec. 4.2. (216.185.71.133, 216.185.71.134, UDP_-_NOEDNS_)

harte-lyne.ca/NS: No response was received from the server over UDP (tried 12 times). See RFC 1035, Sec. 4.2. (216.185.71.133, 216.185.71.134, UDP_-_NOEDNS_)

harte-lyne.ca/SOA: No RRSIG covering the RRset was returned in the response. See RFC 4035, Sec. 3.1.1. (216.185.71.33, 216.185.71.34, TCP_-_EDNS0_4096_D_N, UDP_-_EDNS0_4096_D_KN, UDP_-_EDNS0_4096_D_KN_0x20)

We have a network connectivity issue with our offsite (dns02 and dns04) so the no response errors are expected.

Now the key signing key files have not changed for years. I am at a loss to understand what has happened. The only unusual event in the recent past is that a journal file became corrupted and prevented the master zone from loading. This was not caught until several days after it happened. But purging the jnl file and reloading cleared that issue.

I would appreciate very much if someone with more experience than me could shed some light on what has happened and how I go about curing it.

PMc · Friday at 11:31 AM

Hm hm, well, what has happend is, it is broken. But You knew that already.

From what I can see, your ca registrar holds a DS record for key # 37852, while your dns presents key # 57965.
How you managed to achieve that, I have no idea. As You say, this has run for years, then one of the two ends must have recently changed. Either the registrar's record, or the key in your DNS.

I would suggest You work through Your change logs, compare with Your backup, or something along that line.

Generally, if the problem persists or goes deeper, I have experienced the BIND users mailing list as extremely helpful - the developers are there and are very engaged.

Now, as we are here already - when I approached the task to setup DNSSEC, I decided to do a few things substantially different, because I wanted to never run into a situation like this. (These ideas are probably not suitable for everybody, and they likely sacrifice performance for improved reliability.)

Don't run the DNSSEC key management inside the BIND server daemon. Run it on a separate instance that does key-management only, and then just send the readily-signed zonefiles into the primary server. That separate instance could even be disconnected from the network entirely, giving you kind-of military grade security and no need for a hardware crypto device.
This makes it a lot easier to understand what has happened when and why, because the concerns are strictly separated.
Make the keys redundant. This is problematic, because it makes the DNS replies big, and that means they have to switch to TCP, and that does really cost performance. But then, keys are redundant - if you loose connection to one key (like in this case), there is still another that works.
Run continuous rollover. Setting up a static config may seem easier, but then, if you ever need to do a rollover (key compromized or whatever), then you're in hell, because you haven't done it in years.
With continuous rollover, the rollover is what already happens, because it always happens.

That's how it looks: https://dnsviz.net/d/daemon.contact/dnssec/

DNSSEC stopped working.

byrnejb

PMc