How to make your software completely bugfree!

PMc · Nov 15, 2021

Okay, I'm getting error messages from named/BIND. They read
named[2453]: dnssec: warning: managed-keys-zone: Failed to create fetch for DNSKEY update

That definetly doesn't look good. It is related to dnssec, so something with the DNS security does not work.
Searching the internet for the message does not find anything useful - it seems nobody else gets that error. But I get it on every machine where a named is running, no matter how it is configured.

Looking into the code - a comment in the code just says

Code:

                /*                                                         
                 * Something is broken.                                     
                 */

when the error message is triggered. Well then, who wouldn't have imagined that?

Now given that

my nameservers work just fine, otherwise, and
according to the docs there is no special configuration needed for DNSSEC, and basic recursing operation all should work with the defaults,

then follows, the thing that is broken here is, with the highest probability, the nameserver code itself.

So, friendly&helpful as I am, I decided to report the matter to the developers and ask why it happens (and why it apparently happens only here).

The answer I got, was this:

We can’t really help you if you don’t share any details of your installation
and configuration (hint: You can use `named-checkconf -px` to scrub the
configuration).

So far, you shared a **single line** from the log and nothing else.

Alright, far enough. I won't start by sending unsolicited bulk data to anywhere, but it's fine when requested.
So I collected these checkconfs, added the full logfile data, and sent that back.
And now I got a different reply:

----- The following addresses had permanent fatal errors -----
<ondrej@isc.org>
(reason: 550 5.7.1 Blocked by SpamAssassin)
<bind-users@lists.isc.org>
(reason: 550 5.7.1 Blocked by SpamAssassin)

----- Transcript of session follows -----
... while talking to mx.pao1.isc.org.:
>>> DATA
<<< 550 5.7.1 Blocked by SpamAssassin
554 5.0.0 Service unavailable

So, what do we have here? I think we have the perfect scheme for 100% bugfree software:

require your users to send defects to a mailinglist
inform your users that defect reports will only be considered when they contain detailed configuration data
configure the spamblocker on the mailinglist so that it rejects mails with such detailed configuration data.

From then on, you will not be bothered by defect reports anymore!

So now, we have already seen many approaches how the Ivory-Tower-League (aka the developer elite) achieves to separate themselves from the ordinary user-plebs and to disable working communication with the inferiors.
But this one I couldn't have imagined. It is full of elegance, truly recommendable, and actually brilliant. This would indeed fit into modern-day business-consulting.

OTOH, from the user perspective, concerning support... Stop, lets clarify this: this is NOT about "support". Support is something different, and as long as you don't pay for the service, you cannot expect support. Support is also not an issue, because you have the sourcecode and could always DIY - to just make the thing somehow work.

So this is about something different: this is about the developers not allowing people to tell them where their crap is broken.
Now this is understandable: all people want to feel great, and don't like critical statements - and defect reports are in fact critical statements - even worse: critical statements from the user-plebs!
But then, covering this with a statement like "we are not able to support you" and trying to turn things around that way - well, that is simply a lie.

Jose · Nov 15, 2021

Ghostbusters code comments!

There's something weird, and it don't look good...

mer · Nov 15, 2021

Having written lots of code over the years, I've come across a lot of comments "should never get here" and you wind up seeing the log "you got here".

Now sometimes the problem happens because other people change code. a switch statement and forgetting to put a break; in? Yeah weird stuff happens.

PMc · Nov 15, 2021

That post fired to early, it was posted when only half-written.

mer · Nov 15, 2021

PMc your last couple sentences, true. Part of the problem is developing code one often gets a sense of infallibility. Unit testing, it typically testing for success. "let me test this value in the valid range". Good QA goes and tests "what about values outside the range? +/- 1, way out, etc" Lots and lots of code bugs are "off by one".
Human nature makes it hard to accept "You made a mistake". Even hard when it's in public (mailing lists). Good developers (that I've worked with) go "oh crap, you think I mucked up? I don't think so, but if so, prove it" Sounds harsh, but if you do prove it, about all you get is "oh crap. right. thanks" and maybe a mention in a bug report.
Fair? No, but too often bug reports are "this sucks, fix it" and even the best get jaded after a bit.

Please note: I am in no way excusing any bad or suspect behavior, just that human nature seems to abhor admitting fault.

kpedersen · Nov 15, 2021

Hehe, that is amusingly unfortunate. However, possibly they just have a poorly thought out spam filter (I find email is getting more and more flakey these days as Google, Outlook and friends are slowly reducing all credibility). Perhaps pastebin and send a link?

PMc · Nov 15, 2021

kpedersen said:
Hehe, that is amusingly unfortunate. However, possibly they just have a poorly thought out spam filter

Yes, probably they have. Or probably, the data (which they explicitely requested, and which was autocreated with their tool!), namely my nameserver configuration, contains some 20'000 zones with the names of all known spammers, in order to block resolution of these names.
These lists are public on the internet, so this is a known usecase of a nameserver.

(I nowadays often start asking myself: am I the only one remaining who does use that grey stuff between the ears?)

kpedersen said:
Perhaps pastebin and send a link?

I am very tired of this all. I am tired of always being brought to my knees only to be allowed to report a defect. As if it were a gratitude from the developers to listen to it.

eternal_noob · Nov 15, 2021

PMc said:
Ivory-Tower-League

PMc said:
As if it were a gratitude from the developers to listen to it.

Yes. That's why i will never open a PR again. They just don't take you serious.

PMc · Nov 15, 2021

mer said:
PMc your last couple sentences, true. Part of the problem is developing code one often gets a sense of infallibility. Unit testing, it typically testing for success. "let me test this value in the valid range". Good QA goes and tests "what about values outside the range? +/- 1, way out, etc" Lots and lots of code bugs are "off by one".

That's too true. Coding something that does something, is rather simple. Covering the corner cases, so that it can acutally run in production, is a multiple amount of work.
And testing is high art, and, as I think, overvalued: you can only test against things that you already know - and when you already know them, you could as well just code them correctly in the first place. Therefore, the main usefulness of testing is to protect against regressions.

mer said:
Human nature makes it hard to accept "You made a mistake". Even hard when it's in public (mailing lists). Good developers (that I've worked with) go "oh crap, you think I mucked up? I don't think so, but if so, prove it" Sounds harsh, but if you do prove it, about all you get is "oh crap. right. thanks" and maybe a mention in a bug report.
Fair? No, but too often bug reports are "this sucks, fix it" and even the best get jaded after a bit.

I'm well with You. And what makes the issue more dramatic is the hierarchy, the separation between producers and mere consumers. Because, when you have a hierarchy, you must be better than those below: you cannot admit faults.
In the old times all of us were just computer people, were somehow equal, and there was not much of a problem with fixing the stuff, because all of us knew that it is not yet perfect by any means.
Nowadays it is still not yet perfect by any means, but nowadays it is big business, and some people make billions with it and therefore propagate the lie that it all is perfect if only you decide to buy their crap.
And that does certainly not improve things.

mer · Nov 15, 2021

Sometimes the hardest thing is letting go. I know I've done that a time or two, but step back, deep breath it's all good.

PMc · Nov 15, 2021

eternal_noob said:
Yes. That's why i will never open a PR again. They just don't take you serious.

Yes, I need actionable wording. Not sure what's on Your mind.
In this specific case, I have the error message appear every time named is started. From the code I can see, the task will then be retried after one hour - in accordance with RFC 5011. There is no more error message after that hour, and from the timestamps on the files I can see that the task is then actually done and appears to be successful.

At this point, the error message might be considered harmless and the issue closed.

But then, the fact that an error condition is reached during the startup of named, while the same task works correctly at a later time, shows that there is a logical flaw in the startup sequence. It may be that just the RFC-5011 stuff is done too early, or it may be a more serious matter.
The other matter is that I spent some effort to figure that out so far. And I don't like that effort being spent just per my curiousity, without achieving some greater good for all of us.

But this does not work when it needs more effort to achieve a means to talk to the respnsibles, than to properly pinpoint the issue. After I get the issue fixed for me (or identified as currently harmless on my site), I just cannot spend much further work into it, because there are so many other issues: Two days ago I identified and fixed an issue with net/dhcpcd not working properly on FreeBSD - I then found that already discussed here https://githubmemory.com/repo/rsmarples/dhcpcd/issues/59, and while the proposed solution does not work well and instead renders my entire site stuck in "no buffer space available", the developer explains that it is necessary to do things in that way, for reaons I don't fully grasp (but then, I'm not so very long into IPv6 yet.) You can find my comment on the matter here.

Before that there was the issue that suricata-6 consumes 6-8 Watts more per instance than suricata-5 when run mostly-idle in a guest OS. But my tools are running suricata in IPFW mode, and I have a statement from the developers that they dont support that mode anymore because nobody is running it (or something along that line).

I tend to forget these things a few weeks after having them resolved. I think the one before was the firefox/stapling/TLSv1.3 matter - which I did drive to it's very end, and which then resolved as being magically fixed in the very newest release of firefox (which was not yet in existance when I reported the issue).

Oh yes, in between after firefox were the three or four kernel patches needed to get IPv6 properly integrate with IPFW... but this one is not yet concluded, just postponed for now.

So yes, indeed, if anybody thinks proper PRs should be filed for all of this, they are absolutely welcome to do just that! (and cope with all the buerocracy and argueing). Because I am already doing fulltime, with no reserve.

PMc · Nov 19, 2021

Finally,
if anybody might come along to here because of the quote of this very error message. I will post the explanation as I figured out:

The error appears when slaving the root-zones. named.conf (the default) gives you two options: either use a hint-file or set up zone replication from the root server. When doing the second, this error may come up, because the init sequence seems to do the rfc-5011 stuff before loading the zones.

Developers have been informed, and may now do what they want.

Cath O'Deray · Nov 20, 2021

PMc said:
Yes, I need actionable wording.

PMc said:
How to make your software completely bugfree!

– is not truly actionable.

hardworkingnewbie · Nov 20, 2021

PMc said:
So, what do we have here? I think we have the perfect scheme for 100% bugfree software:

require your users to send defects to a mailinglist

inform your users that defect reports will only be considered when they contain detailed configuration data

configure the spamblocker on the mailinglist so that it rejects mails with such detailed configuration data.

We are talking here about BIND9 by the ISC, right?

To quote https://www.isc.org/reportbug/:

Reporting issues in BIND 9

To report a non-security-related bug or request a feature in BIND, please navigate to our GitLab instance and enter your issue there. You will have the option of marking your issue confidential, if necessary. You will need to create an account on GitLab, but you can link your credentials from another GitLab instance or social media account. Once you have logged your issue, you may follow up with us via email to info@isc.org. We do not enable new issue creation via email because of spam problems.

So makes me wonder why you didn't find their bug tracker, and just filed it there.

Issues · ISC Open Source Projects / BIND · GitLab

Welcome to the public repository for BIND 9 source code and issues. Classic, full-featured and mostly standards-compliant DNS.

gitlab.isc.org

It took me just one minute to find it.

To sum it up: they've got a fully operational and public bug tracker. Nobody is forced to report errors by email to them.

jamie · Jun 3, 2022

PMc said:
Finally,
if anybody might come along to here because of the quote of this very error message. I will post the explanation as I figured out:

The error appears when slaving the root-zones. named.conf (the default) gives you two options: either use a hint-file or set up zone replication from the root server. When doing the second, this error may come up, because the init sequence seems to do the rfc-5011 stuff before loading the zones.

Developers have been informed, and may now do what they want.

Ahhhhh. Thanks for posting the update. I did indeed arrive here after googling. I'd made some unrelated changes to the dns server, and saw this message pop up. It wasn't in the logs because the server is rarely restarted, and yes, I secondary the root zone too.

I know not to worry about it now. Cheers!