Inspired by some of Trevor's recent git bisecting to find issues I lay out the details of an issue I discovered recently.
I had a super super weird DNS issue this morning
I got notice that there was some new powerdns releases, both recursive and authoritative. These were also noted on debian security announced, so I installed all of them on my various servers without thinking too much. Almost immediately after upgrading the ones in Winnipeg I started getting bombarded with notifications, mostly about one particular zone check failing -- ciscodude.net.
I started investigating my various local resolvers trying to nail down the problem, and I noticed a lot of records missing all over the place, and missing inconsistently too. An A record for something would be there but not its AAAA. NS records would be missing or incomplete. I used the handy `rec_control trace-regex <domain>` on the recursors. This started to show very weird things, like my closest nameserver returning different zone's NS records when queried for NS records.
How could this even be possible I thought?
I then started `dig`ing the same queries against the suspect nameserver, and YES indeed it was returning NS records from a whole different zone!!! So, of course when powerdns recursor checked with the 2nd, 3rd and 4th nameservers, they disagreed with what had been returned by the first one.
This caused DNS for Ciscodude.net to fail, and some other scattered fallout for other domains that used ciscodude.net nameserver records (most of my domains) while the primary served random records.
Investigating the issue on #powerdns IRC, it was quickly identified that the configuration of 2 backends on this one server (mysql + bind files) was likely causing the issue, and then I was quickly pointed to a commit, and then a second commit [1] for 4.x which likely introduced the bug. I was also pointed to the package archives where I was able to install the previous version for now on that one server to get things back up and running.
Github was simultaneously having an HTTP issue of some sort and so I haven't been able to compile the sources minus that particular commit yet to confirm that the commit is indeed the problem.
1: https://github.com/PowerDNS/pdns/commit/b854d9fec6f7e5636ab4742d716c7d848e0c...
Theodore Baschak - AS395089 - Hextet Systems https://ciscodude.net/ - https://hextet.systems/ http://mbix.ca/