help! dns zone delegation wonky for AAAA

List overview All Threads
Download

newer

older

mailing list error

home network hangs up when ISP...

Trevor Cordes

21 Nov 2016 21 Nov '16

9:57 a.m.

I'm seeing some weird behaviour related to AAAA and delegation I'd like to correct with a BIND DNS setup. I have no AAAA records anywhere. Some lookup tools/libraries insist on looking up AAAA, I want them to fail immediately. All servers/clients involved are run with the -4 option to run all traffic over IPv4.

The problem is that I'm seeing occassional lookup delays for AAAA records on some boxes (the ones that delegate), but not other ones (every other box).

On my box (BOX1) I'm authoritative for foo.com (only for my internal networks). On the same box, I delegate sub.foo.com to ns.com (BOX2).

BOX2 is authoritative for foo.com and sub.foo.com. I do this so BOX1 can have local dynamic DNS for local Windows boxes, etc, on foo.com. Whereas the BOX2 view is for the whole world, to which I don't want to share the existence of windows.foo.com, etc. A bit messy, but this has worked for me for 15 years.

The problem symptoms:

I run "host bar.sub.foo.com " on the boxes:

BOX1: bar.sub.foo.com has address 1.2.3.4 Host bar.sub.foo.com not found: 2(SERVFAIL) bar.sub.foo.com mail is handled by 5 bar.sub.foo.com. <often delays 5-10sec before giving the SERVFAIL

BOX2 (and every other box in the world except BOX1!!): bar.sub.foo.com has address 1.2.3.4 bar.sub.foo.com mail is handled by 5 bar.sub.foo.com.

I don't want the delay or the SERVFAIL on BOX1.

The host command by default does a lookup of AA, AAAA and MX in that order. That's fine. But I want them all to run without delay, and the AAAA to be ignored like it is on BOX2. Again, there are no AAAA records in any of these zone files.

I think I'm seeing the precise bug discussed here:

https://tools.ietf.org/html/draft-ietf-dnsop-misbehavior-against-aaaa-00 search to: 4.4 Make Lame Delegation

That document doesn't seem to provide any solutions.

I think the issue is when BOX2 (or any box but BOX1) does a lookup, it checks only with BOX2, sees there's no AAAA and happily ignores AAAA. I think in essence it's like "I'm BOX2, I'm authoritative and I have no AAAA". host is happy with this.

With BOX1, it does a lookup with BOX1's named which recurses out to the delegation on BOX2. BOX2 says the same as it did above, but this time it's talking to BOX1 named, not the host command. BOX1 named must be saying "I thought BOX2 was authoritative, but I find no AAAA so it's not authoritative after all, and I don't know anyone who is so I'm spewing this error SERVFAIL". I'm just guessing here.

I want the host command on BOX1 to behave the same as BOX2. Can it be done? I actually was seeing the exact same problem with the nonexistent bar.sub.foo.com MX record and I solved it by adding an MX record for it on BOX2. However, I don't want any AAAA record on any box, as none of them have IPv6 addresses! Surely there must be a solution to this weird problem.

Possibly relevant is how dig behaves with different usage:

BOX1#dig -tAAAA @localhost bar.sub.foo.com ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 2619 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION: ;bar.sub.foo.com. IN AAAA

BOX1#dig -tAAAA @ns.com bar.sub.foo.com ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5477 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; WARNING: recursion requested but not available

;; QUESTION SECTION: ;bar.sub.foo.com. IN AAAA

;; AUTHORITY SECTION: foo.com. 86400 IN SOA ns.com. 17 1800 300 604800 86400

BOX2#dig -tAAAA @localhost bar.sub.foo.com **pretty much the same output as above 2nd example, NOERROR** BOX2#dig -tAAAA @ns.com bar.sub.foo.com **pretty much the same output as above 2nd example, NOERROR**

It's that SERVFAIL in example dig #1 above that I want to eliminate, and thus also the SERVFAIL with host.

Thanks!

Show replies by date

Adam Thompson

21 Nov 21 Nov

1:32 p.m.

Sounds like a bug in host(1), which has been deprecated for several years now. Recommended solution: switch to "dig +short" instead. -Adam

On November 21, 2016 3:57:42 AM CST, Trevor Cordes trevor@tecnopolis.ca wrote:

...

I'm seeing some weird behaviour related to AAAA and delegation I'd like to correct with a BIND DNS setup. I have no AAAA records anywhere. Some lookup tools/libraries insist on looking up AAAA, I want them to fail immediately. All servers/clients involved are run with the -4 option to run all traffic over IPv4.

The problem is that I'm seeing occassional lookup delays for AAAA records on some boxes (the ones that delegate), but not other ones (every other

box).

On my box (BOX1) I'm authoritative for foo.com (only for my internal networks). On the same box, I delegate sub.foo.com to ns.com (BOX2).

BOX2 is authoritative for foo.com and sub.foo.com. I do this so BOX1 can have local dynamic DNS for local Windows boxes, etc, on foo.com. Whereas the BOX2 view is for the whole world, to which I don't want to share the existence of windows.foo.com, etc. A bit messy, but this has worked for me for 15 years.

The problem symptoms:

I run "host bar.sub.foo.com " on the boxes:

BOX1: bar.sub.foo.com has address 1.2.3.4 Host bar.sub.foo.com not found: 2(SERVFAIL) bar.sub.foo.com mail is handled by 5 bar.sub.foo.com. <often delays 5-10sec before giving the SERVFAIL

BOX2 (and every other box in the world except BOX1!!): bar.sub.foo.com has address 1.2.3.4 bar.sub.foo.com mail is handled by 5 bar.sub.foo.com.

I don't want the delay or the SERVFAIL on BOX1.

The host command by default does a lookup of AA, AAAA and MX in that order. That's fine. But I want them all to run without delay, and the

AAAA to be ignored like it is on BOX2. Again, there are no AAAA records in any of these zone files.

I think I'm seeing the precise bug discussed here:

https://tools.ietf.org/html/draft-ietf-dnsop-misbehavior-against-aaaa-00 search to: 4.4 Make Lame Delegation

That document doesn't seem to provide any solutions.

I think the issue is when BOX2 (or any box but BOX1) does a lookup, it checks only with BOX2, sees there's no AAAA and happily ignores AAAA. I think in essence it's like "I'm BOX2, I'm authoritative and I have no AAAA". host is happy with this.

With BOX1, it does a lookup with BOX1's named which recurses out to the

delegation on BOX2. BOX2 says the same as it did above, but this time it's talking to BOX1 named, not the host command. BOX1 named must be saying "I thought BOX2 was authoritative, but I find no AAAA so it's not authoritative after all, and I don't know anyone who is so I'm spewing this error SERVFAIL". I'm just guessing here.

I want the host command on BOX1 to behave the same as BOX2. Can it be done? I actually was seeing the exact same problem with the nonexistent bar.sub.foo.com MX record and I solved it by adding an MX record for it on BOX2. However, I don't want any AAAA record on any box, as none of them have IPv6 addresses! Surely there must be a solution to this weird problem.

Possibly relevant is how dig behaves with different usage:

BOX1#dig -tAAAA @localhost bar.sub.foo.com ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 2619 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; QUESTION SECTION: ;bar.sub.foo.com. IN AAAA

BOX1#dig -tAAAA @ns.com bar.sub.foo.com ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5477 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; WARNING: recursion requested but not available

;; QUESTION SECTION: ;bar.sub.foo.com. IN AAAA

;; AUTHORITY SECTION: foo.com. 86400 IN SOA ns.com. 17 1800 300 604800 86400

BOX2#dig -tAAAA @localhost bar.sub.foo.com **pretty much the same output as above 2nd example, NOERROR** BOX2#dig -tAAAA @ns.com bar.sub.foo.com **pretty much the same output as above 2nd example, NOERROR**

It's that SERVFAIL in example dig #1 above that I want to eliminate, and thus also the SERVFAIL with host.

Thanks! _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Trevor Cordes

22 Nov 22 Nov

6:03 a.m.

On 2016-11-21 Adam Thompson wrote:

...

Sounds like a bug in host(1), which has been deprecated for several years now. Recommended solution: switch to "dig +short" instead.

I know host is outdated and maybe obsolete, but I see/saw no mention that it is deprecated (or unsupported). I guess I could try filing a bug for it to see. I use it mostly out of habit, and to save typing +short :-)

The reason this bug piqued my interest is actually not host(1), it is ssh when connecting to one of the "out" boxes from BOX1. Periodically all ssh attempts to the out box will take about 1-2 mins to startup. If I do ssh -vvv I can see it taking about 10s to do the initial name lookup (meaning it too is fetching more than just A records), but worst is the GSSAPI negotiation takes about 30s for each (of 3) attempts.

GSSAPI always fails to all my boxes it seems (maybe because no kerberos??) but the failures happen in a fraction of a second, so I don't care. Google says to disable all GSSAPI in ssh config but it seems to be there by default now, and it doesn't hurt anything in every case except for this buggy one, so my preference is to leave it as-is and fix the DNS issue. (Besides, it's in my nature to solve the root problem and not resort to workarounds.)

So far, it's just host and ssh that seem to exhibit this behavior, but I guessed there would be more. Maybe that's wrong. There might be a way to force ssh to not do other-than-A lookups, and that would be a possible solution to this... I'll investigate some more.

I can't believe there's not more BIND gurus in the club??

Adam Thompson

11:30 a.m.

Well, it's not a bind(8) problem. Nor is it a generic libc problem, by the sounds of it. The GSSAPI thing is a royal PITA - I have to turn it off for significant numbers of hosts in ~/.ssh/config, and that’s in an environment *with* functional forward and reverse IPv6 mappings. (Yes, it's Kerberos-related. Usually. At least nothing else we're ever likely to run into uses GSSAPI.) On the rare occasions that GSSAPI works (ssh as yourself to another machine joined to the same AD domain, for example) it's mildly handy but not exactly a life-changing feature. TBH, it's more surprising/startling than helpful... Also, my mistake - it was nslookup(1) that was deprecated. And apparently then rewritten and un-deprecated in BIND 9.9.0a3 (see https://kb.isc.org/article/AA-00496/0/BIND-9.9.0a3-Release-Notes.htm, #1700). I suppose it could be a libc bug... you'd think it would affect more than just host(1) and sshd(8), though...? Or is that the extent of software that normally does reverse lookups nowadays? In the problematic host(1) call, add "-d", and specify "A" records only using "-t", is the best I can suggest. You can also influence resolver behaviour with /etc/gai.conf and /etc/host.conf - not sure if any of those knobs will solve your exact problem or not. I think promoting the IPv4-compatibility rules in gai.conf *might* be useful to you, not sure. -Adam

...

-----Original Message----- From: Roundtable [mailto:roundtable-bounces@muug.ca] On Behalf Of Trevor Cordes Sent: November 22, 2016 00:03 To: Adam Thompson athompso@athompso.net Cc: Continuation of Round Table discussion roundtable@muug.ca Subject: Re: [RndTbl] help! dns zone delegation wonky for AAAA

On 2016-11-21 Adam Thompson wrote:

...
Sounds like a bug in host(1), which has been deprecated for several years now. Recommended solution: switch to "dig +short" instead.

I know host is outdated and maybe obsolete, but I see/saw no mention that it is deprecated (or unsupported). I guess I could try filing a bug for it to see. I use it mostly out of habit, and to save typing +short :-)

The reason this bug piqued my interest is actually not host(1), it is ssh when connecting to one of the "out" boxes from BOX1. Periodically all ssh attempts to the out box will take about 1-2 mins to startup. If I do ssh -vvv I can see it taking about 10s to do the initial name lookup (meaning it too is fetching more than just A records), but worst is the GSSAPI negotiation takes about 30s for each (of 3) attempts.

GSSAPI always fails to all my boxes it seems (maybe because no kerberos??) but the failures happen in a fraction of a second, so I don't care. Google says to disable all GSSAPI in ssh config but it seems to be there by default now, and it doesn't hurt anything in every case except for this buggy one, so my preference is to leave it as-is and fix the DNS issue. (Besides, it's in my nature to solve the root problem and not resort to workarounds.)

So far, it's just host and ssh that seem to exhibit this behavior, but I guessed there would be more. Maybe that's wrong. There might be a way to force ssh to not do other-than-A lookups, and that would be a possible solution to this... I'll investigate some more.

I can't believe there's not more BIND gurus in the club?? _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable

Trevor Cordes

24 Nov 24 Nov

7:42 a.m.

On 2016-11-22 Adam Thompson wrote:

...

Well, it's not a bind(8) problem. Nor is it a generic libc problem, by the sounds of it. The GSSAPI thing is a royal PITA - I have to turn it off for significant numbers of hosts in ~/.ssh/config, and

OK, I guess I'll disable GSSAPI in my confs too since it seems to have no upside (I never need to connect to a AD), and from what I've read on the net it can cause more problems besides mine.

Wonder why ssh now turns it on by default when it seems so unlikely to be used.

...

#1700). I suppose it could be a libc bug... you'd think it would affect more than just host(1) and sshd(8), though...?

It very well might... I just notice it in those 2 right now. Well, and dig, but dig won't hang like host/ssh does.

...

Or is that the extent of software that normally does reverse lookups nowadays? In

I'm not sure what I'm describing is reverse lookups, is it?

...

the problematic host(1) call, add "-d", and specify "A" records only using "-t", is the best I can suggest.

Yes, aliasing host to host -t A is a good bandaid option.

...

You can also influence resolver behaviour with /etc/gai.conf and /etc/host.conf - not sure

Wow, didn't know about those. They could be handy, esp gai.conf. However, I just played with them both and they won't help here. You can use gai to reorder the results so programs prefer 6 or 4, but they'll still return all the entries (4&6). It appears Fedora still has 4 as preferred(? as gai.conf on Fed is empty!), though lots of Ubuntu chatter about it preferring 6 and how to do it.

Interestingly, a doc I read said that most apps will ignore gai.conf anyhow. I ran a test with atime on and confirmed that nearly every command line net app I could think of ignores gai.conf as the atime never changes. The only way I could get gai.conf read was by doing a manual getaddrinfo() with sample code: import socket print ', '.join(map(lambda x: x[4][0], socket.getaddrinfo('pool.ntp.org', 123, 0, socket.SOCK_DGRAM)))

Perhaps most progs don't use getaddrinfo, and use some other syscall instead.

Even though it didn't help, I'm glad I now know about gai.conf!

Maybe I should now reformulate the crux of my problem as this: Can I configure bind to return for all AAAA requests in the local zone "I'm authoritative but I don't have the answer" instead of SERVFAIL *even if the subzone has been delegated*. Or even specify a delegation for certain records (A & MX) only (not AAAA), though I specifically read somewhere that that's impossible on purpose.

I think the next step is to hit the BIND mailing list to see if they think it's a bug or even an issue that needs thinking about.

Either that or I'm doing an entirely unsupported, insane thing with my BIND having 2 different authoritative NSs each with a different idea of what the zone contents should be (though mostly overlappping). :-)

Trevor Cordes

9:54 a.m.

On 2016-11-24 Trevor Cordes wrote:

...

Maybe I should now reformulate the crux of my problem as this: Can I configure bind to return for all AAAA requests in the local zone "I'm authoritative but I don't have the answer" instead of SERVFAIL *even if the subzone has been delegated*. Or even specify a delegation for certain records (A & MX) only (not AAAA), though I specifically read somewhere that that's impossible on purpose.

Eureka!! The path you set me out on that led to my above reformulation led me to some other avenues of google attack. Two ideas in I found a solution!

First I found this named option: filter-aaaa-on-v4 (and -v6) "It is intended to help the transition from IPv4 to IPv6 by not giving IPv6 addresses to DNS clients unless they have connections to the IPv6 Internet." Super description and chart here: https://kb.isc.org/article/AA-00576/0/Filter-AAAA-option-in-BIND-9-.html

Perfect!! It did indeed filter AAAA when I tested with names like google.com. But it failed for my own problematic sites.

So, I turned on more debugging in named and saw the external NS responses for the subdomain where giving me: lame-servers: info: FORMERR resolving ... then my named would give to me: query failed (SERVFAIL)

So it wasn't the external NS giving me SERVFAIL, but FORMERR... which then turned into a SERVFAIL from my command's point of view.

Some more searching armed with new keywords I found: https://lists.isc.org/pipermail/bind-users/2012-April/087465.html

"The root cause is that the name servers for www.ryanair.com are misconfigured. They are returning answers as if they are configured for ryanair.com (see the SOA record) instead of www.ryanair.com as can be seen below."

Aha! Ding! named was barfing because I had two NS's authoritative for the same domain and one referencing the other. Even though I was only trying to reference the delegated subdomain, named didn't like that arrangement... but not in general, only as it pertained to non-existent records. Weird! (Would have been easier to debug if it didn't work for any records at all!)

Crossing my fingers, I changed the external server to break out the zones (the root zone, and the delegated subzone) into two zone files so now both BOX1 and BOX2 have very similar zone files with regards to the handling of the ".out." subdomain... they both just delegate it to the root zone. It's like I was running the out subdomain on an entirely separate box from both root NS's. Restart, pray, and it works!

host problem gone. sshd problem gone. dig results are the same everywhere I attempt the query. So the problem was a misconfig on my part because of this very convoluted example when trying to delegate on "shared but different" domains. I hope if anyone else ever has this problem this thread can help them solve it more easily.

Finally, I guess the filter-aaaa-on-v4 didn't help here because of the nature of the FORMERR. I guess named was trying to tell me something.

c0l0nelFlagg

21 Nov 21 Nov

4:10 p.m.

New subject: network interruptions need solution

I have home network with dedicated firewall box (smoothwall) that assigns reserved IP addresses based on MAC addresses of any network machine.

I need to find a way to configure things so that when I lose the RED www inet input connection to said firewall it doesn't hang up normal home network use by other workstations.

Right now as long as inet is there all works normally however if I drop the public inet input everything seems to hang up. It seems to be that specifying the firewall box as dhcp server as well as gateway is OK as long as it can have full normal dns access to the outside world but gets bogged down when the outside world dis appears and it has to rely on the gateway as a local caching dns server which does not appear to be working right.

tried to use bind9 at one time but that ground this to a total halt for some reason; anyone have a link to good solution? thanks

Adam Thompson

7:08 p.m.

New subject: network interruptions need solution

1) You've hijacked a thread by replying to an unrelated email. Don't do that. Send a new message instead. More people will see your message that way.

2) You haven't told us what kind of firewall you're using, so the email is equivalent to: "I have a truck with problems. I tried installing some Ford accessories but it made it worse. What car should I get?"

We need specific details about how your firewall is configured before we can hope to help you.

If you aren't familiar with asking for help on public email lists, try reading http://www.catb.org/~esr/faqs/smart-questions.html first.

Good luck, -Adam

On November 21, 2016 10:10:31 AM CST, c0l0nelFlagg mashfiend@gmx.com wrote:

...

I have home network with dedicated firewall box (smoothwall) that assigns reserved IP addresses based on MAC addresses of any network machine.

I need to find a way to configure things so that when I lose the RED www inet input connection to said firewall it doesn't hang up normal home network use by other workstations.

Right now as long as inet is there all works normally however if I drop

the public inet input everything seems to hang up. It seems to be that

specifying the firewall box as dhcp server as well as gateway is OK as long as it can have full normal dns access to the outside world but gets bogged down when the outside world dis appears and it has to rely on the gateway as a local caching dns server which does not appear to be working right.

tried to use bind9 at one time but that ground this to a total halt for

some reason; anyone have a link to good solution? thanks _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

3108

Age (days ago)

3111

Last active (days ago)

roundtable@muug.ca

7 comments

3 participants

tags (0)

participants (3)

Adam Thompson
c0l0nelFlagg
Trevor Cordes