I ran dig +short gymcan.org a whole pile of times and it never failed for me. I also ran it directly against the authoritative name server (dig @ns06.domaincontrol.com. gymcan.org) with the same result (no failures).

Also, I monitored it with tcpdump and the packet size is not larger than 72 bytes so fragmentation is unlikely.

I'm still suspicious of the iptables setup. I'd try stopping the firewall entirely (set them all to -ACCEPT and flush the rules) and run the tests again just to fully rule that out.

I think the thing you need to solve is why are you dropping packets? That isn't normal and since it's spread across multiple servers on different providers, it's most likely your config.

John


On Thu, Apr 21, 2016 at 12:55 AM, Trevor Cordes <trevor@tecnopolis.ca> wrote:
I turned on extreme debug logging on BIND named and triggered a
SERVFAIL and here's what it shows:

21-Apr-2016 00:44:55.592 client: debug 3: client 127.0.0.1#42594: UDP request
21-Apr-2016 00:44:55.592 client: debug 5: client 127.0.0.1#42594: using view '_default'
21-Apr-2016 00:44:55.592 security: debug 3: client 127.0.0.1#42594: request is not signed
21-Apr-2016 00:44:55.592 security: debug 3: client 127.0.0.1#42594: recursion available
21-Apr-2016 00:44:55.592 client: debug 3: client 127.0.0.1#42594: query
21-Apr-2016 00:44:55.592 client: debug 10: client 127.0.0.1#42594 (gymcan.org): ns_client_attach: ref = 1
21-Apr-2016 00:44:55.592 security: debug 3: client 127.0.0.1#42594 (gymcan.org): query (cache) 'gymcan.org/A/IN' approved
21-Apr-2016 00:44:55.592 client: debug 3: client 127.0.0.1#42594 (gymcan.org): replace
21-Apr-2016 00:44:55.592 client: debug 3: client @0x7f438001c6a0: udprecv
21-Apr-2016 00:44:56.224 query-errors: debug 1: client 127.0.0.1#42594 (gymcan.org): query failed (SERVFAIL) for gymcan.org/IN/A at query.c:7769
21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 (gymcan.org): error
21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 (gymcan.org): send
21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 (gymcan.org): sendto
21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 (gymcan.org): senddone
21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 (gymcan.org): next
21-Apr-2016 00:44:56.225 client: debug 10: client 127.0.0.1#42594 (gymcan.org): ns_client_detach: ref = 0
21-Apr-2016 00:44:56.225 client: debug 3: client 127.0.0.1#42594 (gymcan.org): endrequest
21-Apr-2016 00:44:56.225 query-errors: debug 2: fetch completed at resolver.c:3658 for gymcan.org/A in 0.632030: SERVFAIL/success [domain:gymcan.org,referral:2,restart:4,qrysent:2,timeout:0,lame:0,neterr:2,badresp:0,adberr:0,findfail:0,valfail:0]

Too bad they don't show even more info, but we can still wireshark the details.

So the error seems to be a "neterr", which bind docs say:
The number of erroneous results that the resolver encountered in
sending queries at the domain zone. One common case is the remote
server is unreachable and the resolver receives an ICMP unreachable
error message.

But I confirmed no ICMP unreachable came in.  "One common case"...
I wonder what the other cases are!

Aside: I wiresharked making sure to capture ICMP as well and no ICMP
came across during the SERVAIL, so that also helps to exclude
fragmentation issues as they should trigger a ICMP can't-fragment
packet.
_______________________________________________
Roundtable mailing list
Roundtable@muug.mb.ca
http://www.muug.mb.ca/mailman/listinfo/roundtable



--
John Lange
www.johnlange.ca