I ran dig +short gymcan.org a whole pile of times and it never failed for me. I also ran it directly against the authoritative name server (dig @ ns06.domaincontrol.com. gymcan.org) with the same result (no failures).
Also, I monitored it with tcpdump and the packet size is not larger than 72 bytes so fragmentation is unlikely.
I'm still suspicious of the iptables setup. I'd try stopping the firewall entirely (set them all to -ACCEPT and flush the rules) and run the tests again just to fully rule that out.
I think the thing you need to solve is why are you dropping packets? That isn't normal and since it's spread across multiple servers on different providers, it's most likely your config.
John
On Thu, Apr 21, 2016 at 12:55 AM, Trevor Cordes trevor@tecnopolis.ca wrote:
I turned on extreme debug logging on BIND named and triggered a SERVFAIL and here's what it shows:
21-Apr-2016 00:44:55.592 client: debug 3: client 127.0.0.1#42594: UDP request 21-Apr-2016 00:44:55.592 client: debug 5: client 127.0.0.1#42594: using view '_default' 21-Apr-2016 00:44:55.592 security: debug 3: client 127.0.0.1#42594: request is not signed 21-Apr-2016 00:44:55.592 security: debug 3: client 127.0.0.1#42594: recursion available 21-Apr-2016 00:44:55.592 client: debug 3: client 127.0.0.1#42594: query 21-Apr-2016 00:44:55.592 client: debug 10: client 127.0.0.1#42594 ( gymcan.org): ns_client_attach: ref = 1 21-Apr-2016 00:44:55.592 security: debug 3: client 127.0.0.1#42594 ( gymcan.org): query (cache) 'gymcan.org/A/IN' approved 21-Apr-2016 00:44:55.592 client: debug 3: client 127.0.0.1#42594 ( gymcan.org): replace 21-Apr-2016 00:44:55.592 client: debug 3: client @0x7f438001c6a0: udprecv 21-Apr-2016 00:44:56.224 query-errors: debug 1: client 127.0.0.1#42594 ( gymcan.org): query failed (SERVFAIL) for gymcan.org/IN/A at query.c:7769 21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 ( gymcan.org): error 21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 ( gymcan.org): send 21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 ( gymcan.org): sendto 21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 ( gymcan.org): senddone 21-Apr-2016 00:44:56.224 client: debug 3: client 127.0.0.1#42594 ( gymcan.org): next 21-Apr-2016 00:44:56.225 client: debug 10: client 127.0.0.1#42594 ( gymcan.org): ns_client_detach: ref = 0 21-Apr-2016 00:44:56.225 client: debug 3: client 127.0.0.1#42594 ( gymcan.org): endrequest 21-Apr-2016 00:44:56.225 query-errors: debug 2: fetch completed at resolver.c:3658 for gymcan.org/A in 0.632030: SERVFAIL/success [domain: gymcan.org ,referral:2,restart:4,qrysent:2,timeout:0,lame:0,neterr:2,badresp:0,adberr:0,findfail:0,valfail:0]
Too bad they don't show even more info, but we can still wireshark the details.
So the error seems to be a "neterr", which bind docs say: The number of erroneous results that the resolver encountered in sending queries at the domain zone. One common case is the remote server is unreachable and the resolver receives an ICMP unreachable error message.
But I confirmed no ICMP unreachable came in. "One common case"... I wonder what the other cases are!
Aside: I wiresharked making sure to capture ICMP as well and no ICMP came across during the SERVAIL, so that also helps to exclude fragmentation issues as they should trigger a ICMP can't-fragment packet. _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable