On 2016-04-20 John Lange wrote:
I don't think what you are seeing is normal and to me it's all hinting at something local. I feel that there is something common to all your setups causing this. I don't think it's the upstream DNS providers.
Actually, you are probably correct in that this now seems to be a local BIND + upstream DNS problem. I guess I could try to setup dnsmasq, courtesy or MUUG's recent daemon-dash presentation, temporarily to see how that fares. I'm not sure what I'll find...
One thing that pops to mind is UDP packet fragmentation. Perhaps there is something in the network setup or filtering (iptables) which is causing UDP packets to fragment but is dropping the second part of the fragment? This is a surprisingly common problem on a lot of
I have this: $iptables -N fragments $iptables -A fragments -m limit --limit 10/minute -j LOG --log-prefix $p"99d fragments: " $iptables -A fragments -j DROP $iptables -A aad_first -f -j fragments
That runs very early in my rules to ditch all frags, but I just checked both the /v/l/messages where these are logged, and iptables -L -v |grep fragments and both show zero hits, nada, on all boxes I am testing on, even immediately after these SERVFAIL tests.
So that can't be it. (In general I have a (rate-limited) LOG before nearly every DROP in my iptables, so I should see /v/l/messages coming across if iptables was throwing things away during these tests. And I just confirmed that I am not hitting any of the drops that aren't logged.)
I thought about kernel-level (apart from iptables) frag dropping but I see nothing user settable (thought there might be an "on/off" switch like for /proc/sys/net/ipv4/conf/all/rp_filter. It appears to be something you only play with in iptables, not the kernel's sysfs.
firewalls, for example Sonicwall. Perhaps force dig to use TCP to see if the results are different (dig +tcp <host>).
Good idea. Curiouser and curiouser... I get 1-2 look failures on almost every single test when I use +tcp +short. That's worse than the previous tests (0-1 failures). That really does start to limit the problems!
That's just one possibility among many. Swap out one of the machines for a totally different system (Windows laptop maybe?) and repeat the
Windows won't help because it isn't running a local recursive resolver (well, I guess I could try Windows Server but that is beyond the scope...). It is a good idea though to try to replace what I can, perhaps a different distro or a BSD, or a different resolver.
The fact that +trace has yet to have any error at all means that it may be possible to make a resolver that won't fail in this way on these domains. That's why I think it might be something specific to BIND, perhaps. I doubt this happens if you set your resolv.conf to 8.8.8.8 because I bet the "big guys" are doing something more robust than BIND.
For kicks I just tried adding +tries=X to the dig commands, first =5 then =10 then =100 and the failure rate appears to stay pretty constant. Strangely, the time the commands take doesn't really change(?!).
Also, look into state and connection tracking in your iptables rules.
I'm using pretty stock idioms: $iptables -A inextern -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
very early in the ruleset. Also, my accept everything SPT=53 (u and t) temporarily rules should have caught any weird-state packets. Not ruling it out completely, as this is very complex stuff, but it's been ages (10+ years) since I've had a conntrack bug.
I'll keep hunting...
Thanks for all the tips y'all; keep them coming and I'll try 'em!