Re: [RndTbl] very strange DNS errors

21 Apr 2016


      On 2016-04-20 John Lange wrote:
...
I don't think what you are seeing is normal and to me it's all
hinting at something local. I feel that there is something common to
all your setups causing this. I don't think it's the upstream DNS
providers.
Actually, you are probably correct in that this now seems to be a
local BIND + upstream DNS problem.  I guess I could try to setup
dnsmasq, courtesy or MUUG's recent daemon-dash presentation,
temporarily to see how that fares.  I'm not sure what I'll find...
...
One thing that pops to mind is UDP packet fragmentation. Perhaps
there is something in the network setup or filtering (iptables) which
is causing UDP packets to fragment but is dropping the second part of
the fragment? This is a surprisingly common problem on a lot of
I have this:
$iptables -N fragments
$iptables -A fragments -m limit --limit 10/minute -j LOG  --log-prefix
$p"99d fragments: " 
$iptables -A fragments -j DROP
$iptables -A aad_first -f -j fragments
That runs very early in my rules to ditch all frags, but I just checked
both the /v/l/messages where these are logged, and iptables -L -v |grep
fragments and both show zero hits, nada, on all boxes I am testing on,
even immediately after these SERVFAIL tests.
So that can't be it.  (In general I have a (rate-limited) LOG before
nearly every DROP in my iptables, so I should see /v/l/messages coming
across if iptables was throwing things away during these tests.  And I
just confirmed that I am not hitting any of the drops that aren't
logged.)
I thought about kernel-level (apart from iptables) frag dropping but I
see nothing user settable (thought there might be an "on/off" switch
like for /proc/sys/net/ipv4/conf/all/rp_filter.  It appears to be
something you only play with in iptables, not the kernel's sysfs.
...
firewalls, for example Sonicwall. Perhaps force dig to use TCP to see
if the results are different (dig +tcp <host>).
Good idea.  Curiouser and curiouser... I get 1-2 look failures on
almost every single test when I use +tcp +short.  That's worse than the
previous tests (0-1 failures).  That really does start to limit the
problems!
...
That's just one possibility among many. Swap out one of the machines
for a totally different system (Windows laptop maybe?) and repeat the
Windows won't help because it isn't running a local recursive resolver
(well, I guess I could try Windows Server but that is beyond the
scope...).  It is a good idea though to try to replace what I can,
perhaps a different distro or a BSD, or a different resolver.
The fact that +trace has yet to have any error at all means that it may
be possible to make a resolver that won't fail in this way on these
domains.  That's why I think it might be something specific to BIND,
perhaps.  I doubt this happens if you set your resolv.conf to 8.8.8.8
because I bet the "big guys" are doing something more robust than BIND.
For kicks I just tried adding +tries=X to the dig commands, first =5
then =10 then =100 and the failure rate appears to stay pretty constant.
Strangely, the time the commands take doesn't really change(?!).
...
Also, look into state and connection tracking in your iptables rules.
I'm using pretty stock idioms:
$iptables -A inextern -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
very early in the ruleset.  Also, my accept everything SPT=53 (u and t)
temporarily rules should have caught any weird-state packets.  Not
ruling it out completely, as this is very complex stuff, but it's been
ages (10+ years) since I've had a conntrack bug.
I'll keep hunting...
Thanks for all the tips y'all; keep them coming and I'll try 'em!

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

Re: [RndTbl] very strange DNS errors