Just upgraded a few boxes to kernel 4.8.8-100.fc23. It seems this latest kernel (or maybe 1 or 2 versions going back 1-2 months) changed something.
I have a script that generates pings (using SOCK_RAW to make its own packets) that I run as root. Worked fine until now. In the new kernel I get "Operation not permitted" on the socket() call. Digging around the net, I found that I need to do:
setcap cap_net_raw+p /foo/myscript
Then when I run it, it works fine.
Ok, great, but *** I'm running the script as root ***!!! Huh? Since when did root need to specify capabilities to run stuff as root? What is this, Windows?
Is there some major paradigm change in the latest kernels where this is a "feature" and not a bug? Just wanted to do a sanity check before I file a bz.
Oh ya, selinux is disabled, so that's not the problem. Lots of chatter on the net about this problem but everyone talking about it is talking about the non-root use case. It would appear my issue is something brand new.
Aside: As for the script, I'm doing really wacky stuff on purpose, and I really needed direct control over the packet, so I can't just abandon SOCK_RAW.
Hold that thought... it looks like there's something majorly amiss with the new kernel. My first test after setcap succeeded, but my subsequent tests all fail with the original error. Even if I wipe the cap and reset it, and/or reboot, I just get the original error now.
I have proof in my scrollback buffer, however, that this did work once, so I know I'm not nuts... Very strange. I'll come up with a small test script and see if someone else can reproduce this.
Yikes, even weirder than I thought. It's not a cap issue. I put some debugs in it and it's getting around 30-220 pings out the door before it randomly fails on one of them with the "Operation not permitted" error.
If I flush and reload my iptables then it sometimes gets through all 254 pings I'm doing (it's my own subnet, I'm allowed to scan!). After that first scan, the count it allows seems to reduce and then fluctuate.
I checked any iptables rules that could apply and for kicks I changed everywhere I did icmp rate limiting on type 8 (echo) to unlimited (just -j ACCEPT). I also cleared the iptables counters, ran my test, and checked what rules were incrementing and nothing was standing out, and the ones that did increment I got rid of and the problem remains.
Sure feels like an iptable rate-limiting thing, but I just can't spot the problem after extensive checking of my rules and their counters. Besides, I do almost no iptables on OUTPUT table, and default on OUTPUT is ACCEPT.
It's like some other rate limiter has been put into the kernel.
If I add to the script before the send() call a sleep(.002) (hires sleep enabled in perl) then the bug doesn't occur! Works every time! If I set it to sleep(.001) then the bug starts to occur again about half the time. So something got introduced in the newest kernel that is limiting me to around 500s send() calls.
I'll see if I can get an internet-unplugged, iptables completely off test in soon to completely eliminate iptables: tough because I am accessing this (offsite) test box via ssh at the moment.
Anyone aware of anything else/new that could cause this? Anything else in the kernel that rate limits stuff like this? Other ideas to test to rule out? Like I said, this bug does not occur in 4.7.10-100.fc23, just 4.8.8-100.fc23. I hope I don't have to git bisect to solve this one!
Thanks! (Sorry for the copious emails!)
I'm running 4.8.{8,9,10} kernel on a couple systems, however its on Debian, and its not stock -- I've compiled packages for my infrastructure based on the coldkernel patchset we maintain. https://github.com/coldhakca/coldkernel
I'd be willing to test out something if need be.
Theodore Baschak - AS395089 - Hextet Systems https://ciscodude.net/ - https://hextet.systems/ http://mbix.ca/
On Tue, Nov 29, 2016 at 4:42 AM, Trevor Cordes trevor@tecnopolis.ca wrote:
Hold that thought... it looks like there's something majorly amiss with the new kernel. My first test after setcap succeeded, but my subsequent tests all fail with the original error. Even if I wipe the cap and reset it, and/or reboot, I just get the original error now.
I have proof in my scrollback buffer, however, that this did work once, so I know I'm not nuts... Very strange. I'll come up with a small test script and see if someone else can reproduce this. _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
On 2016-11-29 Theodore Baschak wrote:
I'm running 4.8.{8,9,10} kernel on a couple systems, however its on Debian, and its not stock -- I've compiled packages for my infrastructure based on the coldkernel patchset we maintain. https://github.com/coldhakca/coldkernel
I'd be willing to test out something if need be.
Thanks a ton! I'm attaching as simplified a test prog I made that shows the bug. Sorry it's such a mess, I just C&P as little code as I could to trigger the bug. (My code is heavily based on a sample from perl monks, so credit to where it's due.) The code simply creates 253 icmp echo packets and sends them out to the LAN as fast as it can. The sample ignores the responses, as they aren't required to repro the bug.
Change the $subnet at the top to be any of your local LAN /24 subnets. I guess you could test a /16, might work as-is. Have no idea about ipv6.
On 4.8.8 or newer, as it is it should die with error most runs (but not all!). (I've confirmed on 4.8.8 and 4.8.10 now.)
CURIOUS!!!: If you uncomment the $single= at the top and put in any single IP on your subnet, the bug disappears!! So the bug only hits when you are scanning a large number of IPs and not a single IP! Even though in both cases it's sending the same number of icmp packets out! BIZARRE! This might rule out iptables, because AFAIK there's no rule to match "variability of hosts".
I confirmed this bug does not exist in 4.7.10 (on the same box, all else equal).
I found a bunch of icmp and net tweaks in sysfs that possibly could relate, and tweaked all of them to (near-)unlimited, but it didn't help at all. I checked and their defaults were the same as they are on 4.7.10.
Strange, my test is pretty much like: nmap -sP 192.168.101.0/24 Yet nmap runs perfectly fine. Unless it catches these errors and retries/ratelimits?
It's like something new in the kernel is trying to ping flood host scans? I'm still digging around in changelogs trying to figure it out.
If you (or anyone with 4.8.8+) can confirm the bug hits with $single off, and doesn't hit with $single on, that would be great! Also, letting me know your iptables setup would help as I still haven't ruled that out.
Thanks a ton!
No solution yet, but I was able to reproduce the bug with nmap, yay!!!
#nmap -PE 192.168.101.0/24 Starting Nmap 7.12 ( https://nmap.org ) at 2016-12-01 02:50 CST sendto in send_ip_packet_sd: sendto(5, packet, 44, 0, 192.168.101.102,16) => Operation not permitted Offending packet: TCP 192.168.101.1:57520 > 192.168.101.102:21 S ttl=51 id=5430 iplen=44 seq=879361804 win=1024 <mss 1460>
Looks like -sP does more than just ping and it's not fast enough to trigger the bug. -PE is what I need to reproduce the conditions of my script.
My two F23 boxes are only at 4.4.9 and 4.7.9. No problems with your script and nmap on them.
I'll see if I have time to do an update tonight and test with the newest kernel.
-- Wyatt Zacharias
On Thu, Dec 1, 2016 at 2:53 AM, Trevor Cordes trevor@tecnopolis.ca wrote:
No solution yet, but I was able to reproduce the bug with nmap, yay!!!
#nmap -PE 192.168.101.0/24 Starting Nmap 7.12 ( https://nmap.org ) at 2016-12-01 02:50 CST sendto in send_ip_packet_sd: sendto(5, packet, 44, 0, 192.168.101.102,16) => Operation not permitted Offending packet: TCP 192.168.101.1:57520 > 192.168.101.102:21 S ttl=51 id=5430 iplen=44 seq=879361804 win=1024 <mss 1460>
Looks like -sP does more than just ping and it's not fast enough to trigger the bug. -PE is what I need to reproduce the conditions of my script. _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
On Dec 1, 2016, at 2:01 AM, Trevor Cordes trevor@tecnopolis.ca wrote:
On 2016-11-29 Theodore Baschak wrote:
I'm running 4.8.{8,9,10} kernel on a couple systems, however its on Debian, and its not stock -- I've compiled packages for my infrastructure based on the coldkernel patchset we maintain. https://github.com/coldhakca/coldkernel
I'd be willing to test out something if need be.
Thanks a ton! I'm attaching as simplified a test prog I made that shows the bug. Sorry it's such a mess, I just C&P as little code as I could to trigger the bug. (My code is heavily based on a sample from perl monks, so credit to where it's due.) The code simply creates 253 icmp echo packets and sends them out to the LAN as fast as it can. The sample ignores the responses, as they aren't required to repro the bug.
Change the $subnet at the top to be any of your local LAN /24 subnets. I guess you could test a /16, might work as-is. Have no idea about ipv6.
On 4.8.8 or newer, as it is it should die with error most runs (but not all!). (I've confirmed on 4.8.8 and 4.8.10 now.)
CURIOUS!!!: If you uncomment the $single= at the top and put in any single IP on your subnet, the bug disappears!! So the bug only hits when you are scanning a large number of IPs and not a single IP! Even though in both cases it's sending the same number of icmp packets out! BIZARRE! This might rule out iptables, because AFAIK there's no rule to match "variability of hosts".
I confirmed this bug does not exist in 4.7.10 (on the same box, all else equal).
I found a bunch of icmp and net tweaks in sysfs that possibly could relate, and tweaked all of them to (near-)unlimited, but it didn't help at all. I checked and their defaults were the same as they are on 4.7.10.
Strange, my test is pretty much like: nmap -sP 192.168.101.0/24 Yet nmap runs perfectly fine. Unless it catches these errors and retries/ratelimits?
It's like something new in the kernel is trying to ping flood host scans? I'm still digging around in changelogs trying to figure it out.
If you (or anyone with 4.8.8+) can confirm the bug hits with $single off, and doesn't hit with $single on, that would be great! Also, letting me know your iptables setup would help as I still haven't ruled that out.
Thanks a ton!
<ping-test>
Just ran this on a physical system at home with the following kernel: Linux hypnotoad 4.8.10-coldkernel-grsec-1 #1 SMP Tue Nov 22 19:05:17 CST 2016 x86_64 GNU/Linux
I'm not running any iptables rules on this system at all, and I was able to run the test on a sample /24 without error. Then I modified the source to ping my entire internal /19, with the same result.
No errors on my end tho :-(
Similarly, with the nmap -PE command on a /24 or even a whole /19 I didn't get any send errors.
Theodore Baschak - AS395089 - Hextet Systems https://ciscodude.net/ - https://hextet.systems/ http://mbix.ca/
On 2016-12-01 Theodore Baschak wrote:
Just ran this on a physical system at home with the following kernel: Linux hypnotoad 4.8.10-coldkernel-grsec-1 #1 SMP Tue Nov 22 19:05:17 CST 2016 x86_64 GNU/Linux
I'm not running any iptables rules on this system at all, and I was able to run the test on a sample /24 without error. Then I modified the source to ping my entire internal /19, with the same result.
Weird! I'm puzzled. Must be something in my config (or iptables), Fedora's patches, or Fedora's kernel tune default choices.
Can you send me the output of: tail -c+1 `find /proc /sys -type f | grep icmp | grep -v '/proc/[0-9]'`
tail -c+1 `find /proc | grep -P 'net.*(limit|interv|max|conntrack)'\ |grep -vP '/proc/[0-9]|hop_lim|igmp|mldv|router|icmp|ip6frag|ipv6'`
(you can send offlist as it might be long)
That will let me see the tuning choices of your kernel. Thanks!
On 2016-12-01 Wyatt Zacharias wrote:
My two F23 boxes are only at 4.4.9 and 4.7.9. No problems with your script and nmap on them.
I'll see if I have time to do an update tonight and test with the newest kernel.
Thanks Wyatt! I finally found one other hit on the net of a guy having the same problem, from just a couple days ago. He's on Ubuntu. He says the problem wasn't in 4.4 but was in 4.8. If you can reproduce it after kernel update to 4.8 then it looks like the change was between 4.7 and 4.8, I'll await your results.
(Boy, I hope it doesn't turn out to be some stupid iptables thing on my end!)
Thanks guys!
Updated my desktop to 4.8.10 (4.8.10-100.fc23.x86_64) last night. Trying your perl script and the nmap command, both still work without error.
I do have iptables running on that box, but I don't have any rate limiting rules of any kind.
I'll send you the kernel tuning parameters off list.
-- Wyatt Zacharias
On Thu, Dec 1, 2016 at 4:57 PM, Trevor Cordes trevor@tecnopolis.ca wrote:
On 2016-12-01 Theodore Baschak wrote:
Just ran this on a physical system at home with the following kernel: Linux hypnotoad 4.8.10-coldkernel-grsec-1 #1 SMP Tue Nov 22 19:05:17 CST 2016 x86_64 GNU/Linux
I'm not running any iptables rules on this system at all, and I was able to run the test on a sample /24 without error. Then I modified the source to ping my entire internal /19, with the same result.
Weird! I'm puzzled. Must be something in my config (or iptables), Fedora's patches, or Fedora's kernel tune default choices.
Can you send me the output of: tail -c+1 `find /proc /sys -type f | grep icmp | grep -v '/proc/[0-9]'`
tail -c+1 `find /proc | grep -P 'net.*(limit|interv|max|conntrack)'\ |grep -vP '/proc/[0-9]|hop_lim|igmp|mldv|router|icmp|ip6frag|ipv6'`
(you can send offlist as it might be long)
That will let me see the tuning choices of your kernel. Thanks!
On 2016-12-01 Wyatt Zacharias wrote:
My two F23 boxes are only at 4.4.9 and 4.7.9. No problems with your script and nmap on them.
I'll see if I have time to do an update tonight and test with the newest kernel.
Thanks Wyatt! I finally found one other hit on the net of a guy having the same problem, from just a couple days ago. He's on Ubuntu. He says the problem wasn't in 4.4 but was in 4.8. If you can reproduce it after kernel update to 4.8 then it looks like the change was between 4.7 and 4.8, I'll await your results.
(Boy, I hope it doesn't turn out to be some stupid iptables thing on my end!)
Thanks guys! _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
Followup to my posts last month re: bursty packet being dropped with "operation not permitted".
LKML guys finally got around to fixing and (soon) committing to stable tree. Their fix for 4.8 was to just revert the two patches that I had bisected down to. However, for 4.9 they say they have done a "real" fix rather than a simple revert. That won't help me with F24 though until F24 rebases to 4.9 (which could be whenever).
Relevant LMKL patch-for-review here: https://lkml.org/lkml/2017/1/4/936 https://lkml.org/lkml/2017/1/4/946
Funny, but to work on this bisection and LKML submission I followed my MUUG presentation notes from 2015. (Slides still available! https://muug.ca/meetings/150609-git-bisect.pdf and I'm currently working on another bisect on another box for another bug! Isn't bleeding edge fun!)
In the end, I hope every MUUGer obtains the knowledge to bisect and get fixed any kernel bug they are running into. The whole process for this one took about a month, with maybe another few weeks until the fix is in mainline for F24 and I can simply dnf to update to it. If it was urgent, I could easily whip up my own rpm with the patches in it for use in the meantime.
Extra special thanks to Theo & Wyatt for helping to pin down and confirm this bug!
Happy New Year Trevor! Your links to LMKL relevant patches are saying 'bad gateway' 'website is offline' Is this to say you contribute to the linux kernel? or to Fedora? Looks like a lot of work but rewarding Frank
From: Trevor Cordes trevor@tecnopolis.ca To: Continuation of Round Table discussion roundtable@muug.ca Sent: Thursday, January 5, 2017 12:34 AM Subject: Re: [RndTbl] latest kernel rate limits icmp to different hosts? (** nmap shows bug! **) SOLVED + FIXED
Followup to my posts last month re: bursty packet being dropped with "operation not permitted".
LKML guys finally got around to fixing and (soon) committing to stable tree. Their fix for 4.8 was to just revert the two patches that I had bisected down to. However, for 4.9 they say they have done a "real" fix rather than a simple revert. That won't help me with F24 though until F24 rebases to 4.9 (which could be whenever).
Relevant LMKL patch-for-review here: https://lkml.org/lkml/2017/1/4/936 https://lkml.org/lkml/2017/1/4/946
Funny, but to work on this bisection and LKML submission I followed my MUUG presentation notes from 2015. (Slides still available! https://muug.ca/meetings/150609-git-bisect.pdf and I'm currently working on another bisect on another box for another bug! Isn't bleeding edge fun!)
In the end, I hope every MUUGer obtains the knowledge to bisect and get fixed any kernel bug they are running into. The whole process for this one took about a month, with maybe another few weeks until the fix is in mainline for F24 and I can simply dnf to update to it. If it was urgent, I could easily whip up my own rpm with the patches in it for use in the meantime.
Extra special thanks to Theo & Wyatt for helping to pin down and confirm this bug! _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
On 2017-01-05 Frank H wrote:
Happy New Year Trevor! Your links to LMKL relevant patches are saying 'bad gateway' 'website
Hmm, must have been a transient outage... it works for me now. If it still doesn't work for you, maybe something blocking your port 443 access? Can you reach https://muug.ca?
is offline' Is this to say you contribute to the linux kernel? or to Fedora? Looks like a lot of work but rewarding
Hmm, I guess I do "contribute" to the kernel in a matter of speaking :-) I'm by no means a kernel hacker, nor do I submit patches of my own. I guess I'm now a "kernel debugger and tester" :-) (I do have my name/email in the git tree now!)
Same with Fedora: I've very active on Fedora bugzilla. Basically when I hit a bug that impacts me or all the servers I admin, I do all the work to try to isolate the bug and help the actual code authors come to a speedy fix. I find you get better responsiveness (and faster errata released) that way. Instead of saying a nebulous "it crashes sometimes", you say "it crashes but only after XYZABC code commit".
Relevant LMKL patch-for-review here: https://lkml.org/lkml/2017/1/4/936 https://lkml.org/lkml/2017/1/4/946