This was a lovely "bug" to run into during a server migration tonight.
as root: grep -r / foobar
cause a new system to crash/reboot. Did it again and it did it exactly at the same place.
The last line was about not being able to read (or some similar error) /dev/watchdog
Rackspace thought it was a bug too, and quite odd, then they looked around and decided this is "normal" when watchdogs are turned on. Huh? I haven't bz-checked this yet, but this smells funny to me. A read of a file should never *trigger* an action, right? It's like a /sys file: you echo > to them to get it to do something, never *read*.
Any thoughts? If I don't find a bz about it, I'm definitely going to make one. (RHEL6)
RS claims the only "fix" is to disable watchdog or make sure not to grep -r /.
I know grep -r / is really cheesy and lame-o and wasteful as there's not going to be something in /dev I'll need (and reading /dem/kmem is really cheesy) but sometimes I really need to find all occurences of something fast with the least hassle possible (not going into / and ls'ing and deciding what to include/exclude which is different on many systems). My point is, grep -r / has never given me trouble in the past 20 years, but then again I usually don't have a watchdog.
On 04/05/2015 6:23 AM, Trevor Cordes wrote:
This was a lovely "bug" to run into during a server migration tonight.
as root: grep -r / foobar
cause a new system to crash/reboot. Did it again and it did it exactly at the same place.
The last line was about not being able to read (or some similar error) /dev/watchdog
Rackspace thought it was a bug too, and quite odd, then they looked around and decided this is "normal" when watchdogs are turned on. Huh? I haven't bz-checked this yet, but this smells funny to me. A read of a file should never *trigger* an action, right? It's like a /sys file: you echo > to them to get it to do something, never *read*.
A read on a file in /dev will almost always trigger an action (usually causing data to be read from a device, of course), but I can't imagine any normal scenario where that action should be a crash or reboot.
Any thoughts? If I don't find a bz about it, I'm definitely going to make one. (RHEL6)
This does sound like a bug to me.
RS claims the only "fix" is to disable watchdog or make sure not to grep -r /.
I know grep -r / is really cheesy and lame-o and wasteful as there's not going to be something in /dev I'll need (and reading /dem/kmem is really cheesy) but sometimes I really need to find all occurences of something fast with the least hassle possible (not going into / and ls'ing and deciding what to include/exclude which is different on many systems). My point is, grep -r / has never given me trouble in the past 20 years, but then again I usually don't have a watchdog.
I'd avoid grep -r from the root directory (especially as user root). The grep command is simplistic in its directory traversal, so (as I've seen in the past with it and other commands like "diff -r") you can end up in symlink-induced loops. I'd also avoid going through /dev (even just for reads), as this can cause hanging at the very least, and possibly other more serious unintended consequences. I have, in the past, done recursive traversals through /etc, but have encountered symlink loops that way.
On 2015-05-04 Gilbert E. Detillieux wrote:
A read on a file in /dev will almost always trigger an action (usually causing data to be read from a device, of course), but I
Hmm, that's a good point. I suppose a read on a device could "steal" the input another program was expecting.
It's strange though, because now that I think about it, when I run grep -r /, I'm pretty sure it's not reading all the disk block files, etc, as it runs way too fast for that...
Ah, I forgot I had this set as an alias to grep on every system I use: /bin/grep -s --devices=skip
I just checked and this new Rackspace box indeed had my alias set! So the plot thickens! So it's two possibly bugs: why was grep even reading that watchdog device in the first place; and why did reading it trigger a reboot?
can end up in symlink-induced loops. I'd also avoid going
I've never actually ended up in a symlink loop, but I can see how it would be easy to run into. I guess I don't use many dir symlinks on my systems.
Or... I just RTFM and it looks like gnu grep solved the symlink problem already:
-r, --recursive Read all files under each directory, recursively, following symbolic links only if they are on the command line. Note that if no file operand is given, grep searches the working directory. This is equivalent to the -d recurse option.
-R, --dereference-recursive Read all files under each directory, recursively. Follow all symbolic links, unlike -r.
So that explains why I've never hit a symlink loop: I've never used -R (capital).
Ah, even better: -D ACTION, --devices=ACTION If an input file is a device, FIFO or socket, use ACTION to process it. By default, ACTION is read, which means that devices are read just as if they were ordinary files. If ACTION is skip, devices are silently skipped.
So that means my -D skips every "weird" file. That means that I've definitely, for sure, hit some bugs here. There's absolutely nothing now that should stop anyone from safely doing a:
grep -r -D skip /
on any system with modern gnu grep.
The main reason I want to be able to do grep -r / sometimes is that I need to find a string (like an IP address) that may be scattered literally anywhere on the fs. In the past when I say to myself "oh, it must only be in /etc or /var or /home", there ends up being some little file in /usr (not even /usr/local) or wherever that was changed ages ago for whatever reason to work around whatever problem, and it gets missed unless I just "grep the whole darn thing". And to ls / and go through the 10-15 items in my brain deciding if they are greppable or "system/non-greppable" takes time and is error/omission-prone, and to list each one on the command line is a pain. Now this would be easier with a shell like zsh that has "negative globs" where I could specify "* except /dev /sys" (ie just make an "exclude" list, not an include list), but my beloved tcsh doesn't have that and I'm not ready to switch to zsh yet. And like I said before, each UNIX has its own different set of "don't grep this" dirs, and they change over time (even in Linux), so making a "one-alias fits all" is a non-solution.
Does your own system have /dev/watchdog? Mine does, but I'm in the middle of a bunch of stuff so I'm not going to go poking at it.
Also, did you read this: https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt
-- Wyatt Zacharias
On Mon, May 4, 2015 at 10:51 PM, Trevor Cordes trevor@tecnopolis.ca wrote:
On 2015-05-04 Gilbert E. Detillieux wrote:
A read on a file in /dev will almost always trigger an action (usually causing data to be read from a device, of course), but I
Hmm, that's a good point. I suppose a read on a device could "steal" the input another program was expecting.
It's strange though, because now that I think about it, when I run grep -r /, I'm pretty sure it's not reading all the disk block files, etc, as it runs way too fast for that...
Ah, I forgot I had this set as an alias to grep on every system I use: /bin/grep -s --devices=skip
I just checked and this new Rackspace box indeed had my alias set! So the plot thickens! So it's two possibly bugs: why was grep even reading that watchdog device in the first place; and why did reading it trigger a reboot?
can end up in symlink-induced loops. I'd also avoid going
I've never actually ended up in a symlink loop, but I can see how it would be easy to run into. I guess I don't use many dir symlinks on my systems.
Or... I just RTFM and it looks like gnu grep solved the symlink problem already:
-r, --recursive Read all files under each directory, recursively,
following symbolic links only if they are on the command line. Note that if no file operand is given, grep searches the working directory. This is equivalent to the -d recurse option.
-R, --dereference-recursive Read all files under each directory, recursively. Follow all symbolic links, unlike -r.
So that explains why I've never hit a symlink loop: I've never used -R (capital).
Ah, even better: -D ACTION, --devices=ACTION If an input file is a device, FIFO or socket, use ACTION to process it. By default, ACTION is read, which means that devices are read just as if they were ordinary files. If ACTION is skip, devices are silently skipped.
So that means my -D skips every "weird" file. That means that I've definitely, for sure, hit some bugs here. There's absolutely nothing now that should stop anyone from safely doing a:
grep -r -D skip /
on any system with modern gnu grep.
The main reason I want to be able to do grep -r / sometimes is that I need to find a string (like an IP address) that may be scattered literally anywhere on the fs. In the past when I say to myself "oh, it must only be in /etc or /var or /home", there ends up being some little file in /usr (not even /usr/local) or wherever that was changed ages ago for whatever reason to work around whatever problem, and it gets missed unless I just "grep the whole darn thing". And to ls / and go through the 10-15 items in my brain deciding if they are greppable or "system/non-greppable" takes time and is error/omission-prone, and to list each one on the command line is a pain. Now this would be easier with a shell like zsh that has "negative globs" where I could specify "* except /dev /sys" (ie just make an "exclude" list, not an include list), but my beloved tcsh doesn't have that and I'm not ready to switch to zsh yet. And like I said before, each UNIX has its own different set of "don't grep this" dirs, and they change over time (even in Linux), so making a "one-alias fits all" is a non-solution. _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
On 2015-05-04 Wyatt Zacharias wrote:
Does your own system have /dev/watchdog? Mine does, but I'm in the middle of a bunch of stuff so I'm not going to go poking at it.
Ya, grep it for fun an profit! Maybe quit all daemons first, and type sync every 1s until it blows up.
Also, did you read this: https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt
Ya, I hadn't done the research yet. I was trying to think the philosophy through before delving into the details to find out why someone thinks this is a feature, not a bug.
It appears this is the problem:
When the device is closed, the watchdog is disabled, unless the "Magic Close" feature is supported (see below). ... If a driver supports "Magic Close", the driver will not disable the watchdog unless a specific magic character 'V' has been sent to /dev/watchdog just before closing the file. If the userspace daemon closes the file without sending this special character, the driver will assume that the daemon (and userspace in general) died, and will stop pinging the watchdog without disabling it first. This will then cause a reboot if the watchdog is not re-opened in sufficient time. =====
That's almost certainly what is happening. I think the drive, if it can, should look at the type of open (read or write), and ignore the reads. An open-for-read + close should do nothing. Right now it appears the driver thinks open-for-read + close is the same as open-for-write + close.
Again, it's a philosophy thing, and I see no reason why a read should ever "do" an action other than obtain data (yes, possibly rewinding a tape too, etc), especially a hard reset!
Forgetting all that, my first task is to find out why grep -D skip is still opening the watchdog file at all...
On 2015-05-04 Adam Thompson wrote:
I've found "find / -xdev -type f -print0 | xargs grep string /dev/null" to be completely reliable and reasonably portable.
Ya, that would work as long as I wasn't counting on separate fs's being checked too (separate /home for instance); or just strike the -xdev. That would make a good semi-universal alias.
Oh, and you forgot the -0 to xargs. And probably want a /dev/null after the grep too.
The plot thickens...
I double-checked the exact command I was running when I did the grep...
I ran:
n19 grep -r foobar /
n19 is an alias I've been using forever (and as per last month's RTFM): /bin/nice -19 /usr/bin/ionice -c2 -n7 -t
I n19 almost everything long-lasting / non-interactive I run.
It just dawned on me: "nah, the shell (tcsh) couldn't be expanding the n19 alias and *not* expanding the grep alias, could it?"
Sure enough, after a few tests, it is clear the shell only expands the first alias on the line. So that means (tada) my grep -r wasn't being run with the --devices option! That is why it was opening the device files. In my quest to be "nice", I shot myself in the foot.
So now the question is why doesn't the shell expand both aliases (I guess it's a safety / can't-tell-what-you-mean issue); is there a way to make the n19 alias expand the command listed after it too; or can I tell the shell to expand aliases after "modifier" commands (nice, xargs, etc).
Nice test case (may have to be modified for bash):
#alias n19 '/bin/nice -19 /usr/bin/ionice -c2 -n7 -t'
#nice n19 echo bobo /bin/nice: n19: No such file or directory
#n19 n19 echo bobo ionice: failed to execute n19: No such file or directory
In a perfect world the system would "do what I mean" and both above commands would succeed, just as this does:
nice nice echo bobo
Oh dear. Yeah, that would definitely do it.
The shell won't expand the second alias because it's an argument, not a command, from the shell's perspective. In bash et al., you might be able to do "alias1 ( alias2 )". No clue about tcsh.
Checking the man pages, bash accommodates this if the final character of the alias is a space (!) but I don't see anything similar for tcsh. Using a shell function in bash instead of an alias would also enable the desired behavior, maybe that would work in tcsh, too?
-Adam
On May 5, 2015 1:08:10 AM CDT, Trevor Cordes trevor@tecnopolis.ca wrote:
The plot thickens...
I double-checked the exact command I was running when I did the grep...
I ran:
n19 grep -r foobar /
n19 is an alias I've been using forever (and as per last month's RTFM): /bin/nice -19 /usr/bin/ionice -c2 -n7 -t
I n19 almost everything long-lasting / non-interactive I run.
It just dawned on me: "nah, the shell (tcsh) couldn't be expanding the n19 alias and *not* expanding the grep alias, could it?"
Sure enough, after a few tests, it is clear the shell only expands the first alias on the line. So that means (tada) my grep -r wasn't being run with the --devices option! That is why it was opening the device files. In my quest to be "nice", I shot myself in the foot.
So now the question is why doesn't the shell expand both aliases (I guess it's a safety / can't-tell-what-you-mean issue); is there a way to make the n19 alias expand the command listed after it too; or can I tell the shell to expand aliases after "modifier" commands (nice, xargs, etc).
Nice test case (may have to be modified for bash):
#alias n19 '/bin/nice -19 /usr/bin/ionice -c2 -n7 -t'
#nice n19 echo bobo /bin/nice: n19: No such file or directory
#n19 n19 echo bobo ionice: failed to execute n19: No such file or directory
In a perfect world the system would "do what I mean" and both above commands would succeed, just as this does:
nice nice echo bobo _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
I've found "find / -xdev -type f -print0 | xargs grep string /dev/null" to be completely reliable and reasonably portable. Not to mention a lot faster. Faster yet if you use any of the #NN arguments to xargs to batch the grep invocations. I started doing that because every time I ran grep -r, it would just hang somewhere in /etc. -Adam -Adam
On May 4, 2015 10:51:08 PM CDT, Trevor Cordes trevor@tecnopolis.ca wrote:
On 2015-05-04 Gilbert E. Detillieux wrote:
A read on a file in /dev will almost always trigger an action (usually causing data to be read from a device, of course), but I
Hmm, that's a good point. I suppose a read on a device could "steal" the input another program was expecting.
It's strange though, because now that I think about it, when I run grep -r /, I'm pretty sure it's not reading all the disk block files, etc, as it runs way too fast for that...
Ah, I forgot I had this set as an alias to grep on every system I use: /bin/grep -s --devices=skip
I just checked and this new Rackspace box indeed had my alias set! So the plot thickens! So it's two possibly bugs: why was grep even reading that watchdog device in the first place; and why did reading it trigger a reboot?
can end up in symlink-induced loops. I'd also avoid going
I've never actually ended up in a symlink loop, but I can see how it would be easy to run into. I guess I don't use many dir symlinks on my systems.
Or... I just RTFM and it looks like gnu grep solved the symlink problem already:
-r, --recursive Read all files under each directory, recursively,
following symbolic links only if they are on the command line. Note that if no file operand is given, grep searches the working directory. This is equivalent to the -d recurse option.
-R, --dereference-recursive Read all files under each directory, recursively. Follow all symbolic links, unlike -r.
So that explains why I've never hit a symlink loop: I've never used -R (capital).
Ah, even better: -D ACTION, --devices=ACTION If an input file is a device, FIFO or socket, use ACTION to process it. By default, ACTION is read, which means that devices are read just as if they were ordinary files. If ACTION is skip, devices are silently skipped.
So that means my -D skips every "weird" file. That means that I've definitely, for sure, hit some bugs here. There's absolutely nothing now that should stop anyone from safely doing a:
grep -r -D skip /
on any system with modern gnu grep.
The main reason I want to be able to do grep -r / sometimes is that I need to find a string (like an IP address) that may be scattered literally anywhere on the fs. In the past when I say to myself "oh, it must only be in /etc or /var or /home", there ends up being some little file in /usr (not even /usr/local) or wherever that was changed ages ago for whatever reason to work around whatever problem, and it gets missed unless I just "grep the whole darn thing". And to ls / and go through the 10-15 items in my brain deciding if they are greppable or "system/non-greppable" takes time and is error/omission-prone, and to list each one on the command line is a pain. Now this would be easier with a shell like zsh that has "negative globs" where I could specify "* except /dev /sys" (ie just make an "exclude" list, not an include list), but my beloved tcsh doesn't have that and I'm not ready to switch to zsh yet. And like I said before, each UNIX has its own different set of "don't grep this" dirs, and they change over time (even in Linux), so making a "one-alias fits all" is a non-solution. _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
*xargs -0
Oops.
On May 4, 2015 10:59:28 PM CDT, Adam Thompson athompso@athompso.net wrote:
I've found "find / -xdev -type f -print0 | xargs grep string /dev/null" to be completely reliable and reasonably portable. Not to mention a lot faster. Faster yet if you use any of the #NN arguments to xargs to batch the grep invocations. I started doing that because every time I ran grep -r, it would just hang somewhere in /etc. -Adam -Adam
On May 4, 2015 10:51:08 PM CDT, Trevor Cordes trevor@tecnopolis.ca wrote:
On 2015-05-04 Gilbert E. Detillieux wrote:
A read on a file in /dev will almost always trigger an action (usually causing data to be read from a device, of course), but I
Hmm, that's a good point. I suppose a read on a device could "steal" the input another program was expecting.
It's strange though, because now that I think about it, when I run
grep
-r /, I'm pretty sure it's not reading all the disk block files, etc, as it runs way too fast for that...
Ah, I forgot I had this set as an alias to grep on every system I use: /bin/grep -s --devices=skip
I just checked and this new Rackspace box indeed had my alias set! So the plot thickens! So it's two possibly bugs: why was grep even reading that watchdog device in the first place; and why did reading
it
trigger a reboot?
can end up in symlink-induced loops. I'd also avoid going
I've never actually ended up in a symlink loop, but I can see how it would be easy to run into. I guess I don't use many dir symlinks on
my
systems.
Or... I just RTFM and it looks like gnu grep solved the symlink
problem
already:
-r, --recursive Read all files under each directory, recursively,
following symbolic links only if they are on the command line. Note that if no file operand is given, grep searches the working directory. This is equivalent to the -d recurse option.
-R, --dereference-recursive Read all files under each directory, recursively.
Follow
all symbolic links, unlike -r.
So that explains why I've never hit a symlink loop: I've never used -R (capital).
Ah, even better: -D ACTION, --devices=ACTION If an input file is a device, FIFO or socket, use
ACTION
to process it. By default, ACTION is read, which means that devices are read just as if they were ordinary files. If ACTION is skip, devices are silently skipped.
So that means my -D skips every "weird" file. That means that I've definitely, for sure, hit some bugs here. There's absolutely nothing now that should stop anyone from safely doing a:
grep -r -D skip /
on any system with modern gnu grep.
The main reason I want to be able to do grep -r / sometimes is that I need to find a string (like an IP address) that may be scattered literally anywhere on the fs. In the past when I say to myself "oh,
it
must only be in /etc or /var or /home", there ends up being some
little
file in /usr (not even /usr/local) or wherever that was changed ages ago for whatever reason to work around whatever problem, and it gets
missed
unless I just "grep the whole darn thing". And to ls / and go through the 10-15 items in my brain deciding if they are greppable or "system/non-greppable" takes time and is error/omission-prone, and to list each one on the command line is a pain. Now this would be easier with a shell like zsh that has "negative globs" where I could specify "* except /dev /sys" (ie just make an "exclude" list, not an include list), but my beloved tcsh doesn't have that and I'm not ready to switch to zsh yet. And like I said before, each UNIX has its own different set of "don't grep this" dirs, and they change over time (even in Linux), so making a "one-alias fits all" is a non-solution. _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
On 04/05/2015 10:59 PM, Adam Thompson wrote:
I've found "find / -xdev -type f -print0 | xargs grep string /dev/null" to be completely reliable and reasonably portable. Not to mention a lot faster. Faster yet if you use any of the #NN arguments to xargs to batch the grep invocations.
I'd also use the xargs -r (or --no-run-if-empty) option, to avoid running the command if there are no args. (It may not particularly matter in this case, but it can sometimes be quite significant.)
I started doing that because every time I ran grep -r, it would just hang somewhere in /etc.
Yeah, that's been my experience as well...
On May 4, 2015 10:51:08 PM CDT, Trevor Cordes trevor@tecnopolis.ca wrote:
...
Or... I just RTFM and it looks like gnu grep solved the symlink problem already: -r, --recursive Read all files under each directory, recursively, following symbolic links only if they are on the command line. Note that if no file operand is given, grep searches the working directory. This is equivalent to the -d recurse option. -R, --dereference-recursive Read all files under each directory, recursively. Follow all sy! mbolic links, unlike -r. So that explains why I've never hit a symlink loop: I've never used -R (capital).
This if a more recent addition to GNU grep. E.g. on RHEL 6 systems, -R and -r are the same, and the man page makes no mention of special handling of symlinks. (Hence the possibility of loops and hanging.)