Well, ya... I guess I did the equivalent (though not so concise) method after sending the first email to roundtable...
echo APP-AM005-a | sed 's/[[:alpha:]]//g;s/[[:punct:]]//g'
I like the search inversion though Sean. Much cleaner!
So the problem I have is solved, thanks Sean. But why won't my original method work? The [[:digit:]]* should have matched all the consecutive digits shouldn't it? And then the ( ) brackets should place the match into buffer 1.
Steve
IBM Global Services sjm@ca.ibm.com (204)792-3245
----- Forwarded by Steve Moffat/CanWest/IBM on 05/09/2007 04:08 PM -----
"Sean Walberg" sean@ertw.com Sent by: To swalberg@gmail.co Steve Moffat/CanWest/IBM@IBMCA m cc roundtable@muug.mb.ca Subject 05/09/2007 04:05 Re: [RndTbl] Oh great RE master PM
# echo BUILD-AM005-a | sed 's/[^0-9]//g' 005
Sean
On 5/9/07, Steve Moffat <Steve.Moffat@ca.ibm.com > wrote: Hi All; I've been trying to write a sed function to return only a numeric portion of a string, but can't seem to get it working. The input is a single string of letters and numbers, with the numbers always consecutive. For example: BUILD-AM005-a
I want to get the 005 out of this string.
echo BUILD-AM005-a | sed 's/.*([[:digit:]]).*/\1/g'
will return the digit 5. This is good!
So I add an asterisk to try to match multiple digits like: echo BUILD-AM005-a | sed 's/.*([[:digit:]]*).*/\1/g'
and instead of returning 005, it doesn't match anything, so returns nothing.
Can any of you RE maters help me out?
Steve Moffat IBM Global Services sjm@ca.ibm.com (204)792-3245
_______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
-- Sean Walberg sean@ertw.com http://ertw.com/
What you had was
Match anything, Followed by zero or more digits, Followed by anything
The first match any matched the entire string, which also satisified the second two conditions.
The * operator is greedy, in perl .*? probably would have worked, I'm not sure if that feature exists in sed. Google around for "backtracking", O'Reilly had an excellent article on how it works in Perl, which should be the same for any regex library.
Sean
On 5/9/07, Steve Moffat Steve.Moffat@ca.ibm.com wrote:
Well, ya... I guess I did the equivalent (though not so concise) method after sending the first email to roundtable...
echo APP-AM005-a | sed 's/[[:alpha:]]//g;s/[[:punct:]]//g'
I like the search inversion though Sean. Much cleaner!
So the problem I have is solved, thanks Sean. But why won't my original method work? The [[:digit:]]* should have matched all the consecutive digits shouldn't it? And then the ( ) brackets should place the match into buffer 1.
Steve
IBM Global Services sjm@ca.ibm.com (204)792-3245
----- Forwarded by Steve Moffat/CanWest/IBM on 05/09/2007 04:08 PM -----
"Sean Walberg" <sean@ertw.com> Sent by: To swalberg@gmail.co Steve Moffat/CanWest/IBM@IBMCA m cc roundtable@muug.mb.ca Subject 05/09/2007 04:05 Re: [RndTbl] Oh great RE master PM
# echo BUILD-AM005-a | sed 's/[^0-9]//g' 005
Sean
On 5/9/07, Steve Moffat <Steve.Moffat@ca.ibm.com > wrote: Hi All; I've been trying to write a sed function to return only a numeric portion of a string, but can't seem to get it working. The input is a single string of letters and numbers, with the numbers always consecutive. For example: BUILD-AM005-a
I want to get the 005 out of this string.
echo BUILD-AM005-a | sed 's/.*([[:digit:]]).*/\1/g'
will return the digit 5. This is good!
So I add an asterisk to try to match multiple digits like: echo BUILD-AM005-a | sed 's/.*([[:digit:]]*).*/\1/g'
and instead of returning 005, it doesn't match anything, so returns nothing.
Can any of you RE maters help me out?
Steve Moffat IBM Global Services sjm@ca.ibm.com (204)792-3245
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
-- Sean Walberg sean@ertw.com http://ertw.com/
On 9 May, Sean Walberg wrote:
The * operator is greedy, in perl .*? probably would have worked, I'm not sure if that feature exists in sed. Google around for "backtracking",
Sean beat me to it. Perl's non-greedy *? is what you want. Without it you're taking the most left-most first. I use perl's non-greedy modifiers *all* the time. Plus, perl let's you use \d instead of the horrific posix [[:digit:]] syntax.
I tried to use a sed command to extract alphanumberic names, one per line, from a file which also included comments (line starts with '#') and blank lines. I wanted to refer to each in a loop.
I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/' ~/MPI-SRC/machine_names.txt` a little awkward because sed has no '+' to indicate "at least one alphanumeric".
I got an error that there was no command given to sed. I thought it printed by default.
I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt` and it printed all the names twice.
Finally, I had to use
for MACHINE in `sed -n '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt`
to suppress one copy.
Why do I get "double or nothing"?
I guess the simple answer is because you told it to... the p command prints the current pattern space, the -n suppress it. I'm not sure how to use sed like grep, which is basically what you're doing :)
This may be a silly observation, but a regexp like ^[0-9A-Za-z][0-9A-Za-z]* is the same as ^[0-9A-Za-z] (with or without a + at the end) if you're looking for a simple match and not replacing anything. The second alphanum doesn't have to match at all because of the *, so there's not much point in having it!
Quite frankly sed is too much of a pain for anything but the most simple substitutions. In your case I'd look at egrep (which supports the + operator, previous comment notwithstanding) , and for anything more complex, a perl one liner.
sed 's/something/complex/g' file
is the same as
perl -pe 's/something/complex/g' file
and you can do a lot more things in the code. You can even edit a file in place with -i:
perl -i.bak -pe 's/sed/perl -pe/' myscript.sh
or only print certain lines with -n (no print rather than the -p meaning print)
perl -ne 'if (//([^/]+$)/) { print $1; }' file
(the last one should print the last component of a /path/to/file/name of lines looking like a file path, and nothing on the rest)
Sean
On 5/9/07, Dan Martin ummar143@cc.umanitoba.ca wrote:
I tried to use a sed command to extract alphanumberic names, one per line, from a file which also included comments (line starts with '#') and blank lines. I wanted to refer to each in a loop.
I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/' ~/MPI-SRC/machine_names.txt` a little awkward because sed has no '+' to indicate "at least one alphanumeric".
I got an error that there was no command given to sed. I thought it printed by default.
I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt` and it printed all the names twice.
Finally, I had to use
for MACHINE in `sed -n '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt`
to suppress one copy.
Why do I get "double or nothing"?
-- -Dan
Dr. Dan Martin, MD, CCFP, BSc, BCSc (Hon)
GP Hospital Practitioner Computer Science grad student ummar143@cc.umanitoba.ca (204) 831-1746 answering machine always on
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
I think Gilles had another good idea with the [^[[:digit:]]]* to strip out all the leading non-digits instead of the first greedy .*
Most often I find that if I start a regexp with .* it can be rewritten much more simply by rethinking, often ending up in a [^X]*([X]+) pattern like Gilles or the s/[^X]//g pattern like I did. .*? does work wonders too, but regexps written that way suffer from the "what the heck does this do?" syndrome 6 months down the road :)
That said, having two .* in the same pattern usually ends up causing problems because of the very reasons we've gone through, and is a good sign to rethink the way you're matching.
Sean
On 5/9/07, Trevor Cordes trevor@tecnopolis.ca wrote:
On 9 May, Sean Walberg wrote:
The * operator is greedy, in perl .*? probably would have worked, I'm not sure if that feature exists in sed. Google around for "backtracking",
Sean beat me to it. Perl's non-greedy *? is what you want. Without it you're taking the most left-most first. I use perl's non-greedy modifiers *all* the time. Plus, perl let's you use \d instead of the horrific posix [[:digit:]] syntax.
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
The problem with 's/.*([[:digit:]]*).*/\1/g' is the first .* will swallow up as many characters as it can while still having the rest of the expression match something. Now, because the * means 0 or more of the previously matched character, the [[:digit:]]* and trailing .* will happily match nothing at all, so the initial .* still swallows everything. The fix is to make the first part more restrictive than .*, .e.g. [^0-9]* or [^[:digit:]]*, so it won't chew up your digits, but then Sean's RE is even simpler -- so long as you want all the digits and it doesn't matter where they are. If you needed to extract the first contiguous string of possibly several strings of digits, though, you'd need to get more elaborate.
An equivalent to Sean's command would be:
echo BUILD-AM005-a | tr -dc '0-9'
This would chew up the newline character as well, but that doesn't matter if you're going to use the result in a variable using var=`...` or var=$(...) .
Gilles
On 05/09/2007 04:14 PM, Steve Moffat wrote:
Well, ya... I guess I did the equivalent (though not so concise) method after sending the first email to roundtable...
echo APP-AM005-a | sed 's/[[:alpha:]]//g;s/[[:punct:]]//g'
I like the search inversion though Sean. Much cleaner!
So the problem I have is solved, thanks Sean. But why won't my original method work? The [[:digit:]]* should have matched all the consecutive digits shouldn't it? And then the ( ) brackets should place the match into buffer 1.
Steve
IBM Global Services sjm@ca.ibm.com (204)792-3245
----- Forwarded by Steve Moffat/CanWest/IBM on 05/09/2007 04:08 PM -----
*"Sean Walberg" <sean@ertw.com>* Sent by: swalberg@gmail.com 05/09/2007 04:05 PM
To
Steve Moffat/CanWest/IBM@IBMCA
cc
roundtable@muug.mb.ca
Subject
Re: [RndTbl] Oh great RE master
# echo BUILD-AM005-a | sed 's/[^0-9]//g' 005
Sean
On 5/9/07, *Steve Moffat* <_Steve.Moffat@ca.ibm.com _ mailto:Steve.Moffat@ca.ibm.com> wrote:
Hi All; I've been trying to write a sed function to return only a numeric portion of a string, but can't seem to get it working. The input is a single string of letters and numbers, with the numbers always consecutive. For example: BUILD-AM005-a I want to get the 005 out of this string. echo BUILD-AM005-a | sed 's/.*\([[:digit:]]\).*/\1/g' will return the digit 5. This is good! So I add an asterisk to try to match multiple digits like: echo BUILD-AM005-a | sed 's/.*\([[:digit:]]*\).*/\1/g' and instead of returning 005, it doesn't match anything, so returns nothing. Can any of you RE maters help me out? Steve Moffat IBM Global Services_ __sjm@ca.ibm.com_ <mailto:sjm@ca.ibm.com> (204)792-3245