Fw: [RndTbl] Oh great RE master

List overview All Threads
Download

newer

older

b5media is looking for a server...

Oh great RE master

Steve Moffat

9 May 2007 9 May '07

9:14 p.m.

Well, ya... I guess I did the equivalent (though not so concise) method after sending the first email to roundtable...

echo APP-AM005-a | sed 's/[[:alpha:]]//g;s/[[:punct:]]//g'

I like the search inversion though Sean. Much cleaner!

So the problem I have is solved, thanks Sean. But why won't my original method work? The [[:digit:]]* should have matched all the consecutive digits shouldn't it? And then the ( ) brackets should place the match into buffer 1.

Steve

IBM Global Services sjm@ca.ibm.com (204)792-3245

----- Forwarded by Steve Moffat/CanWest/IBM on 05/09/2007 04:08 PM -----

"Sean Walberg" sean@ertw.com Sent by: To swalberg@gmail.co Steve Moffat/CanWest/IBM@IBMCA m cc roundtable@muug.mb.ca Subject 05/09/2007 04:05 Re: [RndTbl] Oh great RE master PM

# echo BUILD-AM005-a | sed 's/[^0-9]//g' 005

Sean

On 5/9/07, Steve Moffat <Steve.Moffat@ca.ibm.com > wrote: Hi All; I've been trying to write a sed function to return only a numeric portion of a string, but can't seem to get it working. The input is a single string of letters and numbers, with the numbers always consecutive. For example: BUILD-AM005-a

I want to get the 005 out of this string.

echo BUILD-AM005-a | sed 's/.*([[:digit:]]).*/\1/g'

will return the digit 5. This is good!

So I add an asterisk to try to match multiple digits like: echo BUILD-AM005-a | sed 's/.*([[:digit:]]*).*/\1/g'

and instead of returning 005, it doesn't match anything, so returns nothing.

Can any of you RE maters help me out?

Steve Moffat IBM Global Services sjm@ca.ibm.com (204)792-3245

_______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable

-- Sean Walberg sean@ertw.com http://ertw.com/

Attachments:

attachment.html (text/html — 4.8 KB)
pic26619.gif (image/gif — 1.2 KB)
ecblank.gif (image/gif — 45 bytes)

Show replies by date

Sean Walberg

9 May 9 May

9:26 p.m.

New subject: Oh great RE master

What you had was

Match anything, Followed by zero or more digits, Followed by anything

The first match any matched the entire string, which also satisified the second two conditions.

The * operator is greedy, in perl .*? probably would have worked, I'm not sure if that feature exists in sed. Google around for "backtracking", O'Reilly had an excellent article on how it works in Perl, which should be the same for any regex library.

Sean

On 5/9/07, Steve Moffat Steve.Moffat@ca.ibm.com wrote:

...

Well, ya... I guess I did the equivalent (though not so concise) method after sending the first email to roundtable...

echo APP-AM005-a | sed 's/[[:alpha:]]//g;s/[[:punct:]]//g'

I like the search inversion though Sean. Much cleaner!

So the problem I have is solved, thanks Sean. But why won't my original method work? The [[:digit:]]* should have matched all the consecutive digits shouldn't it? And then the ( ) brackets should place the match into buffer 1.

Steve

IBM Global Services sjm@ca.ibm.com (204)792-3245

----- Forwarded by Steve Moffat/CanWest/IBM on 05/09/2007 04:08 PM -----
         "Sean Walberg"
         <sean@ertw.com>
         Sent by:                                                   To
         swalberg@gmail.co         Steve Moffat/CanWest/IBM@IBMCA
         m                                                          cc
                                   roundtable@muug.mb.ca
                                                               Subject
         05/09/2007 04:05          Re: [RndTbl] Oh great RE master
         PM
# echo BUILD-AM005-a | sed 's/[^0-9]//g' 005

Sean

On 5/9/07, Steve Moffat <Steve.Moffat@ca.ibm.com > wrote: Hi All; I've been trying to write a sed function to return only a numeric portion of a string, but can't seem to get it working. The input is a single string of letters and numbers, with the numbers always consecutive. For example: BUILD-AM005-a

I want to get the 005 out of this string.

echo BUILD-AM005-a | sed 's/.*([[:digit:]]).*/\1/g'

will return the digit 5. This is good!

So I add an asterisk to try to match multiple digits like: echo BUILD-AM005-a | sed 's/.*([[:digit:]]*).*/\1/g'

and instead of returning 005, it doesn't match anything, so returns nothing.

Can any of you RE maters help me out?

Steve Moffat IBM Global Services sjm@ca.ibm.com (204)792-3245

Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable

-- Sean Walberg sean@ertw.com http://ertw.com/

-- Sean Walberg sean@ertw.com http://ertw.com/

Trevor Cordes

11:59 p.m.

New subject: Oh great RE master

On 9 May, Sean Walberg wrote:

...

The * operator is greedy, in perl .*? probably would have worked, I'm not sure if that feature exists in sed. Google around for "backtracking",

Sean beat me to it. Perl's non-greedy *? is what you want. Without it you're taking the most left-most first. I use perl's non-greedy modifiers *all* the time. Plus, perl let's you use \d instead of the horrific posix [[:digit:]] syntax.

Dan Martin

10 May 10 May

1:38 a.m.

New subject: Another RE problem with sed

I tried to use a sed command to extract alphanumberic names, one per line, from a file which also included comments (line starts with '#') and blank lines. I wanted to refer to each in a loop.

I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/' ~/MPI-SRC/machine_names.txt` a little awkward because sed has no '+' to indicate "at least one alphanumeric".

I got an error that there was no command given to sed. I thought it printed by default.

I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt` and it printed all the names twice.

Finally, I had to use

for MACHINE in `sed -n '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt`

to suppress one copy.

Why do I get "double or nothing"?

-- -Dan Dr. Dan Martin, MD, CCFP, BSc, BCSc (Hon) GP Hospital Practitioner Computer Science grad student ummar143@cc.umanitoba.ca (204) 831-1746 answering machine always on

Sean Walberg

3:02 a.m.

New subject: Another RE problem with sed

I guess the simple answer is because you told it to... the p command prints the current pattern space, the -n suppress it. I'm not sure how to use sed like grep, which is basically what you're doing :)

This may be a silly observation, but a regexp like ^[0-9A-Za-z][0-9A-Za-z]* is the same as ^[0-9A-Za-z] (with or without a + at the end) if you're looking for a simple match and not replacing anything. The second alphanum doesn't have to match at all because of the *, so there's not much point in having it!

Quite frankly sed is too much of a pain for anything but the most simple substitutions. In your case I'd look at egrep (which supports the + operator, previous comment notwithstanding) , and for anything more complex, a perl one liner.

sed 's/something/complex/g' file

is the same as

perl -pe 's/something/complex/g' file

and you can do a lot more things in the code. You can even edit a file in place with -i:

perl -i.bak -pe 's/sed/perl -pe/' myscript.sh

or only print certain lines with -n (no print rather than the -p meaning print)

perl -ne 'if (//([^/]+$)/) { print $1; }' file

(the last one should print the last component of a /path/to/file/name of lines looking like a file path, and nothing on the rest)

Sean

On 5/9/07, Dan Martin ummar143@cc.umanitoba.ca wrote:

...

I tried to use a sed command to extract alphanumberic names, one per line, from a file which also included comments (line starts with '#') and blank lines. I wanted to refer to each in a loop.

I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/' ~/MPI-SRC/machine_names.txt` a little awkward because sed has no '+' to indicate "at least one alphanumeric".

I got an error that there was no command given to sed. I thought it printed by default.

I tried for MACHINE in `sed '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt` and it printed all the names twice.

Finally, I had to use

for MACHINE in `sed -n '/^[0-9A-Za-z][0-9A-Za-z]*/p' ~/MPI-SRC/machine_names.txt`

to suppress one copy.

Why do I get "double or nothing"?

-- -Dan

Dr. Dan Martin, MD, CCFP, BSc, BCSc (Hon)

GP Hospital Practitioner Computer Science grad student ummar143@cc.umanitoba.ca (204) 831-1746 answering machine always on

Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable

-- Sean Walberg sean@ertw.com http://ertw.com/

Sean Walberg

3:13 a.m.

New subject: Oh great RE master

I think Gilles had another good idea with the [^[[:digit:]]]* to strip out all the leading non-digits instead of the first greedy .*

Most often I find that if I start a regexp with .* it can be rewritten much more simply by rethinking, often ending up in a [^X]*([X]+) pattern like Gilles or the s/[^X]//g pattern like I did. .*? does work wonders too, but regexps written that way suffer from the "what the heck does this do?" syndrome 6 months down the road :)

That said, having two .* in the same pattern usually ends up causing problems because of the very reasons we've gone through, and is a good sign to rethink the way you're matching.

Sean

On 5/9/07, Trevor Cordes trevor@tecnopolis.ca wrote:

...

On 9 May, Sean Walberg wrote:

...
The * operator is greedy, in perl .*? probably would have worked, I'm not sure if that feature exists in sed. Google around for "backtracking",

Sean beat me to it. Perl's non-greedy *? is what you want. Without it you're taking the most left-most first. I use perl's non-greedy modifiers *all* the time. Plus, perl let's you use \d instead of the horrific posix [[:digit:]] syntax.

Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable

-- Sean Walberg sean@ertw.com http://ertw.com/

Gilles Detillieux

9 May 9 May

9:45 p.m.

New subject: Fw: Oh great RE master

The problem with 's/.*([[:digit:]]*).*/\1/g' is the first .* will swallow up as many characters as it can while still having the rest of the expression match something. Now, because the * means 0 or more of the previously matched character, the [[:digit:]]* and trailing .* will happily match nothing at all, so the initial .* still swallows everything. The fix is to make the first part more restrictive than .*, .e.g. [^0-9]* or [^[:digit:]]*, so it won't chew up your digits, but then Sean's RE is even simpler -- so long as you want all the digits and it doesn't matter where they are. If you needed to extract the first contiguous string of possibly several strings of digits, though, you'd need to get more elaborate.

An equivalent to Sean's command would be:

echo BUILD-AM005-a | tr -dc '0-9'

This would chew up the newline character as well, but that doesn't matter if you're going to use the result in a variable using var=`...` or var=$(...) .

Gilles

On 05/09/2007 04:14 PM, Steve Moffat wrote:

...

Well, ya... I guess I did the equivalent (though not so concise) method after sending the first email to roundtable...

echo APP-AM005-a | sed 's/[[:alpha:]]//g;s/[[:punct:]]//g'

I like the search inversion though Sean. Much cleaner!

So the problem I have is solved, thanks Sean. But why won't my original method work? The [[:digit:]]* should have matched all the consecutive digits shouldn't it? And then the ( ) brackets should place the match into buffer 1.

Steve

IBM Global Services sjm@ca.ibm.com (204)792-3245

----- Forwarded by Steve Moffat/CanWest/IBM on 05/09/2007 04:08 PM -----
                    *"Sean Walberg" <sean@ertw.com>*
                    Sent by: swalberg@gmail.com

                    05/09/2007 04:05 PM
To

Steve Moffat/CanWest/IBM@IBMCA

cc

roundtable@muug.mb.ca

Subject

Re: [RndTbl] Oh great RE master

# echo BUILD-AM005-a | sed 's/[^0-9]//g' 005

Sean

On 5/9/07, *Steve Moffat* <_Steve.Moffat@ca.ibm.com _ mailto:Steve.Moffat@ca.ibm.com> wrote:
  Hi All;
  I've been trying to write a sed function to return only a numeric
  portion of a string, but can't seem to get it working.
  The input is a single string of letters and numbers, with the
  numbers always consecutive.
  For example: BUILD-AM005-a

  I want to get the 005 out of this string.

  echo BUILD-AM005-a | sed 's/.*$[[:digit:]]$.*/\1/g'

  will return the digit 5. This is good!

  So I add an asterisk to try to match multiple digits like:
  echo BUILD-AM005-a | sed 's/.*$[[:digit:]]*$.*/\1/g'

  and instead of returning 005, it doesn't match anything, so
  returns nothing.

  Can any of you RE maters help me out?

  Steve Moffat
  IBM Global Services_
  __sjm@ca.ibm.com_ <mailto:sjm@ca.ibm.com>
  (204)792-3245

-- Gilles R. Detillieux E-mail: grdetil@scrc.umanitoba.ca Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)

6660

Age (days ago)

6661

Last active (days ago)

roundtable@muug.ca

6 comments

5 participants

tags (0)

participants (5)

Dan Martin
Gilles Detillieux
Sean Walberg
Steve Moffat
Trevor Cordes