Re: [RndTbl] Command line challenge: trim garbage from start and end of a file.

10 Nov 2010

      I may have misinterpreted the question before.  If you want the "output 
start" and "output end" marker lines in the output (which I guess your 
grep pipeline would do), then Adam's sed script will do that.  Mine, 
using the "d" commands, will output only the data in between.  The 
shortest awk script to do the same would be:

awk '/output start/{s=1};s==1;/output end/{s=0};'

or

awk '/output end/{s=0};s==1;/output start/{s=1};'

The first is a simplification of Adam's, which outputs the output marker 
lines, while the second, using the same statements in the opposite 
order, suppresses the markers.  Of perl, awk and sed, I suspect sed is 
the most lightweight, and probably the quickest, unless perl can 
outperform sed on larger files.  awk has a reputation for being pretty 
slow.  I tend to favour sed unless awk or perl makes the job a lot easier.

Gilles

On 11/10/2010 11:13 AM, Adam Thompson wrote:
...
The AWK version is functionally identical, and not very much shorter, or 
any more elegant:
awk ‘/output start/ {s=1};{if (s==1) print $0};/output end/ {s=0}’
(the perl version can generally be made that small, too.)
I would instead suggest sed(1), since this is precisely what it’s 
designed for:
sed –n ‘/output start/,/output end/p’ < infile
-Adam
*From:* roundtable-bounces@muug.mb.ca 
[mailto:roundtable-bounces@muug.mb.ca] *On Behalf Of *Sean Walberg
*Sent:* Wednesday, November 10, 2010 10:56
*To:* Continuation of Round Table discussion
*Subject:* Re: [RndTbl] Command line challenge: trim garbage from start 
and end of a file.
OTTOMH:
perl -n -e 'BEGIN {$state = 0} $state = 1 if ($state == 0 and /output 
start/); $state = 2 if ($state == 1 and /output end/)  ; print if 
($state == 1)' < infile > outfile
I'll bet there's a shorter AWK version though.
Sean
On Wed, Nov 10, 2010 at 10:51 AM, John Lange <john@johnlange.ca 
<mailto:john@johnlange.ca>> wrote:
I have files with the following structure:
garbage
garbage
garbage
output start
.. good data
.. good data
.. good data
.. good data
output end
garbage
garbage
garbage
How can I extract the good data from the file trimming the garbage
from the beginning and end?
The following works just fine but it's dirty because I don't like the
fact that I have to pick an arbitrarily large number for the "before"
and "after" values.
grep -A 999999 "output start" <infile> | grep -B 999999 "output end" > 
newfile
Can anyone come up with something more elegant?
--
John Lange
www.johnlange.ca <http://www.johnlange.ca>
-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 0J9  (Canada)

Re: [RndTbl] Command line challenge: trim garbage from start and end of a file.

Gilles Detillieux