Re: [RndTbl] Command line challenge: trim garbage from start and end of a file.

25 Dec 2010


      On 2010-11-10 Sean Walberg wrote:
...
Adam and I were having an offline discussion, and some testing shows
that AWK outperforms SED by a slight margin:
I know it's an old thread... but I had to have a go at you awk/sed
weenies. ;-)
My solution is perl regex:
perl -e '$/=undef;open I,$ARGV[0];$_=<I>;/(?:^|\n)(output start\n.*\noutput end\n)/s and print $1' infile
It's not a filter (requires a filename) but could probably easily be
made into one.
I recall reading in perl books that perl regex was faster than sed/awk
and the above takes advantage of the slurp-whole-file that $/ allows.
On my computer the awk/sed/perl times compare like so:
time sed -n '/output start/,/output end/p' < infile > /dev/null
0.264+0.002c 0:00.26s 100.0% 0+0<774k | 1+39cs 0+259pg 0sw 0sg
time awk '/output start/,/output end/' < infile > /dev/null
0.183+0.003c 0:00.18s 100.0% 0+0<774k | 1+28cs 0+298pg 0sw 0sg
time perl -e '$/=undef;open I,$ARGV[0];$_=<I>;/(?:^|\n)(output start\n.*\noutput end\n)/s and print $1' infile > /dev/null
0.032+0.017c 0:00.05s 80.0% 0+0<8168k | 1+19cs 0+4196pg 0sw 0sg
Wow!  But yikes, look at the mem usage.  Good thing RAM is plentiful
these days.  In 1980 sed would be the better bet for sure.
...
[sean@bob tmp]$ W=/usr/share/dict/words
[sean@bob tmp]$ (tail -1000 $W; echo output start; cat $W; echo
output end; head -1000 $W) > infile
[sean@bob tmp]$ wc -l infile
481831 infile

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

Re: [RndTbl] Command line challenge: trim garbage from start and end of a file.