[RndTbl] fast counting with find

Peter O'Gorman peter at pogma.com
Sun Nov 6 21:04:10 CST 2011


On 11/05/2011 05:11 PM, Adam Thompson wrote:
>> -----Original Message-----
>> From: roundtable-bounces at muug.mb.ca [mailto:roundtable-
>> bounces at muug.mb.ca] On Behalf Of Trevor Cordes
>> Sent: Saturday, November 05, 2011 10:00
>> To: MUUG RndTbl
>> Subject: [RndTbl] fast counting with find
>>
>> I found myself needing a type of -limit -quit option in find.  I
>> couldn't
>> Why do I want to quit at all?  Why not just do find|wc -l?  The
>> dirs I'm
>> scanning have about 200k files and are sometimes over NFS.  Either
>> way, a
>> full find|wc takes a long time and a lot of resources, especially
>> if the
>> find has to do a stat (for mtime, etc).  With find|wc my 1 find
>> command
>> took 10+ mins.  With my new method, it's a few seconds.
>>
>
>
> Doesn't "	find /path -args | head -1000 | wc -l" give you nearly the same
> result?  It may generate more disk i/o in the background (depending on
> pipe buffering and signalling semantics) but should just as fast when used
> interactively.
>
> (For the pedantic among us, that should read "find /path -args -print |
> head -n 1000 | wc -l" since direct specification of the line count to
> head(1) in option-style syntax is deprecated in POSIX.)

I regularly use sed Nq (where N is a number) instead of head because sed 
100q is universal, and head sometimes requires -n and sometimes doesn't, 
and that's annoying.

It seems like limiting the number of matches may not be the goal after 
all, instead perhaps it would be better to limit the resources that find 
uses? e.g. with recent coreutils you can limit the time it runs with the 
timeout(1) command.

The difference becomes immediately obvious if you think about what 
option you'd like added to findutils to give the desired result, one 
which stops find when the number of matches is reached, or one which 
stops find after some number of paths are seen. Depending on the options 
supplied these two could be very different.

Peter




More information about the Roundtable mailing list