[RndTbl] fast counting with find

Sat Nov 5 10:00:24 CDT 2011

I found myself needing a type of -limit -quit option in find.  I couldn't 
see a built-in way to do it, even with GNU find.  GNU find does let you 
count to 1 and quit, simply by using -quit, but not count to X then quit.

Why do I want to quit at all?  Why not just do find|wc -l?  The dirs I'm 
scanning have about 200k files and are sometimes over NFS.  Either way, a 
full find|wc takes a long time and a lot of resources, especially if the 
find has to do a stat (for mtime, etc).  With find|wc my 1 find command 
took 10+ mins.  With my new method, it's a few seconds.

Here's the best solution I could think up.  It's sub-optimal I'm sure 
(requires execs and a temp file), but I couldn't see an easier way to do 
it within the confines of find (without writing my own find, which I 
didn't want to do in this case).

See the find command example on line 6 of the script.  Arg 1 is a temp 
file path (normal race condition safety precautions apply).  Arg 2 is the 
number to count to.

cat find-count-helper
#!/usr/bin/perl -w
#
# allows a type	of counting short-circuit in find
# much faster in huge dirs than	doing a	find | wc -l
# use:
# find path \( -name 'exclude-dir' -prune \) -o -type f -print -exec 
# /usr/local/script/find-count-helper /tmp/unique-temp-file 5 \; -quit
# will find the	first 5	matching files then quit

$ENV{'SHELL'}='/bin/bash';

$file=$ARGV[0];
$max =$ARGV[1];

$_=`cat $file 2>/dev/null`; chop;
$_=0 if !$_;
$_++;
if ($_>=$max) {
  unlink $file;
  exit 0;
}
open(O,'>',$file) or die;
print O "$_\n";
close(O);
exit 1;