fast counting with find - Roundtable

5 Nov 2011


      I found myself needing a type of -limit -quit option in find.  I couldn't 
see a built-in way to do it, even with GNU find.  GNU find does let you 
count to 1 and quit, simply by using -quit, but not count to X then quit.
Why do I want to quit at all?  Why not just do find|wc -l?  The dirs I'm 
scanning have about 200k files and are sometimes over NFS.  Either way, a 
full find|wc takes a long time and a lot of resources, especially if the 
find has to do a stat (for mtime, etc).  With find|wc my 1 find command 
took 10+ mins.  With my new method, it's a few seconds.
Here's the best solution I could think up.  It's sub-optimal I'm sure 
(requires execs and a temp file), but I couldn't see an easier way to do 
it within the confines of find (without writing my own find, which I 
didn't want to do in this case).
See the find command example on line 6 of the script.  Arg 1 is a temp 
file path (normal race condition safety precautions apply).  Arg 2 is the 
number to count to.
cat find-count-helper
#!/usr/bin/perl -w
#
# allows a type	of counting short-circuit in find
# much faster in huge dirs than	doing a	find | wc -l
# use:
# find path ( -name 'exclude-dir' -prune ) -o -type f -print -exec 
# /usr/local/script/find-count-helper /tmp/unique-temp-file 5 ; -quit
# will find the	first 5	matching files then quit
$ENV{'SHELL'}='/bin/bash';
$file=$ARGV[0];
$max =$ARGV[1];
$_=`cat $file 2>/dev/null`; chop;
$_=0 if !$_;
$_++;
if ($_>=$max) {
  unlink $file;
  exit 0;
}
open(O,'>',$file) or die;
print O "$_\n";
close(O);
exit 1;