On 11/05/2011 05:11 PM, Adam Thompson wrote:
-----Original Message----- From: roundtable-bounces@muug.mb.ca [mailto:roundtable- bounces@muug.mb.ca] On Behalf Of Trevor Cordes Sent: Saturday, November 05, 2011 10:00 To: MUUG RndTbl Subject: [RndTbl] fast counting with find
I found myself needing a type of -limit -quit option in find. I couldn't Why do I want to quit at all? Why not just do find|wc -l? The dirs I'm scanning have about 200k files and are sometimes over NFS. Either way, a full find|wc takes a long time and a lot of resources, especially if the find has to do a stat (for mtime, etc). With find|wc my 1 find command took 10+ mins. With my new method, it's a few seconds.
Doesn't " find /path -args | head -1000 | wc -l" give you nearly the same result? It may generate more disk i/o in the background (depending on pipe buffering and signalling semantics) but should just as fast when used interactively.
(For the pedantic among us, that should read "find /path -args -print | head -n 1000 | wc -l" since direct specification of the line count to head(1) in option-style syntax is deprecated in POSIX.)
I regularly use sed Nq (where N is a number) instead of head because sed 100q is universal, and head sometimes requires -n and sometimes doesn't, and that's annoying.
It seems like limiting the number of matches may not be the goal after all, instead perhaps it would be better to limit the resources that find uses? e.g. with recent coreutils you can limit the time it runs with the timeout(1) command.
The difference becomes immediately obvious if you think about what option you'd like added to findutils to give the desired result, one which stops find when the number of matches is reached, or one which stops find after some number of paths are seen. Depending on the options supplied these two could be very different.
Peter