Finding Files (Much) Faster with a find Database (Unix Power Tools, 3rd Edition)

9.20. Finding Files (Much) Faster with a find Database

If you use find to search for files, you know that it can take a long time to work, especially when there are lots of directories to search. Here are some ideas for speeding up your finds.

NOTE: By design, setups like these that build a file database won't have absolutely up-to-date information about all your files.

If your system has "fast find" or locate, that's probably all you need. It lets you search a list of all pathnames on the system.

Even if you have "fast find" or locate, it still might not do what you need. For example, those utilities only search for pathnames. To find files by the owner's name, the number of links, the size, and so on, you have to use "slow find." In that case -- or, when you don't have "fast find" or locate -- you may want to set up your own version.

slocate can build and update its own database (with its -u option), as well as search the database. The basic "fast find" has two parts. One part is a command, a shell script usually named updatedb or locate.updatedb, that builds a database of the files on your system -- if your system has it, take a look to see a fancy way to build the database. The other part is the find or locate command itself -- it searches the database for pathnames that match the name (regular expression) you type.

To make your own "fast find":

Pick a filename for the database. We'll use $HOME/.fastfind (some systems use $LOGDIR instead of $HOME).

Design the find command you want to use. The command to build a database of all the files in your home directory might look like this:
```
% cd
% find . -print | sed "s@^./@@" > .fastfind.new
% mv -f .fastfind.new .fastfind
```
That doesn't update the database until the new one is finished. It also doesn't compress the database. If you're short on disk space, use this instead:
```
% cd
% find . -print | sed "s@^./@@" | gzip > .fastfind.gz
```
The script starts from your home directory, then uses sed (Section 13.9) to strip the start of the pathname (like ./) from every entry. (If you're building a database of the whole filesystem, don't do that part!) To save more space, you can compress with bzip2 instead; it's slow, but it saved about 25% of the disk space for my database.
Set up cron (Section 25.3) or at to run that find as often as you want -- usually once a day, early in the morning morning, is fine.
Finally, make a shell script (I call mine ffind) to search the database. If you use egrep (Section 13.4), you can search with flexible regular expressions:
```
egrep "$1" $HOME/.fastfind | sed "s@^@$HOME/@"
```
or, for a gzipped database:
```
gzcat $HOME/.fastfind.gz | egrep "$1" | sed "s@^@$HOME/@"
```
The sed expressions add your home directory's pathname (like /usr/freddie) to each line.

To search the database, type:

% ffind somefile
/usr/freddie/lib/somefile
% ffind '/(sep|oct)[^/]*$'
/usr/freddie/misc/project/september
/usr/freddie/misc/project/october

You can do much more: I'll get you started. If you have room to store more information than just pathnames, you can feed your find output to a command like ls -l. For example, if you do a lot of work with links, you might want to keep the files' i-numbers as well as their names. You'd build your database with a command like this:

% cd
% find . -print | xargs ls -id > .fastfind.new
% mv -f .fastfind.new .fastfind

Or, if your version of find has the handy -ls operator, use the next script. Watch out for really large i-numbers; they might shift the columns and make cut give wrong output. The exact column numbers will depend on your system:

% cd
% find . -ls | cut -c1-7,67- > .fastfind.new
% mv -f .fastfind.new .fastfind

Then, your ffind script could search for files by i-number. For instance, if you had a file with i-number 1234 and you wanted to find all its links:

% ffind "^1234 "

The space at the end of that regular expression prevents matches with i-numbers like 12345. You could search by pathname in the same way. To get a bit fancier, you could make your ffind a little perl or awk script that searches your database by field. For instance, here's how to make awk do the previous i-number search; the output is just the matching pathnames:

awk '$1 == 1234 {print $2}' $HOME/.fastfind

With some information about Unix shell programming and utilities like awk, the techniques in this article should let you build and search a sophisticated file database -- and get information much faster than with plain old find.

-- JP