[Chapter 17] 17.19 Finding Files (Much) Faster with a find Database

17.19 Finding Files (Much) Faster with a find Database

If you use find (17.2 ) to search for files, you know that it can take a long time to work, especially when there are lots of directories to search. Here are some ideas for speeding up your find s.

NOTE: By design, setups like these that build a file database won't have absolutely up-to-date information about all your files.

If your system has "fast find " or GNU locate (17.18 ) , that's probably all you need. It lets you search a list of all pathnames on the system.

Even if you have the fast find or locate , it still might not do what you need. For example, those utilties only search for pathnames. To find files by the owner's name, the number of links, the size, and so on, you have to use "slow" find . In that case - or, when you don't have fast find or locate - you may want to set up your own version.

The basic fast find has two parts. One part is a command, a shell script named /usr/lib/find/updatedb , that builds a database of the files on your system - if your system has it, take a look to see a fancy way to build the database. The other part is the find command itself - it searches the database for pathnames that match the name (regular expression) you type.

To make your own fast find :

Pick a filename for the database. We'll use $HOME/.fastfind (some systems use $LOGDIR instead of $HOME ).
Design the find command you want to use. The command to build a database of all the files in your home directory might look like this:
```
cd
find . -print | sed "s@^./@@" > $HOME/.fastfind
```
If you're short on disk space, use this instead:
gzip
cd find . -print | sed "s@^./@@" | gzip > $HOME/.fastfind.gz
To save disk space, the script starts from your home directory, then uses sed (34.24 ) to strip the start of the pathname (like ./ ) from every entry. (If you're building a database of the whole filesystem, don't do that!)
Set up cron (40.12 ) or at (40.3 ) to run that find as often as you want - usually once early every morning is fine.
Finally, make a shell script (1.5 ) (I call mine ffind ) to search the database. It's usually fastest to use egrep (27.5 ) - and that lets you search with flexible regular expressions (26.4 ) , too:
```
egrep "$1" $HOME/.fastfind | sed "s@^@$HOME/@"
```
or, for a gzip ped database:
gzcat
gzcat $HOME/.fastfind.gz | egrep "$1" | sed "s@^@$HOME/@"
The sed expressions add your home directory's pathname (like /usr/freddie ) to each line.

To search the database, type:

% ffind somefile


/usr/freddie/lib/somefile
% ffind '/(sep|oct)[^/]*$'


/usr/freddie/misc/project/september
/usr/freddie/misc/project/october

You can do much more. I'll get you started. If you have room to store more information than just pathnames, you can feed your find output to a command like ls -l or sls (16.29 ) . For example, if you do a lot of work with links (18.3 ) , you might want to keep the files' i-numbers (1.22 ) as well as their names. You'd build your database with a command like the one below. Use xargs (9.21 ) or something like it (9.20 ) .

cd
find . -print | xargs ls -id > $HOME/.fastfind

Or, if your version of find has the handy -ls operator, use the next script. Watch out for really large i-numbers; they might shift the columns and make cut (35.14 ) give wrong output.

cd
find . -ls | cut -c1-7,67- > $HOME/.fastfind

The exact column numbers will depend on your system. Then, your ffind script could search for files by i-number. For instance, if you had a file with i-number 1234 and you wanted to find all its links:

% ffind "^1234 "

(The space at the end prevents matches with i-numbers like 12345.) You could also search by pathname.

Article 16.21 shows another find database setup, a list of directories or files with the same names.

With some information about UNIX shell programming and utilities like awk (33.11 ) , the techniques in this article should let you build and search a sophisticated file database - and get information much faster than with plain old find .

- JP