27.8 glimpse and agrep
Before you use glimpse , you need to index your files by running glimpseindex . You'll probably want to run it every night from cron ( 40.12 ) . So, your searches will miss files that have been added since the last glimpseindex run. But, other than that problem (which can't be avoided in an indexed system like this), glimpse is fantastic - especially because it's (usually) so fast. The speed depends on the size of the index file you build: a bigger index makes the searches faster. But even with the smallest index file, I can search my entire 70-Megabyte email archive, on a fairly slow workstation, in less than 30 seconds. With faster CPUs and disks, the search could be much quicker. One weakness is in search patterns that could match many files, which can take a lot of time to do: glimpse will print a warning and ask if you want to continue the search. (After glimpse checks its index for possible matches, it runs agrep on the possibly matching files to check and get the exactly matching records.) agrep is one of the nicer additions to the grep family. It's not only one of the faster greps around, it has the unique feature that it will look for approximate matches. It's also record-oriented rather than line-oriented. Glimpse calls agrep , but you can also use agrep without using glimpse . The three most significant features of agrep that are not supported by the grep family are:
Putting these options together one can write queries like:
% which outputs all paragraphs referencing articles in CACM between 1985 and 1989 by TheAuthor dealing with Curriculum. Two errors are allowed, but they cannot be in either CACM or the year. (The <> brackets forbid errors in the pattern between them.) Other agrep features include searching for regular expressions (with or without errors), unlimited wildcards, limiting the errors to only insertions or only substitutions or any combination, allowing each deletion, for example, to be counted as, say, 2 substitutions or 3 insertions, restricting parts of the query to be exact and parts to be approximate, and many more. Email glimpse-request@cs.arizona.edu to be added to the glimpse mailing list. Email glimpse@cs.arizona.edu to report bugs, ask questions, discuss tricks for using glimpse, etc. (This is a moderated mailing list with very little traffic, mostly announcements.) - , |
|