[Chapter 27] 27.12 Make Custom grep Commands (etc.) with perl

27.12 Make Custom grep Commands (etc.) with perl

All of the various grep -like utilities perform pretty much the same function, with minor differences - they search for a specified pattern in some or all of a file, and then display that pattern with varying amounts of surrounding context.

perl

As you use UNIX more and more, you will find yourself wanting to do an increasing number of grep -like tasks, but no particular UNIX utility will quite suit them all (hence the need for the various grep utilities discussed earlier in this section). You'll start accumulating C programs, awk scripts, and shell scripts to do these different tasks, and you'll be craving one utility that can easily encompass them all so you don't have to waste the disk space for all of those binaries. That utility is Perl ( 37.1 ) , the "Practical Extraction and Report Language" developed by Larry Wall. According to the documentation accompanying Perl, it is "an interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information." If you don't already have perl installed on your system, you can get it from the CD-ROM.

For example, to search for a pattern in the header of a Usenet message:

perl -ne 'exit if (/^$/); print if (/

pattern

/);' 

filename

[This works because mail and Usenet ( 1.33 ) messages always use a blank line - indicated by ^$ in regular expression syntax - to separate the header from the body of the message. - TOR ]

To do a search for a pattern and print the paragraphs in which it appears:

perl -ne '$/ = "\n\n"; print if (/

pattern

/);' 

filename

[This assumes that paragraphs are delimited by a double linefeed - that is, a blank line. You'd have to adjust this script for a troff or TeX document where paragraphs are separated by special codes. - TOR ]

Searching through files is one of Perl's strengths, but certainly not its only strength. Perl encompasses all of the functionality of sed , awk , grep , find , and other UNIX utilities. Furthermore, a Perl program to do something originally done with one or more of these utilities is usually faster and easier to read than the non-Perl solution. [I agree that Perl is usually faster than a bunch of separate UNIX utilities strung together by pipes and temporary files. It also beats many utilities running standalone. But, in my experience, sed beats Perl's running speed almost every time. That could be partly because I have a slow disk, and the 40-kbtye sed binary takes less time to load than the 700-kbtye Perl 5 binary. Make your own tests, and I'll make room for Jonathan's rebuttal in the third edition of this book. ;-) -JP ]

- JIK