[Chapter 37] 37.4 Why Learn Perl? #2

37.4 Why Learn Perl? #2

Donning my vestments as devil's advocate, let me start by saying that just because you learn something new, you shouldn't entirely forget the old. UNIX is a pluralistic environment in which many paths can lead to the solution, some more circuitously than others. Different problems can call for different solutions. If you force yourself to program in nothing but Perl, you may be short-changing yourself and taking the more tortuous route for some problems.

Now, that being said, I shall now reveal my true colors as Perl disciple and perhaps not infrequent evangelist. Perl is without question the greatest single program to appear in the UNIX community (although it runs elsewhere, too) in the last ten years. [Tom wrote this in 1992 or so, but I'd bet his opinion hasn't changed since then. ;-) -JP ] It makes programming fun again. It's simple enough to get a quick start on, but rich enough for some very complex tasks. I frequently learn new things about it despite having used it nearly daily since Larry Wall first released it to the general public around 1991. Heck, sometimes even Larry learns something new about Perl! The Artist is not always aware of the breadth and depth of his own work.

It is indeed the case that Perl is a strict superset of sed and awk , so much so that s2p and a2p translators exist for these utilities. You can do anything in Perl that you can do in the shell, although Perl is, strictly speaking, not a command interpreter. It's more of a programming language.

Most of us have written, or at least seen, shell scripts from hell. While often touted as one of UNIX's strengths because they're conglomerations of small, single-purpose tools, these shell scripts quickly grow so complex that they're cumbersome and hard to understand, modify, and maintain. After a certain point of complexity, the strength of the UNIX philosophy of having many programs that each does one thing well becomes its weakness.

The big problem with piping tools together is that there is only one pipe. This means that several different data streams have to get multiplexed into a single data stream, then demuxed on the other end of the pipe. This wastes processor time as well as human brain power.

For example, you might be shuffling a list of filenames through a pipe, but you also want to indicate that certain files have a particular attribute, and others don't. (For example, certain files are more than ten days old.) Typically, this information is encoded in the data stream by appending or prepending some special marker string to the filename. This means that both the pipe feeder and the pipe reader need to know about it. Not a pretty sight.

Because perl is one program rather than a dozen others ( sh , awk , sed , tr , wc , sort , grep , and so on), it is usually clearer to express yourself in perl than in sh and allies, and often more efficient as well. You don't need as many pipes, temporary files, or separate processes to do the job. You don't need to go shoving your data stream out to tr and back, and to sed and back, and to awk and back, and to sort and back, and then back to sed , and back again. Doing so can often be slow, awkward, and/or confusing.

Anyone who's ever tried to pass command-line arguments into a sed script of moderate complexity or above can attest to the fact that getting the quoting right is not a pleasant task. In fact, quoting in general in the shell is just not a pleasant thing to code or to read.

In a heterogeneous computing environment, the available versions of many tools vary too much from one system to the next to be utterly reliable. Does your sh understand functions on all your machines? What about your awk ? What about local variables? It is very difficult to do complex programming without being able to break a problem up into subproblems of lesser complexity. You're forced to resort to using the shell to call other shell scripts and allow UNIX's power of spawning processes ( 38.2 ) to serve as your subroutine mechanism, which is inefficient at best. That means your script will require several separate scripts to run, and getting all these installed, working, and maintained on all the different machines in your local configuration is painful. With perl , all you need to do is get it installed on the system - which is really pretty easy thanks to Larry's Configure program - and after that you're home free.

Perl is even beginning to be included by some software and hardware vendors' standard software distributions. I predict we'll see a lot more of this in the next couple of years.

Besides being faster, perl is a more powerful tool than sh , sed , or awk . I realize these are fighting words in some camps, but so be it. There exists a substantial niche between shell programming and C programming that perl conveniently fills. Tasks of this nature seem to arise with extreme frequency in the realm of system administration. Since system administrators almost invariably have far too much to do to devote a week to coding up every task before them in C, perl is especially useful for them. Larry Wall, Perl's author, has been known to call it "a shell for C programmers." I like to think of it as a "BASIC for UNIX." I realize that this carries both good and bad connotations.

In what ways is perl more powerful than the individual tools? This list is pretty long, so what follows is not necessarily an exhaustive list. To begin with, you don't have to worry about arbitrary and annoying restrictions on string length, input line length, or number of elements in an array. These are all virtually unlimited; i.e., limited to your system's address space and virtual memory size.

Perl's regular expression ( 26.4 ) handling is far and above the best I've ever seen. For one thing, you don't have to remember which tool wants which particular flavor of regular expressions, or lament the fact that one tool doesn't allow (..|..) constructs or + 's \b 's or whatever. With Perl, it's all the same - and, as far as I can tell, a proper superset of all the others.

Perl has a fully functional symbolic debugger (written, of course, in Perl) that is an indispensable aid in debugging complex programs. Neither the shell nor sed / awk / sort / tr /... have such a thing.

Perl has a loop control mechanism that's more powerful even than C's. You can do the equivalent of a break or continue ( last and next in Perl) of any arbitrary loop, not merely the nearest enclosing one. You can even do a kind of continue that doesn't trigger the re-initialization part of a loop, something you may, from time to time, want to do.

Perl's data types and operators are richer than the shells' or awk 's, because you have scalars, numerically-indexed arrays (lists), and string-indexed (hashed) arrays. Each of these holds arbitrary data values, including floating-point numbers, for which mathematic built-in subroutines and power operators are available. It can handle binary data of arbitrary size.

Speaking of LISP, you can generate strings, perhaps with sprintf ( ), and then eval them. That way you can generate code on the fly. You can even do lambda-type functions that return newly created functions that you can call later. The scoping of variables is dynamic; fully recursive subroutines are supported; and you can pass or return any type of data into or out of your subroutines.

You have a built-in automatic formatter for generating pretty printed forms with automatic pagination and headers and center-justified and text-filled fields like %(|fmt)s , if you can imagine what that would actually be were it legal.

There's a mechanism for writing SUID ( 1.23 ) programs that can be made more secure than even C programs, thanks to an elaborate data-tracing mechanism that understands the "taintedness" of data derived from external sources. It won't let you do anything really stupid that you might not have thought of.

You have access to just about any system-related function or system call, like ioctl s, fcntl , select , pipe and fork , getc , socket and bind , and connect and attach , and indirect syscall invocation, as well as things like getpwuid , gethostbyname , etc. You can read in binary data laid out by a C program or system call using structure-conversion templates.

At the same time you can get at the high-level shell-type operations like the -r or -w tests ( 44.20 ) on files or `backquote` ( 9.16 ) command interpolation. You can do file-globbing with the <*.[ ch ]> ( 15.1 ) notation or do low-level readdir s as suits your fancy.

DBM files can be accessed using simple array notation. This is really nice for dealing with system databases (aliases, news, ...), efficient access mechanisms over large data sets, and for keeping persistent data.

Don't be dismayed by the apparent complexity of what I've just discussed. Perl is actually very easy to learn because so much of it derives from existing tools. It's like interpreter C with sh , sed , awk , and a lot more built into it. And, finally, there's a lot of code out there already written in Perl, including libraries to handle things you don't feel like re-implementing.

- TC