Using Perl (Running Linux)

13.4. Using Perl

Perl's main strength is that it incorporates the most widely used features of languages, such as C, sed, awk, and various shells, into a single interpreted script language. In the past, getting a complicated job done was a matter of juggling these various languages into complex arrangements, often entailing sed scripts piping into awk scripts piping into shell scripts and eventually piping into a C program. Perl gets rid of the common Unix philosophy of using many small tools to handle small parts of one large problem. Instead, Perl does it all, and it provides many different ways of doing the same thing. In fact, this chapter was written by an Artificial Intelligence program developed in Perl. ( Just kidding, Larry.)

Perl provides a nice programming interface to many features that were sometimes difficult to use in other languages. For example, a common task of many Unix system administration scripts is to scan a large amount of text, cut fields out of each line of text based on a pattern (usually represented as a regular expression), and produce a report based on the data. Let's say that you want to process the output of the Unix last command, which displays a record of login times for all users on the system, as so:

mdw       ttypf    loomer.vpizza.co Sun Jan 16 15:30 - 15:54  (00:23)
larry     ttyp1    muadib.oit.unc.e Sun Jan 16 15:11 - 15:12  (00:00)
johnsonm  ttyp4    mallard.vpizza.c Sun Jan 16 14:34 - 14:37  (00:03)
jem       ttyq2    mallard.vpizza.c Sun Jan 16 13:55 - 13:59  (00:03)
linus     FTP      kruuna.helsinki. Sun Jan 16 13:51 - 13:51  (00:00)
linus     FTP      kruuna.helsinki. Sun Jan 16 13:47 - 13:47  (00:00)

If we wanted to count up the total login time for each user (given in parentheses in the last field), we could write a sed script to splice the time values from the input, an awk script to sort the data for each user and add up the times, and another awk script to produce a report based on the accumulated data. Or, we could write a somewhat complex C program to do the entire task--complex because, as any C programmer knows, text-processing functions within C are somewhat limited.

However, this task can be easily accomplished by a simple Perl script. The facilities of I/O, regular-expression pattern matching, sorting by associative arrays, and number crunching are all easily accessed from a Perl program with little overhead. Perl programs are generally short and to the point, without a lot of technical mumbo-jumbo getting in the way of what you want your program to actually do.

1 #!/usr/bin/perl 2 3 while (<STDIN>) { # While we have input... 4 # Find lines and save username, login time 5 if (/^(\S*)\s*.*$(.*):(.*)$$/) { 6 # Increment total hours, minutes, and logins 7 $hours{$1} += $2; 8 $minutes{$1} += $3; 9 $logins{$1}++; 10 } 11 } 12 13 # For each user in the array... 14 foreach $user (sort(keys %hours)) { 15 # Calculate hours from total minutes 16 $hours{$user} += int($minutes{$user} / 60); 17 $minutes{$user} %= 60; 18 # Print the information for this user 19 print "User $user, total login time "; 20 # Perl has printf, too 21 printf "%02d:%02d, ", $hours{$user}, $minutes{$user}; 22 print "total logins $logins{$user}.\n"; 23 }

papaya$ last | logintime User johnsonm, total login time 01:07, total logins 11. User kibo, total login time 00:42, total logins 3. User linus, total login time 98:50, total logins 208. User mdw, total login time 153:03, total logins 290. papaya$

13.4.2. More Features

The previous example introduced the most commonly used Perl features by demonstrating a living, breathing program. There is much more where that came from--in the way of both well-known and not-so-well-known features.

As we mentioned, Perl provides a report-generation mechanism beyond the standard print and printf functions. Using this feature, the programmer defines a report "format" that describes how each page of the report will look. For example, we could have included the following format definition in our example:

format STDOUT_TOP = 
User           Total login time     Total logins
-------------- -------------------- -------------------
.
format STDOUT =
@<<<<<<<<<<<<< @<<<<<<<<            @####
$user,         $thetime,            $logins{$user}
.

The STDOUT_TOP definition describes the header of the report, which will be printed at the top of each page of output. The STDOUT format describes the look of each line of output. Each field is described beginning with the @ character; @<<<< specifies a left-justified text field, and @#### specifies a numeric field. The line below the field definitions gives the names of the variables to use in printing the fields. Here, we have used the variable $thetime to store the formatted time string.

To use this report for the output, we replace lines 19-22 in the original script with the following:

$thetime = sprintf("%02d:%02d", $hours{$user}, $minutes{$user});
write;

The first line uses the sprintf function to format the time string and save it in the variable $thetime; the second line is a write command that tells Perl to go off and use the given report format to print a line of output.

Using this report format, we'll get something looking like:

User           Total login time     Total logins
-------------- -------------------- -------------------
johnsonm       01:07                   11
kibo           00:42                    3
linus          98:50                  208
mdw            153:03                 290

Using other report formats we can achieve different (and better-looking) results.

Perl comes with a huge number of modules that you can plug in to your programs for quick access to very powerful features. A popular online archive called CPAN (for Comprehensive Perl Archive Network) contains even more modules: net modules that let you send mail and carry on other networking tasks, modules for dumping data and debugging, modules for manipulating dates and times, modules for math functions--the list could go on for pages.

If you hear of an interesting module, check first to see whether it's already loaded on your system. You can look at the directories where modules are located (probably under /usr/lib/perl5 ) or just try loading in the module and see if it works. Thus, the command:

$ perl -MCGI -e 1
Can't locate CGI in @INC...

gives you the sad news that the CGI.pm module (which we'll use in "Section 16.1.5.2, "Writing the CGI script"" in Chapter 16, "The World Wide Web and Electronic Mail", to handle a web form) is not on your system. CGI.pm is popular enough to be included in the standard Perl distribution, and you can install it from there, but for many modules you will have to go to CPAN (and some don't make it into CPAN either). CPAN, which is maintained by Jarkko Hietaniemi and Andreas König, resides on dozens of mirror sites around the world because so many people want to download its modules. The easiest way to get onto CPAN is to visit http://www.perl.com/CPAN-local/.

The following program--which we wanted to keep short, and therefore we neglected to find a useful task to perform--shows two modules, one that manipulates dates and times in a sophisticated manner and another that sends mail. The disadvantage of using such powerful features is that a huge amount of code is loaded from them, making the runtime size of the program quite large:

#! /usr/local/bin/perl

# We will illustrate Date and Mail modules
use Date::Manip;
use Mail::Mailer;

# Illustration of Date::Manip module
if ( Date_IsWorkDay( "today", 1) )  {

    # Today is a work day
    $date = ParseDate( "today" );

}
else {

    # Today is not a work day, so choose next work day
    $date=DateCalc( "today" , "+ 1 business day" );

}

# Convert date from compact string to readable string like "April  8"
$printable_date = UnixDate( $date , "%B %e" );

# Illustration of Mail::Mailer module
my ($to) = "the_person\@you_want_to.mail_to";
my ($from) = "owner_of_script\@system.name";

$mail = Mail::Mailer->new;

$mail->open(
            {
                From => $from,
                To => $to,
                Subject => "Automated reminder",
            }
           );

print $mail <<"MAIL_BODY";
If you are at work on or after
$printable_date,
you will get this mail.
MAIL_BODY

$mail->close;

# The mail has been sent! (Assuming there were no errors.)

The reason packages are so easy to use is that Perl added object-oriented features in version 5. The Date module used in the previous example is not object oriented, but the Mail module is. The $mail variable in the example is a Mailer object, and it makes mailing a messages straightforward through methods like new, open, and close.

To do some major task like parsing HTML, just read in the proper CGI package and issue a new command to create the proper object--all the functions you need for parsing HTML will then be available.

If you want to give a graphical interface to your Perl script, you can use the Tk module, which originally was developed for use with the Tcl language, or the Gtk module, which uses the newer GIMP Toolkit (GTK). The book Learning Perl/Tk by Nancy Walsh shows you how to do graphics with that module. Both Tcl and Tk are discussed later in the chapter.

Another abstruse feature of Perl is its ability to (more or less) directly access several Unix system calls, including interprocess communications. For example, Perl provides the functions msgctl, msgget, msgsnd, and msgrcv from System V IPC. Perl also supports the BSD socket implementation, allowing communications via TCP/IP directly from a Perl program. No longer is C the exclusive language of networking daemons and clients. A Perl program loaded with IPC features can be very powerful indeed--especially considering that many client-server implementations call for advanced text-processing features such as those provided by Perl. It is generally easier to parse protocol commands transmitted between client and server from a Perl script, rather than write a complex C program to do the work.

As an example, take the well-known SMTP daemon, which handles the sending and receiving of electronic mail. The SMTP protocol uses internal commands such as recv from and mail to to enable the client to communicate with the server. Either the client or the server, or both, can be written in Perl, and can have full access to Perl's text- and file-manipulation features as well as the vital socket communication functions.

Perl is a fixture of CGI programming, that is, writing small programs that run on a web server and help web pages becoming more interactive.

As a far-out example of the kinds of things Perl and IPC can do, Larry Wall was reportedly considering rewriting the rn newsreader entirely in Perl.

13.4. Using Perl

13.4.1. A Sample Program

13.4.2. More Features

13.4.3. Pros and Cons