12. Packages, Libraries, and Modules

Like all those possessing a library, Aurelian was aware that he was guilty of not knowing his in its entirety.

- Jorge Luis Borges The Theologians

12.0. Introduction

Imagine that you have two separate programs, both of which work fine by themselves, and you decide to make a third program that combines the best features from the first two. You copy both programs into a new file or cut and paste selected pieces. You find that the two programs had variables and functions with the same names that should remain separate. For example, both might have an init function or a global $count variable. When merged into one program, these separate parts would interfere with each other.

The solution to this problem is packages . Perl uses packages to partition the global namespace. The package is the basis for both traditional modules and object-oriented classes. Just as directories contain files, packages contain identifiers. Every global identifier (variables, functions, file and directory handles, and formats) has two parts: its package name and the identifier proper. These two pieces are separated from one another with a double colon. For example, the variable $CGI::needs_binmode is a global variable named $needs_binmode , which resides in package CGI .

Where the filesystem uses slashes to separate the directory from the filename, Perl uses a double colon (prior to release 5.000, you could only use a single quote mark, as in $CGI'needs_bin_mode ). $Names::startup is the variable named $startup in the package Names , whereas $Dates::startup is the $startup variable in package Dates . Saying $startup by itself without a package name means the global variable $startup in the current package. (This assumes that no lexical $startup variable is currently visible. Lexical variables are explained in Chapter 10, Subroutines .) When looking at an unqualified variable name, a lexical takes precedence over a global. Lexicals live in scopes; globals live in packages. If you really want the global instead, you need to fully qualify it.

package is a compile-time declaration that sets the default package prefix for unqualified global identifiers, just as chdir sets the default directory prefix for relative pathnames. This effect lasts until the end of the current scope (a brace-enclosed block, file, or eval ). The effect is also terminated by any subsequent package statement in the same scope. (See the following code.) All programs are in package main until they use a package statement to change this.

package Alpha;
$name = "first";

package Omega;
$name = "last";

package main;
print "Alpha is $Alpha::name, Omega is $Omega::name.\n";

Alpha is first, Omega is last.

Unlike user-defined identifiers, built-in variables with punctuation names (like $_ and $. ) and the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are all forced to be in package main when unqualified. That way things like STDIN, @ARGV , %ENV , and $_ are always the same no matter what package you're in; for example, @ARGV always means @main::ARGV , even if you've used package to change the default package. A fully qualified @ElseWhere::ARGV would not (and carries no special built-in meaning). Make sure to localize $_ if you use it in your module.

Modules

The unit of software reuse in Perl is the module , a file that has a collection of related functions designed to be used by other programs and library modules. Every module has a public interface, a set of variables and functions that outsiders are encouraged to use. From inside the module, the interface is defined by initializing certain package variables that the standard Exporter module looks at. From outside the module, the interface is accessed by importing symbols as a side effect of the use statement. The public interface of a Perl module is whatever is documented to be public. In the case of undocumented interfaces, it's whatever is vaguely intended to be public. When we talk about modules in this chapter, and traditional modules in general, we mean those that use the Exporter.

The require or use statements both pull a module into your program, although their semantics are slightly different. require loads modules at runtime, with a check to avoid the redundant loading of a given module. use is like require , with two added properties: compile-time loading and automatic importing.

Modules included with use are processed at compile time, but require processing happens at run time. This is important because if a module that a program needs is missing, the program won't even start because the use fails during compilation of your script. Another advantage of compile-time use over run-time require is that function prototypes in the module's subroutines become visible to the compiler. This matters because only the compiler cares about prototypes, not the interpreter. (Then again, we don't usually recommend prototypes except for replacing built-in commands, which do have them.)

use is suitable for giving hints to the compiler because of its compile-time behavior. A pragma is a special module that acts as directive to the compiler to alter how Perl compiles your code. A pragma's name is always all lowercase, so when writing a regular module instead of a pragma, choose a name that starts with a capital letter. Pragmas supported by Perl 5.004 include autouse, constant, diagnostics, integer, lib, locale, overload, sigtrap, strict, subs, and vars. Each has its own manpage.

The other difference between require and use is that use performs an implicit import on the included module's package. Importing a function or variable from one package to another is a form of aliasing; that is, it makes two different names for the same underlying thing. It's like linking in files from another directory to your current one by the command ln /somedir/somefile. Once it's linked in, you no longer have to use the full pathname to access the file. Likewise, an imported symbol no longer needs to be fully qualified by package name (or predeclared with use vars or use subs ). You can use imported variables as though they were part of your package. If you imported $English::OUTPUT_AUTOFLUSH in the current package, you could refer to it as $OUTPUT_AUTOFLUSH .

The required file extension for a Perl module is ".pm" . The module named FileHandle would be stored in the file FileHandle.pm . The full path to the file depends on your include path, which is stored in the global @INC variable. Recipe 12.7 shows how to manipulate this array to your own purposes.

If the module name itself contains one or more double colons, these are translated into your system's directory separator. That means that the File::Find module resides in the file File/Find.pm under most filesystems. For example:

require "FileHandle.pm";            # run-time load
require FileHandle;                 # ".pm" assumed; same as previous
use FileHandle;                     # compile-time load

require "Cards/Poker.pm";           # run-time load
require Cards::Poker;               # ".pm" assumed; same as previous
use Cards::Poker;                   # compile-time load

Import/Export Regulations

The following is a typical setup for a hypothetical module named Cards::Poker that demonstrates how to manage its exports. The code goes in the file named Poker.pm within the directory Cards : that is, Cards/Poker.pm . (See Recipe 12.7 for where the Cards directory should reside.) Here's that file, with line numbers included for reference:

1    package Cards::Poker;
2    use Exporter;
3    @ISA = ('Exporter');
4    @EXPORT = qw(&shuffle @card_deck);
5    @card_deck = ();                       # initialize package global
6    sub shuffle { }                        # fill-in definition later
7    1;                                     # don't forget this

Line 1 declares the package that the module will put its global variables and functions in. Typically, a module first switches to a particular package so that it has its own place for global variables and functions, one that won't conflict with that of another program. This must be written exactly as the corresponding use statement will be written when the module is loaded.

Don't say package Poker just because the basename of your file is Poker.pm . Rather, say package Cards::Poker because your users will say use Cards::Poker . This common problem is hard to debug. If you don't make the package and use statements exactly the same, you won't see a problem until you try to call imported functions or access imported variables, which will be mysteriously missing.

Line 2 loads in the Exporter module, which manages your module's public interface as described below. Line 3 initializes the special, per-package array @ISA to contain the word "Exporter" . When a user says use Cards::Poker , Perl implicitly calls a special method, Cards::Poker->import() . You don't have an import method in your package, but that's OK, because the Exporter package does, and you're inheriting from it because of the assignment to @ISA ( is a ). Perl looks at the package's @ISA for resolution of undefined methods. Inheritance is a topic of Chapter 13, Classes, Objects, and Ties . You may ignore it for now - so long as you put code as shown in lines 2 and 3 into each module you write.

Line 4 assigns the list ('&shuffle', '@card_deck') to the special, per-package array @EXPORT . When someone imports this module, variables and functions listed in that array are aliased into the caller's own package. That way they don't have to call the function Poker::Deck::shuffle(23) after the import. They can just write shuffle(23) instead. This won't happen if they load Cards::Poker with require Cards::Poker ; only a use imports.

Lines 5 and 6 set up the package global variables and functions to be exported. (We presume you'll actually flesh out their initializations and definitions more than in these examples.) You're free to add other variables and functions to your module as well, including ones you don't put in the public interface via @EXPORT . See Recipe 12.1 for more about using the Exporter.

Finally, line 7 is a simple 1 , indicating the overall return value of the module. If the last evaluated expression in the module doesn't produce a true value, an exception will be raised. Trapping this is the topic of Recipe 12.2 . Any old true value will do, like 6.02e23 or "Because tchrist and gnat told us to put this here" ; however, 1 is the canonical true value used by almost every module.

Packages group and organize global identifiers. They have nothing to do with privacy. Code compiled in package Church can freely examine and alter variables in package State . Package variables are always global and are used for sharing. But that's okay, because a module is more than just a package; it's also a file, and files count as their own scope. So if you want privacy, use lexical variables instead of globals. This is the topic of Recipe 12.4 .

Other Kinds of Library Files

A library is a collection of loosely related functions designed to be used by other programs. It lacks the rigorous semantics of a Perl module. The file extension .pl indicates that it's a Perl library file. Examples include syslog.pl and chat2.pl .

Perl libraries - or in fact, any arbitrary file with Perl code in it - can be loaded in using do 'file.pl' or with require 'file.pl' . The latter is preferred in most situations, because unlike do , require does implicit error checking. It raises an exception if the file can't be found in your @INC path, doesn't compile, or if it doesn't return a true value when any initialization code is run. (The last part is what the 1; was for earlier.) Another advantage of require is that it keeps track of which files have already been loaded in the global hash %INC . It doesn't reload the file if %INC indicates that the file has already been read in.

Libraries work well when used by a program, but problems can arise when libraries use one another. Consequently, simple Perl libraries have been rendered mostly obsolete, replaced by the more modern modules. But some programs still use libraries, usually loading them in with require instead of do .

Other file extensions are occasionally seen in Perl. A ".ph" is used for C header files that have been translated into Perl libraries using the h2ph tool, as discussed in Recipe 12.14 . A ".xs" indicates an augmented C source file, possibly created by the h2xs tool, which will be compiled by the xsubpp tool and your C compiler into native machine code. This process of creating mixed-language modules is discussed in Recipe 12.15 .

So far we've only talked about traditional modules, which export their interface by allowing the caller direct access to particular subroutines and variables. Most modules fall into this category. But some problems - and some programmers - lend themselves to more intricately designed modules, those involving objects. An object-oriented module seldom uses the import-export mechanism at all. Instead, it provides an object-oriented interface full of constructors, destructors, methods, inheritance, and operator overloading. This is the subject of Chapter 13 .

Not Reinventing the Wheel

CPAN, the Comprehensive Perl Archive Network, is a gigantic repository of nearly everything about Perl you could imagine, including source, documentation, alternate ports, and above all, modules. Before you write a new module, check with CPAN to see whether one already exists that does what you need. Even if one doesn't, something close enough might give you ideas.

You can access CPAN at http://www.perl.com/CPAN/CPAN.html (or ftp://www.perl.com/pub/perl/CPAN/CPAN.html ). This file briefly describes each of CPAN's modules, but because it's manually edited, it may not always have the very latest modules' descriptions. You can find out about those in the CPAN/RECENT or CPAN/RECENT.html file.

The module directory itself is in CPAN/modules . It contains indices of all registered modules plus three convenient subdirectories: by-module , by-author , and by-category . All modules are available through each of these, but the by-category directory is probably the most useful. There you will find directories covering specific applications areas including operating system interfaces; networking, modems, and interprocess communication; database interfaces; user interfaces; interfaces to other programming languages; authentication, security, and encryption; World Wide Web, HTML, HTTP, CGI, and MIME; images, pixmap and bitmap manipulation, drawing, and graphing - just to name a few.