Chapter 10. Packages

We've all fallen into the trap of using cut-and-paste when we should have defined a higher-level abstraction, if only just a loop or subroutine.[1] To be sure, some folks have gone to the opposite extreme of defining ever-growing mounds of higher-level abstractions when they should have used cut-and-paste.[2] Generally, though, most of us need to think about using more abstraction rather than less.

[1] This is a form of False Laziness.

[2]This is a form of False Hubris.

Caught somewhere in the middle are the people who have a balanced view of how much abstraction is good, but who jump the gun on writing their own abstractions when they should be reusing existing code.[3]

[3] You guessed it--this is False Impatience. But if you're determined to reinvent the wheel, at least try to invent a better one.

Whenever you're tempted to do any of these things, you need to sit back and think about what will do the most good for you and your neighbor over the long haul. If you're going to pour your creative energies into a lump of code, why not make the world a better place while you're at it? (Even if you're only aiming for the program to succeed, you need to make sure it fits the right ecological niche.)

The first step toward ecologically sustainable programming is simply this: don't litter in the park. When you write a chunk of code, think about giving the code its own namespace, so that your variables and functions don't clobber anyone else's, or vice versa. A namespace is a bit like your home, where you're allowed to be as messy as you like, as long as you keep your external interface to other citizens moderately civil. In Perl, a namespace is called a package. Packages provide the fundamental building block upon which the higher-level concepts of modules and classes are constructed.

Like the notion of "home", the notion of "package" is a bit nebulous. Packages are independent of files. You can have many packages in a single file, or a single package that spans several files, just as your home could be one small garret in a larger building (if you're a starving artist), or it could comprise several buildings (if your name happens to be Queen Elizabeth). But the usual size of a home is one building, and the usual size of a package is one file. Perl provides some special help for people who want to put one package in one file, as long as you're willing to give the file the same name as the package and use an extension of .pm, which is short for "perl module". The module is the fundamental unit of reusability in Perl. Indeed, the way you use a module is with the use command, which is a compiler directive that controls the importation of subroutines and variables from a module. Every example of use you've seen until now has been an example of module reuse.

The Comprehensive Perl Archive Network, or CPAN, is where you should put your modules if other people might find them useful. Perl has thrived because of the willingness of programmers to share the fruits of their labor with the community. Naturally, CPAN is also where you can find modules that others have thoughtfully uploaded for everyone to use. See Chapter 22, "CPAN", and www.cpan.org for details.

The trend over the last 25 years or so has been to design computer languages that enforce a state of paranoia. You're expected to program every module as if it were in a state of siege. Certainly there are some feudal cultures where this is appropriate, but not all cultures are like this. In Perl culture, for instance, you're expected to stay out of someone's home because you weren't invited in, not because there are bars on the windows.[4]

[4] But Perl provides some bars if you want them, too. See "Handling Insecure Code" in Chapter 23, "Security".

This is not a book about object-oriented methodology, and we're not here to convert you into a raving object-oriented zealot, even if you want to be converted. There are already plenty of books out there for that. Perl's philosophy of object-oriented design fits right in with Perl's philosophy of everything else: use object-oriented design where it makes sense, and avoid it where it doesn't. Your call.

In OO-speak, every object belongs to a grouping called a class. In Perl, classes and packages and modules are all so closely related that novices can often think of them as being interchangeable. The typical class is implemented by a module that defines a package with the same name as the class. We'll explain all of this in the next few chapters.

When you use a module, you benefit from direct software reuse. With classes, you benefit from indirect software reuse when one class uses another through inheritance. And with classes, you get something more: a clean interface to another namespace. Everything in a class is accessed indirectly, insulating the class from the outside world.

As we mentioned in Chapter 8, "References", object-oriented programming in Perl is accomplished through references whose referents know which class they belong to. In fact, now that you know about references, you know almost everything difficult about objects. The rest of it just "lays under the fingers", as a pianist would say. You will need to practice a little, though.

One of your basic finger exercises consists of learning how to protect different chunks of code from inadvertently tampering with each other's variables. Every chunk of code belongs to a particular package, which determines what variables and subroutines are available to it. As Perl encounters a chunk of code, it is compiled into what we call the current package. The initial current package is called "main", but you can switch the current package to another one at any time with the package declaration. The current package determines which symbol table is used to find your variables, subroutines, I/O handles, and formats.

Any variable not declared with my is associated with a package--even seemingly omnipresent variables like $_ and %SIG. In fact, there's really no such thing as a global variable in Perl, just package variables. (Special identifiers like _ and SIG merely seem global because they default to the main package instead of the current one.)

The scope of a package declaration is from the declaration itself through the end of the enclosing scope (block, file, or eval--whichever comes first) or until another package declaration at the same level, which supersedes the earlier one. (This is a common practice).

All subsequent identifiers (including those declared with our, but not including those declared with my or those qualified with a different package name) will be placed in the symbol table belonging to the current package. (Variables declared with my are independent of packages; they are always visible within, and only within, their enclosing scope, regardless of any package declarations.)

Typically, a package declaration will be the first statement of a file meant to be included by require or use. But again, that's by convention. You can put a package declaration anywhere you can put a statement. You could even put it at the end of a block, in which case it would have no effect whatsoever. You can switch into a package in more than one place; a package declaration merely selects the symbol table to be used by the compiler for the rest of that block. (This is how a given package can span more than one file.)

You can refer to identifiers[5] in other packages by prefixing ("qualifying") the identifier with the package name and a double colon: $Package::Variable. If the package name is null, the main package is assumed. That is, $::sail is equivalent to $main::sail.[6]

[5] By identifiers, we mean the names used as symbol table keys for accessing scalar variables, array variables, hash variables, subroutines, file or directory handles, and formats. Syntactically speaking, labels are also identifiers, but they aren't put into a particular symbol table; rather, they are attached directly to the statements in your program. Labels cannot be package qualified.

[6] To clear up another bit of potential confusion, in a variable name like $main::sail, we use the term "identifier" to talk about main and sail, but not main::sail. We call that a variable name instead, because identifiers cannot contain colons.

The old package delimiter was a single quote, so in old Perl programs you'll see variables like $main'sail and $somepack'horse. But the double colon is now the preferred delimiter, in part because it's more readable to humans, and in part because it's more readable to emacs macros. It also makes C++ programmers feel like they know what's going on--as opposed to using the single quote as the separator, which was there to make Ada programmers feel like they knew what's going on. Because the old-fashioned syntax is still supported for backward compatibility, if you try to use a string like "This is $owner's house", you'll be accessing $owner::s; that is, the $s variable in package owner, which is probably not what you meant. Use braces to disambiguate, as in "This is ${owner}'s house".

The double colon can be used to chain together identifiers in a package name: $Red::Blue::var. This means the $var belonging to the Red::Blue package. The Red::Blue package has nothing to do with any Red or Blue packages that might happen to exist. That is, a relationship between Red::Blue and Red or Blue may have meaning to the person writing or using the program, but it means nothing to Perl. (Well, other than the fact that, in the current implementation, the symbol table Red::Blue happens to be stored in the symbol table Red. But the Perl language makes no use of that directly.)

For this reason, every package declaration must declare a complete package name. No package name ever assumes any kind of implied "prefix", even if (seemingly) declared within the scope of some other package declaration.

Only identifiers (names starting with letters or an underscore) are stored in a package's symbol table. All other symbols are kept in the main package, including all the nonalphabetic variables, like $!, $?, and $_. In addition, when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are forced to be in package main, even when used for other purposes than their built-in ones. Don't name your package m, s, y, tr, q, qq, qr, qw, or qx unless you're looking for a lot of trouble. For instance, you won't be able to use the qualified form of an identifier as a filehandle because it will be interpreted instead as a pattern match, a substitution, or a transliteration.

Long ago, variables beginning with an underscore were forced into the main package, but we decided it was more useful for package writers to be able to use a leading underscore to indicate semi-private identifiers meant for internal use by that package only. (Truly private variables can be declared as file-scoped lexicals, but that works best when the package and module have a one-to-one relationship, which is common but not required.)

The %SIG hash (which is for trapping signals; see Chapter 16, "Interprocess Communication") is also special. If you define a signal handler as a string, it's assumed to refer to a subroutine in the main package unless another package name is explicitly used. Use a fully qualified signal handler name if you want to specify a particular package, or avoid strings entirely by assigning a typeglob or a function reference instead:

$SIG{QUIT} = "Pkg::quit_catcher"; # fully qualified handler name
$SIG{QUIT} = "quit_catcher";      # implies "main::quit_catcher"
$SIG{QUIT} = *quit_catcher;       # forces current package's sub
$SIG{QUIT} = \&quit_catcher;      # forces current package's sub
$SIG{QUIT} = sub { print "Caught SIGQUIT\n" };   # anonymous sub

The notion of "current package" is both a compile-time and run-time concept. Most variable name lookups happen at compile time, but run-time lookups happen when symbolic references are dereferenced, and also when new bits of code are parsed under eval. In particular, when you eval a string, Perl knows which package the eval was invoked in and propagates that package inward when evaluating the string. (You can always switch to a different package inside the eval string, of course, since an eval string counts as a block, just like a file loaded in with do, require, or use.)

Alternatively, if an eval wants to find out what package it's in, the special symbol __PACKAGE__ contains the current package name. Since you can treat it as a string, you could use it in a symbolic reference to access a package variable. But if you were doing that, chances are you should have declared the variable with our instead so it could be accessed as if it were a lexical.

10.1. Symbol Tables

The contents of a package are collectively called a symbol table. Symbol tables are stored in a hash whose name is the same as the package, but with two colons appended. The main symbol table's name is thus %main::. Since main also happens to be the default package, Perl provides %:: as an abbreviation for %main::.

Likewise, the symbol table for the Red::Blue package is named %Red::Blue::. As it happens, the main symbol table contains all other top-level symbol tables, including itself, so %Red::Blue:: is also %main::Red::Blue::.

When we say that a symbol table "contains" another symbol table, we mean that it contains a reference to the other symbol table. Since main is the top-level package, it contains a reference to itself, with the result that %main:: is the same as %main::main::, and %main::main::main::, and so on, ad infinitum. It's important to check for this special case if you write code that traverses all symbol tables.

Inside a symbol table's hash, each key/value pair matches a variable name to its value. The keys are the symbol identifiers, and the values are the corresponding typeglobs. So when you use the *NAME typeglob notation, you're really just accessing a value in the hash that holds the current package's symbol table. In fact, the following have (nearly) the same effect:

*sym = *main::variable;
*sym = $main::{"variable"};

The first is more efficient because the main symbol table is accessed at compile time. It will also create a new typeglob by that name if none previously exists, whereas the second form will not.

Since a package is a hash, you can look up the keys of the package and get to all the variables of the package. Since the values of the hash are typeglobs, you can dereference them in several ways. Try this:

foreach $symname (sort keys %main::) {
    local *sym = $main::{$symname};
    print "\$$symname is defined\n" if defined $sym;
    print "\@$symname is nonnull\n" if         @sym;
    print "\%$symname is nonnull\n" if         %sym;
}

Since all packages are accessible (directly or indirectly) through the main package, you can write Perl code to visit every package variable in your program. The Perl debugger does precisely that when you ask it to dump all your variables with the V command. Note that if you do this, you won't see variables declared with my since those are independent of packages, although you will see variables declared with our. See Chapter 20, "The Perl Debugger".

Earlier we said that only identifiers are stored in packages other than main. That was a bit of a fib: you can use any string you want as the key in a symbol table hash--it's just that it wouldn't be valid Perl if you tried to use a non-identifier directly:

$!@#$%           = 0;         # WRONG, syntax error.
${'!@#$%'}       = 1;         # Ok, though unqualified.

${'main::!@#$%'} = 2;         # Can qualify within the string.
print ${ $main::{'!@#$%'} }   # Ok, prints 2!

Assignment to a typeglob performs an aliasing operation; that is,

*dick = *richard;

causes variables, subroutines, formats, and file and directory handles accessible via the identifier richard to also be accessible via the symbol dick. If you want to alias only a particular variable or subroutine, assign a reference instead:

*dick = \$richard;

That makes $richard and $dick the same variable, but leaves @richard and @dick as separate arrays. Tricky, eh?

This is how the Exporter works when importing symbols from one package to another. For example:

*SomePack::dick = \&OtherPack::richard;

imports the &richard function from package OtherPack into SomePack, making it available as the &dick function. (The Exporter module is described in the next chapter.) If you precede the assignment with a local, the aliasing will only last as long as the current dynamic scope.

This mechanism may be used to retrieve a reference from a subroutine, making the referent available as the appropriate data type:

*units = populate() ;         # Assign \%newhash to the typeglob
print $units{kg};             # Prints 70; no dereferencing needed!

sub populate {
    my %newhash = (km => 10, kg => 70);
    return \%newhash;
}

Likewise, you can pass a reference into a subroutine and use it without dereferencing:

%units = (miles => 6, stones => 11);  
fillerup( \%units );          # Pass in a reference
print $units{quarts};         # Prints 4

sub fillerup {
    local *hashsym = shift;   # Assign \%units to the typeglob
    $hashsym{quarts} = 4;     # Affects %units; no dereferencing needed!
}

These are tricky ways to pass around references cheaply when you don't want to have to explicitly dereference them. Note that both techniques only work with package variables; they would not have worked had we declared %units with my.

Another use of symbol tables is for making "constant" scalars:

*PI = \3.14159265358979;

Now you cannot alter $PI, which is probably a good thing, all in all. This isn't the same as a constant subroutine, which is optimized at compile time. A constant subroutine is one prototyped to take no arguments and to return a constant expression; see Section 10.4.1, "Inlining Constant Functions" in Chapter 6, "Subroutines", for details. The use constant pragma (see Chapter 31, "Pragmatic Modules") is a convenient shorthand:

use constant PI => 3.14159;

Under the hood, this uses the subroutine slot of *PI, instead of the scalar slot used earlier. It's equivalent to the more compact (but less readable):

*PI = sub () { 3.14159 };

That's a handy idiom to know anyway--assigning a sub {} to a typeglob is the way to give a name to an anonymous subroutine at run time.

Assigning a typeglob reference to another typeglob (*sym = \*oldvar) is the same as assigning the entire typeglob, because Perl automatically dereferences the typeglob reference for you. And when you set a typeglob to a simple string, you get the entire typeglob named by that string, because Perl looks up the string in the current symbol table. The following are all equivalent to one another, though the first two compute the symbol table entry at compile time, while the last two do so at run time:

*sym =   *oldvar;
*sym =  \*oldvar;       # autodereference
*sym = *{"oldvar"};     # explicit symbol table lookup
*sym =   "oldvar";      # implicit symbol table lookup

When you perform any of the following assignments, you're replacing just one of the references within the typeglob:

*sym = \$frodo;
*sym = \@sam;
*sym = \%merry;
*sym = \&pippin;

If you think about it sideways, the typeglob itself can be viewed as a kind of hash, with entries for the different variable types in it. In this case, the keys are fixed, since a typeglob can contain exactly one scalar, one array, one hash, and so on. But you can pull out the individual references, like this:

*pkg::sym{SCALAR}      # same as \$pkg::sym
*pkg::sym{ARRAY}       # same as \@pkg::sym
*pkg::sym{HASH}        # same as \%pkg::sym
*pkg::sym{CODE}        # same as \&pkg::sym
*pkg::sym{GLOB}        # same as \*pkg::sym
*pkg::sym{IO}          # internal file/dir handle, no direct equivalent
*pkg::sym{NAME}        # "sym" (not a reference)
*pkg::sym{PACKAGE}     # "pkg" (not a reference)

You can say *foo{PACKAGE} and *foo{NAME} to find out what name and package the *foo symbol table entry comes from. This may be useful in a subroutine that is passed typeglobs as arguments:

sub identify_typeglob {
    my $glob = shift;
    print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n";
}

identify_typeglob(*foo);
identify_typeglob(*bar::glarch);

This prints:

You gave me main::foo
You gave me bar::glarch

The *foo{THING} notation can be used to obtain references to individual elements of *foo. See the section Section 10.2.5, "Symbol Table References" in Chapter 8, "References" for details.

This syntax is primarily used to get at the internal filehandle or directory handle reference, because the other internal references are already accessible in other ways. (The old *foo{FILEHANDLE} is still supported to mean *foo{IO}, but don't let its name fool you into thinking it can distinguish filehandles from directory handles.) But we thought we'd generalize it because it looks kind of pretty. Sort of. You probably don't need to remember all this unless you're planning to write another Perl debugger.

Chapter 10. Packages

Contents:

10.1. Symbol Tables