Names (Programming Perl)

2.5. Names

We've talked about storing values in variables, but the variables themselves (their names and their associated definitions) also need to be stored somewhere. In the abstract, these places are known as namespaces. Perl provides two kinds of namespaces, which are often called symbol tables and lexical scopes.[6] You may have an arbitrary number of symbol tables or lexical scopes, but every name you define gets stored in one or the other. We'll explain both kinds of namespaces as we go along. For now we'll just say that symbol tables are global hashes that happen to contain symbol table entries for global variables (including the hashes for other symbol tables). In contrast, lexical scopes are unnamed scratchpads that don't live in any symbol table, but are attached to a block of code in your program. They contain variables that can only be seen by the block. (That's what we mean by a scope). The lexical part just means, "having to do with text", which is not at all what a lexicographer would mean by it. Don't blame us.)

[6] We also call them packages and pads when we're talking about Perl's specific implementations, but those longer monikers are the generic industry terms, so we're pretty much stuck with them. Sorry.

Within any given namespace (whether global or lexical), every variable type has its own subnamespace, determined by the funny character. You can, without fear of conflict, use the same name for a scalar variable, an array, or a hash (or, for that matter, a filehandle, a subroutine name, a label, or your pet llama). This means that $foo and @foo are two different variables. Together with the previous rules, it also means that $foo[1] is an element of @foo totally unrelated to the scalar variable $foo. This may seem a bit weird, but that's okay, because it is weird.

Subroutines may be named with an initial &, although the funny character is optional when calling the subroutine. Subroutines aren't generally considered lvalues, though recent versions of Perl allow you to return an lvalue from a subroutine and assign to that, so it can look as though you're assigning to the subroutine.

Sometimes you just want a name for "everything named foo" regardless of its funny character. So symbol table entries can be named with an initial *, where the asterisk stands for all the other funny characters. These are called typeglobs, and they have several uses. They can also function as lvalues. Assignment to typeglobs is how Perl implements importing of symbols from one symbol table to another. More about that later too.

Like most computer languages, Perl has a list of reserved words that it recognizes as special keywords. However, because variable names always start with a funny character, reserved words don't actually conflict with variable names. Certain other kinds of names don't have funny characters, though, such as labels and filehandles. With these, you do have to worry (a little) about conflicting with reserved words. Since most reserved words are entirely lowercase, we recommend that you pick label and filehandle names that contain uppercase letters. For example, if you say open(LOG, logfile) rather than the regrettable open(log, "logfile"), you won't confuse Perl into thinking you're talking about the built-in log operator (which does logarithms, not tree trunks). Using uppercase filehandles also improves readability[7] and protects you from conflict with reserved words we might add in the future. For similar reasons, user-defined modules are typically named with initial capitals so that they'll look different from the built-in modules known as pragmas, which are named in all lowercase. And when we get to object-oriented programming, you'll notice that class names are usually capitalized for the same reason.

[7] One of the design principles of Perl is that different things should look different. Contrast this with languages that try to force different things to look the same, to the detriment of readability.

As you might deduce from the preceding paragraph, case is significant in identifiers--FOO, Foo, and foo are all different names in Perl. Identifiers start with a letter or underscore and may be of any length (for values of "any" ranging between 1 and 251, inclusive) and may contain letters, digits, and underscores. This includes Unicode letters and digits. Unicode ideographs also count as letters, but we don't recommend you use them unless you can read them. See Chapter 15, "Unicode".

Names that follow funny characters don't have to be identifiers, strictly speaking. They can start with a digit, in which case they may only contain more digits, as in $123. Names that start with anything other than a letter, digit, or underscore are (usually) limited to that one character (like $? or $$), and generally have a predefined significance to Perl. For example, just as in the Bourne shell, $$ is the current process ID and $? the exit status of your last child process.

As of version 5.6, Perl also has an extensible syntax for internal variables names. Any variable of the form ${^NAME} is a special variable reserved for use by Perl. All these non-identifier names are forced to be in the main symbol table. See Chapter 28, "Special Names", for some examples.

It's tempting to think of identifiers and names as the same thing, but when we say name, we usually mean a fully qualified name, that is, a name that says which symbol table it lives in. Such names may be formed of a sequence of identifiers separated by the :: token:

$Santa::Helper::Reindeer::Rudolph::nose

That works just like the directories and filenames in a pathname:

/Santa/Helper/Reindeer/Rudolph/nose

In the Perl version of that notion, all the leading identifiers are the names of nested symbol tables, and the last identifier is the name of the variable within the most deeply nested symbol table. For instance, in the variable above, the symbol table is named Santa::Helper::Reindeer::Rudolph::, and the actual variable within that symbol table is $nose. (The value of that variable is, of course, "red".)

A symbol table in Perl is also known as a package, so these are often called package variables. Package variables are nominally private to the package in which they exist, but are global in the sense that the packages themselves are global. That is, anyone can name the package to get at the variable; it's just hard to do this by accident. For instance, any program that mentions $Dog::bert is asking for the $bert variable within the Dog:: package. That is an entirely separate variable from $Cat::bert. See Chapter 10, "Packages".

Variables attached to a lexical scope are not in any package, so lexically scoped variable names may not contain the :: sequence. (Lexically scoped variables are declared with a my declaration.)

2.5.1. Name Lookups

So the question is, what's in a name? How does Perl figure out what you mean if you just say $bert? Glad you asked. Here are the rules the Perl parser uses while trying to understand an unqualified name in context:

First, Perl looks earlier in the immediately enclosing block to see whether the variable is declared in that same block with a my (or our) declaration (see those entries in Chapter 29, "Functions", as well as the section Section 2.8, "Scoped Declarations" in Chapter 4, "Statements and Declarations"). If there is a my declaration, the variable is lexically scoped and doesn't exist in any package--it exists only in that lexical scope (that is, in the block's scratchpad). Because lexical scopes are unnamed, nobody outside that chunk of program can even name your variable.[8]

[8]If you use an our declaration instead of a my declaration, this only declares a lexically scoped alias (a nickname) for a package variable, rather than declaring a true lexically scoped variable the way my does. Outside code can still get at the real variable through its package, but in all other respects an our declaration behaves like a my declaration. This is handy when you're trying to limit your own use of globals with the use strict pragma (see the strict pragma in Chapter 31, "Pragmatic Modules"). But you should always prefer my if you don't need a global.
If that doesn't work, Perl looks for the block enclosing that block and tries again for a lexically scoped variable in the larger block. Again, if Perl finds one, the variable belongs only to the lexical scope from the point of declaration through the end of the block in which it is declared--including any nested blocks, like the one we just came from in step 1. If Perl doesn't find a declaration, it repeats step 2 until it runs out of enclosing blocks.
When Perl runs out of enclosing blocks, it examines the whole compilation unit for declarations as if it were a block. (A compilation unit is just the entire current file, or the string currently being compiled by an evalSTRING operator.) If the compilation unit is a file, that's the largest possible lexical scope, and Perl will look no further for lexically scoped variables, so we go to step 4. If the compilation unit is a string, however, things get fancier. A string compiled as Perl code at run time pretends that it's a block within the lexical scope from which the evalSTRING is running, even though the actual boundaries of the lexical scope are the limits of the string containing the code rather than any real braces. So if Perl doesn't find the variable in the lexical scope of the string, we pretend that the evalSTRING is a block and go back to step 2, only this time starting with the lexical scope of the evalSTRING operator instead of the lexical scope inside its string.
If we get here, it means Perl didn't find any declaration (either my or our) for your variable. Perl now gives up on lexically scoped variables and assumes that your variable is a package variable. If the strict pragma is in effect, you will now get an error, unless the variable is one of Perl's predefined variables or has been imported into the current package. This is because that pragma disallows the use of unqualified global names. However, we aren't done with lexical scopes just yet. Perl does the same search of lexical scopes as it did in steps 1 through 3, only this time it searches for package declarations instead of variable declarations. If it finds such a package declaration, it knows that the current code is being compiled for the package in question and prepends the declared package name to the front of the variable.
If there is no package declaration in any surrounding lexical scope, Perl looks for the variable name in the unnamed top-level package, which happens to have the name main when it isn't going around without a name tag. So in the absence of any declarations to the contrary, $bert means the same as $::bert, which means the same as $main::bert. (But because main is just another package in the top-level unnamed package, it's also $::main::bert, and $main::main::bert, $::main::main::bert and so on. This could be construed as a useless fact. But see "Symbol Tables" in Chapter 10, "Packages".)

There are several implications to these search rules that might not be obvious, so we'll make them explicit.

Because the file is the largest possible lexical scope, a lexically scoped variable can never be visible outside the file in which it's declared. File scopes do not nest.
Any particular bit of Perl is compiled in at least one lexical scope and exactly one package scope. The mandatory lexical scope is, of course, the file itself. Additional lexical scopes are provided by each enclosing block. All Perl code is also compiled in the scope of exactly one package, and although the declaration of which package you're in is lexically scoped, packages themselves are not lexically constrained. That is, they're global.
An unqualified variable name may therefore be searched for in many lexical scopes, but only one package scope, whichever one is currently in effect (which is lexically determined).
A variable name may only attach to one scope. Although at least two different scopes (lexical and package) are active everywhere in your program, a variable can only exist in one of those scopes.
An unqualified variable name can therefore resolve to only a single storage location, either in the first enclosing lexical scope in which it is declared, or else in the current package--but not both. The search stops as soon as that storage location is resolved, and any storage location that it would have found had the search continued is effectively hidden.
The location of the typical variable name can be completely determined at compile time.

Now that you know all about how the Perl compiler deals with names, you sometimes have the problem that you don't know the name of what you want at compile time. Sometimes you want to name something indirectly; we call this the problem of indirection. So Perl provides a mechanism: you can always replace an alphanumeric variable name with a block containing an expression that returns a reference to the real data. For instance, instead of saying:

$bert

you might say:

${ some_expression() }

and if the some_expression() function returns a reference to variable $bert (or even the string, "bert"), it will work just as if you'd said $bert in the first place. On the other hand, if the function returns a reference to $ernie, you'll get his variable instead. The syntax shown is the most general (and least legible) form of indirection, but we'll cover several convenient variations in Chapter 8, "References".