4. References and Nested Data Structures

Contents:
What Is a Reference?
Creating Hard References
Using Hard References
Symbolic References
Braces, Brackets, and Quoting
A Brief Tutorial: Manipulating Lists of Lists
Data Structure Code Examples

For both practical and philosophical reasons, Perl has always been biased in favor of flat, linear data structures. And for many problems, this is exactly what you want. But occasionally you need to set up something just a little more complicated and hierarchical. Under older versions of Perl you could construct complex data structures indirectly by using eval or typeglobs.

Suppose you wanted to build a simple table (two-dimensional array) showing vital statistics - say, age, eye color, and weight - for a group of people. You could do this by first creating an array for each individual:

@john = (47, "brown", 186);
@mary = (23, "hazel", 128);
@bill = (35, "blue", 157);

and then constructing a single, additional array consisting of the names of the other arrays:

@vitals = ('john', 'mary', 'bill');

Unfortunately, actually using this table as a two-dimensional data structure is cumbersome. To change John's eyes to "red" after a night on the town, you'd have to say something like:

$vitals = $vitals[0];
eval "\$${vitals}[1] = 'red'";

A much more efficient (but not more readable) way to do the same thing is to use a typeglob assignment to temporarily alias one symbol table entry to another:

local(*array) = $vitals[0];  # Alias *array to *john.
$array[1] = 'red';           # Actually sets $john[1].

Alternatively, you could avoid the symbol table altogether by doing everything with a set of parallel hash arrays, emulating pointers symbolically by doing key lookups in the appropriate hash. Finally, you could define all your structures operationally, using pack and unpack , or join and split .

So even though you could use a variety of techniques to emulate pointers and data structures, all of them could get to be unwieldy. To be sure, Perl still supports these older mechanisms, since they remain quite useful for simple problems. But now Perl also supports references .

4.1 What Is a Reference?

In the preceding example using eval , $vitals[0] had the value 'john' . That is, it happened to contain a string that was also the name for another variable. You could say that the first variable referred to the second. We will speak of this sort of reference as a symbolic reference. You can think of it as analogous to symbolic links in UNIX filesystems. Perl now provides some simplified mechanisms for using symbolic references; in particular, the need for an eval or a typeglob assignment in our example disappears. See "Symbolic References" later in this chapter.

The other kind of reference is the hard reference.[ 1 ] A hard reference refers not to the name of another variable (which is just a container for a value) but rather to an actual value, some internal glob of data, which we will call a "thingy", in honor of that thingy that hangs down in the back of your throat. (You may also call it a "referent", if you prefer to live a joyless existence.) Suppose, for example, that you create a hard reference to the thingy contained in the variable @array . This hard reference and the thingy it refers to will continue to exist even after @array goes out of scope. Only when the reference count of the thingy itself goes to zero is the thingy actually destroyed.

[1] If you like, you can think of hard references as real references, and symbolic references as fake references. It's like the difference between real friendship and mere name-dropping.

To put it another way, a Perl variable lives in a symbol table and holds one hard reference to its underlying thingy (which may be a simple thingy like a number, or a complex thingy like an array or hash, but there's still only one reference from the variable to the value). There may be other hard references to the same thingy, but if so, the variable doesn't know (or care) about them. A symbolic reference names another variable, so there's always a named location involved, but a hard reference just points to a thingy. It doesn't know (or care) whether there are any other references to the thingy, or whether any of those references are through variables. Hence, a hard reference can refer to an anonymous thingy. All such anonymous thingies are accessed through hard references. But the converse is not necessarily true - just because something has a hard reference to it doesn't necessarily mean it's anonymous. It might have another reference through a named variable. (It can even have more than one name, if it is aliased with typeglobs.)

To reference a variable, in the terminology of this chapter, is to create a hard reference to the thingy underlying the variable. (There's a special operator to do this creative act.) The hard reference so created is simply a scalar value, which behaves in all familiar contexts just like any other scalar value should. To dereference this scalar value is to use it to refer back to the original thingy, as you must do when reading or writing to the thingy. Both referencing and dereferencing occur only when you invoke certain explicit mechanisms; no implicit referencing or dereferencing occurs in Perl.[ 2 ][ 3 ]

[2] Actually, a function with a prototype can use implicit pass-by-reference if explicitly declared that way. If so, then the caller of the function doesn't need to know he's passing a reference, but you still have to dereference it explicitly within the function. See Chapter 2, The Gory Details .

[3] Actually, to be perfectly honest, there's also some mystical automatic dereferencing when you use certain kinds of filehandles, but that's for backward compatibility, and is transparent to the casual user.

Any scalar may hold a hard reference, and such a reference may point to any data structure. Since arrays and hashes contain scalars, you can build arrays of arrays, arrays of hashes, hashes of arrays, arrays of hashes and functions, and so on.

Keep in mind, though, that Perl arrays and hashes are internally one-dimensional. They can only hold scalar values (strings, numbers, and references). When we use a phrase like "array of arrays", we really mean "array of references to arrays". But since that's the only way to implement an array of arrays in Perl, it follows that the shorter, less accurate phrase is not so inaccurate as to be false, and therefore should not be totally despised, unless you're into that sort of thing.