Using Hard References (Programming Perl)

8.3. Using Hard References

Just as there are numerous ways to create references, there are also several ways to use, or dereference, a reference. There is just one overriding principle: Perl does no implicit referencing or dereferencing.[4] When a scalar is holding a reference, it always behaves like a simple scalar. It doesn't magically start being an array or hash or subroutine; you have to tell it explicitly to do so, by dereferencing it.

[4] We already confessed that this was a small fib. We're not about to do so again.

8.3.1. Using a Variable as a Variable Name

When you encounter a scalar like $foo, you should be thinking "the scalar value of foo." That is, there's a foo entry in the symbol table, and the $ funny character is a way of looking at whatever scalar value might be inside. If what's inside is a reference, you can look inside that (dereferencing $foo) by prepending another funny character. Or looking at it the other way around, you can replace the literal foo in $foo with a scalar variable that points to the actual referent. This is true of any variable type, so not only is $$foo the scalar value of whatever $foo refers to, but @$bar is the array value of whatever $bar refers to, %$glarch is the hash value of whatever $glarch refers to, and so on. The upshot is that you can put an extra funny character on the front of any simple scalar variable to dereference it:

$foo         = "three humps";
$scalarref   = \$foo;         # $scalarref is now a reference to $foo
$camel_model = $$scalarref;   # $camel_model is now "three humps"

Here are some other dereferences:

$bar = $$scalarref;

push(@$arrayref, $filename);
$$arrayref[0] = "January";            # Set the first element of @$arrayref
@$arrayref[4..6] = qw/May June July/; # Set several elements of @$arrayref

%$hashref = (KEY => "RING", BIRD => "SING");  # Initialize whole hash
$$hashref{KEY} = "VALUE";                     # Set one key/value pair
@$hashref{"KEY1","KEY2"} = ("VAL1","VAL2");   # Set two more pairs

&$coderef(1,2,3);

print $handleref "output\n";

This form of dereferencing can only make use of a simple scalar variable (one without a subscript). That is, dereferencing happens before (or binds tighter than) any array or hash lookups. Let's use some braces to clarify what we mean: an expression like $$arrayref[0] is equivalent to ${$arrayref}[0] and means the first element of the array referred to by $arrayref. That is not at all the same as ${$arrayref[0]}, which is dereferencing the first element of the (probably nonexistent) array named @arrayref. Likewise, $$hashref{KEY} is the same as ${$hashref}{KEY}, and has nothing to do with ${$hashref{KEY}}, which would be dereferencing an entry in the (probably nonexistent) hash named %hashref. You will be miserable until you understand this.

You can achieve multiple levels of referencing and dereferencing by concatenating the appropriate funny characters. The following prints "howdy":

$refrefref = \\\"howdy";
print $$$$refrefref;

You can think of the dollar signs as operating right to left. But the beginning of the chain must still be a simple, unsubscripted scalar variable. There is, however, a way to get fancier, which we already sneakily used earlier, and which we'll explain in the next section.

8.3.2. Using a BLOCK as a Variable Name

Not only can you dereference a simple variable name, you can also dereference the contents of a BLOCK. Anywhere you'd put an alphanumeric identifier as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type. In other words, the earlier examples could all be disambiguated like this:

$bar = ${$scalarref};
push(@{$arrayref}, $filename);
${$arrayref}[0] = "January";
@{$arrayref}[4..6] = qw/May June July/;
${$hashref}{"KEY"} = "VALUE";
@{$hashref}{"KEY1","KEY2"} = ("VAL1","VAL2");
&{$coderef}(1,2,3);

not to mention:

$refrefref = \\\"howdy";
print ${${${$refrefref}}};

Admittedly, it's silly to use the braces in these simple cases, but the BLOCK can contain any arbitrary expression. In particular, it can contain subscripted expressions. In the following example, $dispatch{$index} is assumed to contain a reference to a subroutine (sometimes called a "coderef"). The example invokes the subroutine with three arguments.

&{ $dispatch{$index} }(1, 2, 3);

Here, the BLOCK is necessary. Without that outer pair of braces, Perl would have treated $dispatch as the coderef instead of $dispatch{$index}.

8.3.3. Using the Arrow Operator

For references to arrays, hashes, or subroutines, a third method of dereferencing involves the use of the -> infix operator. This form of syntactic sugar that makes it easier to get at individual array or hash elements, or to call a subroutine indirectly.

The type of the dereference is determined by the right operand, that is, by what follows directly after the arrow. If the next thing after the arrow is a bracket or brace, the left operand is treated as a reference to an array or a hash, respectively, to be subscripted by the expression on the right. If the next thing is a left parenthesis, the left operand is treated as a reference to a subroutine, to be called with whatever parameters you supply in the parentheses on the right.

Each of these next trios is equivalent, corresponding to the three notations we've introduced. (We've inserted some spaces to line up equivalent elements.)

$  $arrayref  [2] = "Dorian";         #1
${ $arrayref }[2] = "Dorian";         #2
   $arrayref->[2] = "Dorian";         #3

$  $hashref  {KEY} = "F#major";       #1
${ $hashref }{KEY} = "F#major";       #2
   $hashref->{KEY} = "F#major";       #3

&  $coderef  (Presto => 192);         #1
&{ $coderef }(Presto => 192);         #2
   $coderef->(Presto => 192);         #3

You can see that the initial funny character is missing from the third notation in each trio. The funny character is guessed at by Perl, which is why it can't be used to dereference complete arrays, complete hashes, or slices of either. As long as you stick with scalar values, though, you can use any expression to the left of the ->, including another dereference, because multiple arrow operators associate left to right:

print $array[3]->{"English"}->[0];

You can deduce from this expression that the fourth element of @array is intended to be a hash reference, and the value of the "English" entry in that hash is intended to be an array reference.

Note that $array[3] and $array->[3] are not the same. The first is talking about the fourth element of @array, while the second one is talking about the fourth element of the (possibly anonymous) array whose reference is contained in $array.

Suppose now that $array[3] is undefined. The following statement is still legal:

$array[3]->{"English"}->[0] = "January";

This is one of those cases mentioned earlier in which references spring into existence (or "autovivify") when used as an lvalue (that is, when a value is being assigned to it). If $array[3] was undefined, it's automatically defined as a hash reference so that we can set a value for $array[3]->{"English"} in it. Once that's done, $array[3]->{"English"} is automatically defined as an array reference so that we can assign something to the first element in that array. Note that rvalues are a little different: print $array[3]->{"English"}->[0] only defines $array[3] and $array[3]->{"English"}, not $array[3]->{"English"}->[0], since the final element is not an lvalue. (The fact that it defines the first two at all in an rvalue context could be considered a bug. We may fix that someday.)

The arrow is optional between brackets or braces, or between a closing bracket or brace and a parenthesis for an indirect function call. So you can shrink the previous code down to:

$dispatch{$index}(1, 2, 3);
$array[3]{"English"}[0] = "January";

In the case of ordinary arrays, this gives you multidimensional arrays that are just like C's array:

$answer[$x][$y][$z] += 42;

Well, okay, not entirely like C's arrays. For one thing, C doesn't know how to grow its arrays on demand, while Perl does. Also, some constructs that are similar in the two languages parse differently. In Perl, the following two statements do the same thing:

$listref->[2][2] = "hello";    # Pretty clear
$$listref[2][2]  = "hello";    # A bit confusing

This second of these statements may disconcert the C programmer, who is accustomed to using *a[i] to mean "what's pointed to by the ith element of a". But in Perl, the five characters ($ @ * % &) effectively bind more tightly than braces or brackets.[5] Therefore, it is $$listref and not $listref[2] that is taken to be a reference to an array. If you want the C behavior, either you have to write ${$listref[2]} to force the $listref[2] to get evaluated before the leading $ dereferencer, or you have to use the -> notation:

$listref[2]->[$greeting] = "hello";

[5] But not because of operator precedence. The funny characters in Perl are not operators in that sense. Perl's grammar simply prohibits anything more complicated than a simple variable or block from following the initial funny character, for various funny reasons.

8.3.4. Using Object Methods

If a reference happens to be a reference to an object, then the class that defines that object probably provides methods to access the innards of the object, and you should generally stick to those methods if you're merely using the class (as opposed to implementing it). In other words, be nice, and don't treat an object like a regular reference, even though Perl lets you when you really need to. Perl does not enforce encapsulation. We are not totalitarians here. We do expect some basic civility, however.

In return for this civility, you get complete orthogonality between objects and data structures. Any data structure can behave as an object when you want it to. Or not, when you don't.

8.3.5. Pseudohashes

A pseudohash is any reference to an array whose first element is a reference to a hash. You can treat the pseudohash reference as either an array reference (as you would expect) or a hash reference (as you might not expect). Here's an example of a pseudohash:

$john = [ {age => 1, eyes => 2, weight => 3}, 47, "brown", 186 ];

The underlying hash in $john->[0] defines the names ("age", "eyes", "weight") of the array elements that follow (47, "brown", 186). Now you can access an element with both hash and array notations:

$john->{weight}             # Treats $john as a hashref
$john->[3]                  # Treats $john as an arrayref

Pseudohash magic is not deep; it only knows one "trick": how to turn a hash dereference into an array dereference. When adding another element to a pseudohash, you have to explicitly tell the underlying mapping hash where the element will reside before you can use the hash notation:

$john->[0]{height} = 4;     # height is to be element 4
$john->{height} = "tall";   # Or $john->[4] = "tall"

Perl raises an exception if you try to delete a key from a pseudohash, although you can always delete keys from the mapping hash. Perl also raises an exception if you try to access a nonexistent key, where "existence" means presence in the mapping hash:

delete $john->[0]{height};  # Deletes from the underlying hash only
$john->{height};            # This now raises an exception
$john->[4];                 # Still prints "tall"

Don't try to splice the array unless you know what you're doing. If the array elements move around, the mapping hash values will still refer to the old element positions, unless you change those explicitly, too. Pseudohash magic is not deep.

To avoid inconsistencies, you can use the fields::phash function provided by the use fields pragma to create a pseudohash:

use fields;
$ph = fields::phash(age => 47, eyes => "brown", weight => 186);
print $ph->{age};

There are two ways to check for the existence of a key in a pseudohash. The first is to use exists, which checks whether the given field has ever been set. It acts this way to match the behavior of a real hash. For instance:

use fields;
$ph= fields::phash([qw(age eyes brown)], [47]);
$ph->{eyes} = undef;

print exists $ph->{age};     # True, 'age' was set in declaration.
print exists $ph->{weight};  # False, 'weight' has not been used.
print exists $ph->{eyes};    # True, your 'eyes' have been touched.

The second way is to use exists on the mapping hash sitting in the first array element. This checks whether the given key is a valid field for that pseudohash:

print exists $ph->[0]{age};   # True, 'age' is a valid field
print exists $ph->[0]{name};  # False, 'name' can't be used

Unlike what happens in a real hash, calling delete on a pseudohash element deletes only the array value corresponding to the key, not the real key in the mapping hash. To delete the key, you have to explicitly delete it from the mapping hash. Once you do that, you may no longer use that key name as a pseudohash subscript:

print delete $ph->{age};     # Removes and returns $ph->[1], 47
print exists $ph->{age};     # Now false
print exists $ph->[0]{age};  # True, 'age' key still usable
print delete $ph->[0]{age};  # Now 'age' key is gone
print $ph->{age};            # Run-time exception

You've probably begun to wonder what could possibly have motivated this masquerade of arrays prancing about in hashes' clothing. Arrays provide faster lookups and more efficient storage, while hashes offer the convenience of naming (instead of numbering) your data; pseudohashes provide the best of both worlds. But it's not until you consider Perl's compilation phase that the greatest benefit becomes apparent. With the help of a pragma or two, the compiler can verify proper access to valid fields, so you can find out about nonexistent subscripts (or spelling errors) before your program starts to run.

Pseudohashes' properties of speed, efficiency, and compile-time access checking (you might even think of it as type safety) are especially handy for creating efficient and robust class modules. See the discussion of the use fields pragma in Chapter 12, "Objects" and Chapter 31, "Pragmatic Modules".

Pseudohashes are a new and relatively experimental feature; as such, the underlying implementation may well change in the future. To protect yourself from such changes, always go through the fields module's documented interface via its phash and new functions.

8.3.6. Other Tricks You Can Do with Hard References

As mentioned earlier, the backslash operator is usually used on a single referent to generate a single reference, but it doesn't have to be. When used on a list of referents, it produces a list of corresponding references. The second line of the following example does the same thing as the first line, since the backslash is automatically distributed throughout the whole list.

@reflist = (\$s, \@a, \%h, \&f);     # List of four references
@reflist = \($s,  @a   %h,  &f);     # Same thing

If a parenthesized list contains exactly one array or hash, then all of its values are interpolated and references to each returned:

@reflist = \(@x);                    # Interpolate array, then get refs
@reflist = map { \$_ } @x;           # Same thing

This also occurs when there are internal parentheses:

@reflist = \(@x, (@y));              # But only single aggregates expand
@reflist = (\@x, map { \$_ } @y);    # Same thing

If you try this with a hash, the result will contain references to the values (as you'd expect), but references to copies of the keys (as you might not expect).

Since array and hash slices are really just lists, you can backslash a slice of either of these to get a list of references. Each of the next three lines does exactly the same thing:

@envrefs = \@ENV{'HOME', 'TERM'};         # Backslashing a slice
@envrefs = \( $ENV{HOME},  $ENV{TERM} );  # Backslashing a list
@envrefs = ( \$ENV{HOME}, \$ENV{TERM} );  # A list of two references

Since functions can return lists, you can apply a backslash to them. If you have more than one function to call, first interpolate each function's return values into a larger list and then backslash the whole thing:

@reflist = \fx();
@reflist = map { \$_ } fx();                # Same thing

@reflist = \( fx(), fy(), fz() );
@reflist = ( \fx(), \fy(), \fz() );         # Same thing
@reflist = map { \$_ } fx(), fy(), fz();    # Same thing

The backslash operator always supplies a list context to its operand, so those functions are all called in list context. If the backslash is itself in scalar context, you'll end up with a reference to the last value of the list returned by the function:

@reflist = \localtime();      # Ref to each of nine time elements
$lastref = \localtime();      # Ref to whether it's daylight savings time

In this regard, the backslash behaves like the named Perl list operators, such as print, reverse, and sort, which always supply a list context on their right no matter what might be happening on their left. As with named list operators, use an explicit scalar to force what follows into scalar context:

$dateref = \scalar localtime();    # \"Sat Jul 16 11:42:18 2000"

You can use the ref operator to determine what a reference is pointing to. Think of ref as a "typeof" operator that returns true if its argument is a reference and false otherwise. The value returned depends on the type of thing referenced. Built-in types include SCALAR, ARRAY, HASH, CODE, GLOB, REF, LVALUE, IO, IO::Handle, and Regexp. Here, we use it to check subroutine arguments:

sub sum {
    my $arrayref = shift;
    warn "Not an array reference" if ref($arrayref) ne "ARRAY";
    return eval join("+", @$arrayref);
}

If you use a hard reference in a string context, it'll be converted to a string containing both the type and the address: SCALAR(0x1fc0e). (The reverse conversion cannot be done, since reference count information is lost during stringification--and also because it would be dangerous to let programs access a memory address named by an arbitrary string.)

You can use the bless operator to associate a referent with a package functioning as an object class. When you do this, ref returns the class name instead of the internal type. An object reference used in a string context returns a string with the external and internal types, and the address in memory: MyType=HASH(0x20d10) or IO::Handle=IO(0x186904). See Chapter 12, "Objects" for more details about objects.

Since the way in which you dereference something always indicates what sort of referent you're looking for, a typeglob can be used the same way a reference can, despite the fact that a typeglob contains multiple referents of various types. So ${*main::foo} and ${\$main::foo} both access the same scalar variable, although the latter is more efficient.

Here's a trick for interpolating the return value of a subroutine call into a string:

print "My sub returned @{[ mysub(1,2,3) ]} that time.\n";

It works like this. At compile time, when the @{...} is seen within the double-quoted string, it's parsed as a block that returns a reference. Within the block, there are square brackets that create a reference to an anonymous array from whatever is in the brackets. So at run time, mysub(1,2,3) is called in list context, and the results are loaded into an anonymous array, a reference to which is then returned within the block. That array reference is then immediately dereferenced by the surrounding @{...}, and the array value is interpolated into the double-quoted string just as an ordinary array would be. This chicanery is also useful for arbitrary expressions, such as:

print "We need @{ [$n + 5] } widgets!\n";

Be careful though: square brackets supply a list context to their expression. In this case it doesn't matter, although the earlier call to mysub might care. When it does matter, use an explicit scalar to force the context:

print "mysub returns @{ [scalar mysub(1,2,3)] } now.\n";

8.3.7. Closures

Earlier we talked about creating anonymous subroutines with a nameless sub {}. You can think of those subroutines as defined at run time, which means that they have a time of generation as well as a location of definition. Some variables might be in scope when the subroutine is created, and different variables might be in scope when the subroutine is called.

Forgetting about subroutines for a moment, consider a reference that refers to a lexical variable:

{
    my $critter = "camel";
    $critterref = \$critter;
}

The value of $$critterref will remain "camel" even though $critter disappears after the closing curly brace. But $critterref could just as well have referred to a subroutine that refers to $critter:

{
    my $critter = "camel";
    $critterref = sub { return $critter };
}

This is a closure, which is a notion out of the functional programming world of LISP and Scheme.[6] It means that when you define an anonymous function in a particular lexical scope at a particular moment, it pretends to run in that scope even when later called from outside that scope. (A purist would say it doesn't have to pretend--it actually does run in that scope.)

[6] In this context, the word "functional" should not be construed as an antonym of "dysfunctional".

In other words, you are guaranteed to get the same copy of a lexical variable each time, even if other instances of that lexical variable have been created before or since for other instances of that closure. This gives you a way to set values used in a subroutine when you define it, not just when you call it.

You can also think of closures as a way to write a subroutine template without using eval. The lexical variables act as parameters for filling in the template, which is useful for setting up little bits of code to run later. These are commonly called callbacks in event-based programming, where you associate a bit of code with a keypress, mouse click, window exposure, and so on. When used as callbacks, closures do exactly what you expect, even if you don't know the first thing about functional programming. (Note that this closure business only applies to my variables. Global variables work as they've always worked, since they're neither created nor destroyed the way lexical variables are.)

Another use for closures is within function generators; that is, functions that create and return brand new functions. Here's an example of a function generator implemented with closures:

sub make_saying  {
    my $salute = shift;
    my $newfunc = sub {
        my $target = shift;
        print "$salute, $target!\n";
    };
    return $newfunc;            # Return a closure
}

$f = make_saying("Howdy");      # Create a closure
$g = make_saying("Greetings");  # Create another closure

# Time passes...

$f->("world");
$g->("earthlings");

This prints:

Howdy, world!
Greetings, earthlings!

Note in particular how $salute continues to refer to the actual value passed into make_saying, despite the fact that the my $salute has gone out of scope by the time the anonymous subroutine runs. That's what closures are all about. Since $f and $g hold references to functions that, when called, still need access to the distinct versions of $salute, those versions automatically stick around. If you now overwrite $f, its version of $salute would automatically disappear. (Perl only cleans up when you're not looking.)

Perl doesn't provide references to object methods (described in Chapter 12, "Objects") but you can get a similar effect using a closure. Suppose you want a reference not just to the subroutine the method represents, but one which, when invoked, would call that method on a particular object. You can conveniently remember both the object and the method as lexical variables bound up inside a closure:

sub get_method_ref {
    my ($self, $methodname) = @_;
    my $methref = sub {
        # the @_ below is not the same as the one above!
        return $self->$methodname(@_);
    };
    return $methref;
}

my $dog = new Doggie::
            Name => "Lucky",
            Legs => 3,
            Tail => "clipped";

our $wagger = get_method_ref($dog, 'wag');
$wagger->("tail");        # Calls $dog->wag('tail').

Not only can you get Lucky to wag what's left of his tail now, even once the lexical $dog variable has gone out of scope and Lucky is nowhere to be seen, the global $wagger variable can still get him to wag his tail, wherever he is.

8.3.7.1. Closures as function templates

Using a closure as a function template allows you to generate many functions that act similarly. Suppose you want a suite of functions that generate HTML font changes for various colors:

print "Be ", red("careful"), "with that ", green("light"), "!!!";

The red and green functions would be very similar. We'd like to name our functions, but closures don't have names since they're just anonymous subroutines with an attitude. To get around that, we'll perform the cute trick of naming our anonymous subroutines. You can bind a coderef to an existing name by assigning it to a typeglob of the name of the function you want. (See the section Section 8.1, "Symbol Tables" in Chapter 10, "Packages". In this case, we'll bind it to two different names, one uppercase and one lowercase:

@colors = qw(red blue green yellow orange purple violet);
for my $name (@colors) {
    no strict 'refs';       # Allow symbolic references
    *$name = *{uc $name} = sub { "<FONT COLOR='$name'7gt;@_</FONT>" };
}

Now you can call functions named red, RED, blue, BLUE, and so on, and the appropriate closure will be invoked. This technique reduces compile time and conserves memory, and is less error-prone as well, since syntax checks happen during compilation. It's critical that any variables in the anonymous subroutine be lexicals in order to create a closure. That's the reason for the my above.

This is one of the few places where giving a prototype to a closure makes sense. If you wanted to impose scalar context on the arguments of these functions (probably not a wise idea for this example), you could have written it this way instead:

*$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" };

That's almost good enough. However, since prototype checking happens during compile time, the run-time assignment above happens too late to be of much use. You could fix this by putting the whole loop of assignments within a BEGIN block, forcing it to occur during compilation. (More likely, you'd put it out in a module that you use at compile time.) Then the prototypes will be visible during the rest of the compilation.

8.3.7.2. Nested subroutines

If you are accustomed (from other programming languages) to using subroutines nested within other subroutines, each with their own private variables, you'll have to work at it a bit in Perl. Named subroutines do not nest properly, although anonymous ones do.[7] Anyway, we can emulate nested, lexically scoped subroutines using closures. Here's an example:

sub outer {
    my $x = $_[0] + 35;
    local *inner = sub { return $x * 19 };
    return $x + inner();
}

Now inner can only be called from within outer, because of the temporary assignments of the closure. But when it is, it has normal access to the lexical variable $x from the scope of outer.

[7]To be more precise, globally named subroutines don't nest. Unfortunately, that's the only kind of named subroutine declaration we have. We haven't yet implemented lexically scoped, named subroutines (known as my subs), but when we do, they should nest correctly.

This has the interesting effect of creating a function local to another function, something not normally supported in Perl. Because local is dynamically scoped, and because function names are global to their package, any other function that outer called could also call the temporary version of inner. To prevent that, you'd need an extra level of indirection:

sub outer {
    my $x = $_[0] + 35;
    my $inner = sub { return $x * 19 };
    return $x + $inner->();
}