[Chapter 4] 4.3 Using Hard References

4.3 Using Hard References

Just as there are numerous ways to create references, there are also several ways to use, or dereference, a reference.

Using a Variable as a Variable Name

Anywhere you might ordinarily put an alphanumeric identifier as part of a variable or subroutine name, you can just replace the identifier with a simple scalar variable containing a reference of the correct type. For example:

$foo         = "two humps";
$scalarref   = \$foo;
$camel_model = $$scalarref;  # $camel_model is now "two humps"

Here are various dereferences:

$bar = $$scalarref;
push(@$arrayref, $filename);
$$arrayref[0] = "January";
$$hashref{"KEY"} = "VALUE";
&$coderef(1,2,3);
print $globref "output\n";

It's important to understand that we are specifically not dereferencing $arrayref[0] or $hashref{"KEY"} there. The dereferencing of the scalar variable happens before any array or hash lookups. To dereference anything more complicated than a simple scalar variable, you must use one of the next two methods described below. However, "simple scalars" can include an identifier that itself uses this first method recursively. Therefore, the following prints "howdy":

$refrefref = \\\"howdy";
print $$$$refrefref;

You can think of the dollar signs as executing right to left.

Using a BLOCK as a Variable Name

The second way is just like the first, except using a BLOCK instead of a variable. Anywhere you'd put an alphanumeric identifier as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type. In other words, the previous examples could also be handled like this:

$bar = ${$scalarref};
push(@{$arrayref}, $filename);
${$arrayref}[0] = "January";
${$hashref}{"KEY"} = "VALUE";
&{$coderef}(1,2,3);

Admittedly, it's silly to use the braces in these simple cases, but the BLOCK can contain any arbitrary expression. In particular, it can contain subscripted expressions. In the following example, $dispatch{$index} is assumed to contain a reference to a subroutine. The example invokes the subroutine with three arguments.

&{ $dispatch{$index} }(1, 2, 3);

Using the Arrow Operator

For references to arrays or hashes, a third method of dereferencing the reference involves the use of the -> infix operator. This is a form of syntactic sugar that makes it easier to get at individual array or hash elements, especially when the reference expression is complicated. Each of these trios is equivalent, corresponding to the three notations we've introduced. (We've inserted some spaces to line up equivalent elements.)

$  $arrayref  [0] = "January";        #1
${ $arrayref }[0] = "January";        #2
   $arrayref->[0] = "January";        #3
$  $hashref  {KEY} = "F#major";       #1
${ $hashref }{KEY} = "F#major";       #2
   $hashref->{KEY} = "F#major";       #3

You can see from this example that the first $ is missing from the third notation. It is, however, implied, and since it is implied, the notation can only be used to reference scalar values, not slices. But just as with the second notation, you can use any expression to the left of the ->, including another dereference, because arrow operators associate left to right:

print $array[3]->{"English"}->[0];

Note that $array[3] and $array->[3] are not the same. The first is talking about the fourth element of @array, while the second one is talking about the fourth element of the (possibly anonymous) array whose reference is contained in $array.

Suppose now that $array[3] is undefined. The following statement is still legal:

$array[3]->{"English"}->[0] = "January";

This is one of those cases mentioned earlier in which references spring into existence when used in an lvalue context. Supposing $array[3] to have been undefined, it's automatically defined with a hash reference so that we can look up {"English"} in it. Once that's done, $array[3]->{"English"} will automatically get defined with an array reference so that we can look up [0] in it. But this only happens when you're trying to create an element. Nothing would spring into existence if you were just trying to print out the value. You'd just get the undefined value out of it.

One more shortcut here. The arrow is optional between brace- or bracket-enclosed subscripts, so you can shrink the above code down to:

$array[3]{"English"}[0] = "January";

Which, in the case of ordinary arrays, gives you multi-dimensional arrays just like C's arrays:

$answer[$x][$y][$z] += 42;

Well, okay, not entirely like C's arrays. For one thing, C doesn't know how to grow its arrays on demand, while Perl does. Also, there are similar constructs in the two languages that parse differently. In Perl, the following two statements do the same thing:

$listref->[2][2] = "hello";    # pretty clear
$$listref[2][2] = "hello";     # a bit confusing

This second of these statements may disconcert the C programmer, who is accustomed to using *a[i] to mean "what's pointed to by the i th element of a". But in Perl, the five prefix dereferencers ($ @ * % &) effectively bind more tightly than the subscripting braces or brackets.[5] Therefore, it is $$listref and not $listref[$i] that is taken to be a reference to an array. If you want the C notion, you either have to write ${$listref[$i]} to force the $listref[$i] to get evaluated before the leading $ dereferencer, or you have to use the -> notation:

[5] But not because of operator precedence. The funny characters in Perl are not operators in that sense. The grammar simply prohibits anything more complicated than a simple variable or block from following the initial funny character, for various funny reasons.

$listref[$i]->[$j] = "hello";

Using Object Methods

If a reference happens to be a reference to an object (a blessed thingy, that is), then there are probably methods to access the innards of the object, and you should probably stick to those methods unless you're writing the class package that defines the object's methods. (Such a package is allowed to treat the object as a mere thingy when it wants to.) In other words, be nice, and don't violate the object's encapsulation without a very good reason. Perl does not enforce encapsulation. We are not totalitarians here. We do expect some basic civility, however.

Other Tricks You Can Do with Hard References

You can use the ref operator to determine what type of thingy a reference is pointing to. Think of ref as a "typeof" operator that returns true if its argument is a reference and false otherwise. The value returned depends on the type of thing referenced. Built-in types include:

REF
SCALAR
ARRAY
HASH
CODE
GLOB

If you simply use a hard reference in a string context, it'll be converted to a string containing both the type and the address: SCALAR(0x1fc0e). (The reverse conversion cannot be done, since reference count information has been lost.)

You can use the bless operator to associate a referenced thingy with a package functioning as an object class. When you do this, ref will return that package name instead of the internal type. An object reference used in a string context returns a string with both the external and internal types, along with the address: MyType=HASH(0x20d10). See Chapter 5, Packages, Modules, and Object Classes for more details about objects.

Since the dereference syntax always indicates the kind of reference desired, a typeglob can be used the same way a reference can, despite the fact that a typeglob contains multiple thingies of various types. So ${*foo} and ${\$foo} both refer to the same scalar variable. The latter is more efficient though.

Here's a trick for interpolating the value of a subroutine call into a string:

print "My sub returned @{[ mysub(1,2,3) ]} that time.\n";

It works like this. At compile time, when the @{...} is seen within the double-quoted string, it's parsed as a block that will return a reference. Within the block, there are square brackets that will create a reference to an anonymous array from whatever is in the brackets. So at run-time, mysub(1,2,3) is called, and the results are loaded into an anonymous array, a reference to which is then returned within the block. That array reference is then immediately dereferenced by the surrounding @{...}, and the array value is interpolated into the double-quoted string just as an ordinary array would be. This chicanery is also useful for arbitrary expressions, such as:

print "That yields @{[ $n + 5 ]} widgets\n";

Be careful though. The inside of the square brackets is supplying a list context to its expression. In this case it doesn't matter, although it's possible that the above call to mysub() might care. When it does matter, a similar trick can be done with a scalar reference. It just isn't quite as pretty:

print "That yields ${ \($n + 5) } widgets.";

Closures

Earlier we talked about creating anonymous subroutines with a nameless sub {}. Since anonymous subroutines have to be generated someplace within your code (in order to generate the reference that you poke into some variable), such routines can be thought of as coming into existence at run-time. (That is, they have a time of generation as well as a location of definition.) Because of this fact, anonymous subroutines can act as closures with respect to my variables--that is, with respect to variables visible lexically within the current scope. Closure is a notion out of the Lisp world that says if you define an anonymous function in a particular lexical context at a particular moment, it pretends to run in that context even when it's called outside of the context. In other words, you are guaranteed to get the same copy of a lexical variable, even though many other instances of the same lexical variable may have been created before or since. This gives you a way to pass arguments to a subroutine when you define it as well as when you call it. It's useful for setting up little bits of code to run later, such as callbacks.

You can also think of closures as a way to write a subroutine template without using eval. The lexical variables are like parameters to fill in the template.

Here's a small example of how closures work:

sub newprint {
    my $x = shift;
    return sub { my $y = shift; print "$x, $y!\n"; };
}
$h = newprint("Howdy");
$g = newprint("Greetings");
# Time passes...
&$h("world");
&$g("earthlings");

This prints:

Howdy, world!
Greetings, earthlings!

Note in particular how $x continues to refer to the value passed into newprint() despite the fact that the my $x has seemingly gone out of scope by the time the anonymous subroutine runs. That's what closures are all about.

This method only applies to my variables. Global variables work as they always worked (since they're neither created nor destroyed the way lexical variables are). By and large, closures are not something you need to trouble yourself about. When you do need them, they just sorta do what you expect.[6]

[6] Always presuming you expect the right thing, of course.

Perl doesn't provide member pointers like C++ does, but you can get a similar effect using a closure. Suppose you want a pointer to a method for a particular object. You can remember both the object and the method as lexical variables bound to a closure:

sub get_method_ref {
    my ($self, $method) = @_;
    return sub { return $self->$method(@_) };
}
$dog_wag = get_method_ref($dog, 'wag');
&$dog_wag("tail");  # Calls $dog->wag('tail').