Creating References (Programming Perl)

8.2. Creating References

There are several ways to create references, most of which we will describe before explaining how to use (dereference) the resulting references.

8.2.1. The Backslash Operator

You can create a reference to any named variable or subroutine with a backslash. (You may also use it on an anonymous scalar value like 7 or "camel", although you won't often need to.) This operator works like the & (address-of) operator in C--at least at first glance.

Here are some examples:

$scalarref = \$foo;
$constref  = \186_282.42;
$arrayref  = \@ARGV;
$hashref   = \%ENV;
$coderef   = \&handler;
$globref   = \*STDOUT;

The backslash operator can do more than produce a single reference. It will generate a whole list of references if applied to a list. See the section Section 8.3.6, "Other Tricks You Can Do with Hard References" for details.

8.2.2. Anonymous Data

In the examples just shown, the backslash operator merely makes a duplicate of a reference that is already held in a variable name--with one exception. The 186_282.42 isn't referenced by a named variable--it's just a value. It's one of those anonymous referents we mentioned earlier. Anonymous referents are accessed only through references. This one happens to be a number, but you can create anonymous arrays, hashes, and subroutines as well.

8.2.2.1. The anonymous array composer

You can create a reference to an anonymous array with square brackets:

$arrayref = [1, 2, ['a', 'b', 'c', 'd']];

Here we've composed an anonymous array of three elements, whose final element is a reference to an anonymous array of four elements (depicted in Figure 8-2). (The multidimensional syntax described later can be used to access this. For example, $arrayref->[2][1] would have the value "b".)

Figure 8.2. A reference to an array, whose third element is itself an array reference

We now have one way to represent the table at the beginning of the chapter:

$table = [ [ "john", 47, "brown", 186],
           [ "mary", 23, "hazel", 128],
           [ "bill", 35, "blue",  157] ];

Square brackets work like this only where the Perl parser is expecting a term in an expression. They should not be confused with the brackets in an expression like $array[6]--although the mnemonic association with arrays is intentional. Inside a quoted string, square brackets don't compose anonymous arrays; instead, they become literal characters in the string. (Square brackets do still work for subscripting in strings, or you wouldn't be able to print string values like "VAL=$array[6]\n". And to be totally honest, you can in fact sneak anonymous array composers into strings, but only when embedded in a larger expression that is being interpolated. We'll talk about this cool feature later in the chapter because it involves dereferencing as well as referencing.)

8.2.2.2. The anonymous hash composer

You can create a reference to an anonymous hash with braces:

$hashref = {
    'Adam'   => 'Eve',
    'Clyde'  => $bonnie,
    'Antony' => 'Cleo' . 'patra',
};

For the values (but not the keys) of the hash, you can freely mix other anonymous array, hash, and subroutine composers to produce as complicated a structure as you like.

We now have another way to represent the table at the beginning of the chapter:

$table = {
            "john" => [ 47, "brown", 186 ],
            "mary" => [ 23, "hazel", 128 ],
            "bill" => [ 35, "blue",  157 ],
};

That's a hash of arrays. Choosing the best data structure is a tricky business, and the next chapter is devoted to it. But as a teaser, we could even use a hash of hashes for our table:

$table = {
           "john" => { age    => 47,
                       eyes   => "brown",
                       weight => 186,
                     },
           "mary" => { age    => 23,
                       eyes   => "hazel",
                       weight => 128,
                     },
           "bill" => { age    => 35,
                       eyes   => "blue",
                       weight => 157,
                     },
 };

As with square brackets, braces work like this only where the Perl parser is expecting a term in an expression. They should not be confused with the braces in an expression like $hash{key}--although the mnemonic association with hashes is (again) intentional. The same caveats apply to the use of braces within strings.

There is one additional caveat which didn't apply to square brackets. Since braces are also used for several other things (including blocks), you may occasionally have to disambiguate braces at the beginning of a statement by putting a + or a return in front, so that Perl realizes the opening brace isn't starting a block. For example, if you want a function to make a new hash and return a reference to it, you have these options:

sub hashem {        { @_ } }   # Silently WRONG -- returns @_.
sub hashem {       +{ @_ } }   # Ok.
sub hashem { return { @_ } }   # Ok.

8.2.2.3. The anonymous subroutine composer

You can create a reference to an anonymous subroutine by using sub without a subroutine name:

$coderef = sub { print "Boink!\n" };  # Now &$coderef prints "Boink!"

Note the presence of the semicolon, required here to terminate the expression. (It isn't required after the more common usage of subNAME{} that declares and defines a named subroutine.) A nameless sub {} is not so much a declaration as it is an operator--like do {} or eval {}--except that the code inside isn't executed immediately. Instead, it just generates a reference to the code, which in our example is stored in $coderef. However, no matter how many times you execute the line shown above, $coderef will still refer to the same anonymous subroutine.[2]

[2]But even though there's only one anonymous subroutine, there may be several copies of the lexical variables in use by the subroutine, depending on when the subroutine reference was generated. These are discussed later in the section Section 8.3.7, "Closures".

8.2.3. Object Constructors

Subroutines can also return references. That may sound trite, but sometimes you are supposed to use a subroutine to create a reference rather than creating the reference yourself. In particular, special subroutines called constructors create and return references to objects. An object is simply a special kind of reference that happens to know which class it's associated with, and constructors know how to create that association. They do so by taking an ordinary referent and turning it into an object with the bless operator, so we can speak of an object as a blessed reference. There's nothing religious going on here; since a class acts as a user-defined type, blessing a referent simply makes it a user-defined type in addition to a built-in one. Constructors are often named new--especially by C++ programmers--but they can be named anything in Perl.

Constructors can be called in any of these ways:

$objref = Doggie::->new(Tail => 'short', Ears => 'long');  #1
$objref = new Doggie:: Tail => 'short', Ears => 'long';    #2
$objref = Doggie->new(Tail => 'short', Ears => 'long');    #3
$objref = new Doggie Tail => 'short', Ears => 'long';      #4

The first and second invocations are the same. They both call a function named new that is supplied by the Doggie module. The third and fourth invocations are the same as the first two, but are slightly more ambiguous: the parser will get confused if you define your own subroutine named Doggie. (Which is why people typically stick with lowercase names for subroutines and uppercase for modules.) The fourth invocation can also get confused if you've defined your own new subroutine and don't happen to have done either a require or a use of the Doggie module, either of which has the effect of declaring the module. Always declare your modules if you want to use #4. (And watch out for stray Doggie subroutines.)

See Chapter 12, "Objects" for a discussion of Perl objects.

8.2.4. Handle References

References to filehandles or directory handles can be created by referencing the typeglob of the same name:

splutter(\*STDOUT);

sub splutter {
    my $fh = shift;
    print $fh "her um well a hmmm\n";
}

$rec = get_rec(\*STDIN);
sub get_rec {
    my $fh = shift;
    return scalar <$fh>;
}

If you're passing around filehandles, you can also use the bare typeglob to do so: in the example above, you could have used *STDOUT or *STDIN instead of \*STDOUT and \*STDIN.

Although you can usually use typeglob and references to typeglobs interchangeably, there are a few places where you can't. Simple typeglobs can't be blessed into objectdom, and typeglob references can't be passed back out of the scope of a localized typeglob.

When generating new filehandles, older code would often do something like this to open a list of files:

for $file (@names) {
    local *FH;
    open(*FH, $file) || next;
    $handle{$file} = *FH;
}

That still works, but now it's just as easy to let an undefined variable autovivify an anonymous typeglob:

for $file (@names) {
    my $fh;
    open($fh, $file) || next;
    $handle{$file} = $fh;
}

With indirect filehandles, it doesn't matter whether you use use typeglobs, references to typeglobs, or one of the more exotic I/O objects. You just use a scalar that--one way or another--gets interpreted as a filehandle. For most purposes, you can use either a typeglob or a typeglob reference almost indiscriminately. As we admitted earlier, there is some implicit dereferencing magic going on here.

8.2.5. Symbol Table References

In unusual circumstances, you might not know what type of reference you need when your program is written. A reference can be created by using a special syntax, affectionately known as the *foo{THING} syntax. *foo{THING} returns a reference to the THING slot in *foo, which is the symbol table entry holding the values of $foo, @foo, %foo, and friends.

$scalarref = *foo{SCALAR};   # Same as \$foo
$arrayref  = *ARGV{ARRAY};   # Same as \@ARGV
$hashref   = *ENV{HASH};     # Same as \%ENV
$coderef   = *handler{CODE}; # Same as \&handler
$globref   = *foo{GLOB};     # Same as \*foo
$ioref     = *STDIN{IO};     # Er...

All of these are self-explanatory except for *STDIN{IO}. It yields the actual internal IO::Handle object that the typeglob contains, that is, the part of the typeglob that the various I/O functions are actually interested in. For compatibility with previous versions of Perl, *foo{FILEHANDLE} is a synonym for the hipper *foo{IO} notation.

In theory, you can use a *HANDLE{IO} anywhere you'd use a *HANDLE or a \*HANDLE, such as for passing handles into or out of subroutines, or storing them in larger data structures. (In practice, there are still some wrinkles to be ironed out.) The advantage of them is that they access only the real I/O object you want, not the whole typeglob, so you run no risk of clobbering more than you want to through a typeglob assignment (although if you always assign to a scalar variable instead of to a typeglob, you'll be okay). One disadvantage is that there's no way to autovivify one as of yet.[3]

splutter(*STDOUT);
splutter(*STDOUT{IO});

sub splutter {
    my $fh = shift;
    print $fh "her um well a hmmm\n";
}

Both invocations of splutter() print "her um well a hmmm".

[3] Currently, open my $fh autovivifies a typeglob instead of an IO::Handle object, but someday we may fix that, so you shouldn't rely on the typeglobbedess of what open currently autovivifies.

The *foo{THING} thing returns undef if that particular THING hasn't been seen by the compiler yet, except when THING is SCALAR. It so happens that *foo{SCALAR} returns a reference to an anonymous scalar even if $foo hasn't been seen yet. (Perl always adds a scalar to any typeglob as an optimization to save a bit of code elsewhere. But don't depend on it to stay that way in future releases.)

8.2.6. Implicit Creation of References

A final method for creating references is not really a method at all. References of an appropriate type simply spring into existence if you dereference them in an lvalue context that assumes they exist. This is extremely useful, and is also What You Expect. This topic is covered later in this chapter, where we'll discuss how to dereference all of the references we've created so far.