Managing Instance Data (Programming Perl)

12.7. Managing Instance Data

Most classes create objects that are essentially just data structures with several internal data fields (instance variables) plus methods to manipulate them.

Perl classes inherit methods, not data, but as long as all access to the object is through method calls anyway, this works out fine. If you want data inheritance, you have to effect it through method inheritance. By and large, this is not a necessity in Perl, because most classes store the attributes of their object in an anonymous hash. The object's instance data is contained within this hash, which serves as its own little namespace to be carved up by whatever classes do something with the object. For example, if you want an object called $city to have a data field named elevation, you can simply access $city->{elevation}. No declarations are necessary. But method wrappers have their uses.

Suppose you want to implement a Person object. You decide to have a data field called "name", which by a strange coincidence you'll store under the key name in the anonymous hash that will serve as the object. But you don't want users touching the data directly. To reap the rewards of encapsulation, users need methods to access that instance variable without lifting the veil of abstraction.

For example, you might make a pair of accessor methods:

sub get_name {
    my $self = shift;
    return $self->{name};
}

sub set_name {
    my $self      = shift;
    $self->{name} = shift;
}

which leads to code like this:

$him = Person->new();
$him->set_name("Frodo");
$him->set_name( ucfirst($him->get_name) );

You could even combine both methods into one:

sub name {
    my $self = shift;
    if (@_) { $self->{name} = shift }
    return $self->{name};
}

This would then lead to code like this:

$him = Person->new();
$him->name("Frodo");
$him->name( ucfirst($him->name) );

The advantage of writing a separate function for each instance variable (which for our Person class might be name, age, height, and so on) is that it is direct, obvious, and flexible. The drawback is that every time you want a new class, you end up defining one or two nearly identical methods per instance variable. This isn't too bad for the first few, and you're certainly welcome to do it that way if you'd like. But when convenience is preferred over flexibility, you might prefer one of the techniques described in the following sections.

Note that we will be varying the implementation, not the interface. If users of your class respect the encapsulation, you'll be able to transparently swap one implementation for another without the users noticing. (Family members in your inheritance tree using your class for a subclass or superclass might not be so forgiving, since they know you far better than strangers do.) If your users have been peeking and poking into the private affairs of your class, the inevitable disaster is their own fault and none of your concern. All you can do is live up to your end of the contract by maintaining the interface. Trying to stop everyone else in the world from ever doing something slightly wicked will take up all your time and energy--and in the end, fail anyway.

Dealing with family members is more challenging. If a subclass overrides a superclass's attribute accessor, should it access the same field in the hash, or not? An argument can be made either way, depending on the nature of the attribute. For the sake of safety in the general case, each accessor can prefix the name of the hash field with its own classname, so that subclass and superclass can both have their own version. Several of the examples below, including the standard Struct::Class module, use this subclass-safe strategy. You'll see accessors resembling this:

sub name {
    my $self = shift;
    my $field = __PACKAGE__ . "::name";
    if (@_) { $self->{$field} = shift }
    return $self->{$field};
}

In each of the following examples, we create a simple Person class with fields name, race, and aliases, each with an identical interface but a completely different implementation. We're not going to tell you which one we like the best, because we like them all the best, depending on the occasion. And tastes differ. Some folks prefer stewed conies; others prefer fissssh.

12.7.1. Field Declarations with use fields

Objects don't have to be implemented as anonymous hashes. Any reference will do. For example, if you used an anonymous array, you could set up a constructor like this:

sub new {
    my $invocant = shift;
    my $class = ref($invocant) || $invocant;
    return bless [], $class;
}

and have accessors like these:

sub name {
    my $self = shift;
    if (@_) { $self->[0] = shift }
    return $self->[0];
}

sub race {
    my $self = shift;
    if (@_) { $self->[1] = shift }
    return $self->[1];
}

sub aliases {
    my $self = shift;
    if (@_) { $self->[2] = shift }
    return $self->[2];
}

Arrays are somewhat faster to access than hashes and don't take up quite as much memory, but they're not at all convenient to use. You have to keep track of the index numbers (not just in your class, but in your superclass, too), which must somehow indicate which pieces of the array your class is using. Otherwise, you might reuse a slot.

The use fields pragma addresses all of these points:

package Person;
use fields qw(name race aliases);

This pragma does not create accessor methods for you, but it does rely on some built-in magic (called pseudohashes) to do something similar. (You may wish to wrap accessors around the fields anyway, as we do in the following example.) Pseudohashes are array references that you can use like hashes because they have an associated key map table. The use fields pragma sets this key map up for you, effectively declaring which fields are valid for the Person object; this makes the Perl compiler aware of them. If you declare the type of your object variable (as in my Person $self, in the next example), the compiler is smart enough to optimize access to the fields into straight array accesses. Perhaps more importantly, it validates field names for type safety (well, typo safety, really) at compile time. (See the section Section 12.3.5, "Pseudohashes" in Chapter 8, "References".)

A constructor and sample accessors would look like this:

package Person;
use fields qw(name race aliases);
sub new {
    my $type = shift;
    my Person $self = fields::new(ref $type || $type);
    $self->{name} = "unnamed";
    $self->{race}  = "unknown";
    $self->{aliases} = [];
    return $self;
}
sub name {
    my Person $self = shift;
    $self->{name} = shift if @_;
    return $self->{name};
}
sub race {
    my Person $self = shift;
    $self->{race} = shift if @_;
    return $self->{race};
}
sub aliases {
    my Person $self = shift;
    $self->{aliases} = shift if @_;
    return $self->{aliases};
}
1;

If you misspell one of the literal keys used to access the pseudohash, you won't have to wait until run time to learn about this. The compiler knows what type of object $self is supposed to refer to (because you told it), so it can check that the code accesses only those fields that Person objects actually have. If you have horses on the brain and try to access a nonexistent field (such as $self->{mane}), the compiler can flag this error right away and will never turn the erroneous program over to the interpreter to run.

There's still a bit of repetition in declaring methods to get at instance variables, so you still might like to automate the creation of simple accessor methods using one of the techniques below. However, because all these techniques use some sort of indirection, if you use them, you will lose the compile-time benefits of typo-checking lexically typed hash accesses. You'll still keep the (small) time and space advantages, though.

If you do elect to use a pseudohash to implement your class, any class that inherits from this one must be aware of that underlying pseudohash implementation. If an object is implemented as a pseudohash, all participants in the inheritance hierarchy should employ the use base and use fields declarations. For example,

package Wizard;
use base "Person";
use fields qw(staff color sphere);

This makes the Wizard module a subclass of class Person, and loads the Person.pm file. It also registers three new fields in this class to go along with those from Person. That way when you write:

my Wizard $mage = fields::new("Wizard");

you'll get a pseudohash object with access to both classes' fields:

$mage->name("Gandalf");
$mage->color("Grey");

Since all subclasses must know that they are using a pseudohash implementation, they should use the direct pseudohash notation for both efficiency and type safety:

$mage->{name} = "Gandalf";
$mage->{color} = "Grey";

If you want to keep your implementations interchangeable, however, outside users of your class must use the accessor methods.

Although use base supports only single inheritance, this is seldom a severe restriction. See the descriptions of use base and use fields in Chapter 31, "Pragmatic Modules".

12.7.2. Generating Classes with Class::Struct

The standard Class::Struct module exports a function named struct. This creates all the trapping you'll need to get started on an entire class. It generates a constructor named new, plus accessor methods for each of the data fields (instance variables) named in that structure.

For example, if you put the class in a Person.pm file:

package Person;
use Class::Struct;
struct Person => {    # create a definition for a "Person"
    name    => '$',   #    name field is a scalar
    race    => '$',   #    race field is also a scalar
    aliases => '@',   #    but aliases field is an array ref
};
1;

Then you could use that module this way:

use Person;
my $mage = Person->new();
$mage->name("Gandalf");
$mage->race("Istar");
$mage->aliases( ["Mithrandir", "Olorin", "Incanus"] );

The Class::Struct module created all four of those methods. Because it follows the subclass-safe policy of always prefixing the field name with the class name, it also permits an inherited class to have its own separate field of the same name as a base class field without conflict. That means in this case that "Person::name" rather than just "name" is used for the hash key for that particular instance variable.

Fields in a struct declaration don't have to be basic Perl types. They can also specify other classes, but classes created with struct work best because the function makes assumptions about how the classes behave that aren't generally true of all classes. For example, the new method for the appropriate class is invoked to initialize the field, but many classes have constructors with other names.

See the description of Class::Struct in Chapter 32, "Standard Modules", and its online documentation for more information. Many standard modules use Class::Struct to implement their classes, including User::pwent and Net::hostent. Reading their code can prove instructive.

12.7.3. Generating Accessors with Autoloading

As we mentioned earlier, when you invoke a nonexistent method, Perl has two different ways to look for an AUTOLOAD method, depending on whether you declared a stub method. You can use this property to provide access to the object's instance data without writing a separate function for each instance. Inside the AUTOLOAD routine, the name of the method actually invoked can be retrieved from the $AUTOLOAD variable. Consider the following code:

use Person;
$him = Person->new;
$him->name("Aragorn");
$him->race("Man");
$him->aliases( ["Strider", "Estel", "Elessar"] );
printf "%s is of the race of %s.\n", $him->name, $him->race;
print "His aliases are: ", join(", ", @{$him->aliases}), ".\n";

As before, this version of the Person class implements a data structure with three fields: name, race, and aliases:

package Person;
use Carp;

my %Fields = (
    "Person::name"  => "unnamed",
    "Person::race"   => "unknown",
    "Person::aliases"  => [],
);

# The next declaration guarantees we get our own autoloader.
use subs qw(name race aliases);

sub new {
    my $invocant = shift;
    my $class = ref($invocant) || $invocant;
    my $self  = { %Fields, @_ };    # clone like Class::Struct
    bless $self, $class;
    return $self;
}

sub AUTOLOAD {
    my $self = shift;
    # only handle instance methods, not class methods
    croak "$self not an object" unless ref($invocant);
    my $name = our $AUTOLOAD;
    return if $name =~ /::DESTROY$/;
    unless (exists $self->{$name}) {
        croak "Can't access `$name' field in $self";
    }
    if (@_) { return $self->{$name} = shift }
    else    { return $self->{$name} }
}

As you see, there are no methods named name, race, or aliases anywhere to be found. The AUTOLOAD routine takes care of all that. When someone uses $him->name("Aragorn"), the AUTOLOAD subroutine is called with $AUTOLOAD set to "Person::name". Conveniently, by leaving it fully qualified, it's in exactly the right form for accessing fields of the object hash. That way if you use this class as part of a larger class hierarchy, you don't conflict with uses of the same name in other classes.

12.7.4. Generating Accessors with Closures

Most accessor methods do essentially the same thing: they simply fetch or store a value from that instance variable. In Perl, the most natural way to create a family of near-duplicate functions is looping around a closure. But closures are anonymous functions lacking names, and methods need to be named subroutines in the class's package symbol table so that they can be called by name. This is no problem--just assign the closure reference to a typeglob of the appropriate name.

package Person;

sub new {
    my $invocant = shift;
    my $self = bless({}, ref $invocant || $invocant);
    $self->init();
    return $self;
}

sub init {
    my $self = shift;
    $self->name("unnamed");
    $self->race("unknown");
    $self->aliases([]);
}

for my $field (qw(name race aliases)) {
    my $slot = __PACKAGE__ . "::$field";
    no strict "refs";          # So symbolic ref to typeglob works.

    *$field = sub {
        my $self = shift;
        $self->{$slot} = shift if @_;
        return $self->{$slot};
    };
}

Closures are the cleanest hand-rolled way to create a multitude of accessor methods for your instance data. It's efficient for both the computer and you. Not only do all the accessors share the same bit of code (they only need their own lexical pads), but later if you decide to add another attribute, the changes required are minimal: just add one more word to the for loop's list, and perhaps something to the init method.

12.7.5. Using Closures for Private Objects

So far, these techniques for managing instance data have offered no mechanism for "protection" from external access. Anyone outside the class can open up the object's black box and poke about inside--if they don't mind voiding the warranty. Enforced privacy tends to get in the way of people trying to get their jobs done. Perl's philosophy is that it's better to encapsulate one's data with a sign that says:

IN CASE OF FIRE
  BREAK GLASS

You should respect such encapsulation when possible, but still have easy access to the contents in an emergency situation, like for debugging.

But if you do want to enforce privacy, Perl isn't about to get in your way. Perl offers low-level building blocks that you can use to surround your class and its objects with an impenetrable privacy shield--one stronger, in fact, than that found in many popular object-oriented languages. Lexical scopes and the lexical variables inside them are the key components here, and closures play a pivotal role.

In the section Section 12.5.5, "Private Methods" we saw how a class can use closures to implement methods that are invisible outside the module file. Later we'll look at accessor methods that regulate class data so private that not even the rest of the class has unrestricted access. Those are still fairly traditional uses of closures. The truly interesting approach is to use a closure as the very object itself. The object's instance variables are locked up inside a scope to which the object alone--that is, the closure--has free access. This is a very strong form of encapsulation; not only is it proof against external tampering, even other methods in the same class must use the proper access methods to get at the object's instance data.

Here's an example of how this might work. We'll use closures both for the objects themselves and for the generated accessors:

package Person;
sub new {
    my $invocant  = shift;
    my $class = ref($invocant) || $invocant;
    my $data = {
       NAME     => "unnamed",
       RACE     => "unknown",
       ALIASES  => [],
    };
    my $self = sub {
       my $field = shift;
       #############################
       ### ACCESS CHECKS GO HERE ###
       #############################
       if (@_) { $data->{$field} = shift }
       return    $data->{$field};
    };
    bless($self, $class);
    return $self;
}
# generate method names
for my $field (qw(name race aliases)) {
    no strict "refs";  # for access to the symbol table
    *$field = sub {
        my $self = shift;
        return $self->(uc $field, @_);
    };
}

The object created and returned by the new method is no longer a hash, as it was in other constructors we've looked at. It's a closure with unique access to the attribute data stored in the hash referred to by $data. Once the constructor call is finished, the only access to $data (and hence to the attributes) is via the closure.

In a call like $him->name("Bombadil"), the invoking object stored in $self is the closure that was blessed and returned by the constructor. There's not a lot one can do with a closure beyond calling it, so we do just that with $self->(uc $field, @_). Don't be fooled by the arrow; this is just a regular indirect function call, not a method invocation. The initial argument is the string "name", and any remaining arguments are whatever else was passed in.[7] Once we're executing inside the closure, the hash reference inside $data is again accessible. The closure is then free to permit or deny access to whatever it pleases.

[7] Sure, the double-function call is slow, but if you wanted fast, would you really be using objects in the first place?

No one outside the closure object has unmediated access to this very private instance data, not even other methods in the class. They could try to call the closure the way the methods generated by the for loop do, perhaps setting an instance variable the class never heard of. But this approach is easily blocked by inserting various bits of code in the constructor where you see the comment about access checks. First, we need a common preamble:

use Carp;
local $Carp::CarpLevel = 1;  # Keeps croak messages short
my ($cpack, $cfile) = caller();

Now for each of the checks. The first one makes sure the specified attribute name exists:

croak "No valid field '$field' in object"
    unless exists $data->{$field};

This one allows access only by callers from the same file:

carp "Unmediated access denied to foreign file"
    unless $cfile eq __FILE__;

This one allows access only by callers from the same package:

carp "Unmediated access denied to foreign package ${cpack}::"
    unless $cpack eq __PACKAGE__;

And this one allows access only by callers whose classes inherit ours:

carp "Unmediated access denied to unfriendly class ${cpack}::"
    unless $cpack->isa(__PACKAGE__);

All these checks block unmediated access only. Users of the class who politely use the class's designated methods are under no such restriction. Perl gives you the tools to be just as persnickety as you want to be. Fortunately, not many people want to be.

But some people ought to be. Persnickety is good when you're writing flight control software. If you either want or ought to be one of those people, and you prefer using working code over reinventing everything on your own, check out Damian Conway's Tie::SecureHash module on CPAN. It implements restricted hashes with support for public, protected, and private persnicketations. It also copes with the inheritance issues that we've ignored in the previous example. Damian has also written an even more ambitious module, Class::Contract, that imposes a formal software engineering regimen over Perl's flexible object system. This module's feature list reads like a checklist from a computer science professor's software engineering textbook,[8]including enforced encapsulation, static inheritance, and design-by-contract condition checking for object-oriented Perl, along with a declarative syntax for attribute, method, constructor, and destructor definitions at both the object and class level, and preconditions, postconditions, and class invariants. Whew!

[8] Can you guess what Damian's job is? By the way, we highly recommend his book, Object Oriented Perl (Manning Publications, 1999).

12.7.6. New Tricks

As of release 5.6 of Perl, you can also declare a method to indicate that it returns an lvalue. This is done with the lvalue subroutine attribute (not to be confused with object attributes). This experimental feature allows you to treat the method as something that would appear on the lefthand side of an equal sign:

package Critter;

sub new {
    my $class = shift;
    my $self = { pups => 0, @_ };    # Override default.
    bless $self, $class;
}

sub pups : lvalue {                  # We'll assign to pups() later.
    my $self = shift;
    $self->{pups};
}

package main;
$varmint = Critter->new(pups => 4);
$varmint->pups *= 2;                 # Assign to $varmint->pups!
$varmint->pups =~ s/(.)/$1$1/;       # Modify $varmint->pups in place!
print $varmint->pups;                # Now we have 88 pups.

This lets you pretend $varmint->pups is a variable while still obeying encapsulation. See the section Section 12.5.2, "The lvalue Attribute" in Chapter 6, "Subroutines".

If you're running a threaded version of Perl and want to ensure that only one thread can call a particular method on an object, you can use the locked and method attributes to do that:

sub pups : locked method {
    ...
}

When any thread invokes the pups method on an object, Perl locks the object before execution, preventing other threads from doing the same. See the section Section 12.5.1, "The locked and method Attributes" in Chapter 6, "Subroutines".