Tying Arrays (Programming Perl)

14.2. Tying Arrays

A class implementing a tied array must define at least the methods TIEARRAY, FETCH, and STORE. There are many optional methods: the ubiquitous DESTROY method, of course, but also the STORESIZE and FETCHSIZE methods used to provide $#array and scalar(@array) access. In addition, CLEAR is triggered when Perl needs to empty the array, and EXTEND when Perl would have pre-extended allocation in a real array.

You may also define the POP, PUSH, SHIFT, UNSHIFT, SPLICE, DELETE, and EXISTS methods if you want the corresponding Perl functions to work on the tied array. The Tie::Array class can serve as a base class to implement the first five of those functions in terms of FETCH and STORE. (Tie::Array's default implementation of DELETE and EXISTS simply calls croak.) As long as you define FETCH and STORE, it doesn't matter what kind of data structure your object contains.

On the other hand, the Tie::StdArray class (defined in the standard Tie::Array module) provides a base class with default methods that assume the object contains a regular array. Here's a simple array-tying class that makes use of this. Because it uses Tie::StdArray as its base class, it only needs to define the methods that should be treated in a nonstandard way.

#!/usr/bin/perl
package ClockArray;
use Tie::Array;
our @ISA = 'Tie::StdArray';
sub FETCH {
    my($self,$place) = @_;
    $self->[ $place % 12 ];
}
sub STORE {
    my($self,$place,$value) = @_;
    $self->[ $place % 12 ] = $value;
}

package main;
tie my @array, 'ClockArray';
@array = ( "a" ... "z" );
print "@array\n";

When run, the program prints out "y z o p q r s t u v w x". This class provides an array with only a dozen slots, like hours of a clock, numbered 0 through 11. If you ask for the 15th array index, you really get the 3rd one. Think of it as a travel aid for people who haven't learned how to read 24-hour clocks.

14.2.1. Array-Tying Methods

That's the simple way. Now for some nitty-gritty details. To demonstrate, we'll implement an array whose bounds are fixed at its creation. If you try to access anything beyond those bounds, an exception is raised. For example:

use BoundedArray;
tie @array, "BoundedArray", 2;

$array[0] = "fine";
$array[1] = "good";
$array[2] = "great";
$array[3] = "whoa";   # Prohibited; displays an error message.

The preamble code for the class is as follows:

package BoundedArray;
use Carp;
use strict;

To avoid having to define SPLICE later, we'll inherit from the Tie::Array class:

use Tie::Array;
our @ISA = ("Tie::Array");

CLASSNAME->TIEARRAY(LIST)

As the constructor for the class, TIEARRAY should return a blessed reference through which the tied array will be emulated.

In this next example, just to show you that you don't really have to return an array reference, we'll choose a hash reference to represent our object. A hash works out well as a generic record type: the value in the hash's "BOUND" key will store the maximum bound allowed, and its "DATA" value will hold the actual data. If someone outside the class tries to dereference the object returned (doubtless thinking it an array reference), an exception is raised.

sub TIEARRAY {
    my $class = shift;
    my $bound = shift;
    confess "usage: tie(\@ary, 'BoundedArray', max_subscript)"
        if @_ || $bound =~ /\D/;
    return bless { BOUND => $bound, DATA => [] }, $class;
}

We can now say:

tie(@array, "BoundedArray", 3);  # maximum allowable index is 3

to ensure that the array will never have more than four elements. Whenever an individual element of the array is accessed or stored, FETCH and STORE will be called just as they were for scalars, but with an extra index argument.

SELF->FETCH(INDEX)

This method is run whenever an individual element in the tied array is accessed. It receives one argument after the object: the index of the value we're trying to fetch.

sub FETCH {
    my ($self, $index) = @_;
    if ($index > $self->{BOUND}) {
        confess "Array OOB: $index > $self->{BOUND}";
    }
    return $self->{DATA}[$index];
}

SELF->STORE(INDEX, VALUE)

This method is invoked whenever an element in the tied array is set. It takes two arguments after the object: the index at which we're trying to store something and the value we're trying to put there. For example:

sub STORE {
    my($self, $index, $value) = @_;
    if ($index > $self->{BOUND} ) {
        confess "Array OOB: $index > $self->{BOUND}";
    }
    return $self->{DATA}[$index] = $value;
}

SELF->DESTROY

Perl calls this method when the tied variable needs to be destroyed and its memory reclaimed. This is almost never needed in a language with garbage collection, so for this example we'll just leave it out.

SELF->FETCHSIZE

The FETCHSIZE method should return the total number of items in the tied array associated with SELF. It's equivalent to scalar(@array), which is usually equal to $#array + 1.

sub FETCHSIZE {
    my $self = shift;
    return scalar @{$self->{DATA}};
}

SELF->STORESIZE(COUNT)

This method sets the total number of items in the tied array associated with SELF to be COUNT. If the array shrinks, you should remove entries beyond COUNT. If the array grows, you should make sure the new positions are undefined. For our BoundedArray class, we also ensure that the array doesn't grow beyond the limit initially set.

sub STORESIZE {
    my ($self, $count) = @_;
    if ($count > $self->{BOUND}) {
        confess "Array OOB: $count > $self->{BOUND}";
    }
    $#{$self->{DATA}} = $count;
}

SELF->EXTEND(COUNT)

Perl uses the EXTEND method to indicate that the array is likely to expand to hold COUNT entries. That way you can can allocate memory in one big chunk instead of in many successive calls later on. Since our BoundedArrays have fixed upper bounds, we won't define this method.

SELF->EXISTS(INDEX)

This method verifies that the element at INDEX exists in the tied array. For our BoundedArray, we just employ Perl's built-in exists after verifying that it's not an attempt to look past the fixed upper bound.

sub EXISTS  {
    my ($self, $index) = @_;
    if ($index > $self->{BOUND}) {
        confess "Array OOB: $index > $self->{BOUND}";
    }
    exists $self->{DATA}[$index];
}

SELF->DELETE(INDEX)

The DELETE method removes the element at INDEX from the tied array SELF. For our BoundedArray class, the method looks nearly identical to EXISTS, but this is not the norm.

sub DELETE {
    my ($self, $index) = @_;
    print STDERR "deleting!\n";
    if ($index > $self->{BOUND}) {
        confess "Array OOB: $index > $self->{BOUND}";
    }
    delete $self->{DATA}[$index];
}

SELF->CLEAR

This method is called whenever the array has to be emptied. That happens when the array is set to a list of new values (or an empty list), but not when it's provided to the undef function. Since a cleared BoundedArray always satisfies the upper bound, we don't need check anything here:

sub CLEAR {
    my $self = shift;
    $self->{DATA} = [];
}

If you set the array to a list, CLEAR will trigger but won't see the list values. So if you violate the upper bound like so:

tie(@array, "BoundedArray", 2);
@array = (1, 2, 3, 4);

the CLEAR method will still return successfully. The exception will only be raised on the subsequent STORE. The assignment triggers one CLEAR and four STOREs.

SELF->PUSH(LIST)

This method appends the elements of LIST to the array. Here's how it might look for our BoundedArray class:

sub PUSH    {
    my $self = shift;
    if (@_ + $#{$self->{DATA}} > $self->{BOUND}) {
        confess "Attempt to push too many elements";
    }
    push @{$self->{DATA}}, @_;
}

SELF->UNSHIFT(LIST)

This method prepends the elements of LIST to the array. For our BoundedArray class, the subroutine would be similar to PUSH.

SELF->POP

The POP method removes the last element of the array and returns it. For BoundedArray, it's a one-liner:

sub POP { my $self = shift; pop @{$self->{DATA}} }

SELF->SHIFT

The SHIFT method removes the first element of the array and returns it. For BoundedArray, it's similar to POP.

SELF->SPLICE(OFFSET, LENGTH, LIST)

This method lets you splice the SELF array. To mimic Perl's built-in splice, OFFSET should be optional and default to zero, with negative values counting back from the end of the array. LENGTH should also be optional, defaulting to rest of the array. LIST can be empty. If it's properly mimicking the built-in, the method will return a list of the original LENGTH elements at OFFSET (that is, the list of elements to be replaced by LIST).

Since splicing is a somewhat complicated operation, we won't define it at all; we'll just use the SPLICE subroutine from the Tie::Array module that we got for free when we inherited from Tie::Array. This way we define SPLICE in terms of other BoundedArray methods, so the bounds checking will still occur.

That completes our BoundedArray class. It warps the semantics of arrays just a little. But we can do better, and in very much less space.

14.2.2. Notational Convenience

One of the nice things about variables is that they interpolate. One of the not-so-nice things about functions is that they don't. You can use a tied array to make a function that can be interpolated. Suppose you want to interpolate random integers in a string. You can just say:

#!/usr/bin/perl
package RandInterp;
sub TIEARRAY { bless \my $self };
sub FETCH { int rand $_[1] };

package main;
tie @rand, "RandInterp";

for (1,10,100,1000) {
    print "A random integer less than $_ would be $rand[$_]\n";
}
$rand[32] = 5;    # Will this reformat our system disk?

When run, this prints:

A random integer less than 1 would be 0
A random integer less than 10 would be 3
A random integer less than 100 would be 46
A random integer less than 1000 would be 755
Can't locate object method "STORE" via package "RandInterp" at foo line 10.

As you can see, it's no big deal that we didn't even implement STORE. It just blows up like normal.