|
Chapter 5 Packages, Modules, and Object Classes |
|
In older versions of Perl, a user could call dbmopen to
tie a hash to a UNIX DBM file. Whenever the
hash was accessed, the database file on disk (really just a hash,
not a full relational database) would be magically[16]
read from or written to. In modern versions of Perl, you can bind any
ordinary variable (scalar, array, or hash) to an implementation
class by using tie. (The class may or may not implement a
DBM file.) You can break this association with untie.
The tie function creates the association by creating an object
internally to represent the variable to the class. If you have a tied
variable, but want to get at the underlying object, there are
two ways to do it. First, the tie
function returns a reference to the object.
But if you didn't bother to store that object reference
anywhere, you could still retrieve it using the tied function.
$object = tie VARIABLE, CLASSNAME, LIST
untie VARIABLE
$object = tied VARIABLE
The tie function binds the variable to the class package that
provides the methods for that variable. Once
this magic has been performed, accessing a tied variable automatically
triggers method calls in the proper class. All the complexity of the
class is hidden behind magic method calls. The method names are
predetermined, since they're called implicitly from within the innards
of Perl. These names are in ALL CAPS, which is a convention in Perl
culture that indicates that the routines are called implicitly rather than
explicitly--just like BEGIN, END, and DESTROY.
And AUTOLOAD too, for that matter.
You can almost think of tie as a funny kind of bless, except
that it blesses a bare variable instead of a thingy reference, and
takes extra parameters, like a constructor. That's because it actually
does call a constructor internally. (That's one of the magic methods we
mentioned.) This constructor is passed the CLASSNAME you specified, as
well as any additional arguments you supply in the LIST. It is not
passed the VARIABLE, however. The only way the constructor can
tell which kind of VARIABLE is being tied is by knowing its
own method name. This is not the customary constructor name,
new, but rather one of TIESCALAR, TIEARRAY, or
TIEHASH. (You can likely figure out which name goes with which
variable type.) The constructor just returns an object reference in the
normal fashion, and doesn't worry about whether it was called from
tie--which it may not have been, since you can call these methods
directly if you like. (Indeed, if you've tied your variable to a class
that provides other methods not accessible through the variable,
you must call the other methods directly yourself, via the object
reference. These extra methods might provide services like file locking
or other forms of transaction protection.)
As in any constructor, these constructors must bless a reference to a
thingy and return it as the implementation object. The thingy inside
the implementation object doesn't have to be of the same type as the
variable you're tying to. It does have to be a properly blessed
object, though. See the example below on tied arrays, which uses a hash
object to hold information about an array.
The tie function will not use or require a module for you--you
must do that explicitly yourself. (On the other hand, the dbmopen
emulator function will, for backward compatibility, attempt to
use one or another DBM implementation. But you can
preempt its selection with an explicit use, provided the module
you use is one of the modules in dbmopen's list of modules
to try.
See the AnyDBM_File module in Chapter 7, The Standard Perl Library for a fuller explanation.
A class implementing a tied scalar must define the following methods:
TIESCALAR, FETCH, STORE, and possibly
DESTROY. These routines will be invoked implicitly when you
tie a variable (TIESCALAR), read a tied variable
(FETCH), or assign a value to a tied variable (STORE). The
DESTROY method is called (as always) when the last reference to
the object disappears. (This may or may not happen when you call
untie, which destroys the reference used by the tie, but doesn't
destroy any outstanding references you may have squirreled away
elsewhere.) The FETCH and STORE methods are triggered
when you access the variable that's been tied, not the object it's been
tied to. If you have a handle on the object (either returned by the
initial tie or retrieved later via tied), you can access the
underlying object yourself without automatically triggering its
FETCH or STORE methods.
Let's look at each of these methods in turn, using as our example an
imaginary class called Nice.[17]
Variables tied to this class are scalars containing process priorities,
and each such variable is implicitly associated with an object that
contains a particular process ID, such the ID of the currently running
process or of the parent process. (Presumably you'd name your variables
to remind you which process you're referring to.) Variables are tied to
the class this way:
use Nice; # load the Nice.pm module
tie $his_speed, 'Nice', getppid();
tie $my_speed, 'Nice', $$;
Once the variables have been tied, their previous contents are no longer
accessible. The internally forged connection between the variable and
the object takes precedence over ordinary variable semantics.
For example, let's say you copy a variable that's been tied:
Instead of reading the value in the ordinary fashion from the
$his_speed scalar variable, Perl implicitly calls the
FETCH method on the associated underlying object. It's as
though you'd written this:
$speed = (tied $his_speed)->FETCH():
Or if you'd captured the object returned by the tie, you
could simply use that reference instead of the tied function, as in the following sample code.
$myobj = tie $my_speed, 'Nice', $$;
$speed = $my_speed; # through the implicit interface
$speed = $myobj->FETCH(); # same thing, explicitly
You can use $myobj to call methods other than the implicit ones, such
as those provided by the DB_File class (see Chapter 7, The Standard Perl Library).
However, one normally minds one's own business and leaves the underlying
object alone, which is why you often see the return value from tie
ignored. You can still get at it if you need it later.
That's the external view of it. For our implementation, we'll use the
BSD::Resource class (found in CPAN, but not included with
Perl) to access the PRIO_PROCESS, PRIO_MIN, and
PRIO_MAX constants from your system. Here's the
preamble of our class, which we will put into a file named Nice.pm:
package Nice;
use Carp; # Propagates error messages nicely.
use BSD::Resource; # Use these hooks into the OS.
use strict; # Enforce some discipline on ourselves,
use vars '$DEBUG'; # but exempt $DEBUG from discipline.
The Carp module provides methods carp(), croak(),
and confess(), which we'll use in various spots below. As
usual, see Chapter 7, The Standard Perl Library for more about Carp.
The use strict would ordinarily disallow the use of unqualified
package variables like $DEBUG, but we then declared the global
with use vars, so it's exempt. Otherwise we'd have to say
$Nice::DEBUG everywhere. But it is a global, and other modules
can turn on debugging in our module by setting $Nice::DEBUG to
some other value before using our module.
- TIESCALAR CLASSNAME, LIST
-
The TIESCALAR method of the class (that is, the
class package, but we're going to stop reminding you of that) is
implicitly invoked whenever tie is
called on a scalar variable. The LIST contains
any optional parameters needed to properly initialize an object of the
given class. (In our example, there is only one parameter, the
process ID.) The method is expected to return an object, which may or
may not contain an anonymous scalar as its blessed thingy. In our
example, it does.
sub TIESCALAR {
my $class = shift;
my $pid = shift;
$pid ||= $$; # arg of 0 defaults to my process
if ($pid =~ /\D/) {
carp "Nice::TIESCALAR got non-numeric pid $pid" if $^W;
return undef;
}
unless (kill 0, $pid) { # EPERM or ERSCH, no doubt
carp "Nice::TIESCALAR got bad pid $pid: $!" if $^W;
return undef;
}
return bless \$pid, $class;
}
Recall that the statement with the ||= operator is just shorthand for
$pid = $pid || $$; # set if not set
We say the object contains an anonymous scalar, but it doesn't really
become anonymous until my $pid goes out of scope, since that's the
variable we're generating a reference to when we bestow the blessing.
When returning a reference to an array or hash, one could use the same
approach by employing a lexically scoped array or hash variable, but
usually people just use the anonymous array or hash composers,
[] and {}. There is no similar composer for anonymous
scalars.
On the subject of subterfuge, the kill isn't really killing the
process. On most UNIX systems, a signal 0 merely checks to see whether the
process is there.
This particular tie class has chosen to return an error value rather
than raise an exception if its constructor fails. Other classes may
not wish to be so forgiving. (In any event, the tie itself will
throw an exception when the constructor fails to return an object. But
you get more error messages this way, which many folks seem to prefer.)
This routine checks the global variable $^W (which reflects Perl's
-w flag) to see whether to emit its extra bit of noise.
But for all that, it's an ordinary constructor, and doesn't know it's being
called from tie. It just suspects it strongly.
- FETCH THIS
-
This method is triggered every time the tied variable is accessed
(that is, read). It takes no arguments beyond a reference to the object
that is tied to the variable. (The FETCH methods for arrays and
hashes do, though.) Since in this case we're just using a scalar
thingy as the tied object, a simple scalar dereference, $$self,
allows the method to get at the real value stored in its object. In the
example below, that real value is the process ID to which we've tied our
variable.
sub FETCH {
my $self = shift; # ref to scalar
confess "wrong type" unless ref $self;
croak "too many arguments" if @_;
my $nicety;
local $! = 0; # preserve errno
$nicety = getpriority(PRIO_PROCESS, $$self);
if ($!) { croak "getpriority failed: $!" }
return $nicety;
}
This time we've decided to blow up (raise an exception) if the
getpriority function fails--there's no place for us to return an
error otherwise, and it's probably the right thing to do.
Note the absence of a $ on PRIO_PROCESS. That's really a
subroutine call into BSD::Resource that returns the appropriate constant
to feed back into getpriority. The PRIO_PROCESS
declaration was imported by the use declaration. And that's why
there's no $ on the front of it--it's not a variable. (If you
had put a $, the use strict would have caught it for you
as an unqualified global.)
- STORE THIS, VALUE
-
This method is triggered every time the tied variable is set
(assigned). The first argument, THIS, is again a reference to
the object associated with the variable, and VALUE is the value the
user is assigning to the variable.
sub STORE {
my $self = shift;
my $new_nicety = shift;
confess "wrong type" unless ref $self;
croak "too many arguments" if @_;
if ($new_nicety < PRIO_MIN) {
carp sprintf
"WARNING: priority %d less than minimum system priority %d",
$new_nicety, PRIO_MIN if $^W;
$new_nicety = PRIO_MIN;
}
if ($new_nicety > PRIO_MAX) {
carp sprintf
"WARNING: priority %d greater than maximum system priority %d",
$new_nicety, PRIO_MAX if $^W;
$new_nicety = PRIO_MAX;
}
unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) {
confess "setpriority failed: $!";
}
return $new_nicety;
}
There doesn't appear to be anything worth explaining there, except maybe
that we return the new value because that's what an assignment returns.
- DESTROY THIS
-
This method is triggered when the object associated with the
tied variable needs to be destructed (usually only when it goes out of
scope). As with other object classes, such a method is seldom necessary,
since Perl deallocates the moribund object's memory for you automatically.
Here, we'll use a DESTROY method for debugging purposes only.
sub DESTROY {
my $self = shift;
confess "wrong type" unless ref $self;
carp "[ Nice::DESTROY pid $$self ]" if $DEBUG;
}
That's about all there is to it. Actually, it's more than all there
is to it, since we've done a few nice things here for the sake
of completeness, robustness, and general aesthetics (or lack thereof).
Simpler TIESCALAR classes are certainly possible.
A class implementing a tied ordinary array must define the following
methods: TIEARRAY, FETCH, STORE, and perhaps
DESTROY.
Tied arrays are incomplete. There are, as yet,
no defined methods to deal with $#ARRAY access
(which is hard, since it's an lvalue), nor with the other obvious
array functions, like push, pop, shift,
unshift, and splice. This means that a tied array doesn't
behave like an untied one. You can't even determine the length of the
array. But if you use the tied arrays only for simple read and write
access you'll be OK. These restrictions will be removed in a future
release.
For the purpose of this discussion, we will implement an array whose indices are fixed at
its creation. If you try to access anything beyond those bounds, you will
cause an exception.
require Bounded_Array;
tie @ary, 'Bounded_Array', 2; # maximum allowable subscript is 2
$| = 1;
for $i (0 .. 10) {
print "setting index $i: ";
$ary[$i] = 10 * $i; # should raise exception on 3
print "value of element $i now $ary[$i]\n";
}
The preamble code for the class is as follows:
package Bounded_Array;
use Carp;
use strict;
- TIEARRAY CLASSNAME, LIST
-
This is the constructor for the class. That means it is expected to
return a blessed reference through which the new array (probably an
anonymous array reference) will be accessed.
In our example, just to demonstrate that you don't really have to
use an array thingy, we'll choose a hash thingy to
represent our object. A hash works out well as a generic record type:
the {BOUND} field will store the maximum bound allowed, and the
{ARRAY} field will hold the true array reference. Anyone outside
the class who tries to dereference the object returned (doubtless thinking
it an array reference), will blow up. This just goes to show
that you should respect an object's privacy (unless you're well
acquainted and committed to maintaining a good relationship for the
rest of your life).
sub TIEARRAY {
my $class = shift;
my $bound = shift;
confess "usage: tie(\@ary, 'Bounded_Array', max_subscript)"
if @_ or $bound =~ /\D/;
return bless {
BOUND => $bound,
ARRAY => [],
}, $class;
}
In this case we have used the anonymous hash composer rather than a
lexically scoped variable that goes out of scope. We also used the
array composer within the hash composer.
- FETCH THIS, INDEX
-
This method will be triggered every time an individual element in the
tied array is accessed (read). It takes one argument beyond its self
reference: the index we're trying to fetch. (The index is an
integer, but just because the caller thinks of it as a mundane integer
doesn't mean you have to do anything "linear" with it.
You could use it to seed a random number generator, for instance, or
process it with a hash function to do a random lookup in a hash table.)
Here we use list assignment rather than shift to process the
method arguments. TMTOWTDI.
sub FETCH {
my ($self, $idx) = @_;
if ($idx > $self->{BOUND}) {
confess "Array OOB: $idx > $self->{BOUND}";
}
return $self->{ARRAY}[$idx];
}
As you may have noticed, the names of the FETCH, STORE, and
DESTROY methods are
the same for all tied classes, even though the constructors differ in name
(TIESCALAR versus TIEARRAY). While in theory you could have the
same class servicing several tied types, in practice this becomes
cumbersome, and it's easiest to simply write them with one type per
class.
- STORE THIS, INDEX, VALUE
-
This method will be triggered every time an element in the tied array is set
(written). It takes two arguments beyond its self reference: the index at
which we're trying to store something and the value we're trying to put
there. For example:
sub STORE {
my ($self, $idx, $value) = @_;
if ($idx > $self->{BOUND} ) {
confess "Array OOB: $idx > $self->{BOUND}";
}
return $self->{ARRAY}[$idx] = $value;
}
- DESTROY THIS
-
This method will be triggered when the tied object needs to be
deallocated. As with the scalar tie class, this is almost never needed
in a language that does its own storage allocation, so this time we'll
just leave it out.
The code we presented at the beginning of this section attempts several
out-of-bounds accesses. It will therefore generate the following output:
setting index 0: value of element 0 now 0
setting index 1: value of element 1 now 10
setting index 2: value of element 2 now 20
setting index 3: Array OOB: 3 > 2 at Bounded_Array.pm line 39
Bounded_Array::FETCH called at testba line 12
For historical reasons, hashes have the most complete and useful tie
implementation. A class implementing a tied associative array must
define various methods. TIEHASH is the constructor. FETCH and
STORE access the key/value pairs. EXISTS reports
whether a key is present in the hash, and DELETE deletes one.
CLEAR empties the hash by deleting all the key/value pairs.
FIRSTKEY and NEXTKEY implement the keys and
each built-in functions to iterate over all the keys. And
DESTROY (if defined) is called when the tied object is
deallocated.
If this seems like a lot, then feel free to inherit most of these
methods from the standard Tie::Hash module, redefining only the
interesting ones. See the Tie::Hash module documentation in Chapter 7, The Standard Perl Library for
details.
Remember that Perl distinguishes a key not existing in the hash from
a key that exists with an undefined value.
The two possibilities can be tested with the exists and
defined functions, respectively.
Because functions like keys and values may return huge array
values when used on large hashes (like tied DBM files), you may
prefer to use the each function to iterate over such.
For example:
# print out B-news history file offsets
use NDBM_File;
tie(%HIST, 'NDBM_File', '/usr/lib/news/history', 1, 0);
while (($key,$val) = each %HIST) {
print $key, ' = ', unpack('L',$val), "\n";
}
untie(%HIST);
(But does anyone run B-news any more?)
Here's an example of a somewhat peculiar tied hash class: it gives you
a hash representing a particular user's dotfiles (that is, files whose
names begin with a period). You index into the hash
with the name of the file (minus the period) and you get back that dotfile's
contents. For example:
use DotFiles;
tie %dot, "DotFiles";
if ( $dot{profile} =~ /MANPATH/ or
$dot{login} =~ /MANPATH/ or
$dot{cshrc} =~ /MANPATH/ )
{
print "you've set your manpath\n";
}
Here's another way to use our tied class:
# third argument is name of user whose dot files we will tie to
tie %him, 'DotFiles', 'daemon';
foreach $f ( keys %him ) {
printf "daemon dot file %s is size %d\n",
$f, length $him{$f};
}
In our DotFiles example we implement the object as a regular hash
containing several important fields, of which only the {CONTENTS}
field will be what the user thinks of as the real hash. Here are the
fields:
- USER
-
Whose dot files this object represents
- HOME
-
Where those dotfiles live
- CLOBBER
-
Whether we are allowed to change or remove those dot files
- CONTENTS
-
The hash of dotfile names and content mappings
Here's the start of DotFiles.pm:
package DotFiles;
use Carp;
sub whowasi { (caller(1))[3] . '()' }
my $DEBUG = 0;
sub debug { $DEBUG = @_ ? shift : 1 }
For our example, we want to be able to emit debugging information to help in
tracing during development. We also keep one convenience function around
internally to help print out warnings; whowasi() returns the name
of the function that called the current function (whowasi()'s
"grandparent" function).
Here are the methods for the DotFiles tied hash.
- TIEHASH CLASSNAME, LIST
-
This is the constructor for the class. That means it is expected to
return a blessed reference through which the new object may be accessed.
Again, the user of the tied class probably has little need of the object.
It's Perl itself that needs the returned object so that it can magically
call the right methods when the tied variable is accessed.
Here's the constructor:
sub TIEHASH {
my $self = shift;
my $user = shift || $>;
my $dotdir = shift || "";
croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_;
$user = getpwuid($user) if $user =~ /^\d+$/;
my $dir = (getpwnam($user))[7]
or croak "@{[&whowasi]}: no user $user";
$dir .= "/$dotdir" if $dotdir;
my $node = {
USER => $user,
HOME => $dir,
CONTENTS => {},
CLOBBER => 0,
};
opendir DIR, $dir
or croak "@{[&whowasi]}: can't opendir $dir: $!";
foreach $dot ( grep /^\./ && -f "$dir/$_", readdir(DIR)) {
$dot =~ s/^\.//;
$node->{CONTENTS}{$dot} = undef;
}
closedir DIR;
return bless $node, $self;
}
It's probably worth mentioning that if you're going to filetest the
return values returned by that readdir, you'd better prepend the
directory in question (as we do). Otherwise, since no chdir was
done, you'd test the wrong file.
- FETCH THIS, KEY
-
This method will be triggered every time an element in the tied hash is
accessed (read). It takes one argument beyond its self reference: the key
whose value we're trying to fetch. The key is a string, and you can
do anything you like with it (consistent with its being a string).
Here's the fetch for our DotFiles example.
sub FETCH {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $dir = $self->{HOME};
my $file = "$dir/.$dot";
unless (exists $self->{CONTENTS}->{$dot} || -f $file) {
carp "@{[&whowasi]}: no $dot file" if $DEBUG;
return undef;
}
# Implement a cache.
if (defined $self->{CONTENTS}->{$dot}) {
return $self->{CONTENTS}->{$dot};
} else {
return $self->{CONTENTS}->{$dot} = `cat $dir/.$dot`;
}
}
This function was easy to write by having it call the UNIX
cat (1) command, but it would be more portable (and
somewhat more efficient) to open the file ourselves. On the other hand,
since dot files are a UNIXy concept, we're not that concerned.
- STORE THIS, KEY, VALUE
-
This method will be triggered every time an element in the tied hash is set
(written). It takes two arguments beyond its self reference: the key under
which we're storing the value and the value we're putting there.
Here in our DotFiles example we won't let users overwrite a file without
first calling the clobber()
method on the original object reference returned by tie.
sub STORE {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $value = shift;
my $file = $self->{HOME} . "/.$dot";
croak "@{[&whowasi]}: $file not clobberable"
unless $self->{CLOBBER};
open(F, "> $file") or croak "can't open $file: $!";
print F $value;
close(F);
}
If they want to clobber something, they can say:
$ob = tie %daemon_dots, 'daemon';
$ob->clobber(1);
$daemon_dots{signature} = "A true daemon\n";
But there's also the tied function, so they could alternatively
set clobber using:
tie %daemon_dots, 'daemon';
tied(%daemon_dots)->clobber(1);
The clobber method is simply:
sub clobber {
my $self = shift;
$self->{CLOBBER} = @_ ? shift : 1;
}
- DELETE THIS, KEY
-
This method is triggered when we remove an element from the hash,
typically by using the delete function. Again, we'll
be careful to check whether the user really wants to clobber files.
sub DELETE {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $file = $self->{HOME} . "/.$dot";
croak "@{[&whowasi]}: won't remove file $file"
unless $self->{CLOBBER};
delete $self->{CONTENTS}->{$dot};
unlink $file or carp "@{[&whowasi]}: can't unlink $file: $!";
}
- CLEAR THIS
-
This method is triggered when the whole hash is to be cleared, usually by
assigning the empty list to it.
In our example, that would remove all the user's dotfiles! It's such a
dangerous thing that we'll require CLOBBER
to be set higher than 1 before this can happen.
sub CLEAR {
carp &whowasi if $DEBUG;
my $self = shift;
croak "@{[&whowasi]}: won't remove all dotfiles for $self->{USER}"
unless $self->{CLOBBER} > 1;
my $dot;
foreach $dot ( keys %{$self->{CONTENTS}}) {
$self->DELETE($dot);
}
}
- EXISTS THIS, KEY
-
This method is triggered when the user invokes the exists function
on a particular hash. In our example, we'll look at the {CONTENTS}
hash element to find the answer:
sub EXISTS {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
return exists $self->{CONTENTS}->{$dot};
}
- FIRSTKEY THIS
-
This method is triggered when the user begins to iterate through the
hash, such as with a keys or each call. By calling keys in a scalar context, we reset its internal
state to ensure that the next each used
in the return statement will get the
first key.
sub FIRSTKEY {
carp &whowasi if $DEBUG;
my $self = shift;
my $a = keys %{$self->{CONTENTS}};
return scalar each %{$self->{CONTENTS}};
}
- NEXTKEY THIS, LASTKEY
-
This method is triggered during a keys
or each iteration. It has a second
argument which is the last key that has been accessed. This is useful
if the NEXTKEY method needs to know its previous
state to calculate the next state.
For our example, we are using a real hash to represent the tied
hash's data, except that this hash is stored in the hash's
CONTENTS field instead of in the hash itself. So we can just
rely on Perl's each iterator:
sub NEXTKEY {
carp &whowasi if $DEBUG;
my $self = shift;
return scalar each %{ $self->{CONTENTS} }
}
- DESTROY THIS
-
This method is triggered when a tied hash's object is about to be
deallocated. You don't really need it except for debugging and extra
cleanup. Here's a very simple
function:
sub DESTROY {
carp &whowasi if $DEBUG;
}
|