In older versions of Perl, a user could call
dbmopen
to
tie a hash to a UNIX DBM file. Whenever the
hash was accessed, the database file on disk (really just a hash,
not a full relational database) would be magically[
]
read from or written to. In modern versions of Perl, you can bind any
ordinary variable (scalar, array, or hash) to an implementation
class by using
tie
. (The class may or may not implement a
DBM file.) You can break this association with
untie
.
The
tie
function creates the association by creating an object
internally to represent the variable to the class. If you have a tied
variable, but want to get at the underlying object, there are
two ways to do it. First, the
tie
function returns a reference to the object.
But if you didn't bother to store that object reference
anywhere, you could still retrieve it using the
tied
function.
$object = tie
VARIABLE
,
CLASSNAME
,
LIST
untie
VARIABLE
$object = tied
VARIABLE
The
tie
function binds the variable to the class package that
provides the methods for that variable. Once
this magic has been performed, accessing a tied variable automatically
triggers method calls in the proper class. All the complexity of the
class is hidden behind magic method calls. The method names are
predetermined, since they're called implicitly from within the innards
of Perl. These names are in
ALL CAPS
, which is a convention in Perl
culture that indicates that the routines are called implicitly rather than
explicitly - just like
BEGIN
,
END
, and
DESTROY
.
And
AUTOLOAD
too, for that matter.
You can almost think of
tie
as a funny kind of
bless
, except
that it blesses a bare variable instead of a thingy reference, and
takes extra parameters, like a constructor. That's because it actually
does call a constructor internally. (That's one of the magic methods we
mentioned.) This constructor is passed the
CLASSNAME
you specified, as
well as any additional arguments you supply in the
LIST
. It is not
passed the
VARIABLE
, however. The only way the constructor can
tell which kind of
VARIABLE
is being tied is by knowing its
own method name. This is not the customary constructor name,
new
, but rather one of
TIESCALAR
,
TIEARRAY
, or
TIEHASH
. (You can likely figure out which name goes with which
variable type.) The constructor just returns an object reference in the
normal fashion, and doesn't worry about whether it was called from
tie
- which it may not have been, since you can call these methods
directly if you like. (Indeed, if you've tied your variable to a class
that provides other methods not accessible through the variable,
you
must
call the other methods directly yourself, via the object
reference. These extra methods might provide services like file locking
or other forms of transaction protection.)
As in any constructor, these constructors must
bless
a reference to a
thingy and return it as the implementation object. The thingy inside
the implementation object doesn't have to be of the same type as the
variable you're tying to. It does have to be a properly blessed
object, though. See the example below on tied arrays, which uses a hash
object to hold information about an array.
The
tie
function will not
use
or
require
a module for you - you
must do that explicitly yourself. (On the other hand, the
dbmopen
emulator function will, for backward compatibility, attempt to
use one or another DBM implementation. But you can
preempt its selection with an explicit
use
, provided the module
you
use
is one of the modules in
dbmopen
's list of modules
to try.
See the AnyDBM_File module in
Chapter 7
for a fuller explanation.)
A class implementing a tied scalar must define the following methods:
TIESCALAR
,
FETCH
,
STORE
, and possibly
DESTROY
. These routines will be invoked implicitly when you
tie
a variable (
TIESCALAR
), read a tied variable
(
FETCH
), or assign a value to a tied variable (
STORE
). The
DESTROY
method is called (as always) when the last reference to
the object disappears. (This may or may not happen when you call
untie
, which destroys the reference used by the tie, but doesn't
destroy any outstanding references you may have squirreled away
elsewhere.) The
FETCH
and
STORE
methods are triggered
when you access the variable that's been tied, not the object it's been
tied to. If you have a handle on the object (either returned by the
initial
tie
or retrieved later via
tied
), you can access the
underlying object yourself without automatically triggering its
FETCH
or
STORE
methods.
Let's look at each of these methods in turn, using as our example an
imaginary class called
Nice
.[
]
Variables tied to this class are scalars containing process priorities,
and each such variable is implicitly associated with an object that
contains a particular process ID, such the ID of the currently running
process or of the parent process. (Presumably you'd name your variables
to remind you which process you're referring to.) Variables are tied to
the class this way:
use Nice; # load the Nice.pm module
tie $his_speed, 'Nice', getppid();
tie $my_speed, 'Nice', $$;
Once the variables have been tied, their previous contents are no longer
accessible. The internally forged connection between the variable and
the object takes precedence over ordinary variable semantics.
For example, let's say you copy a variable that's been tied:
$speed = $his_speed;
Instead of reading the value in the ordinary fashion from the
$his_speed
scalar variable, Perl implicitly calls the
FETCH
method on the associated underlying object. It's as
though you'd written this:
$speed = (tied $his_speed)->FETCH():
Or if you'd captured the object returned by the
tie
, you
could simply use that reference instead of the
tied
function, as in the following sample code.
$myobj = tie $my_speed, 'Nice', $$;
$speed = $my_speed; # through the implicit interface
$speed = $myobj->FETCH(); # same thing, explicitly
You can use
$myobj
to call methods other than the implicit ones, such
as those provided by the
DB_File
class (see
Chapter 7
).
However, one normally minds one's own business and leaves the underlying
object alone, which is why you often see the return value from
tie
ignored. You can still get at it if you need it later.
That's the external view of it. For our implementation, we'll use the
BSD::Resource
class (found in CPAN, but not included with
Perl) to access the
PRIO_PROCESS
,
PRIO_MIN
, and
PRIO_MAX
constants from your system. Here's the
preamble of our class, which we will put into a file named
Nice.pm
:
package Nice;
use Carp; # Propagates error messages nicely.
use BSD::Resource; # Use these hooks into the OS.
use strict; # Enforce some discipline on ourselves,
use vars '$DEBUG'; # but exempt $DEBUG from discipline.
The Carp module provides methods
carp()
,
croak()
,
and
confess()
, which we'll use in various spots below. As
usual, see
Chapter 7
for more about Carp.
The
use strict
would ordinarily disallow the use of unqualified
package variables like
$DEBUG
, but we then declared the global
with
use vars
, so it's exempt. Otherwise we'd have to say
$Nice::DEBUG
everywhere. But it is a global, and other modules
can turn on debugging in our module by setting
$Nice::DEBUG
to
some other value before using our module.
-
TIESCALAR
CLASSNAME
,
LIST
-
The
TIESCALAR
method of the class (that is, the
class package, but we're going to stop reminding you of that) is
implicitly invoked whenever
tie
is
called on a scalar variable. The
LIST
contains
any optional parameters needed to properly initialize an object of the
given class. (In our example, there is only one parameter, the
process ID.) The method is expected to return an object, which may or
may not contain an anonymous scalar as its blessed thingy. In our
example, it does.
sub TIESCALAR {
my $class = shift;
my $pid = shift;
$pid ||= $$; # arg of 0 defaults to my process
if ($pid =~ /\D/) {
carp "Nice::TIESCALAR got non-numeric pid $pid" if $^W;
return undef;
}
unless (kill 0, $pid) { # EPERM or ERSCH, no doubt
carp "Nice::TIESCALAR got bad pid $pid: $!" if $^W;
return undef;
}
return bless \$pid, $class;
}
Recall that the statement with the
||=
operator is just shorthand for
$pid = $pid || $$; # set if not set
We say the object contains an anonymous scalar, but it doesn't really
become anonymous until
my $pid
goes out of scope, since that's the
variable we're generating a reference to when we bestow the blessing.
When returning a reference to an array or hash, one could use the same
approach by employing a lexically scoped array or hash variable, but
usually people just use the anonymous array or hash composers,
[]
and
{}
. There is no similar composer for anonymous
scalars.
On the subject of subterfuge, the
kill
isn't really killing the
process. On most UNIX systems, a signal 0 merely checks to see whether the
process is there.
This particular
tie
class has chosen to return an error value rather
than raise an exception if its constructor fails. Other classes may
not wish to be so forgiving. (In any event, the
tie
itself will
throw an exception when the constructor fails to return an object. But
you get more error messages this way, which many folks seem to prefer.)
This routine checks the global variable
$^W
(which reflects Perl's
-w
flag) to see whether to emit its extra bit of noise.
But for all that, it's an ordinary constructor, and doesn't know it's being
called from
tie
. It just suspects it strongly.
-
FETCH
THIS
-
This method is triggered every time the tied variable is accessed
(that is, read). It takes no arguments beyond a reference to the object
that is tied to the variable. (The
FETCH
methods for arrays and
hashes do, though.) Since in this case we're just using a scalar
thingy as the tied object, a simple scalar dereference,
$$self
,
allows the method to get at the real value stored in its object. In the
example below, that real value is the process ID to which we've tied our
variable.
sub FETCH {
my $self = shift; # ref to scalar
confess "wrong type" unless ref $self;
croak "too many arguments" if @_;
my $nicety;
local $! = 0; # preserve errno
$nicety = getpriority(PRIO_PROCESS, $$self);
if ($!) { croak "getpriority failed: $!" }
return $nicety;
}
This time we've decided to blow up (raise an exception) if the
getpriority
function fails - there's no place for us to return an
error otherwise, and it's probably the right thing to do.
Note the absence of a
$
on
PRIO_PROCESS
. That's really a
subroutine call into
BSD::Resource
that returns the appropriate constant
to feed back into
getpriority
. The
PRIO_PROCESS
declaration was imported by the
use
declaration. And that's why
there's no
$
on the front of it - it's not a variable. (If you
had put a
$
, the
use strict
would have caught it for you
as an unqualified global.)
-
STORE
THIS, VALUE
-
This method is triggered every time the tied variable is set
(assigned). The first argument,
THIS
, is again a reference to
the object associated with the variable, and
VALUE
is the value the
user is assigning to the variable.
sub STORE {
my $self = shift;
my $new_nicety = shift;
confess "wrong type" unless ref $self;
croak "too many arguments" if @_;
if ($new_nicety < PRIO_MIN) {
carp sprintf
"WARNING: priority %d less than minimum system priority %d",
$new_nicety, PRIO_MIN if $^W;
$new_nicety = PRIO_MIN;
}
if ($new_nicety > PRIO_MAX) {
carp sprintf
"WARNING: priority %d greater than maximum system priority %d",
$new_nicety, PRIO_MAX if $^W;
$new_nicety = PRIO_MAX;
}
unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) {
confess "setpriority failed: $!";
}
return $new_nicety;
}
There doesn't appear to be anything worth explaining there, except maybe
that we return the new value because that's what an assignment returns.
-
DESTROY
THIS
-
This method is triggered when the object associated with the
tied variable needs to be destructed (usually only when it goes out of
scope). As with other object classes, such a method is seldom necessary,
since Perl deallocates the moribund object's memory for you automatically.
Here, we'll use a
DESTROY
method for debugging purposes only.
sub DESTROY {
my $self = shift;
confess "wrong type" unless ref $self;
carp "[ Nice::DESTROY pid $$self ]" if $DEBUG;
}
That's about all there is to it. Actually, it's more than all there
is to it, since we've done a few nice things here for the sake
of completeness, robustness, and general aesthetics (or lack thereof).
Simpler
TIESCALAR
classes are certainly possible.
A class implementing a tied ordinary array must define the following
methods:
TIEARRAY
,
FETCH
,
STORE
, and perhaps
DESTROY
.
Tied arrays are incomplete. There are, as yet,
no defined methods to deal with
$#ARRAY
access
(which is hard, since it's an lvalue), nor with the other obvious
array functions, like
push
,
pop
,
shift
,
unshift
, and
splice
. This means that a tied array doesn't
behave like an untied one. You can't even determine the length of the
array. But if you use the tied arrays only for simple read and write
access you'll be OK. These restrictions will be removed in a future
release.
For the purpose of this discussion, we will implement an array whose indices are fixed at
its creation. If you try to access anything beyond those bounds, you will
cause an exception.
require Bounded_Array;
tie @ary, 'Bounded_Array', 2; # maximum allowable subscript is 2
$| = 1;
for $i (0 .. 10) {
print "setting index $i: ";
$ary[$i] = 10 * $i; # should raise exception on 3
print "value of element $i now $ary[$i]\n";
}
The preamble code for the class is as follows:
package Bounded_Array;
use Carp;
use strict;
-
TIEARRAY
CLASSNAME
,
LIST
-
This is the constructor for the class. That means it is expected to
return a blessed reference through which the new array (probably an
anonymous array reference) will be accessed.
In our example, just to demonstrate that you don't really have to
use an array thingy, we'll choose a hash thingy to
represent our object. A hash works out well as a generic record type:
the
{BOUND}
field will store the maximum bound allowed, and the
{ARRAY}
field will hold the true array reference. Anyone outside
the class who tries to dereference the object returned (doubtless thinking
it an array reference), will blow up. This just goes to show
that you should respect an object's privacy (unless you're well
acquainted and committed to maintaining a good relationship for the
rest of your life).
sub TIEARRAY {
my $class = shift;
my $bound = shift;
confess "usage: tie(\@ary, 'Bounded_Array', max_subscript)"
if @_ or $bound =~ /\D/;
return bless {
BOUND => $bound,
ARRAY => [],
}, $class;
}
In this case we have used the anonymous hash composer rather than a
lexically scoped variable that goes out of scope. We also used the
array composer within the hash composer.
-
FETCH
THIS, INDEX
-
This method will be triggered every time an individual element in the
tied array is accessed (read). It takes one argument beyond its self
reference: the index we're trying to fetch. (The index is an
integer, but just because the caller thinks of it as a mundane integer
doesn't mean you have to do anything "linear" with it.
You could use it to seed a random number generator, for instance, or
process it with a hash function to do a random lookup in a hash table.)
Here we use list assignment rather than
shift
to process the
method arguments. TMTOWTDI.
sub FETCH {
my ($self, $idx) = @_;
if ($idx > $self->{BOUND}) {
confess "Array OOB: $idx > $self->{BOUND}";
}
return $self->{ARRAY}[$idx];
}
As you may have noticed, the names of the
FETCH
,
STORE
, and
DESTROY
methods are
the same for all tied classes, even though the constructors differ in name
(
TIESCALAR
versus
TIEARRAY
). While in theory you could have the
same class servicing several tied types, in practice this becomes
cumbersome, and it's easiest to simply write them with one type per
class.
-
STORE
THIS, INDEX, VALUE
-
This method will be triggered every time an element in the tied array is set
(written). It takes two arguments beyond its self reference: the index at
which we're trying to store something and the value we're trying to put
there. For example:
sub STORE {
my ($self, $idx, $value) = @_;
if ($idx > $self->{BOUND} ) {
confess "Array OOB: $idx > $self->{BOUND}";
}
return $self->{ARRAY}[$idx] = $value;
}
-
DESTROY
THIS
-
This method will be triggered when the tied object needs to be
deallocated. As with the scalar tie class, this is almost never needed
in a language that does its own storage allocation, so this time we'll
just leave it out.
The code we presented at the beginning of this section attempts several
out-of-bounds accesses. It will therefore generate the following output:
setting index 0: value of element 0 now 0
setting index 1: value of element 1 now 10
setting index 2: value of element 2 now 20
setting index 3: Array OOB: 3 > 2 at Bounded_Array.pm line 39
Bounded_Array::FETCH called at testba line 12
For historical reasons, hashes have the most complete and useful
tie
implementation. A class implementing a tied associative array must
define various methods.
TIEHASH
is the constructor.
FETCH
and
STORE
access the key/value pairs.
EXISTS
reports
whether a key is present in the hash, and
DELETE
deletes one.
CLEAR
empties the hash by deleting all the key/value pairs.
FIRSTKEY
and
NEXTKEY
implement the
keys
and
each
built-in functions to iterate over all the keys. And
DESTROY
(if defined) is called when the tied object is
deallocated.
If this seems like a lot, then feel free to inherit most of these
methods from the standard Tie::Hash module, redefining only the
interesting ones. See the Tie::Hash module documentation in
Chapter 7
for
details.
Remember that Perl distinguishes a key not existing in the hash from
a key that exists with an undefined value.
The two possibilities can be tested with the
exists
and
defined
functions, respectively.
Because functions like
keys
and
values
may return huge array
values when used on large hashes (like tied DBM files), you may
prefer to use the
each
function to iterate over such.
For example:
# print out B-news history file offsets
use NDBM_File;
tie(%HIST, 'NDBM_File', '/usr/lib/news/history', 1, 0);
while (($key,$val) = each %HIST) {
print $key, ' = ', unpack('L',$val), "\n";
}
untie(%HIST);
(But does anyone run B-news any more?)
Here's an example of a somewhat peculiar tied hash class: it gives you
a hash representing a particular user's dotfiles (that is, files whose
names begin with a period). You index into the hash
with the name of the file (minus the period) and you get back that dotfile's
contents. For example:
use DotFiles;
tie %dot, "DotFiles";
if ( $dot{profile} =~ /MANPATH/ or
$dot{login} =~ /MANPATH/ or
$dot{cshrc} =~ /MANPATH/ )
{
print "you've set your manpath\n";
}
Here's another way to use our tied class:
# third argument is name of user whose dot files we will tie to
tie %him, 'DotFiles', 'daemon';
foreach $f ( keys %him ) {
printf "daemon dot file %s is size %d\n",
$f, length $him{$f};
}
In our DotFiles example we implement the object as a regular hash
containing several important fields, of which only the
{CONTENTS}
field will be what the user thinks of as the real hash. Here are the
fields:
-
USER
-
Whose dot files this object represents
-
HOME
-
Where those dotfiles live
-
CLOBBER
-
Whether we are allowed to change or remove those dot files
-
CONTENTS
-
The hash of dotfile names and content mappings
Here's the start of
DotFiles.pm
:
package DotFiles;
use Carp;
sub whowasi { (caller(1))[3] . '()' }
my $DEBUG = 0;
sub debug { $DEBUG = @_ ? shift : 1 }
For our example, we want to be able to emit debugging information to help in
tracing during development. We also keep one convenience function around
internally to help print out warnings;
whowasi()
returns the name
of the function that called the current function (
whowasi()
's
"grandparent" function).
Here are the methods for the DotFiles tied hash.
-
TIEHASH
CLASSNAME
,
LIST
-
This is the constructor for the class. That means it is expected to
return a blessed reference through which the new object may be accessed.
Again, the user of the tied class probably has little need of the object.
It's Perl itself that needs the returned object so that it can magically
call the right methods when the tied variable is accessed.
Here's the constructor:
sub TIEHASH {
my $self = shift;
my $user = shift || $>;
my $dotdir = shift || "";
croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_;
$user = getpwuid($user) if $user =~ /^\d+$/;
my $dir = (getpwnam($user))[7]
or croak "@{[&whowasi]}: no user $user";
$dir .= "/$dotdir" if $dotdir;
my $node = {
USER => $user,
HOME => $dir,
CONTENTS => {},
CLOBBER => 0,
};
opendir DIR, $dir
or croak "@{[&whowasi]}: can't opendir $dir: $!";
foreach $dot ( grep /^\./ && -f "$dir/$_", readdir(DIR)) {
$dot =~ s/^\.//;
$node->{CONTENTS}{$dot} = undef;
}
closedir DIR;
return bless $node, $self;
}
It's probably worth mentioning that if you're going to filetest the
return values returned by that
readdir
, you'd better prepend the
directory in question (as we do). Otherwise, since no
chdir
was
done, you'd test the wrong file.
-
FETCH
THIS, KEY
-
This method will be triggered every time an element in the tied hash is
accessed (read). It takes one argument beyond its self reference: the key
whose value we're trying to fetch. The key is a string, and you can
do anything you like with it (consistent with its being a string).
Here's the fetch for our DotFiles example.
sub FETCH {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $dir = $self->{HOME};
my $file = "$dir/.$dot";
unless (exists $self->{CONTENTS}->{$dot} || -f $file) {
carp "@{[&whowasi]}: no $dot file" if $DEBUG;
return undef;
}
# Implement a cache.
if (defined $self->{CONTENTS}->{$dot}) {
return $self->{CONTENTS}->{$dot};
} else {
return $self->{CONTENTS}->{$dot} = `cat $dir/.$dot`;
}
}
This function was easy to write by having it call the UNIX
cat
(1) command, but it would be more portable (and
somewhat more efficient) to open the file ourselves. On the other hand,
since dot files are a UNIXy concept, we're not that concerned.
-
STORE
THIS, KEY, VALUE
-
This method will be triggered every time an element in the tied hash is set
(written). It takes two arguments beyond its self reference: the key under
which we're storing the value and the value we're putting there.
Here in our DotFiles example we won't let users overwrite a file without
first calling the
clobber()
method on the original object reference returned by
tie
.
sub STORE {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $value = shift;
my $file = $self->{HOME} . "/.$dot";
croak "@{[&whowasi]}: $file not clobberable"
unless $self->{CLOBBER};
open(F, "> $file") or croak "can't open $file: $!";
print F $value;
close(F);
}
If they want to clobber something, they can say:
$ob = tie %daemon_dots, 'daemon';
$ob->clobber(1);
$daemon_dots{signature} = "A true daemon\n";
But there's also the
tied
function, so they could alternatively
set
clobber
using:
tie %daemon_dots, 'daemon';
tied(%daemon_dots)->clobber(1);
The
clobber
method is simply:
sub clobber {
my $self = shift;
$self->{CLOBBER} = @_ ? shift : 1;
}
-
DELETE
THIS, KEY
-
This method is triggered when we remove an element from the hash,
typically by using the
delete
function. Again, we'll
be careful to check whether the user really wants to clobber files.
sub DELETE {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $file = $self->{HOME} . "/.$dot";
croak "@{[&whowasi]}: won't remove file $file"
unless $self->{CLOBBER};
delete $self->{CONTENTS}->{$dot};
unlink $file or carp "@{[&whowasi]}: can't unlink $file: $!";
}
-
CLEAR
THIS
-
This method is triggered when the whole hash is to be cleared, usually by
assigning the empty list to it.
In our example, that would remove all the user's dotfiles! It's such a
dangerous thing that we'll require
CLOBBER
to be set higher than
1
before this can happen.
sub CLEAR {
carp &whowasi if $DEBUG;
my $self = shift;
croak "@{[&whowasi]}: won't remove all dotfiles for $self->{USER}"
unless $self->{CLOBBER} > 1;
my $dot;
foreach $dot ( keys %{$self->{CONTENTS}}) {
$self->DELETE($dot);
}
}
-
EXISTS
THIS, KEY
-
This method is triggered when the user invokes the
exists
function
on a particular hash. In our example, we'll look at the
{CONTENTS}
hash element to find the answer:
sub EXISTS {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
return exists $self->{CONTENTS}->{$dot};
}
-
FIRSTKEY
THIS
-
This method is triggered when the user begins to iterate through the
hash, such as with a
keys
or
each
call. By calling
keys
in a scalar context, we reset its internal
state to ensure that the next
each
used
in the
return
statement will get the
first key.
sub FIRSTKEY {
carp &whowasi if $DEBUG;
my $self = shift;
my $a = keys %{$self->{CONTENTS}};
return scalar each %{$self->{CONTENTS}};
}
-
NEXTKEY
THIS, LASTKEY
-
This method is triggered during a
keys
or
each
iteration. It has a second
argument which is the last key that has been accessed. This is useful
if the
NEXTKEY
method needs to know its previous
state to calculate the next state.
For our example, we are using a real hash to represent the tied
hash's data, except that this hash is stored in the hash's
CONTENTS
field instead of in the hash itself. So we can just
rely on Perl's
each
iterator:
sub NEXTKEY {
carp &whowasi if $DEBUG;
my $self = shift;
return scalar each %{ $self->{CONTENTS} }
}
-
DESTROY
THIS
-
This method is triggered when a tied hash's object is about to be
deallocated. You don't really need it except for debugging and extra
cleanup. Here's a very simple
function:
sub DESTROY {
carp &whowasi if $DEBUG;
}
|