8. Object Orientation: The Next Few StepsNo ties bind so strongly as the links of inheritance. - Stephen Jay Gould This chapter is essentially a motley collection of ideas, techniques, and opinions related to Perl objects. I have not attempted to weave these threads too closely. The topics are as follows:
8.1 Efficient Attribute StorageHash tables have traditionally been used for storing object attributes. There are good reasons for doing this:
Of course, not every problem requires this degree of generality. In addition, while Perl's hash tables are fast (within 15% of the speed of arrays) and reasonably compact (key strings are not duplicated), they are not exactly inexpensive. Creating 100 objects means that you have 100 hash tables, each of which tends to optimistically allocate extra space to accommodate future insertions. This section illustrates two alternate approaches, one using arrays and another using typeglobs. Both approaches are less general than the hash table approach but are faster and leaner. The first is a module called ObjectTemplate developed for this book.[ 1 ] The other uses typeglobs and has seen limited application in some standard CPAN modules, most notably IO and Net. I hesitate to suggest this as an alternative approach because it is way too "hackish," but I present it here to enable you to understand these standard modules.
8.1.1 ObjectTemplate: Attribute Storage Using ArraysThe module presented in this section uses arrays to store attributes (but not the array per object approach). Let us briefly see its usage before moving on to the implementation. To implement the Employee class, with the attributes "name," "age," and "position," you simply inherit from ObjectTemplate, and supply a list of attribute names to a static method called attributes (exported by ObjectTemplate ), as follows: package Employee; use ObjectTemplate; # Import ObjectTemplate @ISA = qw(ObjectTemplate); # Inherit from it. attributes qw(name age position); # Declare your attributes That's all. A user of this module can now create Employee objects using a dynamically generated method called new and retrieve and modify attributes using accessor methods (also created automagically): use Employee; $obj = Employee->new( "name" => "Norma Jean", "age" => 25 ); # new() created by ObjectTemplate $obj->position("Actress"); print $obj->name, ":", $obj->age, "\n"; Note that Perl permits you to omit the trailing parentheses for any method call in which there is no ambiguity about its usage. Any word following an arrow is automatically treated as a method, as in the preceding case. ObjectTemplate provides the following features for an inherited class:
8.1.1.1 ObjectTemplate internals overviewFigure 8.1 shows how ObjectTemplate organizes object attributes. Figure 8.1: ObjectTemplate's attribute storage schemeThe data structure is quite simple. Instead of allocating one array or hash per object, ObjectTemplate creates only as many arrays as there are attributes (the columns shown in the figure). Each object is merely a "horizontal slice" across these attribute columns. When new() is called, it allocates a new logical row and inserts each element of the initializer array in the corresponding attribute column at the new row offset. The "object," therefore, is merely a blessed scalar containing that row index. This scheme is more space-efficient than the hash approach, because it creates so few container arrays (only as many as there are attributes), and it is faster because array accesses are always a little faster than hash accesses.
There's a slight hitch when an object is deleted. Although the corresponding row is logically free, we can't really move up the rest of the rows below, because the other object references (which are indices) and their data will get out of sync. ObjectTemplate therefore reuses deallocated (free) rows by maintaining a per-package "free list" called Since the freed and active rows do not overlap, we take the liberty of using one of the attribute columns (the first one) to hold @_free . This is done using typeglob aliasing. Figure 8.2 shows a snapshot of this structure. Figure 8.2: ObjectTemplate's scheme for managing holes created by deleted objectsYou might have noticed that I'm using the same identifier name, _free , for two variables, $_free and @free . Although I frown on this idea in general, I have used it here for two reasons. First, both are required for the same task; second, one typeglob alias gives us access to both variables in one shot. This is important for performance, as we shall see soon. 8.1.1.2 ObjectTemplate implementationObjectTemplate uses objects, typeglob aliasing, symbolic references, and eval liberally, so if you understand the code below, you can consider yourself a Perl hacker! One way to pore through this code is to read the descriptions supplied in this section while using the debugger to step through a small example that uses this module. Of course, you don't have to understand the code to use it. package ObjectTemplate; require Exporter; @ObjectTemplate::ISA = qw(Exporter); @ObjectTemplate::EXPORT = qw(attributes); my $debugging = 0; # assign 1 to it to see code generated on the fly # Create accessor methods, and new() sub attributes { my ($pkg) = caller; @{"${pkg}::_ATTRIBUTES_"} = @_; my $code = ""; foreach my $attr (get_attribute_names($pkg)) { # If a field name is "color", create a global array in the # calling package called @color @{"${pkg}::_$attr"} = (); # Define accessor only if it is not already present unless ($pkg->can("$attr")) { $code .= _define_accessor ($pkg, $attr); } } $code .= _define_constructor($pkg); eval $code; if ($@) { die "ERROR defining constructor and attributes for '$pkg':" . "\n\t$@\n" . "-----------------------------------------------------" . $code; } }
attributes
uses symbolic references to create a global array called
@_ATTRIBUTES
that remembers the attribute names. This array is then used by
get_attribute_names
to access all attributes defined in the current package and all its super classes. For each such attribute, sub _define_accessor { my ($pkg, $attr) = @_; # This code creates an accessor method for a given # attribute name. This method returns the attribute value # if given no args, and modifies it if given one arg. # Either way, it returns the latest value of that attribute # qq makes this block behave like a double-quoted string my $code = qq{ package $pkg; sub $attr { # Accessor ... \@_ > 1 ? \$_${attr} \[\${\$_[0]}] = \$_[1] # set : \$_${attr} \[\${\$_[0]}]; # get } if (!defined \$_free) { # Alias the first attribute column to _free \*_free = \*_$attr; \$_free = 0; }; }; $code; } _define_accessor is called for every field name given to attributes and for every attribute found in the module's superclasses. For an attribute called age in the Employee module, for example, it generates the following code: package Employee; sub age { # Accessor @_ ? $_age[$$_[0]] = $_[1]; # set : $_age[$$_[0]]; # get } if (!defined $_free) { *_free = *_age; # Alias the first attribute column #to _free $_free = 0; }; $_[0] contains the object, and $_[1] contains the attribute value. Therefore $$_[0] contains the row index, and $_age[$$_[0]] contains the value of the age attribute of that object. In addition, _define_accessor aliases _free to _age if the aliases don't already exist. sub _define_constructor { my $pkg = shift; my $code = qq{ package $pkg; sub new { my \$class = shift; my \$inst_id; if (defined(\$_free[\$_free])) { \$inst_id = \$_free; \$_free = \$_free[\$_free]; undef \$_free[\$inst_id]; } else { \$inst_id = \$_free++; } my \$obj = bless \\\$inst_id, \$class; \$obj->set_attributes(\@_) if \@_; \$obj->initialize; \$obj; } }; $code; } _define_constructor generates code for a constructor called new to be installed in the calling package. new checks the free list, and if it contains rows to spare, it uses the row number from the top of that list. It then undef 's the head of the list, because the free list is aliased to the first attribute column, and we don't want that attribute's assessor picking up garbage values. If the free list does not contain any spare rows, the object is assigned the next logical index. sub get_attribute_names { my $pkg = shift; $pkg = ref($pkg) if ref($pkg); my @result = @{"${pkg}::_ATTRIBUTES_"}; if (defined (@{"${pkg}::ISA"})) { foreach my $base_pkg (@{"${pkg}::ISA"}) { push (@result, get_attribute_names($base_pkg)); } } @result; } get_attribute_names recurses through the package's @ISA array to fetch all attribute names. This can be used by anyone requiring object meta-data (such as object persistence modules). # $obj->set_attributes (name => 'John', age => 23); # Or, $obj->set_attributes (['age'], [# sub set_attributes { my $obj = shift; my $attr_name; if (ref($_[0])) { my ($attr_name_list, $attr_value_list) = @_; my $i = 0; foreach $attr_name (@$attr_name_list) { $obj->$attr_name($attr_value_list->[$i++]); } } else { my ($attr_name, $attr_value); while (@_) { $attr_name = shift; $attr_value = shift; $obj->$attr_name($attr_value); } } } set_attributes is given a list of attribute name-value pairs and simply calls the accessor method for each attribute. It can also be called with two parameters; an array of names and an array of values. # @attrs = $obj->get_attributes (qw(name age)); sub get_attributes { my $obj = shift; my (@retval); map $obj->${_}(), @_; }
sub set_attribute { my ($obj, $attr_name, $attr_value) = @_; my ($pkg) = ref($obj); ${"${pkg}::_$attr_name"}[$$obj] = $attr_value; } sub get_attribute { my ($obj, $attr_name, $attr_value) = @_; my ($pkg) = ref($obj); return ${"${pkg}::_$attr_name"}[$$obj]; } The get/set_attribute pair updates a single attribute. Unlike the earlier pair of methods, this pair does not call an accessor; it updates the attribute directly. We saw earlier that attributes does not attempt to create accessor methods for those that already exist. But if the custom accessors still want to use the storage scheme provided by ObjectTemplate, they can use the get/set_attribute pair. The expression ${pkg}::_$attr_name represents the appropriate column attribute, and $$obj represents the logical row. (Recall that the object is simply a reference to an array index.) These methods are clearly not as fast as the generated accessor methods, because they use symbolic references (which involve variable interpolation in a string and an extra hash lookup). sub DESTROY { # release id back to free list my $obj = $_[0]; my $pkg = ref($obj); local *_free = *{"${pkg}::_free"}; my $inst_id = $$obj; # Release all the attributes in that row local(*attributes) = *{"${pkg}::_ATTRIBUTES_"}; foreach my $attr (@attributes) { undef ${"${pkg}::_$attr"}[$inst_id]; } $_free[$inst_id] = $_free; $_free = $inst_id; } DESTROY releases all attribute values corresponding to that object. This is necessary because the object is merely a reference to an array index, which, when freed, won't touch the reference counts of any of the attributes. A module defining its own DESTROY method must make sure that it always calls ObjectTemplate::DESTROY . sub initialize { }; # dummy method, if subclass doesn't define one. Modules are expected to override this method if they want to do specific initialization, in addition to what the automatically generated new() does. 8.1.1.3 Suggested changes to ObjectTemplateThere are (at least) two areas that could use considerable improvement. One is that get_attributes and set_attributes are slow because they always call accessor methods, even if they know which accessors are artificially provided. Because set_attributes is called by the automatically generated new , it slows down object construction dramatically. (Using this new without arguments is twice as fast as allocating an anonymous hash, but after invoking set_attributes , it is around three times slower.) Second, custom accessor methods suffer in speed because they are forced to invoke the other slow pair, get_attribute and set_attribute . Possibly a better alternative is to dynamically generate accessor methods prefixed with an "_", so that the developer can write normal accessor methods (without the prefix), and also call these private methods. You might also want to check out the MethodMaker module available on CPAN, and the Class::Template module that is bundled with the standard distribution. These modules also create accessor methods automatically but assume that the object representation is a hash table. If you like the interface these modules provide, you can attempt to merge their interface with the attribute storage scheme of ObjectTemplate. 8.1.2 Attribute Storage Using TypeglobsThis approach, as we mentioned earlier, is not exactly a paragon of readability and is presented here only because it is used in some freely available libraries on CPAN, like the IO and Net distributions. If you don't wish to understand how these modules work, you can easily skip this section without loss of continuity. We learned from Chapter 3, Typeglobs and Symbol Tables , that a typeglob contains pointers to different types of values. If we somehow make a typeglob into an object reference, we can treat these value pointers as attributes and access them very quickly. Consider the following foo typeglob: ${*foo} = "Oh, my!!" ; # Use the scalar part to store a string @{*foo} = (10, 20); # Use the array part to store an array open (foo, "foo.txt"); # Use it as a filehandle We are able to hang different types of values (at most one of each type) from just one identifier, foo . If we want many such objects, we can use the Symbol module in the Perl library to create references to dynamically created typeglobs: use Symbol; $obj = Symbol::gensym(); # ref to typeglob $obj contains a reference to a typeglob. The different parts of a typeglob can be individually accessed (by replacing foo with $obj ): ${*$obj} = "Oh, my!!" ; # Use the scalar part to store a string @{*$obj} = (10, 20); # Use the array part to store an array open ($obj, "foo"); # Use it as a filehandle Clearly, this is a hideous approach for most general objects; if you need another scalar-valued attribute, for example, you have no option but to put it in the hash part of this typeglob. The reason why the IO group of modules uses this hack is that an instance of any of these modules can be treated as a filehandle and passed directly (without dereferencing) to the built-in I/O functions such as read and write . For example:
$sock = new IO::Socket( ...
We'll use IO::Socket module extensively in the chapters on networking with sockets.[ 2 ]
Let us build a small module called File to examine this technique in greater detail. This module allows you to open a file and read the next line; in addition, it allows you to put back a line so that the next attempt to read the file returns that line: package main; $obj = File->open("File.pm"); print $obj->next_line(); $obj->put_back ("------------------------\n"); print $obj->next_line(); # Should print the string put back above print $obj->next_line(); Since this code opens the File module itself, it should print the following: package File; ------------------------ use Symbol; This module uses the scalar part of the typeglob object as a "putback" buffer, the array part of the typeglob to store all the lines read from the file, and the filehandle part of the typeglob to store the filehandle. The implementation of the File module is shown in Example 8.1 . Example 8.1: File Module, Built Using a Typeglob Representationpackage File; use Symbol; sub open { my ($pkg, $filename) = @_; $obj = gensym(); # Allocate a typeglob open ($obj, $filename) || return undef; # Use it as a filehandle bless $obj, $pkg; # Upgrade to a File "object" } sub put_back { my ($r_obj, $line) = @_; ${*$r_obj} = $line; # The scalar part holds the } # current line sub next_line { my $r_obj = $_[0]; my $retval; if (${*$r_obj}) { # Check putback buffer $retval = ${*$r_obj}; # yep, it's got stuff ${*$r_obj} = ""; # empty it. } else { $retval = <$r_obj>; # no. read from file push(@{*$r_obj}, $retval); # add to history list. } $retval; } 1; |
|