More Elaborate Records (Programming Perl)

9.6. More Elaborate Records

So far, what we've seen in this chapter are simple, two-level, homogeneous data structures: each element contains the same kind of referent as all the other elements at that level. It certainly doesn't have to be that way. Any element can hold any kind of scalar, which means that it could be a string, a number, or a reference to anything at all. The reference could be an array or hash reference, or a pseudohash, or a reference to a named or anonymous function, or an object. The only thing you can't do is to stuff multiple referents into one scalar. If you find yourself trying to do that, it's a sign that you need an array or hash reference to collapse multiple values into one.

In the sections that follow, you will find code examples designed to illustrate many of the possible types of data you might want to store in a record, which we'll implement using a hash reference. The keys are uppercase strings, a convention sometimes employed (and occasionally unemployed, but only briefly) when the hash is being used as a specific record type.

9.6.1. Composition, Access, and Printing of More Elaborate Records

Here is a record with six disparate fields:

$rec = {
    TEXT      => $string,
    SEQUENCE  => [ @old_values ],
    LOOKUP    => { %some_table },
    THATCODE  => \&some_function,
    THISCODE  => sub { $_[0] ** $_[1] },
    HANDLE    => \*STDOUT,
};

The TEXT field is a simple string, so you can just print it:

print $rec->{TEXT};

SEQUENCE and LOOKUP are regular array and hash references:

print $rec->{SEQUENCE}[0];
$last = pop @{ $rec->{SEQUENCE} };

print $rec->{LOOKUP}{"key"};
($first_k, $first_v) = each %{ $rec->{LOOKUP} };

THATCODE is a named subroutine and THISCODE is an anonymous subroutine, but they're invoked identically:

$that_answer = $rec->{THATCODE}->($arg1, $arg2);
$this_answer = $rec->{THISCODE}->($arg1, $arg2);

With an extra pair of braces, you can treat $rec->{HANDLE} as an indirect object:

print { $rec->{HANDLE} } "a string\n";

If you're using the FileHandle module, you can even treat the handle as a regular object:

use FileHandle;
$rec->{HANDLE}->autoflush(1);
$rec->{HANDLE}->print("a string\n");

9.6.2. Composition, Access, and Printing of Even More Elaborate Records

Naturally, the fields of your data structures can themselves be arbitrarily complex data structures in their own right:

%TV = (
    flintstones => {
        series   => "flintstones",
        nights   => [ "monday", "thursday", "friday" ],
        members  => [
            { name => "fred",    role => "husband", age  => 36, },
            { name => "wilma",   role => "wife",    age  => 31, },
            { name => "pebbles", role => "kid",     age  =>  4, },
        ],
    },


    jetsons     => {
        series   => "jetsons",
        nights   => [ "wednesday", "saturday" ],
        members  => [
            { name => "george",  role => "husband", age  => 41, },
            { name => "jane",    role => "wife",    age  => 39, },
            { name => "elroy",   role => "kid",     age  =>  9, },
        ],
    },

    simpsons    => {
        series   => "simpsons",
        nights   => [ "monday" ],
        members  => [
            { name => "homer", role => "husband", age => 34, },
            { name => "marge", role => "wife",    age => 37, },
            { name => "bart",  role => "kid",     age => 11, },
        ],
    },
);

9.6.3. Generation of a Hash of Complex Records

Because Perl is quite good at parsing complex data structures, you might just put your data declarations in a separate file as regular Perl code, and then load them in with the do or require built-in functions. Another popular approach is to use a CPAN module (such as XML::Parser) to load in arbitrary data structures expressed in some other language (such as XML).

You can build data structures piecemeal:

$rec = {};
$rec->{series} = "flintstones";
$rec->{nights} = [ find_days() ];

Or read them in from a file (here, assumed to be in field=value syntax):

@members = ();
while (<>) {
     %fields = split /[\s=]+/;
     push @members, { %fields };
}
$rec->{members} = [ @members ];

And fold them into larger data structures keyed by one of the subfields:

$TV{ $rec->{series} } = $rec;

You can use extra pointer fields to avoid duplicate data. For example, you might want a "kids" field included in a person's record, which might be a reference to an array containing references to the kids' own records. By having parts of your data structure refer to other parts, you avoid the data skew that would result from updating the data in one place but not in another:

for $family (keys %TV) {
    my $rec = $TV{$family};   # temporary pointer
    @kids = ();
    for $person ( @{$rec->{members}} ) {
        if ($person->{role} =~ /kid|son|daughter/) {
            push @kids, $person;
        }
    }
    # $rec and $TV{$family} point to same data!
    $rec->{kids} = [ @kids ];
}

The $rec->{kids} = [ @kids ] assignment copies the array contents--but they are merely references to uncopied data. This means that if you age Bart as follows:

$TV{simpsons}{kids}[0]{age}++;            # increments to 12

then you'll see the following result, because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] both point to the same underlying anonymous hash table:

print $TV{simpsons}{members}[2]{age};     # also prints 12

Now, to print the entire %TV structure:

for $family ( keys %TV ) {
    print "the $family";
    print " is on ", join (" and ", @{ $TV{$family}{nights} }), "\n";
    print "its members are:\n";
    for $who ( @{ $TV{$family}{members} } ) {
        print " $who->{name} ($who->{role}), age $who->{age}\n";
    }
    print "children: ";
    print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } );
    print "\n\n";
}