5.15. Representing Relationships Between Data


You want to represent relationships between elements of data - for instance, the mother of relationship in a family tree or parent process for a process table. This is closely related to representing tables in relational databases (tables represent relationships between information) and to representing computer science graph structures (edges represent relationships between nodes).


Use a hash to represent the relationship.


Here's part of the family tree from the Bible:

%father = ( 'Cain'      => 'Adam',
            'Abel'      => 'Adam',
            'Seth'      => 'Adam',
            'Enoch'     => 'Cain',
            'Irad'      => 'Enoch',
            'Mehujael'  => 'Irad',
            'Methusael' => 'Mehujael',
            'Lamech'    => 'Methusael',
            'Jabal'     => 'Lamech',
            'Jubal'     => 'Lamech',
            'Tubalcain' => 'Lamech',
            'Enos'      => 'Seth' );

This lets us, for instance, easily trace a person's lineage:

while (<>) {
    do {
        print "$_ ";        # print the current name
        $_ = $father{$_};   # set $_ to $_'s father
    } while defined;        # until we run out of fathers
    print "\n";

We can already ask questions like "Who begat Seth?" by checking the %father hash. By inverting this hash, we invert the relationship. This lets us use Recipe 5.8 to answer questions like "Whom did Lamech beget?"

while ( ($k,$v) = each %father ) {
    push( @{ $children{$v} }, $k );

$" = ', ';                  # separate output with commas
while (<>) {
    if ($children{$_}) {
        @children = @{$children{$_}};
    } else {
        @children = "nobody";
    print "$_ begat @children.\n";

Hashes can also represent relationships such as the C language #include s. A includes B if A contains #include B . This code builds the hash (it doesn't look for files in /usr/include as it should, but that is a minor change):

foreach $file (@files) {
    local *F;               # just in case we want a local FH
    unless (open (F, "<$file")) {
        warn "Couldn't read $file: $!; skipping.\n";
    while (<F>) {
        next unless /^\s*#\s*include\s*<([^>]+)>/;
        push(@{$includes{$1}}, $file);
    close F;

This shows which files don't include any others:

@include_free = ();                 # list of files that don't include others
@uniq{map { @$_ } values %includes} = undef;
foreach $file (sort keys %uniq) {
        push( @include_free , $file ) unless $includes{$file};

The values of %includes are anonymous arrays because a single file can (and often does) include more than one other file. We use map to build up a big list of all the included files and remove duplicates by using a hash.

See Also

Recipe 4.6 ; the more complex data structures in Recipe 11.9 through Recipe 11.14