home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


5.15. Representing Relationships Between Data

Problem

You want to represent relationships between elements of data - for instance, the mother of relationship in a family tree or parent process for a process table. This is closely related to representing tables in relational databases (tables represent relationships between information) and to representing computer science graph structures (edges represent relationships between nodes).

Solution

Use a hash to represent the relationship.

Discussion

Here's part of the family tree from the Bible:

%father = ( 'Cain'      => 'Adam',
            'Abel'      => 'Adam',
            'Seth'      => 'Adam',
            'Enoch'     => 'Cain',
            'Irad'      => 'Enoch',
            'Mehujael'  => 'Irad',
            'Methusael' => 'Mehujael',
            'Lamech'    => 'Methusael',
            'Jabal'     => 'Lamech',
            'Jubal'     => 'Lamech',
            'Tubalcain' => 'Lamech',
            'Enos'      => 'Seth' );

This lets us, for instance, easily trace a person's lineage:

while (<>) {
    chomp;
    do {
        print "$_ ";        # print the current name
        $_ = $father{$_};   # set $_ to $_'s father
    } while defined;        # until we run out of fathers
    print "\n";
}

We can already ask questions like "Who begat Seth?" by checking the %father hash. By inverting this hash, we invert the relationship. This lets us use Recipe 5.8 to answer questions like "Whom did Lamech beget?"

while ( ($k,$v) = each %father ) {
    push( @{ $children{$v} }, $k );
}

$" = ', ';                  # separate output with commas
while (<>) {
    chomp;
    if ($children{$_}) {
        @children = @{$children{$_}};
    } else {
        @children = "nobody";
    }
    print "$_ begat @children.\n";
}

Hashes can also represent relationships such as the C language #include s. A includes B if A contains #include B . This code builds the hash (it doesn't look for files in /usr/include as it should, but that is a minor change):

foreach $file (@files) {
    local *F;               # just in case we want a local FH
    unless (open (F, "<$file")) {
        warn "Couldn't read $file: $!; skipping.\n";
        next;
    }
    
    while (<F>) {
        next unless /^\s*#\s*include\s*<([^>]+)>/;
        push(@{$includes{$1}}, $file);
    }
    close F;
}

This shows which files don't include any others:

@include_free = ();                 # list of files that don't include others
@uniq{map { @$_ } values %includes} = undef;
foreach $file (sort keys %uniq) {
        push( @include_free , $file ) unless $includes{$file};
}

The values of %includes are anonymous arrays because a single file can (and often does) include more than one other file. We use map to build up a big list of all the included files and remove duplicates by using a hash.

See Also

Recipe 4.6 ; the more complex data structures in Recipe 11.9 through Recipe 11.14