2.3 Professors, Students, CoursesThis example shows how you might represent professor, student, and course data as hierarchical records and how to link them up. Assume that the data files look like this: #file: professor.dat id : 42343 #Employee Id Name : E.F.Schumacher Office Hours: Mon 34, Wed 89 Courses : HS201, SS343 #Course taught ... #file: student.dat id : 52003 # Registration id Name : Garibaldi Courses : H301, H302, M201 # Courses taken ... #file: courses.dat id : HS201 Description : Small is beautiful Class Hours : Mon 24, Wed 910, Thu 45 ... Each "id:" line starts a new record. Among other tasks, let us say we are required to find out whether there is a scheduling conflict on professors' and students' hours. Because our focus is on data representation and getting a feel for Perl's reference syntax, we will look at implementing only some parts of the problem. 2.3.1 RepresentationA hash table is a good representation for a heterogeneous record, as we mentioned earlier, so a student structure may be implemented like this: $student{42343} = { 'Name' => 'Garibaldi', 'Courses' => [ ]}; A number of subtle design choices have been made here. We could have replaced "foreign keys" (to use the database term) such as "HS201" with references to the corresponding course data structures. We didn't, because it is then tempting to directly dereference these references, in which case the student code is aware of how the course data is structured. We maintain separate global hash tables for students, courses, and professors  yet another effort to keep mostly unrelated data completely separate and to make it possible to change a part of the system without affecting everyone. There is one piece of data we haven't discussed before: time ranges. Both professors and courses have certain "busy" or "active" hours. What is a good representation for this? You might choose to represent the line "Mon 23, Tue 46" as follows: $time_range = { 'Mon' => [2,3], 'Tue' => [4,6] }; There is a much simpler representation, in case you haven't already guessed it. The key insight is that since we are concerned only with clashes in time, the system should be able to quickly tell us whether a professor or a course is "active" in a given hour of the week or not. Considering that there are only 24 * 7 = 168 hours in a week, the entire week's schedule can be represented by a bitmap vector of 21 bytes (168/8). If a bit is set, we know that the professor is teaching something in that hour. In fact we can reduce the storage requirements further if we only account for the effective hours in a week (say, 7:00 A.M. to 7:00 P.M., Monday to Friday). That brings it down to 8 bytes (12 hours * 5 days / 8). The nice thing here is that an entire sequence of time ranges boils down to one scalar containing a bitmap vector. The other cool thing is that you can obtain time conflicts by logically ANDing two bitmaps. Having settled the representation, let us write some code to read the professor.dat file, and construct the data structures. Example 2.3: Read professor.dat and Create Hierarchical Records in Memorymy (%profs); # prof_read_file() populates this data structure from file sub prof_read_file { my ($filename) = @_; my ($line, $curr_prof); open (F, $filename)  die "Could not open $filename"; while ($line = <F>) { chomp($line); next if $line =~ /^\s*$/; # skip blank lines if ($line =~ /^id.*:\s*(.*)/) { # Use an anonymous hash to store a professor's data $profs{$1} = $curr_prof = {}; } elsif ($line =~ /^Office Hours.*:\s*(.*)/) { # $1 contains a string like 'Mon 23, Tue 46' $curr_prof>{Office Hours} = interval_parse($1); } elsif ($line =~ /^Courses.*:\s*(.*)/) { # $1 contains something like 'HS201, MA101' my (@courses_taught) = split(/[\s,]+/, $1); $curr_prof>{Courses} = \@courses_taught; } } } Notice that the courses_taught array is local to the block. When the block ends, $curr_prof>{Courses} continues to hang on to this array. You can omit one step like this: $curr_prof>{Courses} = [split(/[\s,]+/, $1)]; I prefer the earlier approach because it is more readable. The interval_parse method parses a string such as "Mon 35, Wed 26" into a bit string, as was mentioned earlier. The code looks like this: # Each hour in a week (with days from 7am to 7pm) gets its own # unique bit in an 8byte string. # Mon 78 is the 0th bit, Mon 67pm is 11, ... Fri 67 (pm) is 60th. my %base_hours = ( mon => 0, tue => 12, wed => 24 , thu => 36, fri => 48 ); sub interval_parse { my ($interval_sequence) = @_; #contains "Mon 35, Tue 26" my ($time_range) = ""; foreach $day_hours (split /,/, $interval_sequence) { # $day_hours contains "Mon 35" etc. my ($day, $from, $to) = ($day_hours =~ /([AZaz]+).*(\d+)(\d+)/); # if $from or $to is less than 7, it must be afternoon. Normalize # it by adding 12. Then reduce it to a zero base by subtracting 7 # (that is, 7 hrs to 19 hrs becomes 0  12. Finally, # normalize each hour in a day with respect to weekly hours, # by adding in the day's "base hour" $to = 19 if $to == 7; $from += 12 if $from < 7 ; $to += 12 if $to <= 7; my $base = $base_hours{lc $day}; $from += $base  7; $to += $base  7; # At this point Tue 7a.m ==> 12 and Tue 4 p.m => 21 for ($i = $from; $i < $to; $i++) { # Set the corresponding bit vec($time_range, $i, 1) = 1; } } $time_range; } To check for scheduling constraints on a professor's time, we have to calculate overlapping hours between the professor's office hours and each course he or she teaches and between the courses themselves, as shown in Example 2.4 . Example 2.4: Checking Constraints on a Professor's Timesub prof_check_constraints { my ($prof) = @_; my $r_prof = $profs{$prof}; # %profs created by prof_read_file my $office_hours = $r_prof>{Office Hours}; my $rl_courses = $r_prof>{Courses} ; for $i (0 .. $#{$rl_courses}) { $course_hours = course_get_hours($rl_courses>[$i]); if (interval_conflicts($office_hours, $course_hours)) { print "Prof. ", $r_prof>{name}, " Office hours conflict with course $course_taught\n"; } for $j ($i .. $#{$rl_courses}) { my ($other_course_hours) = course_get_hours($rl_courses>[$j]); if (interval_conflicts ($course_hours, $other_course_hours)) { print "Prof. ", $r_prof>{name}, ": Course conflict: ", $rl_courses>[$i], " " $rl_courses>[$j], "\n"; } } }
The subroutine sub interval_conflicts { my ($t1, $t2) = @_; my ($combined) = $t1 & $t2; # $combined will have at least one bit set if there's a conflict my $offset = length($combined) * 8; # start counting down from last bit, and see if any is set while ($offset >= 0) { return 1 if vec($combined,$offset,1); } return 0; }
Note that all knowledge of the internal representation of a time interval is encapsulated in functions with the prefix 
