5.17. Program: dutree
The
dutree program, shown in Example 5-3, turns the output of du:
% du pcb
19 pcb/fix
20 pcb/rev/maybe/yes
10 pcb/rev/maybe/not
705 pcb/rev/maybe
54 pcb/rev/web
1371 pcb/rev
3 pcb/pending/mine
1016 pcb/pending
2412 pcb
into sorted, indented output:
2412 pcb
| 1371 rev
| | 705 maybe
| | | 675 .
| | | 20 yes
| | | 10 not
| | 612 .
| | 54 web
| 1016 pending
| | 1013 .
| | 3 mine
| 19 fix
| 6 .
The arguments you give dutree are passed through
to du. That way you could call
dutree in any of these ways, or maybe more if
your du supports other options:
% dutree
% dutree /usr
% dutree -a
% dutree -a /bin
The %Dirsize hash maintains the mapping of names
to sizes. For example, $Dirsize{"pcb"} contains
2412 in this sample run. We'll use that hash both for output and for
sorting each directory's subdirectories by size.
%Kids is more interesting. For any given path
PATH,
$Kids{PATH} contains a
(reference to an) array of names of subdirectories of this one. The
"pcb" entry contains a reference to an anonymous
array containing "fix", "rev",
and "pending". The "rev" entry
contains "maybe" and "web". The
"maybe" entry contains "yes"
and "not", which do not have their own entries
because they are end nodes in the tree.
The
output function is passed the start of the
tree—the last line read in from the output of
du. First it prints that directory and its size.
Then the function sorts the directory's children (if any) so that
those with the most disk usage float to the top. Finally,
output calls itself, recursing on each child in
order. The extra arguments are used in formatting.
This program is inherently recursive because the filesystem is
recursive. However, its data structure is not recursive; at least,
not the way a circular linked list is. Each value is an array of
further keys to process. The recursion resides in the processing, not
in the storage.
Example 5-3. dutree
#!/usr/bin/perl -w
# dutree - print sorted indented rendition of du output
use strict;
my %Dirsize;
my %Kids;
getdots(my $topdir = input( ));
output($topdir);
# run du, read in input, save sizes and kids
# return last directory (file?) read
sub input {
my($size, $name, $parent);
@ARGV = ("du @ARGV |"); # prep the arguments
while (<>) { # magic open is our friend
($size, $name) = split;
$Dirsize{$name} = $size;
($parent = $name) =~ s#/[^/]+$##; # dirname
push @{ $Kids{$parent} }, $name unless eof;
}
return $name;
}
# figure out how much is taken up in each directory
# that isn't stored in subdirectories. add a new
# fake kid called "." containing that much.
sub getdots {
my $root = $_[0];
my($size, $cursize);
$size = $cursize = $Dirsize{$root};
if ($Kids{$root}) {
for my $kid (@{ $Kids{$root} }) {
$cursize -= $Dirsize{$kid};
getdots($kid);
}
}
if ($size != $cursize) {
my $dot = "$root/.";
$Dirsize{$dot} = $cursize;
push @{ $Kids{$root} }, $dot;
}
}
# recursively output everything,
# passing padding and number width in as well
# on recursive calls
sub output {
my($root, $prefix, $width) = (shift, shift || '', shift || 0);
my $path;
($path = $root) =~ s#.*/##; # basename
my $size = $Dirsize{$root};
my $line = sprintf("%${width}d %s", $size, $path);
print $prefix, $line, "\n";
for ($prefix .= $line) { # build up more output
s/\d /| /;
s/[^|]/ /g;
}
if ($Kids{$root}) { # not a bachelor node
my @Kids = @{ $Kids{$root} };
@Kids = sort { $Dirsize{$b} <=> $Dirsize{$a} } @Kids;
$Dirsize{$Kids[0]} =~ /(\d+)/;
my $width = length $1;
for my $kid (@Kids) { output($kid, $prefix, $width) }
}
}
Before Perl supported hashes of arrays directly, Herculean efforts
were required to emulate these higher order constructs. Some folks
used repeated calls to split and
join, but these were exceedingly slow.
Example 5-4 is a version of
dutree from those days of Perl antiquity.
Because we didn't have proper array references, we had to usurp the
Perl symbol table itself. This program created variables on the fly
with bizarre names. Can you find which hash this program is using?
The @{"pcb"} array contains
"pcb/fix", "pcb/rev", and
"pcb/pending". The @{"pcb/rev"}
array contains "pcb/rev/maybe" and
"pcb/rev/web". The
@{"pcb/rev/maybe"} array contains
"pcb/rev/yes" and
"pcb/rev/not".
When you assign something like "pcb/fix" to
*kid, it promotes the string on the righthand side
to a typeglob. This makes @kid an alias for
@{"pcb/fix"}—among other things. It would
also alias &kid to
&{"pcb/fix"}, and so on.
If that isn't interesting enough, consider how the
local is using dynamic scoping of global variables
to avoid passing in extra arguments. Check out what's happening with
the $width variable in the
output routine.
Example 5-4. dutree-orig
#!/usr/bin/perl
# dutree_orig: the old version pre-perl5 (early 90s)
@lines = `du @ARGV`;
chop(@lines);
&input($top = pop @lines);
&output($top);
exit;
sub input {
local($root, *kid, $him) = @_[0,0];
while (@lines && &childof($root, $lines[$#lines])) {
&input($him = pop(@lines));
push(@kid, $him);
}
if (@kid) {
local($mysize) = ($root =~ /^(\d+)/);
for (@kid) { $mysize -= (/^(\d+)/)[0]; }
push(@kid, "$mysize .") if $size != $mysize;
}
@kid = &sizesort(*kid);
}
sub output {
local($root, *kid, $prefix) = @_[0,0,1];
local($size, $path) = split(' ', $root);
$path =~ s!.*/!!;
$line = sprintf("%${width}d %s", $size, $path);
print $prefix, $line, "\n";
$prefix .= $line;
$prefix =~ s/\d /| /;
$prefix =~ s/[^|]/ /g;
local($width) = $kid[0] =~ /(\d+)/ && length("$1");
for (@kid) { &output($_, $prefix); };
}
sub sizesort {
local(*list, @index) = shift;
sub bynum { $index[$b] <=> $index[$a]; }
for (@list) { push(@index, /(\d+)/); }
@list[sort bynum 0..$#list];
}
sub childof {
local(@pair) = @_;
for (@pair) { s/^\d+\s+//g; s/$/\//; }
index($pair[1], $pair[0]) >= 0;
}
The answer to the question posed earlier — "Which hash is the
old dutree using?" — is
%main::, that is, the Perl symbol table itself.
Needless to say, this program will never run under
use strict. We're happy to
report that the updated version runs three times as fast as the old
one. That's because the old one keeps looking up variables in the
symbol table, and the new one doesn't have to. It's also because we
avoid all that slow splitting of the space used and the directory
name. But we thought we'd show you the old version because it is
instructive, too.
 |  |  | 5.16. Representing Relationships Between Data |  | 6. Pattern Matching |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|
|