20.7. Finding Stale LinksProblemYou want to check whether a document contains invalid links. Solution
Use the technique outlined in
Recipe 20.3
to extract each link, and then use the LWP::Simple module's Discussion
Example 20.5
is an applied example of the link-extraction technique. Instead of just printing the name of the link, we call the LWP::Simple module's
Because this program uses the Example 20.5: churl#!/usr/bin/perl -w # churl - check urls use HTML::LinkExtor; use LWP::Simple qw(get head); $base_url = shift or die "usage: $0 <start_url>\n"; $parser = HTML::LinkExtor->new(undef, $base_url); $parser->parse(get($base_url)); @links = $parser->links; print "$base_url: \n"; foreach $linkarray (@links) { my @element = @$linkarray; my $elt_type = shift @element; while (@element) { my ($attr_name , $attr_value) = splice(@element, 0, 2); if ($attr_value->scheme =~ /\b(ftp|https?|file)\b/) { print " $attr_value: ", head($attr_value) ? "OK" : "BAD", "\n"; } } } Here's an example of a program run: % churl http://www.wizards.com This program has the same limitation as the HTML::LinkExtor program in Recipe 20.3 . See AlsoThe documentation for the CPAN modules HTML::LinkExtor, LWP::Simple, LWP::UserAgent, and HTTP::Response; Recipe 20.8 |
|