11.16. Program: Finding Fresh LinksExample 11-6, fresh-links.php, is a modification of the program in Recipe 11.15 that produces a list of links and their last modified time. If the server on which a URL lives doesn't provide a last modified time, the program reports the URL's last modified time as the time the URL was requested. If the program can't retrieve the URL successfully, it prints out the status code it got when it tried to retrieve the URL. Run the program by passing it a URL to scan for links: % fresh-links.php http://www.oreilly.com http://www.oreilly.com/index.html: Fri Aug 16 16:48:34 2002 http://www.oreillynet.com: Mon Aug 19 10:18:54 2002 http://conferences.oreilly.com: Fri Aug 16 19:41:46 2002 http://international.oreilly.com: Fri Mar 29 18:06:32 2002 http://safari.oreilly.com: 302 http://www.oreilly.com/catalog/search.html: Tue Apr 2 19:05:57 2002 http://www.oreilly.com/oreilly/press/: 302 ... This output is from a run of the program at about 10:20 A.M. EDT on August 19, 2002. The link to http://www.oreillynet.com is very fresh, but the others are of varying ages. The link to http://www.oreilly.com/oreilly/press/ doesn't have a last modified time next to it; it has instead, an HTTP status code (302). This means it's been moved elsewhere, as reported by the output of stale-links.php in Recipe 11.15. The program to find fresh links is conceptually almost identical to the program to find stale links. It uses the same pc_link_extractor( ) function from Recipe 11.10; however, it uses the HTTP_Request class instead of cURL to retrieve URLs. The code to get the base URL specified on the command line is inside a loop so that it can follow any redirects that are returned. Once a page has been retrieved, the program uses the pc_link_extractor( ) function to get a list of links in the page. Then, after prepending a base URL to each link if necessary, sendRequest( ) is called on each link found in the original page. Since we need just the headers of these responses, we use the HEAD method instead of GET. Instead of printing out a new location for moved links, however, it prints out a formatted version of the Last-Modified header if it's available. Example 11-6. fresh-links.php
Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|