7.3. Individual TokensNow that you know the composition of the various types of tokens, let's see how to use HTML::TokeParser to write useful programs. Many problems are quite simple and require only one token at a time. Programs to solve these problems consist of a loop over all the tokens, with an if statement in the body of the loop identifying the interesting parts of the HTML:
7.3.1. Checking Image TagsExample 7-1 complains about any img tags in a document that are missing alt, height, or width attributes: Example 7-1. Check <img> tags
When run on an HTML stream (whether from a file or a string), this outputs: Missing for liza.jpg: height width Missing for aimee.jpg: alt Missing for laurie.jpg: alt height width Identifying images has many applications: making HEAD requests to ensure the URLs are valid, or making a GET request to fetch the image and using Image::Size from CPAN to check or insert the height and width attributes. 7.3.2. HTML FiltersA similar while loop can use HTML::TokeParser as a simple code filter. You just pass through the $source from each token you don't mean to alter. Here's one that passes through every tag that it sees (by just printing its source as HTML::TokeParser passes it in), except for img start-tags, which get replaced with the content of their alt attributes:
So, for example, a document consisting just of this: <!-- new entry --> <p>Dear Diary, <br>This is me & my balalaika, at BalalaikaCon 1998: <img src="mybc1998.jpg" src="BC1998! WHOOO!"> Rock on!</p> is then spat out as this: <!-- new entry --> <p>Dear Diary, <br>This is me & my balalaika, at BalalaikaCon 1998: BC1998! WHOOO! Rock on!</p>
Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|