20.19. Extracting Table Data20.19.1. ProblemYou have data in an HTML table, and you would like to turn that into a Perl data structure. For example, you want to monitor changes to an author's CPAN module list. 20.19.2. SolutionUse the HTML::TableContentParser module from CPAN:
20.19.3. DiscussionThe HTML::TableContentParser module converts all tables in the HTML document into a Perl data structure. As with HTML tables, there are three layers of nesting in the data structure: the table, the row, and the data in that row. Each table, row, and data tag is represented as a hash reference. The hash keys correspond to attributes of the tag that defined that table, row, or cell. In addition, the value for a special key gives the contents of the table, row, or cell. In a table, the value for the rows key is a reference to an array of rows. In a row, the cols key points to an array of cells. In a cell, the data key holds the HTML contents of the data tag. For example, take the following table:
The parse method returns this data structure:
The data tags still contain tags and entities. If you don't want the tags and entities, remove them by hand using techniques from Recipe 20.6. Example 20-11 fetches a particular CPAN author's page and displays in plain text the modules they own. You could use this as part of a system that notifies you when your favorite CPAN authors do something new. Example 20-11. Dump modules for a particular CPAN author
20.19.4. See AlsoThe documentation for the CPAN module HTML::TableContentParser; http://search.cpan.org
Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|