20.5. Converting HTML to ASCII20.5.1. ProblemYou want to convert an HTML file into formatted, plain ASCII. For example, you want to mail a web document to someone. 20.5.2. SolutionIf you have an external formatter like lynx, call an external program: $ascii = `lynx -dump $filename`; If you want to do it within your program and don't care about the things that the HTML::FormatText formatter doesn't yet handle well (tables and frames): use HTML::FormatText 3; $ascii = HTML::FormatText->format_file( $filename, leftmargin => 0, rightmargin => 50 ); 20.5.3. DiscussionThese examples both assume the HTML is in a file. If your HTML is in a variable, you need to write it to a file for lynx to read. With HTML::FormatText, use the format_string( ) method: use HTML::FormatText 3; $ascii = HTML::FormatText->format_string( $filename, leftmargin => 0, rightmargin => 50 ); If you use Netscape, its "Save as" option with the type set to "Text" does the best job with tables. 20.5.4. See AlsoThe documentation for the CPAN modules HTML::TreeBuilder and HTML::FormatText; your system's lynx(1) manpage; Recipe 20.6 Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|