You want to convert an HTML file into formatted plain ASCII.
If you have an external formatter like lynx, call an external program:
$ascii = `lynx -dump $filename`;
If you want to do it within your program and don't care about the things that the HTML::TreeBuilder formatter doesn't yet handle (tables and frames):
use HTML::FormatText; use HTML::Parse; $html = parse_htmlfile($filename); $formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50); $ascii = $formatter->format($html);
These examples both assume you have the HTML text in a file. If your HTML is in a variable, you need to write it to a file for lynx to read. If you are using HTML::FormatText, use the HTML::TreeBuilder module:
use HTML::TreeBuilder; use HTML::FormatText; $html = HTML::TreeBuilder->new(); $html->parse($document); $formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50); $ascii = $formatter->format($html);
Copyright © 2002 O'Reilly & Associates. All rights reserved.