20.4. Converting ASCII to HTMLProblemYou want to convert ASCII text to HTML. SolutionUse the simple little encoding filter in Example 20.3 . Example 20.3: text2html#!/usr/bin/perl -w -p00 # text2html - trivial html encoding of normal text # -p means apply this script to each record. # -00 mean that a record is now a paragraph use HTML::Entities; $_ = encode_entities($_, "\200-\377"); if (/^\s/) { # Paragraphs beginning with whitespace are wrapped in <PRE> s{(.*)$} {<PRE>\n$1</PRE>\n}s; # indented verbatim } else { s{^(>.*)} {$1<BR>}gm; # quoted text s{<URL:(.*?)>} {<A HREF="$1">$1</A>}gs # embedded URL (good) || s{(http:\S+)} {<A HREF="$1">$1</A>}gs; # guessed URL (bad) s{\*(\S+)\*} {<STRONG>$1</STRONG>}g; # this is *bold* here s{\b_(\S+)\_\b} {<EM>$1</EM>}g; # this is _italics_ here s{^} {<P>\n}; # add paragraph tag } DiscussionConverting arbitrary plain text to HTML has no general solution because there are too many different, conflicting ways of representing formatting information in a plain text file. The more you know about the input, the better the job you can do of formatting it. For example, if you knew that you would be fed a mail message, you could add this block to format the mail headers: BEGIN { print "<TABLE>"; $_ = encode_entities(scalar <>); s/\n\s+/ /g; # continuation lines while ( /^(\S+?:)\s*(.*)$/gm ) { # parse heading print "<TR><TH ALIGN='LEFT'>$1</TH><TD>$2</TD></TR>\n"; } print "</TABLE><HR>"; } See AlsoThe documentation for the CPAN module HTML::Entities Copyright © 2001 O'Reilly & Associates. All rights reserved. |
|