11.11. Converting HTML to ASCII11.11.2. SolutionIf you have access to an external program that formats HTML as ASCII, such as lynx, call it like so: $file = escapeshellarg($file); $ascii = `lynx -dump $file`; 11.11.3. DiscussionIf you can't use an external formatter, the pc_html2ascii( ) function shown in Example 11-4 handles a reasonable subset of HTML (no tables or frames, though). Example 11-4. pc_html2ascii( )function pc_html2ascii($s) { // convert links $s = preg_replace('/<a\s+.*?href="?([^\" >]*)"?[^>]*>(.*?)<\/a>/i', '$2 ($1)', $s); // convert <br>, <hr>, <p>, <div> to line breaks $s = preg_replace('@<(b|h)r[^>]*>@i',"\n",$s); $s = preg_replace('@<p[^>]*>@i',"\n\n",$s); $s = preg_replace('@<div[^>]*>(.*)</div>@i',"\n".'$1'."\n",$s); // convert bold and italic $s = preg_replace('@<b[^>]*>(.*?)</b>@i','*$1*',$s); $s = preg_replace('@<i[^>]*>(.*?)</i>@i','/$1/',$s); // decode named entities $s = strtr($s,array_flip(get_html_translation_table(HTML_ENTITIES))); // decode numbered entities $s = preg_replace('//e','chr(\\1)',$s); // remove any remaining tags $s = strip_tags($s); return $s; } 11.11.4. See AlsoRecipe 9.9 for more on get_html_translation_table(); documentation on preg_replace( ) at http://www.php.net/preg-replace, get_html_translation_table( ) at http://www.php.net/get-html-translation-table, and strip_tags( ) at http://www.php.net/strip-tags. Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|