home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomePHP CookbookSearch this book

11.8. Marking Up a Web Page

11.8.3. Discussion

The regular expression used with preg_match( ) matches as much text as possible before an HTML tag, then an HTML tag, and then the rest of the content. The text before the HTML tag has the highlighting applied to it, the HTML tag is printed out without any highlighting, and the rest of the content has the same match applied to it. This prevents any highlighting of words that occur inside HTML tags (in URLs or alt text, for example) which would prevent the page from displaying properly.

The following program retrieves the URL in $url and highlights the words in the $words array. Words are not highlighted when they are part of larger words because they are matched with the \b Perl-compatible regular expression operator for finding word boundaries.

$colors = array('FFFF00','FF9900','FF0000','FF00FF',
                '99FF33','33FFCC','FF99FF','00CC33'); 

// build search and replace patterns for regex 
$patterns = array();
$replacements = array();
for ($i = 0, $j = count($words); $i < $j; $i++) {
    $patterns[$i] = '/\b'.preg_quote($words[$i], '/').'\b/';
    $replacements[$i] = '<b style="color:black;background-color:#' .
                         $colors[$i % 8] .'">' . $words[$i] . '</b>';
}

// retrieve page 
$fh = fopen($url,'r') or die($php_errormsg);
while (! feof($fh)) {
    $s .= fread($fh,4096);
}
fclose($fh);

if ($j) {
    while ($s) {
        if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}s',$s,$matches)) {
            print preg_replace($patterns,$replacements,$matches[1]);
            print $matches[2];
            $s = $matches[3];
        }
    }
} else {
    print $s;
}

11.8.4. See Also

Recipe 13.8 for information on capturing text inside HTML tags; documentation on preg_match( ) at http://www.php.net/preg-match and preg_replace( ) at http://www.php.net/preg-replace.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.