home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Programming PHPProgramming PHPSearch this book

4.5. Encoding and Escaping

Because PHP programs often interact with HTML pages, web addresses (URLs), and databases, there are functions to help you work with those types of data. HTML, web page addresses, and database commands are all strings, but they each require different characters to be escaped in different ways. For instance, a space in a web address must be written as %20, while a literal less-than sign (<) in an HTML document must be written as &lt;. PHP has a number of built-in functions to convert to and from these encodings.

4.5.1. HTML

Special characters in HTML are represented by entities such as &amp; and &lt;. There are two PHP functions for turning special characters in a string into their entities, one for removing HTML tags, and one for extracting only meta tags.

4.5.1.2. Entity-quoting only HTML syntax characters

The htmlspecialchars( ) function converts the smallest set of entities possible to generate valid HTML. The following entities are converted:

  • Ampersands (&) are converted to &amp;

  • Double quotes (") are converted to &quot;

  • Single quotes (') are converted to &#039; (if ENT_QUOTES is on, as described for htmlentities( ))

  • Less-than signs (<) are converted to &lt;

  • Greater-than signs (>) are converted to &gt;

If you have an application that displays data that a user has entered in a form, you need to run that data through htmlspecialchars( ) before displaying or saving it. If you don't, and the user enters a string like "angle < 30" or "sturm & drang", the browser will think the special characters are HTML, and you'll have a garbled page.

Like htmlentities( ), htmlspecialchars( ) can take up to three arguments:

$output = htmlspecialchars(input, [quote_style, [charset]]);

The quote_style and charset arguments have the same meaning that they do for htmlentities( ).

There are no functions specifically for converting back from the entities to the original text, because this is rarely needed. There is a relatively simple way to do this, though. Use the get_html_translation_table( ) function to fetch the translation table used by either of these functions in a given quote style. For example, to get the translation table that htmlentities( ) uses, do this:

$table = get_html_translation_table(HTML_ENTITIES);

To get the table for htmlspecialchars( ) in ENT_NOQUOTES mode, use:

$table = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES);

A nice trick is to use this translation table, flip it using array_flip( ), and feed it to strtr( ) to apply it to a string, thereby effectively doing the reverse of htmlentities( ):

$str = htmlentities("Einstürzende Neubauten");  // now it is encoded

$table = get_html_translation_table(HTML_ENTITIES);
$rev_trans = array_flip($table);

echo strtr($str,$rev_trans);  // back to normal
Einstürzende Neubauten

You can, of course, also fetch the translation table, add whatever other translations you want to it, and then do the strtr( ). For example, if you wanted htmlentities( ) to also encode spaces to &nbsp;s, you would do:

$table = get_html_translation_table(HTML_ENTITIES); 
$table[' '] = '&nbsp;'; 
$encoded = strtr($original, $table);

4.5.2. URLs

PHP provides functions to convert to and from URL encoding, which allows you to build and decode URLs. There are actually two types of URL encoding, which differ in how they treat spaces. The first (specified by RFC 1738) treats a space as just another illegal character in a URL and encodes it as %20. The second (implementing the application/x-www-form-urlencoded system) encodes a space as a + and is used in building query strings.

Note that you don't want to use these functions on a complete URL, like http://www.example.com/hello, as they will escape the colons and slashes to produce http%3A%2F%2Fwww.example.com%2Fhello. Only encode partial URLs (the bit after http://www.example.com/), and add the protocol and domain name later.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.