Show Contents Previous Page Next Page
Chapter 11 - C API Reference Guide, Part II / String and URI Manipulation URI Parsing and Manipulation
In addition to the general string manipulation routines described above, Apache provides specific routines for manipulating URIs. With these routines you can break a URI into its components and put it back together again.
The main data structure used by these routines is the uri_components
struct. The typedef for uri_components is found in the util_uri.h
header file and reproduced in Example 11-5.
For your convenience, a preparsed uri_components struct is contained
in every incoming request, in the field parsed_uri . The various
fields of the parsed URI are as follows: char *scheme
This field contains the URI's scheme. Possible values include http , https , ftp , and file .
char *hostinfo
This field contains the part of the URI between the pair of initial slashes and the beginning
of the document path. It is often just the hostname for the request, but its full
form includes the port and the username/password combination needed to gain access
under certain protocols (such as nonanonymous FTP). Here's an example hostinfo string
that shows all the optional parts:
doug:xyzzy@ftp.modperl.com:23
char *user
The field contains the username part of the hostinfo field or an empty string if absent.
char *password
This field contains the password part of the hostinfo field or an empty string if absent.
char *port_str
This field contains the string representation of the port. You can fetch the numeric representation
from the port field.
char *path
This field corresponds to the path portion of the URI, namely everything after the
hostinfo . Neither the query string (the optional text that follows the ? symbol) nor the
optional #anchor names that appear at the ends of many HTTP URLs are part of the
path. It is equivalent to r->uri .
char *query
The query field holds the query string, that is, everything after the ? in the path but not
including the #anchor fragment, if any. It is equivalent to r->args .
char *fragment
This field contains the #anchor fragment, if any. The # symbol itself is omitted.
unsigned short port
port holds the port number of the URI, in integer form. For the same information in
text form, see port_str .
The other fields in the uri_components record are for internal use only and are not to be relied on.
Example 11-5. The uri_components
Data Type
typedef struct {
char *scheme; /* scheme ("http"/"ftp"/...) */
char *hostinfo; /* combined [user[:password]@]host[:port] */
char *user; /* user name, as in http://user:passwd@host:port/ */
char *password; /* password, as in http://user:passwd@host:port/ */
char *hostname; /* hostname from URI (or from Host: header) */
char *port_str; /* port string (integer representation is in "port") */
char *path; /* the request path
(or "/" if only scheme://host was given) */
char *query; /* Everything after a '?' in the path, if present */
char *fragment; /* Trailing "#fragment" string, if present */
struct hostent *hostent;
unsigned short port; /* The port number, numeric, NULL */
valid only if port_str != NULL */
unsigned is_initialized:1;
unsigned dns_looked_up:1;
unsigned dns_resolved:1;
} uri_components;
In addition to the uri_components record located in the request record's parsed_uri field, you can access Apache's URI parsing and manipulation package using a series of routines variously declared in httpd.h and util_uri.h:
int ap_unescape_url (char *url)
(Declared in the header file httpd.h .) This routine will unescape URI hex escapes. The
escapes are performed in place, replacing the original string. During the unescaping process,
Apache performs some basic consistency checking on the URI and returns the
result of this check as the function result code. The function will return HTTP_BAD_
REQUEST if it encounters an invalid hex escape (for example, %1g ), and HTTP_NOT_
FOUND if replacing a hex escape with its text equivalent results in either the character /
or \0 . If the URI passes these checks, the function returns OK .
if (ap_unescape_url(url) != OK) {
ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_WARNING,
r->server, "bad URI during unescaping");
}
char *ap_os_escape_path (pool *p, const char *path, int partial)
(Declared in the header file httpd.h .) ap_os_escape_path() takes a filesystem pathname in
path and converts it into a properly escaped URI in an operating system-dependent way,
returning the new string as its function result. If the partial flag is false, then the function
will add a / to the beginning of the URI if the path does not already begin with one.
If the partial flag is true, the function will not add the slash.
char *escaped = ap_os_escape_path(p, url, 1);
int ap_is_url (const char *string)
(Declared in the header file httpd.h.) This function returns true if string is a fully qualified
URI (including scheme and hostname), false otherwise. Among other things it is handy
when processing configuration directives that are expected to accept URIs.
if(ap_is_url(string)) {
...
}
char *ap_construct_url (pool *p, const char *uri, const request_rec
*r)
This function builds a fully qualified URI string from the path specified by uri , using the
information stored in the request record r to determine the server name and port. The
port number is not included in the string if it is the same as the default port 80. For example, imagine that the current request is directed to the virtual server
www.modperl.com at port 80. Then the following call will return the string http://www.modperl.com/
index.html :
char *url = ap_construct_url(r->pool, "/index.html", r);
char *ap_construct_server (pool *p, const char *hostname, unsigned
port, const request_rec *r)
(Declared in the header file httpd.h .) The ap_construct_server() function builds the hostname:port
part of a URI and returns it as a new string. The port will not be included in
the string if it is the same as the default. You provide a resource pool in p, the name of
the host in hostname, the port number in port, and the current request record in r. The
request record is used to determine the default port number only and is not otherwise
involved in constructing the string. For example, the following code will return www.modperl.com:8001:
char *server = ap_construct_server(r->pool, hostname, 8001, r);
unsigned short ap_default_port_for_scheme (const char *scheme)
(Declared in the header file util_uri.h.) This handy routine returns the default port number
for the given URL scheme. The scheme you provide is compared in a case-insensitive
manner to an internal list maintained by Apache. For example, here's how to determine
the default port for the secure HTTPS scheme:
unsigned short port = ap_default_port_for_scheme("https");
unsigned short ap_default_port_for_request (const request_rec *r)
(Declared in the header file util_uri.h .) The ap_default_port_for_request() function looks up
the scheme from the request record argument, then calls ap_default_port() to return the
default port for that scheme. It is almost exactly equivalent to calling ap_default_port_
for_scheme(r->parsed_uri.scheme) .
unsigned short port = ap_default_port_for_request(r);
struct hostent * ap_pgethostbyname (pool *p, const char *hostname)
(Declared in the header file util_uri.h .) This function is a wrapper around the standard
gethostbyname() function. The struct hostent pointer normally returned by the standard
function lives in static storage space, so ap_pgethostbyname() makes a copy of this structure
from memory allocated in the passed resource pool in order to avoid any trouble
this might cause. This allows the call to be thread-safe.
int ap_parse_uri_components (pool *p, const char *uri, uri_components
*uptr)
(Declared in the header file util_uri.h .) Given a pool pointer p , a URI uri , and a uri_components structure pointer uptr , this routine will parse the URI and place the extracted
components in the appropriate fields of uptr . The return value is either HTTP_OK (integer
200, not to be confused with the usual OK which is integer 0) to indicate parsing
success or HTTP_BAD_REQUEST to indicate that the string did not look like a valid
URI.
uri_components uri;
int rc = ap_parse_uri_components(p, "http://www.modperl.com/index.html", &uri);
char *ap_unparse_uri_components (pool *p, const uri_components *uptr,
unsigned flags);
(Declared in the header file util_uri.h .) The interesting ap_unparse_uri_components() routine
reverses the effect of the previous call, using a populated uri_components record to
create a URI string, which is returned as the function result. The flags argument is a bit
mask of options that modify the constructed URI string. Possible values for flags
include:
UNP_OMITSITEPART
Suppress the scheme and hostinfo parts from the constructed URI.
UNP_OMITUSER
Suppress the username from the hostinfo part of the URI.
UNP_OMITPASSWORD
Suppress the password from the hostinfo part of the URI.
UNP_REVEALPASSWORD
For security reasons, unless the UNP_REVEALPASSWORD bit is explicitly set, the password part of the URI will be replaced with a series of X characters.
UNP_OMITPATHINFO
If this bit is set, completely suppress the path part of the URI, including the query string.
UNP_OMITQUERY
Suppress the query string and the fragment, if any. The following example will re-create the URI without the username and password parts.
char *string = ap_unparse_uri_components(p, &uri,
Show Contents Previous Page Next Page Copyright © 1999 by O'Reilly & Associates, Inc. |