Show Contents Previous Page Next Page
Chapter 11 - C API Reference Guide, Part II / String and URI Manipulation String Comparison, Pattern Matching, and Transformation The following group of functions provides string pattern matching, substitution, and transformation operations similar to (but more limited than) Perl's built-in operators. Most of these functions are declared in httpd.h. The few exceptions are listed separately. int ap_fnmatch (const char *pattern, const char *string, int flags)
(Declared in the header file fnmatch.h .) The ap_fnmatch()
function is based on the POSIX.2 fnmatch() function. You provide
a search pattern, a string to search, and a bit mask of option flags. The
function will return 0 if a match is found, or the nonzero constant FNM_NOMATCH
otherwise. Note that the function result is the reverse of what you would
expect. It is done this way in order to be compatible with strcasecmp().
It may be less confusing to compare the function result to the constant FNM_NOMATCH
than to test for zero. The pattern you provide is not a regular expression, but a shell-style glob
pattern. In addition to the wildcard characters * and ? ,
patterns containing both string sets like foo.{h,c,cc} and character
ranges like .[a-zA-Z]* are allowed. The flags argument is the
bitwise combination of zero or more of the following constants (defined in
fnmatch.h): FNM_NOESCAPE
If set, treat the backslash character as an ordinary character instead of
as an escape.
FNM_PATHNAME
If set, allow a slash in string to match only a slash in pattern and never a wildcard character or character range.
FNM_PERIOD
If this flag is set, a leading period in string must match exactly with a period in pattern. A period is considered to be leading if it is the first character in string or if FNM_PATHNAME is set and the period immediately follows a slash.
FNM_CASE_BLIND
If this flag is set, then a case-insensitive comparison is performed. This is an Apache extension and not part of the POSIX.2 standard.
Typically you will use ap_fnmatch() to match filename patterns. In fact, this function is
used internally for matching glob-style patterns in configuration sections such as FilesMatch
and LocationMatch. Example:
if(ap_fnmatch("*.html", filename, FNM_PATHNAME|FNM_CASE_BLIND)
!= FNM_NOMATCH) {
...
}
int ap_is_fnmatch (const char *pattern)
(Declared in the header file fnmatch.h .) This function returns true if pattern contains glob
characters, false otherwise. It is useful in deciding whether to perform an ap_fnmatch()
pattern search or an ordinary string comparison.
if (ap_is_fnmatch(target)) {
file_matches = !ap_fnmatch(filename, target, FNM_PATHNAME);
}
else {
file_matches = !strcmp(filename, target);
}
int ap_strcmp_match (const char *string, const char *pattern
Just to add to the confusion, ap_strcmp_match() provides functionality similar to ap_
fnmatch() but only recognizes the * and ? wildcards. The function returns 0 if a match is
found, nonzero otherwise. This is an older function, and there is no particular reason to
prefer it. However, you'll see it used in some standard modules, including in mod_autoindex
where it is called on to determine what icon applies to a filename.
if(!ap_strcmp_match(filename, "*.html")) {
...
}
int ap_strcasecmp_match (const char *str, const char *exp)
ap_strcasecmp_match is the same as ap_strcmp_match but case-insensitive.
int ap_is_matchexp (const char *string)
This function returns true if the string contains either of the wildcard characters * and ? ,
false otherwise. It is useful for testing whether a user-provided configuration string
should be treated as a pattern to be passed to ap_strcmp_match() or as an ordinary string.
Example:
if (ap_is_matchexp(target)) {
file_matches = !ap_strcmp_match(filename, target);
}
else {
file_matches = !strcmp(filename, target);
}
int ap_checkmask (const char *string, const char *mask)
(Declared in the header file util_date.h .) The ap_checkmask() function will attempt to
match the given string against the character mask. Unlike the previous string matching
functions, ap_checkmask() will return true (nonzero) for a successful match, false (zero) if
the match fails. The mask is constructed from the following characters: @ | uppercase letter | $ | lowercase letter | & | hex digit | # | digit | ~ | digit or space | * | swallow remaining characters | x | exact match for any other character |
For example, ap_parseHTTPdate() uses this function to determine
the date format, such as RFC 1123:
if (ap_checkmask(date, "## @$$ #### ##:##:## *")) {
...
}
Because it was originally written to support date and time parsing routines,
this function is declared in util_date.h.
int ap_ind (const char *s, char c)
This function is equivalent to the standard C library index() function. It will scan the
character string s from left to right until it finds the character c, returning the location of
the first occurrence of c, or -1 if the character is not found. Note that the function result
is the integer index of the located character, not a string pointer as in the standard C
function.
int ap_rind (const char *s, char c)
ap_rind() behaves like ap_ind(), except that it scans the string from right to left, returning
the index of the rightmost occurrence of character c . This function is particularly
useful for Hebrew and Arabic texts.
regex_t *ap_pregcomp (pool *p, const char *pattern, int cflags); void ap_pregfree (pool *p, regex_t *reg);
Apache supports regular expression matching using the system library's regular expression
routines regcomp(), regexec(), regerror(), and regfree(). If these functions are not available, then Apache uses its own package of regular expression routines. Documentation
for the regular expression routines can be found in your system manual pages. If your
system does not support these routines, the documentation for Apache's regular expression
package can be found in the regex/ subdirectory of the Apache source tree. We won't try to document the complexities of regular expression matching here, except
to remind you that regular expression matching occurs in two phases. In the first phase,
you call regcomp() to compile a regular expression pattern string into a compiled form. In
the second phase, you pass the compiled pattern to regexec() to match the search pattern
against a source string. In the course of performing its regular expression match,
regexec() writes the offsets of each matched parenthesized subexpression into an array
named pmatch[] . The significance of this array will become evident in the next section
when we discuss ap_pregsub(). For your convenience, Apache provides wrapper routines around regcomp() and regfree()
that make working with regular expressions somewhat simpler. ap_pregcomp() works like
regcomp() to compile a regular expression string, except that it automatically allocates
memory for the compiled expression from the provided resource pool pointer. pattern
contains the string to compile, and cflags is a bit mask of flags that control the type of
regular expression to perform. The full list of flags can be found in the regcomp() manual
page. In addition to allocating the regular expression, ap_pregcomp() automatically installs a
cleanup handler that calls regfree() to release the memory used by the compiled regular
expression when the transaction is finished. This relieves you of the responsibility of
doing this bit of cleanup yourself. Speaking of which, the cleanup handler installed by ap_pregcomp() is ap_pregfree(). It frees
the regular expression by calling regfree() and then removes itself from the cleanup handler
list to ensure that it won't be called twice. You may call ap_pregfree() yourself if, for
some unlikely reason, you need to free up the memory used by the regular expression
before the cleanup would have been performed normally. char *ap_pregsub (pool *p, const char *input, const char *source, size_t
nmatch, regmatch_t pmatch[])
After performing a regular expression match with regexec(), you may use ap_pregsub() to
perform a series of string substitutions based on subexpressions that were matched during
the operation. The function is broadly similar in concept to what happens in the
right half of a Perl s/// operation. This function uses the pmatch[] array, which regexec() populates with the start and end
positions of all the parenthesized subexpressions matched by the regular expression.
You provide ap_pregsub() with p, a resource pool pointer, input, a character string describing
the substitutions to perform, source, the source string used for the regular expression
match, nmatch, the size of the pmatch array, and pmatch itself. input is any arbitrary string containing the expressions $1 through $9 . ap_pregsub() replaces these expressions with the corresponding matched subexpressions from the
source string. $0 is also available for your use: it corresponds to the entire matched
string. The return value will be a newly allocated string formed from the substituted input string. The following example shows ap_pregsub() being used to replace the .htm and .HTM filename
extensions with .html. We begin by calling ap_pregcomp() to compile the desired regular
expression and return the compiled pattern in memory allocated from the resource
pool. We specify flags that cause the match to be case-insensitive and to use the modern
regular expression syntax. We proceed to initialize the pmatch[] array to hold two
regmatch_t elements. Two elements are needed: the first which corresponds to $0 and
the second for the single parenthesized subexpression in the pattern. Next we call
regexec() with the compiled pattern, the requested filename, the pmatch[] array, and its
length. The last argument to regexec(), which is used for passing various additional option
flags, is set to zero. If regexec() returns zero, we go on to call ap_pregsub() to interpolate
the matched subexpression (the filename minus its extension) into the string $1.html ,
effectively replacing the extension.
regmatch_t pmatch[2];
regex_t *cpat = ap_pregcomp(r->pool, "(.+)\\.htm$", REG_EXTENDED|REG_ICASE);
if (regexec(cpat, r->filename, cpat->re_nsub+1, pmatch, 0) == 0) {
r->filename = ap_pregsub(r->pool, "$1.html",
r->filename, cpat->re_nsub+1, pmatch);
}
char *ap_escape_shell_cmd (pool *p, const char *string)
If you must pass a user-provided string to a shell command, you should first use ap_
escape_shell_cmd() to escape characters that might otherwise be interpreted as shell metacharacters.
The function inserts backslashes in front of the potentially unsafe characters
and returns the result as a new string. Unsafe characters include the following:
& ; ` ' " | * ? ~ < > ^ ( ) [ ] { } $ \n
Example:
char *escaped_cmd = ap_escape_shell_cmd(r->pool, command); Do not rely only on this function to make your shell commands safe. The commands themselves may behave unpredictably if presented with unreasonable input, even if the shell behaves well. The best policy is to use a regular expression match to sanity-check the contents of all user-provided data before passing it on to external programs.
char *ap_escape_quotes (pool *p, const char *string)
This function behaves similarly to the previous one but only escapes double quotes.
char *escaped_string = ap_escape_quotes(r->pool, string); void ap_str_tolower (char *string)
This function converts all uppercase characters in the given string to lowercase characters,
modifying the new string in place.
ap_str_tolower(string); char *ap_escape_html (pool *p, const char *string)
The ap_escape_html() function takes a character string and returns a modified copy in
which all special characters (such as > and < ) are replaced with their HTML entities.
This makes the string safe to use inside an HTML page. For example, after the following
example is run, the resulting string will read <h1>Header Level 1 Example</h1> :
char *display_html = ap_escape_html(p, "<h1>Header Level 1 Example</h1>"); char *ap_uuencode (pool *p, const char *string)
This function takes a string, base64-encodes it, and returns the encoded version in a
new string allocated from the provided resource pool. Base64 is the algorithm used by
the uuencode program (hence the function name) and is widely used by the MIME system
for packaging binary email enclosures.
char *encoded = ap_uuencode(p, encoded); char *ap_uudecode (pool *p, char *string)
ap_uudecode() reverses the effect of the previous function, transforming a base64-
encoded string into its original representation.
char *decoded = ap_uudecode(p, encoded);
Show Contents Previous Page Next Page Copyright © 1999 by O'Reilly & Associates, Inc. |