Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP-UX Reference > R

regcomp(3C)

HP-UX 11i Version 3: February 2007
» 

Technical documentation

» Feedback
Content starts here

 » Table of Contents

 » Index

NAME

regcomp(), regerror(), regexec(), regfree() — regular expression matching routines

SYNOPSIS

#include <regex.h> int regcomp(regex_t *__restrict preg, const char *__restrict pattern, int cflags); int regexec( const regex_t *__restrict preg, const char *__restrict string, size_t nmatch, regmatch_t pmatch[__restrict], int eflags ); void regfree(regex_t *preg); size_t regerror( int errcode, const regex_t *__restrict preg, char *__restrict errbuf, size_t errbuf_size );

DESCRIPTION

These functions interpret regular expressions as described in regexp(5). They support both basic and extended regular expressions.

The structures regex_t and regmatch_t are defined in the header <regex.h>.

The regex_t structure contains at least the following member (use of other members results in non-portable code):

size_t re_nsub

Number of parenthesized subexpressions.

The regmatch_t structure contains at least the following members:

regoff_t rm_so

Byte offset from start of string to start of substring.

regoff_t rm_eo

Byte offset from start of string to the first character after the end of the substring.

regcomp() compiles the regular expression specified by the pattern argument and places the results in the structure pointed to by preg. The cflags argument is the bit-wise logical OR of zero or more of the following flags (defined in <regex.h>):

REG_EXTENDED

Use extended regular expressions.

REG_NEWLINE

If REG_NEWLINE is not set in cflags, a newline character in pattern or string is treated as an ordinary character. If REG_NEWLINE is set, newlines are treated as ordinary characters except as follows:

1.

A newline in string is not matched by a period outside of a bracket expression or by any form of a nonmatching list.

2.

A circumflex (^) in pattern, when used to specify expression anchoring, matches the zero-length string immediately after a newline in string, regardless of the setting of REG_NOTBOL.

3.

A dollar-sign ($) in pattern, when used to specify expression anchoring, matches the zero-length string immediately before a newline in string, regardless of the setting of REG_NOTEOL.

REG_ICASE

Ignore case in match. If a character in pattern is defined in the current LC_CTYPE locale as having one or more opposite-case counterpoints, both the character and any counterpoints match the pattern character. This applies to all portions of the pattern, including a string of characters specified to be matched via a back-reference expression (\n).

Within bracket expressions: Collation ranges, character classes, and equivalence classes are effectively expanded into equivalent lists of collation elements and characters. Opposite-case counterpoints are then generated for each collation element or character to form the complete matching list or non-matching list for the bracket expression. Opposite-case counterpoints for a multi-character collating element include all possible combinations of opposite-case counterpoints for each individual character comprising the collating element. These are then combined to form new valid multi-character collating elements. For example, the opposite-case counterpoints for [.ch.] could be [.Ch.], [.cH.], and [.CH.].

The default regular expression type for pattern is Basic Regular Expression. The application can specify Extended Regular Expressions by using the REG_EXTENDED cflags value.

If the function regcomp() succeeds, it returns zero; otherwise it returns a non-zero value indicating the error.

If regcomp() succeeds, and if the REG_NOSUB flag was not set in cflags, regcomp() sets re_nsub to the number of parenthesized subexpressions (delimited by \( and \) in basic regular expressions or ( and ) in extended regular expressions) found in pattern.

regexec() matches the null-terminated string specified by string against the compiled regular expression preg initialized by a previous call to regcomp(). If it finds a match, regexec() returns zero; otherwise it returns non-zero indicating either no match or an error. The eflags argument is the bit-wise logical OR of the following flags:

REG_NOTBOL

The first character of the string pointed to by string is not the beginning of the line. Therefore, the circumflex character (^), when taken as a special character, never matches.

REG_NOTEOL

The last character of the string pointed to by string is not the end of the line. Therefore, the dollar sign ($), when taken as a special character, never matches.

If nmatch is not zero, and REG_NOSUB was not set in the cflags argument to regcomp(), then regexec() fills in the pmatch array with byte offsets to the substrings of string that correspond to the parenthesized subexpressions of pattern: pmatch[i].rm_so is the byte offset of the beginning and pmatch[i].rm_eo is the byte offset one byte past the end of the substring i. (Subexpression i begins at the ith matched left parenthesis, counting from 1). Offsets in pmatch[0] identify the substring that corresponds to the entire regular expression. Unused elements of pmatch are set to -1. If there are more than nmatch subexpressions in pattern (pattern itself counts as a subexpression), regexec() still does the match, but only records the first nmatch substrings.

When matching a regular expression, any given parenthesized subexpression of pattern might participate in the match of several different substrings of string, or it might not match any substring, even though the pattern as a whole did match. The following explains which substrings are reported in pmatch when matching regular expressions:

1.

If subexpression i in a regular expression is not contained within another subexpression, and it participated in the match several times, the byte offsets in pmatch[i] delimit the last such match.

2.

If subexpression i is not contained within another subexpression, and it did not participate in an otherwise successful match (because either *, ?, or | was used), then the byte offsets in pmatch[i] are -1.

3.

If subexpression i is contained in subexpression j, and a match of subexpression j is reported in pmatch[j], the match or no-match reported in pmatch[i] is the last one that occurred within the substring in pmatch[j].

4.

If subexpression i is contained in subexpression j, and the offsets in pmatch[j] are -1, the offsets in pmatch[i] will also be -1.

5.

If subexpression i matched a zero-length string, both offsets in pmatch[i] refer to the character immediately following the zero-length substring.

If REG_NOSUB was set in cflags in the call to regcomp(), and nmatch is not zero in the call to regexec(), the content of the pmatch array is unspecified.

regfree() frees any memory allocated by regcomp() associated with preg.

If the preg argument to regexec() or regfree() is not a compiled regular expression returned by regcomp(), the result is undefined. A preg can no longer be treated as a compiled regular expression after it is given to regfree().

regerror() provides a mapping from error codes returned by regcomp() and regexec() to printable strings. regerror() generates a string corresponding to the value of the errcode parameter, which was the last non-zero value returned by regcomp() or regexec() with the given value of preg. The errcode parameter can take on any of the error values defined in <regex.h>. If errbuf_size is not zero, regerror() copies an appropriate error message into the buffer specified by errbuf. If the error message (including the terminating null) cannot fit in the buffer, it is truncated to errbuf_size - 1 bytes and null terminated.

If errbuf_size is zero, the errbuf parameter is ignored, but the return value is as defined below.

regerror() returns the size of the buffer (including terminating null) that is required to hold the entire error message.

EXTERNAL INFLUENCES

Locale

The LC_COLLATE category determines the collating sequence used in compiling and executing regular expressions.

The LC_CTYPE category determines the interpretation of text as single and/or multi-byte characters, the characters matched by character-class expressions in regular expressions, and the opposite-case counterpart for each character.

International Code Set Support

Single- and multi-byte character code sets are supported. However, if the LC_COLLATE and LC_CTYPE variables specify locale categories that are not based upon the same underlying codeset, the results of regcomp() is undefined.

RETURN VALUE

regcomp() returns zero for success and non-zero for an invalid expression or other failure. regexec() returns zero if it finds a match and non-zero for no match or other failure.

ERRORS

If regcomp() or regexec() detects one of the error conditions listed below, it returns the corresponding non-zero error code. The error codes are defined in the header <regex.h>.

REG_BADBR

The contents within the pair \{ (backslash left brace) and \} (backslash right brace) are unusable: not a number, number too large, more than two numbers, or first number larger than second.

REG_BADPAT

An invalid regular expression.

REG_BADRPT

The ? (question mark), * (asterisk), or + (plus sign) symbols are not preceded by a valid regular expression.

REG_EBRACE

The use of a pair of \{ (backslash left brace) and \} (backslash right brace) or {} (braces) is unbalanced.

REG_EBRACK

The use of [] (brackets) is unbalanced.

REG_EBOL

Using the ^ (caret) anchor and not beginning of line.

REG_ECHAR

There is an invalid multibyte character.

REG_ECOLLATE

There is an unusable collating element referenced.

REG_ECTYPE

There is an unusable character class type referenced.

REG_EEOL

Using the $ (dollar) anchor and not end of line.

REG_EESCAPE

There is a trailing \ in the pattern.

REG_EPAREN

The use of a pair of \( (backslash left parenthesis) and \) (backslash right parenthesis) or () is unbalanced.

REG_ERANGE

There is an unusable endpoint in the range expression.

REG_ESPACE

There is insufficient memory space.

REG_ESUBREG

The number in \digit is invalid or in error.

REG_NOMATCH

The regexec() function failed to match.

EXAMPLES

/* match string against the extended regular expression in pattern, treating errors as no match. Return 1 for match, 0 for no match. Print an error message if an error occurs. */ int match(string, pattern) char *string; char *pattern; { int i; regex_t re; char buf[256]; i=regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB); if (i != 0) { (void)regerror(i,&re,buf,sizeof buf); printf("%s\n",buf); return(0); /* report error */ } i = regexec(&re, string, (size_t) 0, NULL, 0); regfree(&re); if (i != 0) { (void)regerror(i,&re,buf,sizeof buf); printf("%s\n",buf); return(0); /* report error */ } return(1); }

The following demonstrates how the REG_NOTBOL flag could be used with regexec() to find all substrings in a line that match a pattern supplied by a user.

(void) regcomp(&re, pattern, 0); /* look for first match at start of line */ error = regexec(&re, &buffer[0], 1, &pm, 0); while (error == 0) { /* while matches found */ /* find next match on line */ error = regexec(&re, &buffer[pm.rm_eo], 1, &pm, REG_NOTBOL); }

AUTHOR

regcomp(), regerror(), regexec(), and regfree() were developed by OSF and HP.

SEE ALSO

regexp(5).

STANDARDS CONFORMANCE

regcomp(): XPG4, POSIX.2

regerror(): XPG4, POSIX.2

regexec(): XPG4, POSIX.2

regfree(): XPG4, POSIX.2

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1983-2007 Hewlett-Packard Development Company, L.P.