cc/td/doc/product/software/ssr921
hometocprevnextglossaryfeedbacksearchhelp
PDF

Table of Contents

Regular Expressions

Regular Expressions

This appendix describes regular expressions. Regular expressions can be used in addresses and fields for the X.25 switching feature, in chat scripts for the dial-on-demand routing (DDR) feature and in access lists for BGP routes. In addition to describing regular expressions, this appendix provides a description of the regular expression pattern matching scheme specifically used by the X.25 switching feature.

Regular Expressions

Regular expressions allow pattern matching operations on strings contained in commands, for example, wide ranges of X.121 addresses and Call User Data fields. If you are familiar with regular expressions from UNIX programs such as regexp, you are already familiar with much of Cisco Systems' regular expression implementation.

Writing regular expressions is simple once you see and try a few examples. A regular expression is a formula for generating a set of strings. If a particular string can be generated by a given regular expression, then that string and regular expression match. In many ways, a regular expression is a program, and the regular expression matches the strings it generates.

A regular expression is a branch or any number of branches separated by a vertical bar (|). A string is said to match the regular expression if it is generated by the "program" specified in any of the branches. Of course, a string can be generated by more than one branch. For example, "abc" is generated by all branches in the regular expression

abc|a*(bc)+|(ab)?c|.*

Also remember that if a regular expression can match two different parts of an input string, it will match the earliest part first.

The regular expression support and the technical information for this portion of the documentation is based on Henry Spencer's public domain rgrep(3) library package.

A regular expression is built up of different components, each of which is used to build the regular expression string-generating program. The possible components of a regular expression are as follows:

Ranges

A range is a sequence of characters contained within left and right square brackets ([ ]). A character matches a range if that character is contained within the range; for example, the following syntax forms the range consisting of the characters "a," "q," "c," "s," "b," "v," and "d." The order of characters is usually not important; however, there are exceptions and these will be noted.

[aqcsbvd]

You can specify an ASCII sequence of characters by specifying the first and last characters in that sequence, and separating them with a hyphen (-).

[a-dqsv]

The above example could also be written so as to specify right square brackets ( ]) as a character in a range. To do so, enter the bracket as the first character after the initial left square bracket that starts the range.

The following example matches right bracket and the letter "d."

[ ]d]

To include a hyphen (-), enter it as either the first or the last character of the range.

You can reverse the matching of the range by including a wedge (caret) at the start of the range. The following example matches any letter except the ones listed. When using the wedge with the special rules for including a bracket or hyphen, make the wedge the very first character.

[^a-dqsv]

The following example matches anything except a right square bracket ( ])or the letter "d":

[^]d]

Atoms

Atoms are the most primitive usable part of regular expressions. An atom can be as simple as a single character. The letter "a" is an atom, for example. It is also a very simple regular expression; that is, a program that generates only one string, which is the single-letter string made up of the letter "a." While this might seem trivial, it is important to understand the set of strings that your regular expression program generates. As will be seen in upcoming explanations and examples, much larger sets of strings can be generated from more complex regular expressions.

Certain characters have a special meaning when used as atoms; refer to Table C-1.


Special Symbols Used as Atoms
Character Description
       . Matches any single character.
       ^ Matches the beginning of the input string.
       $ Matches the end of the input string.
\character Matches character.

       _
Matches the beginning of the input string, the end of the input string, or a space.

Note another use for the ^ symbol.

As an example, the regular expression matches "abcd" only if "abcd" starts the full string to be matched:

^abcd

In the following expression, an atom is a range that matches any single letter, as long as it is not the letters "a," "b," "c," or "d."

[^abcd]

It was previously stated that a single character string such as the letter "a" is an atom. A character by itself, such as $, means "match the null string at the end of the input string."

$

Whereas this atom matches a dollar sign ($). Preceding a character with a backslash (\) removes the special meaning of that character.

\$

Any character can be preceded with the backslash character with no adverse affect.

\a

Atoms are also full regular expressions surrounded by parentheses. For example, both "a" and "(a)" are atoms matching the letter "a." This will be important later, as we see patterns to manipulate entire regular expressions.

Pieces

A piece is an atom optionally followed by one of the symbols listed in Table C-2.


Special Symbols Used with Pieces
Character Description
       * Matches 0 or more sequences of the atom.
       + Matches 1 or more sequences of the atom.
       ? Matches the atom or the null string.

The following example matches any number of occurrences of the letter "a," including none.

a*

The following string requires there to be at least one letter "a" in the string to be matched:

a+

The following string means that the letter "a" can be there once, but it does not have to be:

a?

The following string matches any number of asterisks (*).

\**

The following example uses parentheses. This string matches any number of the two-atom string "ab."

(ab)*

As a more complex example, the following string matches one or more instances of alphanumeric pairs (but not none, that is, an empty string is not a match):

([A-Za-z][0-9])+

The order for matches using the optional *, +, or ? symbols is longest construct first. Nested constructs are matched from outside to inside. Concatenated constructs are matched beginning at the left side of the construct.

Branch

A branch is simply a set of zero or more concatenated pieces. The previous alphanumeric example was an example of a branch as concatenated pieces. Branches are matched in the order normally read--from left to right. For example, in the previous example, the regular expression matches "A9b3," but not "9Ab3" because the alphabet is given first in the two-atom branch of [A-Za-z][0-9].

Regular Expression Pattern Matching for X.25 Switching Feature

This section provides regular expression pattern matching information specific to the X.25 switching feature. Pattern matching for this feature is only one example of how regular expressions are used.

X.121 address and Call User Data are used to find a matching routing table entry. The list is scanned from the beginning to the end and each entry is pattern-matched with the incoming X.121 address and Call User Data to the X.121 and Call User Data in the routing table entry. If the pattern match for both entries succeeds, then that route is used. If the incoming call does not have any Call User Data, then only the X.121 address pattern match need succeed with an entry that only contains an X.121 pattern. If Call User Data is present, and while scanning, a route is found that matches the X.121 address but does not have a Call User Data pattern, then that route is used when an exact match cannot be found.

Regular expressions are used to allow pattern-matching operations on the X.121 addresses and Call User Data. The most common operation is to do prefix matching on the X.121 DNIC field and route accordingly. For example, the pattern ^3306 will match all X.121 addresses with a DNIC of 3306. The caret (^) is a special regular expression character that means to anchor the match at the beginning of the pattern.

If a matching route is found, the incoming call is forwarded to the next hop depending on the routing entry. If no match is found, the call is cleared. If the route specifies a serial interface running X.25, the call will attempt to be forwarded over that interface. If the interface is not operational or out of available virtual circuits, the call will be cleared. Otherwise, the expected Clear Request or Call Accepted message will be forwarded back toward the originator. The "null 0" interface can be used to early-terminate or refuse calls to specific locations.

If the matching route specifies an IP address, a TCP connection will be established to port 1998 at the specified IP address, which must be another Cisco protocol translator. The Call Request packet will be forwarded to the remote protocol translator, where it will be processed in a similar fashion. If a routing table entry is not present or the serial interface is down or out of virtual circuits, a Clear Request will be sent back and the TCP connection will be closed. Otherwise, the call will be forwarded over the serial interface and the expected Clear Request or Call Accepted packet will be returned. Incoming calls received via TCP connections that match a routing entry specifying an IP address will be cleared. This restriction prevents protocol translators from establishing a TCP connection to another protocol translator that would establish yet another TCP connection. A protocol translator must always connect to the remote protocol translator with the destination DTE directly attached.

hometocprevnextglossaryfeedbacksearchhelp
Copyright 1989-1997 © Cisco Systems Inc.