regex

Use regular expressions

V8.9 and above

The regex type allows you to parse tokens in the workspace using POSIX regular expressions. For information on how to use regular expressions see the online manuals ed(1) and regexp(1). A regex database-map type is declared like this:

Kname regex expression

The name is the symbolic name you will use to reference this database map from inside the RHS of rule sets. The expression is the literal text that composes your regular expression. Here is a simple example:

Knumberedname regex   ^[0-9]+<@(aol|msn).com.?>

The intention here is for this regular expression to match any address that has an all-numeric user part (the part before the <@), and a domain part that is either aol.com or (the | character) msn.com. To make rules that use this type easier to write, you can add a -a switch to the declaration:

Knumberedname regex -a.FOUND ^[0-9]+<@(aol|msn).com.?>

Here the -a database switch causes .FOUND to be appended to any successful match.

Note that because of the way we have declared this database map, nothing but the suffix will be returned on a successful match. To get the original key returned you need to also use the -m database switch (-m).

This regex type can use a number of switches to good advantage. The complete list is shown in Table 23-24.

Table 23-24. The regex database-map type K command switches
Switch	§	Description
`-a`	-a	Append tag on successful match
`-b`	See this section	Use basic, not extended, regular expression matching
`-D`	-D	Don't use this database map if DeliveryMode=defer
`-d`	See this section	The delimiting string
`-f`	-f	Don't fold keys to lowercase, and cause the regular expression to match in a case-insensitive manner
`-m`	-m	Suppress replacement on match
`-n`	See this section	NOT—that is, invert the test
`-q`	-q	Don't strip quotes from key
`-S`	-S	Space replacement character
`-s`	See this section	Substring to match and return
`-T`	-T	Suffix to append on temporary failure
`-t`	-t	Ignore temporary errors

Note that some additional explanation for a few of these switches is provided in the sections that follow. Also, for an actual example of the regex type, see the file cf/cf/knecht.mc, which demonstrates a way to deal with one type of spam email.

The -b regex database-map switch

The -b switch limits the regular expression to a more limited but faster form. If you are using only simple regular expressions, as in the nature of those defined by ed(1), you can use this -b switch to slightly speed up the process:

Kmatch regex -b -aLOCAL @localhost

Here, the search is for a workspace that contains the substring @localhost. Because this is a very simple regular expression, the -b switch is appropriate. If you use the -b on a complex match (such as the one in the previous section's -n example), you might see an error such as this:

configfile: line num: field (2) out of range, only 1 substring in pattern

The -d regex database-map switch

There might be times when you would prefer some other character, operator, or token to replace the $| that is returned when using the -s switch. If so, you can specify a different one with the -d database switch. Consider:

Kmatch regex -s2,3 -d+|+ -a.FOUND (\<a\>|\<b\>)@(\<bob\>|\<ted\>).(\<com\>|\<org\>)

Here we specify that the three characters +|+ will replace the single operator $| in the returned value:

> test a@bob.com
test               input: a @ bob . com
test             returns: bob+|+com . FOUND

Note that here the bob+|+com is a single token.

You can opt to have the original key returned. This is done by specifying the -m database switch:

Kmatch regex -s2,3 -m -d+|+ -a.FOUND (\<a\>|\<b\>)@(\<bob\>|\<ted\>).(\<com\>|\<org\>)

Note that the -m switch overrides the presence of the -s and -d switches:

> test a@bob.com
test               input: a @ bob . com
test             returns: a @ bob . com . FOUND

The -n regex database-map switch

The -n switch inverts the entire sense of the regular expression lookup. It returns a successful match only if the regular expression does not match. Consider:

Kmatch regex -m -n -a.FOUND (\<a\>|\<b\>)@(\<bob\>|\<ted\>).(\<com\>|\<org\>)

If you view the effect of this switch in rule-testing mode, you will see that the result is inverted:

> test a@bob.com
test               input: a @ bob . com
test             returns: a @ bob . com
> test x@y.net
test               input: x @ y . net
test             returns: x @ y . net . FOUND

The -s regex database-map switch

The -s database-map switch is used with the regex type to specify a substring to match and return. To illustrate, consider the following mini-configuration file:

V10
Kmatch regex -s (\<bob\>|\<ted\>)
Stest
R $*       $@ $(match $1 $)

The regular expression looks to match either the name bob or ted, but no other names. The -s says to return the substring actually matched in the expression along with the key, the two separated from each other by a $| operator. Now, observe this mini-configuration file in rule-testing mode:

% /usr/sbin/sendmail -bt -Cdemo.cf
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>
> test bob
test               input: bob
test             returns: bob $| bob
> test alice
test               input: alice
test             returns: alice

By adding a -a switch, which appends text to the matched key:

Kmatch regex -s -a.FOUND (bob|ted)

we see that the matched key with -s is second:

> test bob
test               input: bob
test             returns: bob $| bob . FOUND

When multiple substrings can be matched, the -s database switch can be used to specify which substring match to return. Consider:

Kmatch regex -s2 -a.FOUND (\<a\>|\<b\>)@(\<bob\>|\<ted\>)

There are two substring searches here, first the (\<a\>|\<b\>) choice, then the (\<bob\>|\<ted\>) choice. Because the -s has a 2 as its argument, the second matched substring will be returned, not the first:

> test a@bob
test               input: a @ bob
test             returns: bob . FOUND

In more complex expressions it might be desirable to return multiple substrings. To do that just list them following the -s with each separated from the next by a comma:

Kmatch regex -s2,3 -a.FOUND (\<a\>|\<b\>)@(\<bob\>|\<ted\>).(\<com\>|\<org\>)

When multiple substrings are listed in this way, they are separated by the $| operator when they are returned:

> test a@bob.com
test               input: a @ bob . com
test             returns: bob $| com . FOUND

Table 23-24. The regex database-map type K command switches

The -b regex database-map switch

The -d regex database-map switch

The -n regex database-map switch

The -s regex database-map switch