[Chapter 8] 8.7 Wildcard Operators

8.7 Wildcard Operators

Rules would be pretty useless if they always had to match the workspace exactly. Fortunately, that is not the case. In addition to literal text, you can also use wildcard operators that allow the LHS of rules to match arbitrary text in the workspace. To illustrate, consider this rule:

R$+     
rhs is here

 


 
lhs

This LHS begins with the first character following the R . The LHS in this example is

$+

This is a wildcard operator. The truth of this if statement is determined by a process called pattern matching . The LHS $+ (a single token) is a pattern that means "match one or more tokens." The address that is being evaluated is tokenized and placed into the workspace, and then the workspace is compared to that pattern:

gw@wash.dc.gov
  


  
tokenized into

  


gw @ wash . dc . gov    
in the workspace

When matching the workspace to an LHS pattern, sendmail scans the workspace from left to right. Each token in the workspace is compared to the wildcard operator (the $+ ) in the LHS pattern. If the tokens all match the pattern, the if part of the if-then pair is true.

The $+ wildcard operator simply matches any one or more tokens:


workspace                 pattern

gw               $+        
match one token (``one'')

@                          
and optionally more (``or more'')

wash                      
.
dc
.
gov

As you can see, if there are any tokens in the address at all (the workspace is not empty), the LHS rule $+ evaluates to true.

8.7.1 Other Text in the LHS

A rule of $+ (match one or more tokens) is not sufficient to handle all possible addresses (especially bad addresses):

gw@wash.dc.gov         $+ 
should match and does

@wash.dc.gov           $+ 
matches an incomplete address

To make matching in the LHS more effective, sendmail allows other text to appear in the pattern. To make sure that the address in the workspace contains a user part, the @ character, and a host part, the following LHS pattern can be used:

$+@$+

Just like the address in the workspace, this pattern is tokenized before it is compared for a match. Wildcard operators (like $+ ) count as one token, and @ is a token because it is a separator character:

.:

@

[]          
 you can change these

()<>,;\"\r\n   
 you cannot change these

The pattern of $+@$+ is separated into three tokens:

$+ @ $+

Text in the pattern must match text in the workspace exactly (token for token) if there is to be a match. A good address in the workspace (one containing a user part and a host part) will match our new LHS ($+@$+ ):


workspace                 pattern

gw              $+        
match one or more

@               @         
match exactly

wash            $+        
match one

.                         
  or more

dc
.
gov

Here, the flow of matching begins with the first $+ , which matches one token (of the one or more) in the workspace. The @ matches the identical token in the workspace. At this point, the $+@ part of the pattern has been satisfied. All that remains is for the final $+ to match its one or more of all the remaining tokens in the workspace, which it does.

A bad address in the workspace, on the other hand, will not match the pattern. Consider an address, for example, that lacks a user part:

@wash.dc.gov		
 in the workspace



workspace                 pattern

@               $+	
 match one

wash			
   or more

.                         
dc
.
gov
                @		
 match exactly (fails!)

                $+

Here, the first $+ incorrectly matches the @ in the workspace. Since there is no other @ in the workspace to be matched by the @ in the pattern, the first $+ matches the entire workspace. Because there is nothing left in the workspace, the attempt to match the @ fails. When any part of a pattern fails to match the workspace, the entire LHS fails (the if part of the if-then is false).

8.7.2 Minimal Matching

One small bit of confusion may yet remain. When a wildcard operator such as $+ is used to match the workspace, sendmail always does a minimal match . That is, it matches only what it needs to for the next part of the rule to work. Consider the following:

R$+@$+

In this LHS the first $+ matches everything in the workspace up to the first @ character. For example, consider the following workspace:

a@b@c

In the above, $+@ causes the $+ to match only the characters up to the first @ character, the a . This is the minimum that needs to be matched, and so it is the maximum that will be matched.

8.7.3 More Play with LHS Matching

Take a moment to replace the previous demo rules with the following three new demo rules in the client.cf file:

S0
R@         one
R@$+       two
R$+@$+     three

Again, these three rules are for demonstration purposes only (you'll see how to declare a real one soon enough). We've given each temporary RHS a number to see whether it is selected. Now run sendmail in rule-testing mode:

% 

./sendmail -Cclient.cf -bt


ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>

Now print the rules to remind yourself what they are:

> 

=S 0


R@              one 
R@ $+           two 
R$+ @ $+                three

We'll test those rules with an assortment of test addresses. The first address to try is a lone @ :

> 

0 @


rewrite: ruleset  0   input: @
rewrite: ruleset  0 returns: one

The @ causes the first temporary RHS to be selected because the rule is

R@      one

The LHS here (the pattern to match) contains the lone @ . That pattern matches the tokenized workspace @ exactly, so the RHS for that rule rewrites the workspace to contain one . Since one does not contain an @ character, neither the second nor third rules match, so the entire rule set returns one .

Next enter an address that just contains a host and domain part but not a user part:

> 

0 @your.domain


rewrite: ruleset  0   input: @ your . domain
rewrite: ruleset  0 returns: two

The first thing to notice is what was not printed! The workspace does not match the pattern of the first rule. But instead of returning an error, the workspace is carried down as is to the next rule - where it does match:

 @your.domain 
does not match, so ...

 


R@      one
 


 
try the next rule

 


R@$+    two

Now enter an address that fails to match the first two rules but successfully matches the third:

> 

0 you@your.domain


rewrite: ruleset  0   input: you @ your . domain
rewrite: ruleset  0 returns: three

The flow for this address is

 your@your.domain 
does not match, so ...

 


R@      one
 


 
try the next rule, which also does not match, so ...

 


R@$+    two
 


 
try the next rule, which does match.

 


R$+@$+  three

Try other addresses such as your login name or UUCP addresses such as you@host.uucp and host!you . Can you predict what will happen with weird addresses like @@ or a@b@c ?

When you are done experimenting, exit rule-testing mode and delete the four temporary lines that you added for this demonstration.