18.3 Tokenizing Rules
The sendmail program views the text that makes
up rules and addresses as being composed of individual tokens. Rules
are tokenized—divided into individual
parts—while the configuration file is being read and while they
are being normalized. Addresses are tokenized at another time (as
we'll show later), but the process is the same for
both.
The text our.domain, for example, is composed of
three tokens: our, a dot, and
domain. Tokens are separated by special
characters that are defined by the OperatorChars
option (OperatorChars), or the $o
macro prior to V8.7:
define(`confOPERATORS', `.:%@!^/[ ]+') m4 configuration
O OperatorChars=.:%@!^/[ ]+ V8.7 and above
Do.:%@!^=/[ ] prior to V8.7
When any of these separation characters are recognized in text, they
are considered individual tokens. Any leftover text is then combined
into the remaining tokens:
xxx@yyy;zzz becomes xxx @ yyy;zzz
@ is defined to be a token, but
; is not. Therefore, the text
xxx@yyy;zzz is divided into three tokens.
In addition to the characters in the OperatorChars
option, sendmail also defines 10 tokenizing
characters internally:
( )<>,;"\r\n
This internal list, and the list defined by the
OperatorChars option, are combined into one master
list that is used for all tokenizing. The previous example, when
divided by using this master list, becomes five tokens instead of
just three:
xxx@yyy;zzz becomes xxx @ yyy ; zzz
In rules, quotation marks can be used to override the meaning of
tokenizing characters defined in the master list. For example:
"xxx@yyy";zzz becomes "xxx@yyy" ; zzz
Here, three tokens are produced because the @
appears inside quotation marks. Note that the quotation marks are
retained.
Because the configuration file is read sequentially from start to
finish, the OperatorChars option should be defined
before any rules are declared. But note, beginning with V8.7
sendmail, if you omit this option you cause the
separation characters to default to:
. : % @ ! ^ / [ ]
Also note that beginning with V8.10, if you declare the
OperatorChars option after any rule, the following
error will be produced:
Warning: OperatorChars is being redefined.
It should only be set before ruleset definitions.
To prevent this error, declare the OperatorChars
option in your mc configuration file only with
the confOPERATORS m4 macro
(OperatorChars):
define(`confOPERATORS', `.:%@!^/[ ]-')
Here, we have added a dash character (-) to the
default list. Note that you should not define your own operator
characters unless you first create and examine a configuration file
with the default settings. That way you can be sure you always
augment the actual defaults you find, and avoid the risk that you
might miss new defaults in the future.
18.3.1 $-operators Are Tokens
As we
progress into the details of rules, you will see that certain
characters become operators when prefixed with a $
character. Operators cause sendmail to perform
actions, such as looking for a match ($* is a
wildcard operator) or replacing tokens with others by position
($1 is a replacement operator).
For tokenizing purposes, operators always divide one token from
another, just as the characters in the master list did. For example:
xxx$*zzz becomes xxx $* zzz
18.3.2 The Space Character Is Special
The space character is special for two
reasons. First, although the space character is not in the master
list, it always separates one token from
another:
xxx zzz becomes xxx zzz
Second, although the space character separates tokens, it is not
itself a token. That is, in this example the seven characters on the
left (the fourth is the space in the middle) become two tokens of
three letters each, not three tokens. Therefore, the space character
can be used inside the LHS or RHS of rules for improved clarity but
does not itself become a token or change the meaning of the rule.
18.3.3 Pasting Addresses Back Together
After an address has passed through all the rules (and has been
modified by rewriting), the tokens that form it are pasted back
together to form a single string. The pasting process is very
straightforward in that it mirrors the tokenizing process:
xxx @ yyy becomes xxx@yyy
The only exception to this straightforward pasting process occurs
when two adjoining tokens are both simple text. Simple text is
anything other than the separation characters (defined by the
OperatorChars option, OperatorChars, and internally by
sendmail) or the operators (characters prefixed
by a $ character). The xxx and
yyy in the preceding example are both simple text.
When two
tokens of simple text are pasted together, the character defined by
the BlankSub option (BlankSub)
is inserted between them. Usually, that option is defined as a
dot, so two tokens of simple text would have a dot inserted between
them when they are joined:
xxx yyy becomes xxx.yyy
Note that the improper use of a space character in the LHS or RHS of
rules can lead to addresses that have a dot (or other character)
inserted where one was not intended.
|