Rewriting an E−mail Address Pattern Matching

−r−−r−−r−− 1 root other 29641 Oct 21 1999 mail.cs.cf −r−−r−−r−− 1 root other 1250 Dec 29 1998 mail.cs.mc Though the existing tools will help in managing sendmail configuration, a thorough understanding and knowledge of the contents of the sendmail.cf file is crucial for a successful sendmail administration.

20.3 The Parsing of E−mail Addresses

Rewrite rules are the core of the sendmail.cf file. Rulesets are groups of associated rewrite rules that can be referenced by a number, or lately any alphanumeric combination. In the S n command syntax, n is the number that identifies the ruleset. Normally, numbers in the range of 0 to 99 are used, but there are no restrictions on ruleset numbering. Among all rulesets, ruleset 0 is the most important. However, each ruleset contributes to a successful address parsing and helps sendmail accomplish its basic task: to deliver e−mail.

20.3.1 Rewriting an E−mail Address

A thorough knowledge of rewrite rules is required for a full understanding of how an address parsing is accomplished; the following text should help with this topic. Each rewrite rule is defined by the R command. The syntax of the R command is: R lhs rhs comment where lhs Left−hand side, specifies the pattern to match the input address against. If the matching occurs, the specified rhs over the input address is performed. rhs Right−hand side, specifies the transformation the rules to transform input address if pattern matching occurs if lhs is true. comment This field contains comments referring to this entry; it is ignored by sendmail, but good comments are very important for understanding what is happening in the line.

20.3.2 Pattern Matching

The lhs matches the input address against the pattern, and if a match is found, rewrites the address in a new format using the rules defined in the rhs. A rule may process the same address several times because, after being rewritten, the address is again compared against the pattern. If it still matches, it is rewritten again. This cycle of pattern matching and rewriting continues until the address no longer matches the pattern. Macros, classes, literals, and special metasymbols provide the pattern matching. The macros, classes, and literals provide the values against which the input is compared, while the metasymbols define the rules used in matching the pattern. Some metasymbols used for pattern matching are: Metasymbol Meaning Match zero or more tokens + Match one or more tokens 486 =x Match any token in class x ~x Match any token not in class x x Match all tokens in macro x x Match any token in the NIS map named in macro x x Match any token not in the NIS map named in macro x y Match any token in the NIS hosts.byname map We see that all metasymbols request a match for some number of tokens. What is the token itself? A token is a string of characters in an e−mail address delimited by an operator; and the operators are the characters defined in the macro o in the sendmail.cf file. Operators are also counted as tokens when an e−mail address is parsed. Let us examine an e−mail address and its parsing. sendmail first tokenizes the address; for example: bjlpatsy.myschool.scps.edu = bjl, , patsy, ., myschool, ., scps, ., edu This e−mail address contains nine tokens and they are stored internally in a buffer called workspace. When the lhs of a rule is evaluated, a corresponding pattern is also tokenized, and then those tokens are compared to the tokens in the workspace. If both the workspace and the lhs contain the same tokens, a match is found, and the lhs comparison is true. Assume the pattern −+ in the lhs; after tokenizing it: −+ = −, , + The previous address matches the pattern because: It has exactly one token before the literal, so it matches the requirement of the − metasymbol. • It has an symbol that matches the patterns literal . • It has one or more tokens after the literal, so it matches the requirement of the + metasymbol. • When an address matches a pattern, the corresponding strings from the address that match the metasymbols are assigned to indefinite tokens because they may contain more than one token value. The indefinite tokens are identified numerically according to their relative position in the pattern of the metasymbol that they matched. This means that the indefinite token produced by the match of the first metasymbol is called 1; the match of the second metasymbol is called 2; the third is 3, and so on. The indefinite tokens created by the pattern matching can then be referenced by their new names: 1, 2, 3, etc. From the previous example: 1 = bjl It contains seven tokens. 487

20.3.3 Address Transformation