17. The Hub's Complex Rules

Contents:
Rule Set 3
Rule Set 96
Rule Set 0
Rule Set 4
Things to Try

In this chapter we look at some of the rules that are needed to make a hub function. Until now we have focused on the client form of the configuration file. Since the role of the client is narrow (to forward all mail to the hub), its configuration file is simple. But a hub can be a very busy machine, receiving and sending mail for many client machines, and because its role is broad, its configuration file is complex.

Fundamentally, all configuration files, simple and complex, tend to look pretty much the same. Both begin by selecting delivery agents using rule set 3 and 0. Both then process recipient or sender addresses with rule sets 3, 1 or 2, R= or S=, then 4, but the hub's rules are more complex:

The hub needs to recognize more than simple Internet-style addresses. It may need to handle UUCP-style addresses or reverse-style addresses such as those used in parts of the United Kingdom. It needs rules to convert all such addresses into a form that it can understand.
The hub needs not only to forward mail (like the client), but also to deliver it to the mail spool directory, to pipe through programs, and to form mailing lists.
The hub needs to handle all error conditions gracefully and to emit helpful and clear error messages.
The hub needs to know how to connect to many different kinds of machines worldwide.

In this chapter we explore high points of the V8 configuration files. Along the way, we also mix in rules contributed by others to help illustrate difficult concepts.

17.1 Rule Set 3

Recall that all addresses are first processed by rule set 3. Its job is to find an address among other clutter and to normalize all addresses into a form that other rules can recognize.

17.1.1 Find the Address

Recall that addresses can legally assume two forms:

address (comment)
comment <address>

In the first form, sendmail strips (and saves) the parenthesized comment, then gives the naked address to rule set 3. In the second form, sendmail passes the entire address, angle brackets and all, to rule set 3.

The rules to strip the angle brackets look like this: [1]

[1] These ingenious rules were designed by LeRoy Eide, with surrounding commentary inspired by John Halleck.

S3
R$*         $: <$1>      Guarantee at least one <> pair
R$+ <$*>       <$2>      Remove everything before the last <
R<$*> $+    $: <$1>      Remove everything after the first >
R<>         $@ <@>       Null address to @
R<$*>       $:  $1       Strip remaining <>

In the following, we discuss each of these rules individually.

17.1.1.1 At least one <> pair

To find the address in addresses of the form

comment <address>

we must use rules to search for the < and > characters. Designing rules that do this is easier if we can be sure that every address has at least one surrounding angle bracket pair:

R$*        $: <$1>      Guarantee at least one <> pair

This rule places angle brackets around all addresses, even those that already have them. Note that the $: that prefixes the RHS causes it to be executed only once.

A side benefit of this rule is that it also surrounds an empty (null) address with angle brackets. This allows old versions of sendmail to detect null addresses without needing to use the new (beginning with V8.7 sendmail) $@ LHS operator. We'll cover this in more detail soon.

17.1.1.2 Strip to left of <

A common problem is that of finding the address when it is deeply nested in many pairs of angle brackets. Consider an address like this:

<<<<address>>>>

Such addresses are not common but do appear every now and then as a result of overzealous users or MUAs. Another problem address looks like this:

comment <phone> <address>

Here, just noting the outermost pair of angle brackets is not sufficient because the rightmost pair contains the address.

The process of finding the rightmost innermost pair of angle brackets requires two rules:

R$+ <$*>       <$2>      Remove everything before the last <
R<$*> $+    $: <$1>      Remove everything after the first >

The first recursively discards everything (including angle brackets) to the left of the rightmost balanced < character. The second truncates to the correct address by discarding everything following the innermost remaining angle bracket pair.

The behavior of these two rules may not be obvious. To better understand them, first create a small configuration file (called x.cf) that includes the following two lines: [2]

[2] Note that when a configuration file lacks an S command (to declare a rule set), all rules become part of rule set 0.

R$+ <$*>       <$2>
R<$*> $+    $: <$1>

Then run sendmail in rule-testing mode with a command like this:

% /usr/lib/sendmail -Cx.cf -bt
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>

Enter a series of addresses, one at a time, to see how each is handled. Be as extreme as you want when nesting angle brackets:

> 0 <<<<<a>>>>>
rewrite: ruleset  0   input: < < < < < a > > > > >
rewrite: ruleset  0 returns: < a >
> 0 <a> <b>
rewrite: ruleset  0   input: < a > < b >
rewrite: ruleset  0 returns: < b >
> 0 <<a> <b>>

rewrite: ruleset  0   input: < < a > < b > >
rewrite: ruleset  0 returns: < b >
>

If you want to see, step by step, how each rule works, run sendmail again, this time with the -d21.12 debugging switch (see Section 37.5.72, -d21.12). With that switch, the first example above will print like this:

> 0 <<<<<a>>>>>
rewrite: ruleset  0   input: < < < < < a > > > > >
---trying rule: $+ < $* >
---rule matches: < $2 >
rewritten as: < < < < a > > > > >
---trying rule: $+ < $* >
---rule matches: < $2 >
rewritten as: < < < a > > > > >
---trying rule: $+ < $* >
---rule matches: < $2 >
rewritten as: < < a > > > > >
---trying rule: $+ < $* >
---rule matches: < $2 >
rewritten as: < a > > > > >
---trying rule: $+ < $* >
--- rule fails
---trying rule: < $* > $+
---rule matches: $: < $1 >
rewritten as: < a >
rewrite: ruleset  0 returns: < a >

17.1.1.3 Handle null address

The fourth rule in rule set 3 is designed to convert a null - pty) address into the magic symbol @:

R<>        $@ <@>       Null address to @

The @ symbol is surrounded by angle brackets ("focused"). It needs to be focused because later rules expect all addresses to have the host part in this form. Still later, the angle brackets will be removed, and the @ will be discarded by rule set 4.

The $@ prefix to the RHS causes all further rules in rule set 3 to be skipped. The focused address <@> is returned. If <@> were to be handled by the next rule, its angle brackets would be stripped, and this is not what we desire.

17.1.1.4 Remove remaining angle brackets

The last of our five preliminary rules simply removes the angle brackets from whatever remains:

R<$*>      $:  $1       Strip remaining <>

17.1.2 Normalize the Address

The rules that we have just looked at isolate the address from other possible information and leave it in its initial form, not surrounded by angle brackets. The rest of the rules in rule set 3 are designed to highlight the host part of any address. They assume that all addresses are composed of a user and a host part.

17.1.2.1 A rule to handle List:;

RFC822 allows addresses of the form

name : address(s) ;

Here, name is the name of a mailing list that can contain multiple words and spaces, for example,

Undisclosed Recipients :;

The colon and semicolon are mandatory and may contain one or more addresses between them, which may themselves be lists. [3] Rule set 3 needs to check for the presence of an empty list (one with no addresses between the colon and semicolon). The following rule does just that and turns the empty list into the magic token <@>:

[3] Which tends to complicate the algorithm.

R$* :;      $@ $1 :; <@>       Handle empty List:;

17.1.3 Internet Addresses

After lists have been disposed of, domain-type addresses need to be handled. Domain type addresses are of the form user@host:

R$+ @ $+                $: $1 <@$2>                 Focus on host
R$+ < $+ @ $+ >            $1 $2 <@$3>              move gaze right
R$* < @ $* : $* > $*       $1 <@ $2$3> $4           strip colons
R$+ < @ $+ >            $@ $>96 $1<@$2>             localize and canonicalize

The first rule detects addresses of the form something@something and rewrites them in such a way that the second something becomes the focused host part.

The second rule handles addresses with multiple @ symbols (such as a@b@c). It recursively moves the focus to the rightmost host.

The third rule recursively removes any colons from the resulting host part as a "sanity check." This is necessary because strange forms of route addresses may have bypassed earlier rules (see the DontPruneRoutes option in Section 34.8.20, DontPruneRoutes (R), how route addresses are handled in rules in Section 29.4.3, "Handling Routing Addresses", and the F=d delivery agent flag in Section 30.8.16, F=d), or a colon may be left over from the mailertable feature (see Section 19.6.14, FEATURE(mailertable)).

The fourth rule passes any addresses that have been successfully focused to rule set 96 (which will be discussed in Section 17.2, "Rule Set 96") so that the local host can be detected and the host part canonicalized. The result from rule set 96 is returned.

17.1.4 UUCP Addresses

UUCP addresses contain one or more exclamation points (such as lady!sonya!george). They fall into two categories: those that are delivered locally by uux(8) and those that are forwarded to another host. The rules to handle them look like this:

R$- ! $+          $@ $>96 $2 <@ $1.UUCP>      host!user uucp
R$+ . $- ! $+     $@ $>96 $3 <@ $1.$2>        Domain style uucp
R$+ ! $+          $@ $>96 $2 <@ $1.UUCP>      Bang path uucp

The first rule looks for a single token hostname followed by an exclamation point. A single token host always becomes the next host in line for delivery. The .UUCP suffix added in the RHS allows rule set 0 to recognize this address as one requiring uux(8) delivery.

The second rule looks for a dot in the hostname part of the address. A dot indicates the new-style, domain-based hostname, such as host.domain!user. Such names are assumed to have MX records pointing to service providers and are rewritten into the normal user@host.domain form.

The third rule catches any remaining addresses with exclamation points in them. The host to the left of the leftmost exclamation point is taken as the next hop in the UUCP path for delivery. A .UUCP suffix is added to that host, just as in the first rule.

All three rules exit (the leading $@ in the RHS) after the address is normalized by rule set 96 (which leaves .UUCP suffixed addresses unchanged). They are then handed as is to rule set 0, which selects a delivery agent (usually uux(8)).

17.1.5 The % Hack

A common technique in mail debugging is to send mail to one host and have that host deliver it to another. Often, this is done by sending the mail something like:

% mail user%second@first

Here, the intention is send mail to first and from there to usr@second. This type of addressing is nonstandard. Essentially, it is route addressing with % characters substituted for @ characters. Enabling this behavior requires three rules:

R$*%$*             $1 @ $2                  Convert all % to @
R$*@$*@$*          $1 % $2 @ $3             Undo all but last @
R$*@$*          $@ $>96 $1 <@$2>            Focus on rightmost

Here, the first rule changes all the percent characters into @ characters. The intention is to focus on the rightmost host, whether it is prefixed with an % or an @. The second rule changes all but the rightmost @ back into percent characters even if they were originally @ characters. The last rule takes the result and focuses on the rightmost host, just as was done in the domain form of addressing above.