What Is a Mail Message?
How Is Mail Delivered?
How Does Mail Routing Work?
Electronic mail transport has been one of the most prominent uses of networking since the first networks were devised. Email started as a simple service that copied a file from one machine to another and appended it to the recipient's mailbox file. The concept remains the same, although an ever-growing net, with its complex routing requirements and its ever increasing load of messages, has made a more elaborate scheme necessary.
Various standards of mail exchange have been devised. Sites on the Internet adhere to one laid out in RFC-822, augmented by some RFCs that describe a machine-independent way of transferring just about anything, including graphics, sound files, and special characters sets, by email. CCITT has defined another standard, X.400. It is still used in some large corporate and government environments, but is progressively being retired.
Quite a number of mail transport programs have been implemented for Unix systems. One of the best known is sendmail, which was developed by Eric Allman at the University of California at Berkeley. Eric Allman now offers sendmail through a commercial venture, but the program remains free software. sendmail is supplied as the standard mail agent in some Linux distributions. We describe sendmail configuration in Chapter 18, Sendmail.
Linux also uses Exim, written by Philip Hazel of the University of Cambridge. We describe Exim configuration in Chapter 19, Getting Exim Up and Running.
Compared to sendmail, Exim is rather young. For the vast bulk of sites with email requirements, their capabilities are pretty close.
Both Exim and sendmail support a set of configuration files that have to be customized for your system. Apart from the information that is required to make the mail subsystem run (such as the local hostname), there are many parameters that may be tuned. sendmail's main configuration file is very hard to understand at first. It looks as if your cat has taken a nap on your keyboard with the shift key pressed. Exim configuration files are more structured and easier to understand than sendmail's. Exim, however, does not provide direct support for UUCP and handles only domain addresses. Today that isn't as big a limitation as it once might have been; most sites stay within Exim's limitations. However, for most sites, the work required in setting up either of them is roughly the same.
In this chapter, we deal with what email is and what issues administrators have to deal with. Chapter 18 and Chapter 19 provide instructions on setting up sendmail and Exim and for the first time. The included information should help smaller sites become operational, but there are several more options and you can spend many happy hours in front of your computer configuring the fanciest features.
Toward the end of this chapter we briefly cover setting up elm, a very common mail user agent on many Unix-like systems, including Linux.
For more information about issues specific to electronic mail on Linux, please refer to the Electronic Mail HOWTO by Guylhem Aznar, which is posted to comp.os.linux.answers regularly. The source distributions of elm, Exim, and sendmail also contain extensive documentation that should answer most questions on setting them up, and we provide references to this documentation in their respective chapters. If you need general information on email, a number of RFCs deal with this topic. They are listed in the bibliography at the end of the book.
A mail message generally consists of a message body, which is the text of the message, and special administrative data specifying recipients, transport medium, etc., like what you see when you look at a physical letter's envelope.
This administrative data falls into two categories. In the first category is any data that is specific to the transport medium, like the address of sender and recipient. It is therefore called the envelope. It may be transformed by the transport software as the message is passed along.
The second variety is any data necessary for handling the mail message, which is not particular to any transport mechanism, such as the message's subject line, a list of all recipients, and the date the message was sent. In many networks, it has become standard to prepend this data to the mail message, forming the so-called mail header. It is offset from the mail body by an empty line.
Most mail transport software in the Unix world use a header format outlined in RFC-822. Its original purpose was to specify a standard for use on the ARPANET, but since it was designed to be independent from any environment, it has been easily adapted to other networks, including many UUCP-based networks.
RFC-822 is only the lowest common denominator, however. More recent standards have been conceived to cope with growing needs such as data encryption, international character set support, and MIME (Multipurpose Internet Mail Extensions, described in RFC-1341 and other RFCs).
In all these standards, the header consists of several lines separated by an end-of-line sequence. A line is made up of a field name, beginning in column one, and the field itself, offset by a colon and white space. The format and semantics of each field vary depending on the field name. A header field can be continued across a newline if the next line begins with a whitespace character such as tab. Fields can appear in any order.
A typical mail header may look like this:
Return-Path: <firstname.lastname@example.org> Received: ursa.cus.cam.ac.uk (email@example.com [184.108.40.206]) by al.animats.net (8.9.3/8.9.3/Debian 8.9.3-6) with ESMTP id WAA04654 for <firstname.lastname@example.org>; Sun, 30 Jan 2000 22:30:01 +1100 Received: from ph10 (helo=localhost) by ursa.cus.cam.ac.uk with local-smtp (Exim 3.13 #1) id 12EsYC-0001eF-00; Sun, 30 Jan 2000 11:29:52 +0000 Date: Sun, 30 Jan 2000 11:29:52 +0000 (GMT) From: Philip Hazel <email@example.com> Reply-To: Philip Hazel <firstname.lastname@example.org> To: Terry Dawson <email@example.com>, Andy Oram <firstname.lastname@example.org> Subject: Electronic mail chapter In-Reply-To: <38921283.A58948F2@animats.net> Message-ID: <Pine.SOL.3.96.1000130111515.5800Aemail@example.com>
Usually, all necessary header fields are generated by the mailer interface you use, like elm, pine, mush, or mailx. However, some are optional and may be added by the user. elm, for example, allows you to edit part of the message header. Others are added by the mail transport software. If you look into a local mailbox file, you may see each mail message preceded by a "From" line (note: no colon). This is not an RFC-822 header; it has been inserted by your mail software as a convenience to programs reading the mailbox. To avoid potential trouble with lines in the message body that also begin with "From," it has become standard procedure to escape any such occurrence by preceding it with a > character.
This list is a collection of common header fields and their meanings:
This contains the sender's email address and possibly the "real name." A complete zoo of formats is used here.
This is a list of recipient email addresses. Multiple recipient addresses are separated by a comma.
This is a list of email addresses that will receive "carbon copies" of the message. Multiple recipient addresses are separated by a comma.
This is a list of email addresses that will receive "carbon copies" of the message. The key difference between a "Cc:" and a "Bcc:" is that the addresses listed in a "Bcc:" will not appear in the header of the mail messages delivered to any recipient. It's a way of alerting recipients that you've sent copies of the message to other people without telling them who those others are. Multiple recipient addresses are separated by a comma.
Describes the content of the mail in a few words.
Supplies the date and time the mail was sent.
Specifies the address the sender wants the recipient's reply directed to. This may be useful if you have several accounts, but want to receive the bulk of mail only on the one you use most frequently. This field is optional.
The organization that owns the machine from which the mail originates. If your machine is owned by you privately, either leave this out, or insert "private" or some complete nonsense. This field is not described by any RFC and is completely optional. Some mail programs support it directly, many don't.
A string generated by the mail transport on the originating system. It uniquely identifies this message.
Every site that processes your mail (including the machines of sender and recipient) inserts such a field into the header, giving its site name, a message ID, time and date it received the message, which site it is from, and which transport software was used. These lines allow you to trace which route the message took, and you can complain to the person responsible if something went wrong.
No mail-related programs should complain about any header that starts with
X-. It is used to implement additional features that have not yet made it into an RFC, or never will. For example, there was once a very large Linux mailing list server that allowed you to specify which channel you wanted the mail to go to by adding the string
followed by the channel name.
Generally, you will compose mail using a mailer interface like mail or mailx, or more sophisticated ones like mutt, tkrat, or pine. These programs are called mail user agents, or MUAs. If you send a mail message, the interface program will in most cases hand it to another program for delivery. This is called the mail transport agent, or MTA. On most systems the same MTA is used for both local and remote delivery and is usually invoked as /usr/sbin/sendmail, or on non-FSSTND compliant systems as /usr/lib/sendmail. On UUCP systems it is not uncommon to see mail delivery handled by two separate programs: rmail for remote mail delivery and lmail for local mail delivery.
Local delivery of mail is, of course, more than just appending the incoming message to the recipient's mailbox. Usually, the local MTA understands aliasing (setting up local recipient addresses pointing to other addresses) and forwarding (redirecting a user's mail to some other destination). Also, messages that cannot be delivered must usually be bounced, that is, returned to the sender along with some error message.
For remote delivery, the transport software used depends on the nature of the link. Mail delivered over a network using TCP/IP commonly uses Simple Mail Transfer Protocol (SMTP), which is described in RFC-821. SMTP was designed to deliver mail directly to a recipient's machine, negotiating the message transfer with the remote side's SMTP daemon. Today it is common practice for organizations to establish special hosts that accept all mail for recipients in the organization and for that host to manage appropriate delivery to the intended recipient.
Mail is usually not delivered directly in UUCP networks, but rather is forwarded to the destination host by a number of intermediate systems. To send a message over a UUCP link, the sending MTA usually executes rmail on the forwarding system using uux, and feeds it the message on standard input.
Since uux is invoked for each message separately, it may produce a considerable workload on a major mail hub, as well as clutter the UUCP spool queues with hundreds of small files taking up a disproportionate amount of disk space. Some MTAs therefore allow you to collect several messages for a remote system in a single batch file. The batch file contains the SMTP commands that the local host would normally issue if a direct SMTP connection were used. This is called BSMTP, or batched SMTP. The batch is then fed to the rsmtp or bsmtp program on the remote system, which processes the input almost as if a normal SMTP connection has occurred.
Email addresses are made up of at least two parts. One part is the name of a mail domain that will ultimately translate to either the recipient's host or some host that accepts mail on behalf of the recipient. The other part is some form of unique user identification that may be the login name of that user, the real name of that user in "Firstname.Lastname" format, or an arbitrary alias that will be translated into a user or list of users. Other mail addressing schemes, like X.400, use a more general set of "attributes" that are used to look up the recipient's host in an X.500 directory server.
How email addresses are interpreted depends greatly on what type of network you use. We'll concentrate on how TCP/IP and UUCP networks interpret email addresses.
Internet sites adhere to the RFC-822 standard, which requires the familiar notation of firstname.lastname@example.org, for which host.domain is the host's fully qualified domain name. The character separating the two is properly called a "commercial at" sign, but it helps if you read it as "at." This notation does not specify a route to the destination host. Routing of the mail message is left to the mechanisms we'll describe shortly.
You will see a lot of RFC-822 if you run an Internet connected site. Its use extends not only to mail, but has also spilled over into other services, such as news. We discuss how RFC-822 is used for news in Chapter 20, Netnews.
In the original UUCP environment, the prevalent form was path!host!user, for which path described a sequence of hosts the message had to travel through before reaching the destination host. This construct is called the bang path notation, because an exclamation mark is colloquially called a "bang." Today, many UUCP-based networks have adopted RFC-822 and understand domain-based addresses.
Other networks have still different means of addressing. DECnet-based networks, for example, use two colons as an address separator, yielding an address of host::user. The X.400 standard uses an entirely different scheme, describing a recipient by a set of attribute-value pairs, like country and organization.
Lastly, on FidoNet, each user is identified by a code like 2:320/204.9, consisting of four numbers denoting zone (2 is for Europe), net (320 being Paris and Banlieue), node (the local hub), and point (the individual user's PC). Fidonet addresses can be mapped to RFC-822; the above, for example, would be written as Thomas.Quinot@p9.f204.n320.z2.fidonet.org. Now didn't we say domain names were easy to remember?
It is inevitable that when you bring together a number of different systems and a number of clever people, they will seek ways to interconnect the differing systems so they are capable of internetworking. Consequently, there are a number of different mail gateways that are able to link two different email systems together so that mail may be forwarded from one to another. Addressing is the critical question when linking two systems. We won't look at the gateways themselves in any detail, but let's take a look at some of the addressing complications that may arise when gateways of this sort are used.
Consider mixing the UUCP style bang-path notation and RFC-822. These two types of addressing don't mix too well. Assume there is an address of domainA!user@domainB. It is not clear whether the @ sign takes precedence over the path, or vice versa: do we have to send the message to domainB, which mails it to domainA!user, or should it be sent to domainA, which forwards it to user@domainB?
Addresses that mix different types of address operators are called hybrid addresses. The most common type, which we just illustrated, is usually resolved by giving the @ sign precedence over the path. In domainA!user@domainB, this means sending the message to domainB first.
However, there is a way to specify routes in RFC-822 conformant ways: <@domainA,@domainB:user@domainC> denotes the address of user on domainC, where domainC is to be reached through domainA and domainB (in that order). This type of address is frequently called a source routed address. It's not a good idea to rely on this behavior, as revisions to the RFCs describing mail routing recommend that source routing in a mail address be ignored and instead an attempt should be made to deliver directly to the remote destination.
Then there is the % address operator: user%domainB@domainA is first sent to domainA, which expands the rightmost (in this case, the only) percent sign to an @ sign. The address is now user@domainB, and the mailer happily forwards your message to domainB, which delivers it to user. This type of address is sometimes referred to as "Ye Olde ARPAnet Kludge," and its use is discouraged.
There are some implications to using these different types of addressing that will be described throughout the following sections. In an RFC-822 environment, you should avoid using anything other than absolute addresses, such as email@example.com.
The process of directing a message to the recipient's host is called routing. Apart from finding a path from the sending site to the destination, it involves error checking and may involve speed and cost optimization.
There is a big difference between the way a UUCP site handles routing and the way an Internet site does. On the Internet, the main job of directing data to the recipient host (once it is known by its IP address) is done by the IP networking layer, while in the UUCP zone, the route has to be supplied by the user or generated by the mail transfer agent.
On the Internet, the destination host's configuration determines whether any specific mail routing is performed. The default is to deliver the message to the destination by first determining what host the message should be sent to and then delivering it directly to that host. Most Internet sites want to direct all inbound mail to a highly available mail server that is capable of handling all this traffic and have it distribute the mail locally. To announce this service, the site publishes a so-called MX record for its local domain in its DNS database. MX stands for Mail Exchanger and basically states that the server host is willing to act as a mail forwarder for all mail addresses in the domain. MX records can also be used to handle traffic for hosts that are not connected to the Internet themselves, like UUCP networks or FidoNet hosts that must have their mail passed through a gateway.
MX records are always assigned a preference. This is a positive integer. If several mail exchangers exist for one host, the mail transport agent will try to transfer the message to the exchanger with the lowest preference value, and only if this fails will it try a host with a higher value. If the local host is itself a mail exchanger for the destination address, it is allowed to forward messages only to MX hosts with a lower preference than its own; this is a safe way of avoiding mail loops. If there is no MX record for a domain, or no MX records left that are suitable, the mail transport agent is permitted to see if the domain has an IP address associated with it and attempt delivery directly to that host.
Suppose that an organization, say Foobar, Inc., wants all its mail handled by its machine mailhub. It will then have MX records like this in the DNS database:
green.foobar.com. IN MX 5 mailhub.foobar.com.
This announces mailhub.foobar.com as a mail exchanger for green.foobar.com with a preference of 5. A host that wishes to deliver a message to firstname.lastname@example.org checks DNS and finds the MX record pointing at mailhub. If there's no MX with a preference smaller than 5, the message is delivered to mailhub, which then dispatches it to green.
This is a very simple description of how MX records work. For more information on mail routing on the Internet, refer to RFC-821, RFC-974, and RFC-1123.
Mail routing on UUCP networks is much more complicated than on the Internet because the transport software does not perform any routing itself. In earlier times, all mail had to be addressed using bang paths. Bang paths specified a list of hosts through which to forward the message, separated by exclamation marks and followed by the user's name. To address a letter to a user called Janet on a machine named moria, you would use the path eek!swim!moria!janet. This would send the mail from your host to eek, from there on to swim, and finally to moria.
The obvious drawback of this technique is that it requires you to remember much more about network topology, fast links, etc. than Internet routing requires. Much worse than that, changes in the network topology -- like links being deleted or hosts being removed -- may cause messages to fail simply because you aren't aware of the change. And finally, in case you move to a different place, you will most likely have to update all these routes.
One thing, however, that made the use of source routing necessary was the presence of ambiguous hostnames. For instance, assume there are two sites named moria, one in the U.S. and one in France. Which site does moria!janet refer to now? This can be made clear by specifying what path to reach moria through.
The first step in disambiguating hostnames was the founding of the UUCP Mapping Project. It is located at Rutgers University and registers all official UUCP hostnames, along with information on their UUCP neighbors and their geographic location, making sure no hostname is used twice. The information gathered by the Mapping Project is published as the Usenet Maps, which are distributed regularly through Usenet. A typical system entry in a map (after removing the comments) looks like this:
moria bert(DAILY/2), swim(WEEKLY)
This entry says moria has a link to bert, which it calls twice a day, and swim, which it calls weekly. We will return to the map file format in more detail later.
Using the connectivity information provided in the maps, you can automatically generate the full paths from your host to any destination site. This information is usually stored in the paths file, also called the pathalias database. Assume the maps state that you can reach bert through ernie; a pathalias entry for moria generated from the previous map snippet may then look like this:
If you now give a destination address of email@example.com, your MTA will pick the route shown above and send the message to ernie with an envelope address of bert!moria!janet.
Building a paths file from the full Usenet maps is not a very good idea, however. The information provided in them is usually rather distorted and occasionally out of date. Therefore, only a number of major hosts use the complete UUCP world maps to build their paths files. Most sites maintain routing information only for sites in their neighborhood and send any mail to sites they don't find in their databases to a smarter host with more complete routing information. This scheme is called smart-host routing. Hosts that have only one UUCP mail link (so-called leaf sites) don't do any routing of their own; they rely entirely on their smart host.
The best cure for the problems of mail routing in UUCP networks so far is the adoption of the domain name system in UUCP networks. Of course, you can't query a name server over UUCP. Nevertheless, many UUCP sites have formed small domains that coordinate their routing internally. In the maps, these domains announce one or two hosts as their mail gateways so that there doesn't have to be a map entry for each host in the domain. The gateways handle all mail that flows into and out of the domain. The routing scheme inside the domain is completely invisible to the outside world.
This works very well with the smart-host routing scheme. Global routing information is maintained by the gateways only; minor hosts within a domain get along with only a small, handwritten paths file that lists the routes inside their domain and the route to the mail hub. Even the mail gateways do not need routing information for every single UUCP host in the world anymore. Besides the complete routing information for the domain they serve, they only need to have routes to entire domains in their databases now. For instance, this pathalias entry will route all mail for sites in the sub.org domain to smurf:
Mail addressed to firstname.lastname@example.org will be sent to swim with an envelope address of smurf!jones!claire.
The hierarchical organization of the domain namespace allows mail servers to mix more specific routes with less specific ones. For instance, a system in France may have specific routes for subdomains of fr, but route any mail for hosts in the us domain toward some system in the U.S. In this way, domain-based routing (as this technique is called) greatly reduces the size of routing databases, as well as the administrative overhead needed.
The main benefit of using domain names in a UUCP environment, however, is that compliance with RFC-822 permits easy gatewaying between UUCP networks and the Internet. Many UUCP domains nowadays have a link with an Internet gateway that acts as their smart host. Sending messages across the Internet is faster, and routing information is much more reliable because Internet hosts can use DNS instead of the Usenet Maps.
In order to be reachable from the Internet, UUCP-based domains usually have their Internet gateway announce an MX record for them (MX records were described previously in the section "Mail Routing on the Internet"). For instance, assume that moria belongs to the orcnet.org domain. gcc2.groucho.edu acts as its Internet gateway. moria would therefore use gcc2 as its smart host so that all mail for foreign domains is delivered across the Internet. On the other hand, gcc2 would announce an MX record for *.orcnet.org and deliver all incoming mail for orcnet sites to moria. The asterisk in *.orcnet.org is a wildcard that matches all hosts in that domain that don't have any other record associated with them. This should normally be the case for UUCP-only domains.
The only remaining problem is that the UUCP transport programs can't deal with fully qualified domain names. Most UUCP suites were designed to cope with site names of up to eight characters, some even less, and using nonalphanumeric characters such as dots is completely out of the question for most.
Therefore, we need mapping between RFC-822 names and UUCP hostnames. This mapping is completely implementation-dependent. One common way of mapping FQDNs to UUCP names is to use the pathalias file:
This will produce a pure UUCP-style bang path from an address that specifies a fully qualified domain name. Some mailers provide a special file for this; sendmail, for instance, uses the uucpxtable.
The reverse transformation (colloquially called domainizing) is sometimes required when sending mail from a UUCP network to the Internet. As long as the mail sender uses the fully qualified domain name in the destination address, this problem can be avoided by not removing the domain name from the envelope address when forwarding the message to the smart host. However, there are still some UUCP sites that are not part of any domain. They are usually domainized by appending the pseudo-domain uucp.
The pathalias database provides the main routing information in UUCP-based networks. A typical entry looks like this (site name and path are separated by tabs):
moria.orcnet.org ernie!bert!moria!%s moria ernie!bert!moria!%s
This makes any message to moria be delivered via ernie and bert. Both moria's fully qualified name and its UUCP name have to be given if the mailer does not have a separate way to map between these namespaces.
If you want to direct all messages to hosts inside a domain to its mail relay, you may also specify a path in the pathalias database, giving the domain name preceded by a dot as the target. For example, if all hosts in sub.org can be reached through swim!smurf, the pathalias entry might look like this:
Writing a pathalias file is acceptable only when you are running a site that does not have to do much routing. If you have to do routing for a large number of hosts, a better way is to use the pathalias command to create the file from map files. Maps can be maintained much more easily, because you may simply add or remove a system by editing the system's map entry and recreating the map file. Although the maps published by the Usenet Mapping Project aren't used for routing very much anymore, smaller UUCP networks may provide routing information in their own set of maps.
A map file mainly consists of a list of sites that each system polls or is polled by. The system name begins in the first column and is followed by a comma-separated list of links. The list may be continued across newlines if the next line begins with a tab. Each link consists of the name of the site followed by a cost given in brackets. The cost is an arithmetic expression made up of numbers and symbolic expressions like DAILY or WEEKLY. Lines beginning with a hash sign are ignored.
As an example, consider moria, which polls swim.twobirds.com twice a day and bert.sesame.com once per week. The link to bert uses a slow 2,400 bps modem. moria would publish the following maps entry:
moria.orcnet.org bert.sesame.com(DAILY/2), swim.twobirds.com(WEEKLY+LOW) moria.orcnet.org = moria
The last line makes moria known under its UUCP name, as well. Note that its cost must be specified as DAILY/2 because calling twice a day actually halves the cost for this link.
Using the information from such map files, pathalias is able to calculate optimal routes to any destination site listed in the paths file and produce a pathalias database from this which can then be used for routing to these sites.
pathalias provides a couple of other features like site-hiding (i.e., making sites accessible only through a gateway). See the pathalias manual page for details and a complete list of link costs.
Comments in the map file generally contain additional information on the sites described in it. There is a rigid format in which to specify this information so that it can be retrieved from the maps. For instance, a program called uuwho uses a database created from the map files to display this information in a nicely formatted way. When you register your site with an organization that distributes map files to its members, you generally have to fill out such a map entry. Below is a sample map entry (in fact, it's the one for Olaf's site):
#N monad, monad.swb.de, monad.swb.sub.org #S AT 486DX50; Linux 0.99 #O private #C Olaf Kirch #E email@example.com #P Kattreinstr. 38, D-64295 Darmstadt, FRG #L 49 52 03 N / 08 38 40 E #U brewhq #W firstname.lastname@example.org (Olaf Kirch); Sun Jul 25 16:59:32 MET DST 1993 # monad brewhq(DAILY/2) # Domains monad = monad.swb.de monad = monad.swb.sub.org
The whitespace after the first two characters is a tab. The meaning of most of the fields is pretty obvious; you will receive a detailed description from whichever domain you register with. The L field is the most fun to find out: it gives your geographical position in latitude/longitude and is used to draw the PostScript maps that show all sites for each country, as well as worldwide.
elm stands for "electronic mail" and is one of the more reasonably named Unix tools. It provides a full-screen interface with a good help feature. We won't discuss how to use elm here, but only dwell on its configuration options.
Theoretically, you can run elm unconfigured, and everything works well -- if you are lucky. But there are a few options that must be set, although they are required only on occasion.
When it starts, elm reads a set of configuration variables from the elm.rc file in /etc/elm. Then it attempts to read the file .elm/elmrc in your home directory. You don't usually write this file yourself. It is created when you choose "Save new options" from elm's options menu.
The set of options for the private elmrc file is also available in the global elm.rc file. Most settings in your private elmrc file override those of the global file.
In the global elm.rc file, you must set the options that pertain to your host's name. For example, at the Virtual Brewery, the file for vlager contains the following:
# # The local hostname hostname = vlager # # Domain name hostdomain = .vbrew.com # # Fully qualified domain name hostfullname = vlager.vbrew.com
These options set elm's idea of the local hostname. Although this information is rarely used, you should set the options. Note that these particular options only take effect when giving them in the global configuration file; when found in your private elmrc, they will be ignored.
A set of standards and RFCs have been developed that amend the RFC-822 standard to support various types of messages, such as plain text, binary data, PostScript files, etc. These standards are commonly referred to as MIME, or Multipurpose Internet Mail Extensions. Among other things, MIME also lets the recipient know if a character set other than standard ASCII has been used when writing the message, for example, using French accents or German umlauts. elm supports these characters to some extent.
The character set used by Linux internally to represent characters is usually referred to as ISO-8859-1, which is the name of the standard it conforms to. It is also known as Latin-1. Any message using characters from this character set should have the following line in its header:
Content-Type: text/plain; charset=iso-8859-1
The receiving system should recognize this field and take appropriate measures when displaying the message. The default for text/plain messages is a charset value of us-ascii.
To be able to display messages with character sets other than ASCII, elm must know how to print these characters. By default, when elm receives a message with a charset field other than us-ascii (or a content type other than text/plain, for that matter), it tries to display the message using a command called metamail. Messages that require metamail to be displayed are shown with an M in the very first column in the overview screen.
Since Linux's native character set is ISO-8859-1, calling metamail is not necessary to display messages using this character set. If elm is told that the display understands ISO-8859-1, it will not use metamail, but will display the message directly instead. This can be enabled by setting the following option in the global elm.rc:
displaycharset = iso-8859-1
Note that you should set this option even when you are never going to send or receive any messages that actually contain characters other than ASCII. This is because people who do send such messages usually configure their mailer to put the proper
Content-Type: field into the mail header by default, whether or not they are sending ASCII-only messages.
However, setting this option in elm.rc is not enough. When displaying the message with its built-in pager, elm calls a library function for each character to determine whether it is printable. By default, this function will only recognize ASCII characters as printable and display all other characters as
^?. You may overcome this function by setting the environment variable LC_CTYPE to ISO-8859-1, which tells the library to accept Latin-1 characters as printable. Support for this and other features have been available since Version 4.5.8 of the Linux standard library.
When sending messages that contain special characters from ISO-8859-1, you should make sure to set two more variables in the elm.rc file:
charset = iso-8859-1 textencoding = 8bit
This makes elm report the character set as ISO-8859-1 in the mail header and send it as an 8-bit value (the default is to strip all characters to 7-bit).
Of course, all character set options we've discussed here may also be set in the private elmrc file instead of the global one so individual users can have their own default settings if the global one doesn't suit them.