knowledge guide: Message Formats

RFC 822

* Messages consist of a primitive envelope, some number of header fields, a blank line, and then the message body. Each header field consists of a single line of ASCII text containing the field name, a colon, and, for most fields, a value.

* RFC 822 was designed decades ago and does not clearly distinguish the envelope fields from the header fields. Although it was revised in RFC 2822, completely redoing it was not possible due to its widespread usage.

* In normal usage, the user agent builds a message and passes it to the message transfer agent, which then uses some of the header fields to construct the actual envelope, a somewhat old-fashioned mixing of message and envelope.

Figure1. RFC 822 header fields related to message transport.

* The next two fields, From: and Sender:, tell who wrote and sent the message, respectively. These need not be the same. For example, a business executive may write a message, but her secretary may be the one who actually transmits it.

* In this case, the executive would be listed in the From: field and the secretary in the Sender: field. The From: field is required, but the Sender: field may be omitted if it is the same as the From: field. These fields are needed in case the message is undeliverable and must be returned to the sender.

* A line containing Received: is added by each message transfer agent along the way. The line contains the agent's identity, the date and time the message was received, and other information that can be used for finding bugs in the routing system.

* The Return-Path: field is added by the final message transfer agent and was intended to tell how to get back to the sender. In theory, this information can be gathered from all the Received: headers, but it is rarely filled in as such and typically just contains the sender's address.

Figure2. Some fields used in the RFC 822 message header.

* The Reply-To: field is sometimes used when neither the person composing the message nor the person sending the message wants to see the reply. For example, a marketing manager writes an e-mail message telling customers about a new product.

* The message is sent by a secretary, but the Reply-To: field lists the head of the sales department, who can answer questions and take orders. This field is also useful when the sender has two e-mail accounts and wants the reply to go to the other one.

* The RFC 822 document explicitly says that users are allowed to invent new headers for their own private use, provided that these headers start with the string X-. It is guaranteed that no future headers will use names starting with X-, to avoid conflicts between official and private headers.

MIME—The Multipurpose Internet Mail Extensions

* In the early days of the ARPANET, e-mail consisted exclusively of text messages written in English and expressed in ASCII. For this environment, RFC 822 did the job completely: it specified the headers but left the content entirely up to the users.

* Nowadays, on the worldwide Internet, this approach is no longer adequate. The problems include sending and receiving.

1. Messages in languages with accents (e.g., French and German).

2. Messages in non-Latin alphabets (e.g., Hebrew and Russian).

3. Messages in languages without alphabets (e.g., Chinese and Japanese).

4. Messages not containing text at all (e.g., audio or images).

* A solution was proposed in RFC 1341 and updated in RFCs 2045–2049. This solution, called MIME (Multipurpose Internet Mail Extensions) is now widely used. The basic idea of MIME is to continue to use the RFC 822 format, but to add structure to the message body and define encoding rules for non-ASCII messages.

* By not deviating from RFC 822, MIME messages can be sent using the existing mail programs and protocols. All that has to be changed are the sending and receiving programs, which users can do for themselves.

Figure3. RFC 822 headers added by MIME.

* The Content-Description: header is an ASCII string telling what is in the message. This header is needed so the recipient will know whether it is worth decoding and reading the message. The Content-Id: header identifies the content. It uses the same format as the standard Message-Id: header.

* The Content-Transfer-Encoding: tells how the body is wrapped for transmission through a network that may object to most characters other than letters, numbers, and punctuation marks. Five schemes are provided. The simplest scheme is just ASCII text. ASCII characters use 7 bits and can be carried directly by the e-mail protocol provided that no line exceeds 1000 characters.

* The correct way to encode binary messages is to use base64 encoding, sometimes called ASCII armor. In this scheme, groups of 24 bits are broken up into four 6-bit units, with each unit being sent as a legal ASCII character. The coding is ''A'' for 0, ''B'' for 1, and so on, followed by the 26 lower-case letters, the ten digits, and finally + and / for 62 and 63, respectively.

* The == and = sequences indicate that the last group contained only 8 or 16 bits, respectively. Carriage returns and line feeds are ignored, so they can be inserted at will to keep the lines short enough. Arbitrary binary text can be sent safely using this scheme.

* For messages that are almost entirely ASCII but with a few non-ASCII characters, base64 encoding is somewhat inefficient. Instead, an encoding known as quoted-printable encoding is used. This is just 7-bit ASCII, with all the characters above 127 encoded as an equal sign followed by the character's value as two hexadecimal digits.

* The subtype must be given explicitly in the header; no defaults are provided. The initial list of types and subtypes specified in RFC 2045 is given in Fig. 5-12. Many new ones have been added since then, and additional entries are being added all the time as the need arises.

Figure4. The MIME types and subtypes defined in RFC 2045.

* Let us now go briefly through the list of types. The text type is for straight ASCII text. The text/plain combination is for ordinary messages that can be displayed as received, with no encoding and no further processing. This option allows ordinary messages to be transported in MIME with only a few extra headers.

* The text/enriched subtype allows a simple markup language to be included in the text. This language provides a system-independent way to express boldface, italics, smaller and larger point sizes, indentation, justification, sub- and superscripting, and simple page layout.

* The markup language is based on SGML, the Standard Generalized Markup Language also used as the basis for the World Wide Web's HTML. For example, the message

The <bold> time </bold> has come the <italic> walrus </italic> said ...

would be displayed as The time has come the walrus said ...

* The next MIME type is image, which is used to transmit still pictures. Many formats are widely used for storing and transmitting images nowadays, both with and without compression. Two of these, GIF and JPEG, are built into nearly all browsers, but many others exist as well and have been added to the original list.

* The audio and video types are for sound and moving pictures, respectively. Please note that video includes only the visual information, not the soundtrack. If a movie with sound is to be transmitted, the video and audio portions may have to be transmitted separately, depending on the encoding system used.

* The first video format defined was the one devised by the modestly-named Moving Picture Experts Group (MPEG), but others have been added since. In addition to audio/basic, a new audio type, audio/mpeg was added in RFC 3003 to allow people to e-mail MP3 audio files.

* The application type is a catchall for formats that require external processing not covered by one of the other types. An octet-stream is just a sequence of uninterpreted bytes. Upon receiving such a stream, a user agent should probably display it by suggesting to the user that it be copied to a file and prompting for a file name.

* The other defined subtype is postscript, which refers to the PostScript language defined by Adobe Systems and widely used for describing printed pages. Many printers have built-in The partial subtype makes it possible to break an encapsulated message into pieces and send them separately.

* Instead of including the MPEG file in the message, an FTP address is given and the receiver's user agent can fetch it over the network at the time it is needed. This facility is especially useful when sending a movie to a mailing list of people, only few are expected to view it.

* The final type is multipart, which allows a message to contain more than one part, with the beginning and end of each part being clearly delimited. The mixed subtype allows each part to be different, with no additional structure imposed. Many e-mail programs allow the user to provide one or more attachments to a text message using the multipart type.

Figure 5. A multipart message containing enriched and audio alternatives.

* Note that the Content-Type header occurs in three positions within this example. At the top level, it indicates that the message has multiple parts. Within each part, it gives the type and subtype of that part.

* Finally, within the body of the second part, it is required to tell the user agent what kind of an external file it is to fetch. To indicate this slight difference in usage, we have used lower case letters here, although all headers are case insensitive.

* The content- transfer-encoding is similarly required for any external body that is not encoded as 7-bit ASCII. Getting back to the subtypes for multipart messages, two more possibilities exist. The parallel subtype is used when all parts must be ''viewed'' simultaneously.

* Finally, the digest subtype is used when many messages are packed together into a composite message. For example, some discussion groups on the Internet collect messages from subscribers and then send them out to the group as a single multipart/digest message.

knowledge guide

Saturday, 16 February 2013

Message Formats

1 comment:

Translate

Search This Blog