HTML - User's Guide
An introduction to the HTTP protocol

Contents:
1. OVERVIEW
2. MESSAGE FORMAT
2.1 Generic message
2.2 Request message
2.3 Response message
3. MESSAGE ELEMENTS
3.1 The request URI
3.2 Methods
3.3 Message headers
3.3.1 General headers
3.3.2 Request headers
3.3.3 Response headers
3.3.4 Entity headers

HTTP (HyperText Transfer Protocol) is a communication protocol used on the Internet to transfer documents such as HTML pages on the Internet.

1. OVERVIEW

The communication is effected through an exchange of request and response messages.
Request & Response
The communication initiates with a request sent by one of the partners, for some action on a resource accessible by the other partner. A response sent back by the other completes the proceeding.

In the HTTP terminology, the partner that sends the request is the client side of the communication. The one that sends back the response is the server. This does not imply any assumption as to the capabilities of the partners; they can switch role in the course of time.
HTTP communication Practically a communication in the Internet involves a user agent, usually a browser1 on the user side, and a Web server on the side of some remote site. Both can be in turn client and server in the sense defined by the HTTP documentation.

In HTML parlance, however, the side where the user is, with the HTML page, is called the client-side, the other side, where usally a Web Server is, the server-side. This is not quite consistent with what was just said, but usually no confusion is to be expected, because nobody is aware of the HTTP operations.
(1) The user agents used in this book to test the HTML code are the Internet Explorer and the Netscape Navigator

2. MESSAGE FORMAT

2.1 Generic message format

Requests and responses are messages.

HTTP messages are sequences of characters, in one of the character sets recommended by the ISO (International Standard Organization) such as ISO-10646, etc. No binary data is allowed.

A message is composed of lines that are sequences of characters terminated by a carriage return and a line feed character. This 2-character sequence is usually denoted by the acronym CRLF.

A message is composed of the following lines:

    start-line CRLF
    message-header CRLF
    ...
    message-header CRLF
    CRLF
    message-body
The start-line is:
- a request-line in a request
- a status-line in a response
They are discussed in the respective paragraphs, in the following.
The message header group is terminated by an empty line, that is a line with a CRLF alone, after which comes the message body. This contains what the message is intended to carry, in its orignal or encoded form. The original text to be transmitted is called the entity body. For security reason, the entity body can be encoded. This is called transfer coding. The result is the message body transported in the message. If there is no transfer coding, the message body is identical to the entity body.
Message headers in both request and response messages describe such message characteristics as the message-body length (the Content-Length header) or the method used for encoding the entity-body (the Transfer-Encoding header).
In a request message they tell the server whether to perform the requested action (for instance: return information only if the resource was modified since a certain date), or of the the acceptable characterics of the response message such as the permissible character set (the Accept-Charset header) or language (the Accept-Language header), etc.
Headers in a response message may convey a warning (the Warning header) or contain any indication useful to the requestor, for example, where to go for further information (the Location header).

2.2 Request format

A request message is composed of the following lines:
    request-line CRLF
    message-header CRLF
    ...
    message-header CRLF
    CRLF
    message-body
The request-line is composed of the following items, separated by one space:
- method, defines the action to be performed on the resource identified by the request_uri
- request_uri, identifies the resource upon which to apply the request -- in a very usual situation the request is addressed to a Web Server and the resource is an application to be scheduled.
- version, indicates the HTTP version to be used.
The general look of a request line is as follows:
method request_uri version CRLF
Methods will be dealt with in the 3.2 paragraph.
Message headers will be dealt with in the 3.3 paragraph.

An example of the request-line is:
GET applis/jsp/MyApps.jsp?first=John&last=blow&push=ENTER HTTP/1.1
In this:
- The method is GET
- The request URI is applis/jsp/MyApps.jsp?first=John&last=Blow&push=ENTER
It tells the Web Server to schedule the JSP application located in the file the path of which is applis/jsp/MyAppls.jsp and pass it the data : first=John, last=Blow, push=ENTER
- the version is HTTP/1.1

2.3 Response format

A response message is composed of the following lines:
    status-line CRLF
    message-header CRLF
    ...
    message-header CRLF
    CRLF
    message-body
The status line is composed of the following items, separated by one space:
- version, indicates the HTTP version being used.
- status_code, indicates how the requested operation was performed
- reason_phrase, comments on the status code, for example, to explains what the error was.
The general look of a request line is as follows:
version status_code reason_phrase CRLF
A status code is a 3-digit figure. The left-most digit identifying the status category of the request. There are 5 such categories. Here is how these categories and the corresponding codes are described in the HTTP specification:
1xxInformational Request received, continuing process
2xxSuccessThe action was successfully received, understood, and accepted
3xxRedirectionFurther action must be taken in order to complete the request
4xx Client ErrorThe request contains bad syntax or cannot be fulfilled
5xxServer ErrorThe server failed to fulfill an apparently valid request
For example, the '200' status signals that the processing of the request was satisfactorily completed, the '400' status signals a syntax error in the request URI. The '500' status signals that the specified resource was not found.

The reason phrase explains the meaning of the numeric code

MESSAGE ELEMENTS

3.1 The request URI

The request URI identifies the resource to process.
Usually, the request is sent by a user agent to a Web Server.
The resource can be an HTML page to send back. It can be an application to schedule. When the resource is a JSP page, it is a request to process the page and send back a result.

The resource is identified by the address part which is to the left of the question mark. This is the relative URI of the resource.

To the right of the question mark is the data string. This is present in the request URI when the method is 'GET'

Individual data items are separated by the ampersand (&) sign

In the request URI, there can be special characters. They are usually represented by their hexadecimal code, preceded by the % sign. For example, a space is considered a special character; it is represented by %20.

3.2 HTTP methods

The method tells the Web Server how to process the resource.

The methods are:
- GETrequests the information represented or produced by the resource identified by the request URI
- POSTrequests the server to post the entity body to the resource identified by the request URI
- OPTIONenquires about the options of the resource identified by the request URI, or of the server
- PUTrequests that the transmitted entity be stored under the identification supplied by the request URI -- the entity can be data, and the request URI a file name (complete with its path), the PUT method then requests the server to create the file with this name, to contain the transmitted entity
- HEADrequests a response with headers only (no response-body)
- DELETErequest the deletion of the resource identified by the request URI
- TRACErequest a loop back of the message from the server side
The methods used in HTML communications are GET and POST.

The GET method requests the server to retrieve and send back the information identified by the request URI. If this is an application, it is passed the data part (to the right of the question mark), and its result is to be returned to the client as an entity encoded into the body of the Response message. The result can be cached, that is saved with the request on the client side; if a subsequent request is identical to the saved request, the saved result is returned to the user from the cache, without going all the way to the server.

The POST method on the other hand posts the request entity body to the resource identified by the request URI, that is, as the referenced HTTP specification puts it, to accept this entity as a "new subordinate" of this resource. In the most frequent situation where the resource is an application, this means that the server schedules the application and passes it the entity as data to process, then returns the result to the client. No caching is allowed, that is the request is always sent all the way to the destination specified by the request URI.

In the case where the resource identified by the request URI is an application, the results returned to the user by a GET or the POST method are identical. The difference is:
- in how the data is placed in the request message:
. with the GET method, the data is contained in the request URI; there is a length limit of 256 bytes for the entire request URI, and no encoding is possible
. with the POST method, the data is enclosed in the message body; there is no size limit, and the data can be coded for security
- in the caching capability : the request results can be cached when using the GET method, they cannot with the POST method

3.3 Message headers

A message header field has the following syntax:
field-name:field-value CRLF
The field value can be composed of multiple items separated by one space.

Some headers can be used in both requests and responses. Others are only for requests whereas still others are only valid for response. One more category are the entity headers.

3.3.1 General headers

Some of the general headers are:
- Allowspecifies the list of methods applicable to the resource identified by the request URI. An example:
Allow: GET,POST
- Cache-Controlspecifies the caching directives to be obeyed by the caching mecanisms along the way from the origin to the destination of the message. Some examples are:
Cache-Control: no-cacheno cache is to be used
Cache-Control: max-age=300the cached information is valid for 300 seconds
- Connectionspecifies the option for the connection. Examples are:
Coonection: keep-alive - the connection is to be persistent
Coonection: close - the connection is not to be persistent
- Dateindicates the date and time at which the message is generated. An example is:
Date: Tue, 10 Dec 2003 22:17:54 GMT
- Transfer-Encodingindicates the type of encoding that has been applied to the message-body

3.3.2 Request headers

Some of the request headers are:
- Acceptspecifies the media-types acceptable in the response; a q parameter (called quality factor), with value from 0 to 1 indicates the user or user-agent preference for the media-type (the default value is 1). Example:
Accept: text/html;q=1, text/plain;q=0.3
- Accept-Charsetindicates the character sets acceptable in the response; a q parameter can be used to indicate the preference level of a character set. An example is:
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
ISO-8859-1 is acceptable with preference 1., utf-8 with preference 0.7, any other character set with preference 0.7
- Accept-Encodingrestricts the acceptable encoding operations of the response to the specified list; if no Accept-Encoding header is present, the server may suppose that all encodings are acceptable. Example:
Accept-Encoding: gzip,deflate
- Accept-Languagespecifies the language acceptable in the response. Example:
Accept-Language: en-us,en;q=0.5
- Authorizationcontains the credentials that give access to the requested resource
- Fromgives the e-mail address of the user who caused the request to be sent.
From: www.information@HatayServices.com
- Hostspecifies the host and port number of the requested resource; this information is as defined by the original request URI
- User-Agentgives information on the user-agent that originates the request. Example:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
- If-Modified-Sinceused with the GET method to specify that the operation is to be performed, and information returned to the client only if the requested resource has been modified since the specified date. An example:
If-Modified-Since: Mon, 8 Dec 2003 17:30:00 EDT

3.3.3 Response headers

Some of the response headers are:
- Ageused when the response goes through proxies on its way to the final recipient. The Age header is generated by an intermediate proxy and idicates the number of seconds elapsed since the original server emitted the response
- Locationspecifies a location, different from the one defined by the request URI, where the client can find further information
- Retry-Afterused with the 503 (Service Unavailable) status to indicate the delay to wait before renewing the request
- Serverdescribes the server which sends back the response
- Warningcontains a warning message

3.3.4 Entity headers

The entity headers describe the properties of the entity body enclosed in a message. They can be found in requests or in responses since both requests and responses can transport an entity. Some of the entity headers are:
- Content-Basespecifies the URI base for the relative URI found in the entity-body
- Content-Encodingindicates the encoding that has been applied to the entity-body
Content-Encoding: gzip
- Content-Languageindicate the natural language of the entity-body. Example:
Content-Language: en-us
- Content-Lengthindicates the number of bytes contained in the message-body
- Content-Locationspecifies the resource location of the entity-body; this is preciifed an an absolute or relative URI.
- Content-Typeindicates the media type of the entity-body. An example:
Content-Type: text/html, text/xml
- ETagspecifies an entity-tag as a string of characters assigned to the entity-body of the message; this entity-tag is associated with the entity-body in the cache at the destination; later on, a request from this destination can specify the entity-tag of the entity-body that can be retrieved from the cache instead of from the remote server.
- Expiresspecifies the date and time at which the entity-body is to expire
- Last-Modifiedspecifies date and time the resource was last modified

Reference

Hypertext Transfer Protocol 1.1 by R. Fielding, UC Irvine, J. Gettys, J. Mogul, DEC, H. Frystyk, T. Berners-Lee, MIT/LCS, RFC 2068, Networking Group.