APACHE WHAT IS IT

Previous Next Contents Glossary Index References Cover License

Chapter 1. What is Apache ?

1. What does Apache do ?

CONTENTS:
1.1 Serving a file
1.2 Passing the request to an application processor
1.3 Accessing a database

2. How does Apache do it ?

2.1 The HTTP request

2.1.1 A request entered by the user
2.1.2 Request messages containing data

2.2 Listening on a port
2.3 Locating the requested file

2.3.1 The document root
2.3.2 Alias and user directories

2.4 Processing the request

2.4.1 A simple response
2.4.2 Server side processing

3. Apache addtitional functions

3.1 Security functions
3.2 Dynamic Shared Objects

4. Configuring Apache

4.1 Directives
4.2 The configuration and .htaccess files
4.3 Directives restricted to specific directories, locations, files

4.3.1 Directives restricted to specific directories
4.3.2 Directives restricted to specific locations
4.3.3 Directives restricted to specific files

4.4 The content-type files

1. What does Apache do?

A request is a message identifying a file, sent by a user to the computer that hosts that file. The file is to be sent back to the user, or it can be a program that works out a result file, to be sent back to the user.

Apache is a Web server, that is a software system that accepts requests from a user through the Web, and returns the appropriate response to the user. This is done either by sending back the file identified in the request, or by scheduling the program identified in the request, then sending back the result of this program.

CLIENT SIDE

USER

HTTP
protocol

THE WEB

HTTP
protocol

HTTP server:
APACHE

Apache is also called an HTTP server, because it uses the HTTP protocol in its communication with the Web.

1.1 Serving a file

An instance of a request you can send to Apache, from your browser's screen (e.g. from Internet Explorer or Netscape Navigator) is:

http://myhost.com/somefile.html

where "myhost.com" is the Internet address of a host computer where an Apache server is installed.

CLIENT SIDE

USER

HTTP protocol

HOST computer
myhost.com

APACHE

somefile.html

Your browser would translate this line into an HTTP message that requests the "somefile.html" file contained in a dedicated directory in the addressed host computer. How Apache locates this directory is summarily dealt with in 2.3 Locating the requested file below.

If the Apache server has not been tinkered with, it would fetch the "somefile.html" file and send it back to your browser, for display on your screen.

1.2 Passing the request to an application processor

A more sophisticated treatment involves the passing of the request to an application processor (such as PHP).

CLIENT SIDE

USER

HTTP protocol

HOST computer
myhost.com

APACHE

PHP

phppage.php

In addition to the file identification, APACHE can receive data from the browser, and pass them on to the application processor, along with the file identification.

Usually, the requested file is a script which uses some input data and produces an output result. In a realistic situation, the application processor sets the data received from Apache into some conventional format and has the script executed, with the formatted data as input. The output result from the script is returned to APACHE which sends it back to the browser, for display onto the user screen.

The application processor to be called is generally indicated by the file name extension of the requested file, or by the directory in which this file is located. Examples:

http://myhost/phppage.php	the `.php` extension indicates that the request is to be passed on to the `PHP` processor
http://myhost/cgi-bin/calculus.pl	the `cgi-bin` directory is known by Apache to contain files to be processed by a CGI processor

All that is controlled by the setting of Apache (see section 4. Apache configuration, below).

Some examples of application processors are:

PHP	(Hyperlink PreProcessor)
JSP	(Java Server Page)

Earlier generations of Apache were dedicated to running CGI (Common Gateway Interface) programs. As its name implies, CGI is not a processor, but an interface. Many language processors can be used to create programs to run as CGI applications: PERL, C, C++, Basic, etc.. Under UNIX, shell scripts can also be used. The preferred processor for CGI is PERL.

A script run under these processors generally produces an HTML page that contains the results computed from the input data. This HTML page is the result sent back to the requesting browser which then displays it for the user to see.

To develop an application running under one of these processors (PHP, JSP, PERL, etc...), a programmer has to know the language appropriate for this processor. Such a language is independent from Apache. An application developer needs not know how Apache or any other Web server works.

1.3 Accessing a database

A script as called upon by Apache may have the capabability of accessing a database. This is true with the above mentionned processors: PHP, JSP, PERL, C, etc.

Most databases nowadays can be distributed among multiple hosts. Using a distributed database, a user connected to an Apache system can access data held in different hosts. This capability is of the database management system. It is not related to Apache.

Connection to a database usually involves a database driver which is a piece of software that pertains to the database system, and is installed in the host of the requesting script.

2. How does Apache do it?

2.1 The HTTP request

The procedure starts with an HTTP request from the client side.

2.1.1 A request entered by the user

A sample HTTP request is this line entered from the address line of your browser (e.g. Netscape Navigator, Internet Explorer, etc.):

http://www.hata.com:4080/profile/people.php?name=Fanny&surname=Adams

Your browser translates this line into a message that looks like this:

GET http://www.hata.com:4080/profile/people.php?name=Fanny&surname=Adams HTTP/1.1CRLF
Accept: text/html, image/gif, image/jpegCRLF
Accept-Language: en-us, fr;qs=0.5CRLF
Accept-Encoding: gzip, deflateCRLF
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)CRLF
Host: www.hata.com:4080CRLF
Connection: Keep-AliveCRLF
CRLF

This message is given here for concreteness sake. An application developper never sees it, and an Apache administrator seldom has to deal with it.

The CRLF symbol represents the "Carriage Return / Line Feed" sequence that marks the end of a line of text. The rest of the message is composed of readable characters: HTTP is a character transmission protocol.

As you can see, the first line contains the request that was entered by the user:
http://www.hata.com:4080/profile/people.php?name=Fanny&surname=Adams

In this:

`www.hata.com`	is the server's host Internet name
`4080`	is the port at the server's host where the message is expected
`/profile/people.php`	is the requested file URI
`name=Fanny&surname=Adams`	is the data sequence to be passed as input data to the `people.php` script.

The remaining lines are the message headers that contain informations on how to handle the message.

This message is sent by the browser to the Apache system in the www.hata.com host.

2.1.2 Requests with a content

In the above example, the HTTP message conveys the data to be processed (name=Fanny&surname=Adams) on the line entered by the user. There is nothing in the message after the headers, but 2 CRLF characters.

However, the browser can generate messages where the data are placed after the headers. The usual cases are messages generated from an HTML form. A sample message is as follows:

POST http://www.hata.com:4080/profile/people.php HTTP/1.1CRLF
Accept: text/html, image/gif, image/jpegCRLF
Accept-Language: en-us, fr;qs=0.5CRLF
Accept-Encoding: gzip, deflateCRLF
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)CRLF
Host: www.hata.com:4080CRLF
Connection: Keep-AliveCRLF
CRLF
name=Fanny&surname=Adams

The data coming after the headers make up the message contents

2.2 Listening on a port

How the message finds its way to its destination host is not in the scope of the present book. Let's resume at when the message arrives there.

In this example, Apache listens on port 4080 for incoming messages. That is, when a message destined for port 4080 is received, the Operating System (Unix, Windows, etc...) dispatches it to Apache.

There are 65000 something (not quite 2 to the power 16 minus 1) ports, numbered from 1 up. Each of the programs that communicate with the outside world is assigned one of these ports. Port 80 is assigned to HTTP communications. If Apache uses it, this number may be omitted from the request message, as it is implied. Here 4080 is used, possibly because 80 is assigned to another HTTP server, for example IIS (Internet Information System), in Windows.

2.3 Locating the requested file

Files accessible to remote users are ordinarily contained in a directory called the document root, or its descendants. An installation can also set up alias directories (so called because they are requested not by their name, but by an alias) or user directories (not shown on the figure below) to contain accessible files.

2.3.1 The document root

APACHE


other	Alias appl


		htdocs	Document root


		files
			docs	profile

In the Apache environment, there is a directory structure that contains the generality of the files accessible to remote users. It starts with a directory called the document root. The name of this directory is usually htdocs.

The document root can contain files and other directories. The figure on the left shows 2 of these, called docs and profile.

The path information (in the above sample request: profile/people.php), that part of a request that comes between the host name and port informations, and the question mark, if any, or the end of the line, specifies the relative path of the requested file. Except for references to an alias or a user directory (to be seen below), this path is evaluated relative to the document root.

In the above example, the profile/people.php file is to be found in the profile directory, under the document root.

If somefile.html is a file contained in the document root, a request for this file, in the same host as above, is:

http://www.hata.com:4080/somefile.html

Names that come directly after the host and port identifications identify files contained in the directory root.

2.3.2 Alias and user directories

It is possible to set up directories outside the document root to contain files that users can request. These are:

-	alias directories
-	user directories

Apache knows their names and where they are located.

In the structure shown in the figure above, the "other" directory is known to users by an alias defined by the setting of Apache, for example "appl". A request starting with http://www.hata.com:4080/appl/... then refers to a file found under the "other" directory. The use of aliases precludes users from knowing the true organization of the Apache file system. It also makes the users independent of this organization.

The figure below shows one of the possible user directory organizations. The users have the names of "joe", "sam" and "willy". They have a directory named after them.

A SAMPLE USER DIRECTORY ORGANIZATION

users

joe

sam

willy

files

appl

docs

files

libr

scrip

files

prog

infos

A file named phppage.php contained in the scrip directory is requested by:

http://www.hata.com:4080/~sam/scrip/phppage.php

2.4 Processing the request

2.4.1 A simple response

In a simplest instance, Apache serves a file by sending it as is to the requesting browser. This is the case with an HTML page.

2.4.2 Server side processing

When a request requires processing to produce a response, the request message is passed on to the appropriate program.

How to select this program is generally determined by the extension of the requested file. Another method is to assign the files contained in one or more directories to a processing program, disregarding their extensions. Both methods are based on the settings of Apache.

A simple treatment is filtering: the requested file is passed to a filter program which modifies, then returns it to Apache for sending to the client. An example of such a treatment is the Server Side Include (SSI) procedure. In this, the requested file is an HTML page which has special elements inserted in it. The SSI processor reads through the page and replaces these elements by the result of the operations they describe (such a result is for instance the current date and time). The resulting transformed HTML page is sent to the client. Files to be so treated are usually characterized by the .shtml extension ("usually", because this can be changed by Apache settings).

A more elaborate treatment involves the excution of a program that runs under an application processor such as PHP, JSP, PERL, etc. The requested file is then a program (often a script) developped in the language required by the application processor. Apache passes the HTTP request message on to the application processor which retrieves the requested file and has it run. The data contained in the request message is made available to the program, in the form defined by the application processor language.

The output from this treatment is generally an HTML page which contains the sought for results. This page is sent to the client to be displayed by the browser on the user's screen. HTML pages are widely used to-day, because they are currently the type of document that browsers know best how to handle. In the near future, application programs can generate XML pages instead.

3. Apache additional functions

3.1 Apache security functions

Apache offers a number of functions to control user access. These are:

- host authorization	which accepts or rejects requests based on their originating host
- user authentication	which requires that users enter their name and password, to access certain directories
- per user grouping	which assigns a separate file structure to each user

The user authentication function is supported by software components which provide for encrypting the passwords to be stored in disk files. At an elaborate level, it is also possible to encrypt passwords sent in by the users, which involves an agreement on a key (usually emitted by the server) prior to the procedure.

3.2 Apache modules

Apache security functions are implemented by modules that are dynamically loaded when Apache is started.

Apache can accept new modules to carry out new ways of handling request. These modules can be copied into the Apache environment or stay outside. One or more file extensions are to be defined to identify the files to be processed by such a module. This capability makes Apache a system open to new functionalities

Such modules can vary in size. They can be complete application systems like PHP.

4. Configuring Apache

Apache is composed of a core which is produced in one compilation run, and a number of dynamic modules also called Dynamic Shared Objects (DSO) which are loaded when Apache is started. These modules are called "dynamic" because they are not hard compiled into the Apache core, and can be loaded at run time. In the course of time, they can be added to the system without recompiling the core.

The options that govern the inclusion of dynamic modules, the directory organization and the functioning of Apache are defined by a set of directives. The bulk of these are contained in the Apache configuration file. Directives that affect the access to a specific directory can be set in a .htaccess file contained in that directory.

4.1 Directives

Some of the directives are:

`Alias`	defines a user accessible directory structure independent of the document root
`UserDir`	defines a directory organization, partitioned on a per-user basis
`LoadModule`	locates a module to be loaded when Apache starts -- among these modules are those to which Apache passes incoming requests for processing, such as the SSI processor, the PHP or JSP application processors, and those which implement the security procedures.
`AddType`	assigns one or more file extensions to a data type
`AddHandler`	assigns one or more file extensions to a handling requirement
`Action`	assigns a data type or handling requirement to a processing module

Some of the directives are discussed in the next chapter.

Directives fall into 2 categories:

-	those pertaining to the Apache core, which are always available
-	those pertaining to a specific module, available only when the module is installed

An exhaustive directive index is found in the file:manual/mod/directives.html.en, where a hyperlink directs to the full description of each directive. See also: Appendix 4. The Apache manual

4.2 The configuration and .htaccess files

The main file to contain directives is the so-called configuration file, the path of which is conf/httpd.conf.

This is a text file which can be displayed and edited using a simple text editor. In this, the lines that start with a # are comments.

Directives controlling the access to a specific directory can be set in a file names .htaccess contained in that directory. These directives will affect the handling requests to access that directory and all of the directories contained therein.

4.3 Directives restricted to specific directories, locations or files

Directives can be set up with a scope restricted to specific:
- directories
- locations
- files

4.3.1 Directives restricted to specific directories

Directives related to a specific directory are set up in the configuration file, within a section delimited by the <Directory > and </Directory> tags. The syntax is:

<Directory directory-path>

     directives

</Directory>

where directory-path can be:

a true path name. Exemples:

/usr/apache/htdocs	in a UNIX system
C:/soft/apache2/htdocs	in a Windows system

a path name containing wild-card strings such as:

`?`	to match any single character
`*`	to match any character string
`[a-x]`	to match a range of characters from "a" to "x"

- a regular expression set within quotes, and preceded by the tilde sign ~.
Example: ~ "^/www/lib/"
would match any directory name starting with /www/lib/

Example:

<Directory C:/Apache/htdocs/ >
   Options Indexes FollowSymLinks
   AllowOverride none
   Order allow,deny
   Allow from all
</Directory>

In this example, the directives enclosed within the <Directory> and </Directory> tags apply to the C:/Apache/htdocs/ directory and its descendants, if any.

The directives to be applied to a directory can also be set in a file named .htaccess contained in that directory.

4.3.2 Directives restricted to specific locations

A section delimited by the <Location> and </Location> tags contains the directives that apply to the requests whose path information starts with the location string specified in the <Location> tag.

The path information is that part of the request URL comprised between the host and port information and the end of the request, or the ? (question mark) sign, which ever comes first:
http://host:port/path?info

The syntax is:

<Location location-string

     directives

</Location>

where location-string is a character string to be matched with the start of the path information of an incoming request.
For example, this Location section, defined, say, in the mysite host:

<Location /private/
  Order Allow,Deny
  Allow hatayservices.com
</Location>

would deny access to requests from all origins, except hatayservices.com, with a path information string starting with the string /private/, such as:

http://mysite/private/secret.html
or
http://mysite/private/cgi-bin/prog.pl

The location-string can contain the wild-card characters ? or *. Also, it can be a regular expression enclosed within quotes and preceded by a tilde (~) sign.

4.3.3 Directives restricted to specific files

Directives related to files with a name that matches a certain pattern are set up in a section delimited by the <Files > and </Files> tags. The syntax is:

<Files file-name-pattern>

     directives

</Files>

The file name considered here is the last component in the full name of a file. It excludes the names of the directories contained in the file path.

The file-name-pattern argument in the Files tag can contain the wild-card characters

`?`	which represents any single character
`*`	which represents any character string

This argument can be a regular expression, enclosed within quotes, and preceded by a tilde sign. Example - the directives in the section starting with:

<Files ~ "\.(gif|jpg|jpeg|png|bmp)$"

apply to all files with a name ending with one of the extensions .gif, .jpg, .jpeg, .png or .bmp which are image files.

4.4 The content-type files

Two files contained in the conf directory serve to determine the data type of a file (hence the appropriate program to process it):

- the `mime.type` file	which is the main source for assigning extensions to a data type -- the `AddType` directives (discussed in Chapter 3, below) come in as a complement to the `mime.types` file
- the `magic` file	which contains the information to determine the data type of a file by looking at its first bytes.

top

Previous Next Contents Glossary Index References Cover License