URI - Universal Resource Identifier

1. SAMPLE
1.1 URL
1.1.1 A resource in an Internet site - accessed through HTPP
1.1.2 A resource in an Internet site - accessed through FTP
1.1.3 A local resource absolute URL
1.1.4 Relative URL for a local resource in the same directory
1.1.5 Relative URL for a local resource in a descendant directory
1.1.6 Relative URL for a local resource in a far away directory
1.2 URN
2. USAGE
2.1 Absolute URL
2.2 Relative URL
2.3 Port identification
3. SYNTAX
3.1 Absolute URL
3.2 Relative URL
3.2.1 Directory tree
3.2.2 Relative URL

A URI (Universal Resource Identifier) is a code that identifies a resource, to be found in the Internet or elsewhere in the world.

Subcategories of the URI are :
- the URL (Universal Resource Locator) that identifies a resource in the Internet or an intranet
- the URN (Universal Resource Name) that identifies a resource by a name unique worldwide (such a name is e.g. an ISBN that identifies a book)

1. SAMPLES

Here are some URI samples

1.1 URL

The samples include:
- a URI to identify a resource in an Internet site - accessed through HTTP
- a URI to identify a resource in an Internet site - accessed through FTP
- an absolute URI to identify a file in the local host
- a relative URI to identify a file in the same directory
- a relative URI to identify a file in a descendant directory
- a relative URI to identify a file in a directory far apart

1.1.1 A resource in an Internet site - accessed through HTTP

http://www.w3.org/TR/html4/about.html

This URL identifies a file:
- located in the host whose Internet adress is 'www.w3.org'
- contained in the directory path 'TR/html4/'
- whose file name is 'about.html

This can be represented by the following figure
TR/html4/about.html The light blue area represents the content of host "www.w3.org". Both 'TR' and 'html4' directories are contained in this host. File 'about.html' is contained in 'html4'; 'html4' is contained in 'TR' and 'TR' in a directory denoted 'Internet root'. Internet communications are handled by a Web Server which has as a parameter an 'Internet root' directory which is the root of all directories visible from the Internet.

The http prefix says that access to the resource is to be handled using the http communication protocol which is the preferred protocol for HTML document exchange ('http' stands for HyperText Transfer Protocol).

1.1.2 A resource in an Internet site - accessed through FTP

ftp://ftp.epix.net/pub/apache/httpd/httpd-2.0.47.tar.gz

This URL identifies a file:
- located in the host whose Internet adress is 'ftp.epix.net'
- contained in the directory path 'pub/apache/httpd/'
- whose file name is 'httpd-2.0.47.tar.gz'

The ftp prefix states that access to the resource is to be handled using the ftp communication protocol which is used for file transfer -- which involves a large quantity of data ('ftp' stands for File Transfert Protocol).

1.1.3 A local resource absolute URL

file:///C:/Tutorials/HTML/docs/HTMLObject.html

This URL identifies a file:
- contained in disk drive 'C:'
- whose directory path is 'Tutorials/HTML/doc/'
- whose file name is 'HTMLObject.html'

This can be represented by the figure:
C:/Tutorials/HTML/docs/HTMLObject.html file

The file: prefix is the protocol specification which indicates that the resource is to be accessed using a local file access protocol. This sometimes can be omitted; the URL is then simply:

C:/Tutorials/HTML/docs/HTMLObject.html

i.e. the URL is the path name of the file.

1.1.4 Relative URI for a resource in the same directory

TextWaldenPond.html

This URI identifies a file that is in the same directory as the document where the URI is used, as shown in the following figure.

URI - file in same directory

The document that contains the URI "TextWaldenPond.html" and the referenced document "TextWaldenPond.html" are in the same common directory.

1.1.5 Relative URI for a local resource in a descendant directory

JAVA/applet/applet.YinYang.class

This URI identifies a resource that is descendant of the directory that contains the document where the URI is used, as shown in the figure:
Referenced file descends from same directory The referenced object, applet "applet.YinYang.class", is contained in the directory path JAVA/applet which is in the same directory as the document where the URI is used.

applet.YinYang.class" is the full name (complete with package name) of a JAVA applet. This applet is in the file named "YinYang.class"; so, what appears in the URI is not a file name but the full name of a Java class.

The directory that contains the document and the path to the referenced object (in rosy tint) is called the URI base for resolving the JAVA/applet/applet.YinYang.class relative URI.

1.1.6 Relative URI for a local resource in a far away directory

../../objects/images/ThePond.gif

This URI identifies an image file named "ThePond.gif" which is not a descendant of the directory containing the document where the URI is used.

URI for document not descending from base

The path to the referenced file is ../../objects/images/. It is resolved relative to the directory that contains the document where the URI is used; this directory is called the URI base (in rosy tint). The ../ sign signifies "go up one level". The URI directs the browser (or whatever agent that processes the document) to go up 2 levels before going down to the image file.

1.2 URN

An instance of the URN is the ISBN (International Standard Book Number). This is a sample URN using an ISNB:

URN:ISBN:2-901683-06-1

The acronyms URN:ISBN: are to be coded as is. The sequence of dash separated figures is the ISBN of a book.

2. USAGE

A URI, in the special form of a URL can be:
- an absolute URI or
- a relative URI

To be able to state precisely the facts, we shall deal with the URL which are most usual forms of the URIs to-day.

2.1 Absolute URL

An absolute URL uniquely defines a resource worldwide. Examples in 1.1.1, 1.1.2 and 1.1.3 above are absolute URIs.

The URIs in 1.1.1 and 1.1.2 reference remote resources. They specify the Internet address of the host containing the resource, and the location of the resource file within the host.

The host address (www.w3.org and ftp.epix.net) is unique worldwide, when an Internet address as in the examples. It is a unique address within a private network when an intranet address. Such an address has three parts:
- the domain identifier (org and net in the examples)
- the host name within the domain (w3 and epix in the examples)
- a subdivision of the host space (www and ftp in the examples); the latter may be further subdivided in one or more levels as suits the host owner, since the host space is his.

The resource file location within the host is the directory path and the file name of the file.

The path to the file starts from a directory sometimes called the Network root directory. Access to a resource in the host is handled by a processor determined by the first part of the URI which specifies a communication protocol (http and ftp in the above examples -- http protocol will be handled by the HTTP processor, etc...). This processor interprets the path part of the URI as being a path starting from the Network root directory, as said. This directory is selected as part of the customization of the processor.

Network root

File 'index.html' which is directly in the Network root directory has the absolute URI http://www.xyz.com/index.html

File 'intro.html' in the directory path docs/html/ has the absolute URI http://www.xyz.com/docs/html/intro.html

In example 1.1.3 the URL specifies the location of a file within the local host. The address part of the URL, which comes after the protocol specification "file:///", starts with the identification of the disk drive that holds the file, then locates the file within the drive by its directory path and its file name. Note that directory and file names are separated by slashes, not back-slashes, even in Windows systems. This is URI notation, not system dependent notation.

2.2 Relative URL

A relative URL defines the location of a resource within the local host by specifying a path to the resource relative to a directory called the URI base. The URI base is generally the directory that contains the document where the URI is used, but it is not necessary so.

The relative URI describes the path to follow to get from the URI base directory to the file or directory resource. It records the names of the directories encountered when going from a directory to a directory contained in it (going down the directory tree). Different names are separated by a slash sign. A move one step upward (from a directory to the directory it is contained in) is denoted by the sign ../.

To locate a resource from a relative URI, the corresponding absolute URI must be derived using the URI base. Deriving an absolute URI from a relative URI is called resolving it. A relative URI can be resolved by appending it to the absolute URI of the URI base directory, as demonstrated in the following figure.

Relative and base URI Suppose the URI base directory is the 'HTML' directory (in rosy tint); its absolute URI is file:///C:/Tutorials/HTML/

The relative URI of file 'Nature.java', relative to the URI base is ../JAVA/applet/Nature.java.
(one step up to 'Tutorials', then down to 'JAVA', to 'applet' and to 'Nature.java')

A correct absolute URI for file 'Nature.java' is file:///C:/Tutorials/HTML/../JAVA/applet/Nature.java, obtained by appending ../JAVA/applet/Nature.java to file:///C:/Tutorials/HTML/

Obviously a more palatable expression is file:///C:/Tutorials/JAVA/applet/Nature.java

But the other form is correct.

2.3 Port identification

A computer can host a variety of servers: Web servers, Database servers, etc. Each of the server monitors one the computer ports, for incoming requests.

A port is identified by its number, which can range from 1 to 65535.

The lower numbers usually are assigned to standard tasks. For example, the 80 port is the default port for the HTTP protocol server.

The port that a server is to monitor is defined at the server installation time.

When a request is addressed to a server, the server port number is specified in the URL request, separated from the host identification by a colon.

Example:

http://www.hatayser.com:4080/index.html

In HTTP communications, the default port is 80. So when sending a request to the standard HTTP protocol processor, the port number need not be indicated.

3. SYNTAX

3.1 Absolute URL

An absolute URL has the following form:

protocol://host_address:port/path

where:

protocol is the name of the protocol used to access the resource. Some of the protocols are:
httphypertext transfer protocol -- used to access documents such as HTMLs
fptfile transfert protocol -- used in file transfer
mailtoa protocol for mail exchange
filea protocol used for local file access
host_address is the Internet address of the host; this can be
- the real numeric 4 bytes address coded in the form of 4 decimal numbers, each representing the decimal value of a byte, such as 192.108.24.201
- a symbolic name such as www.xyz.com that some processor on the Internet will transform into the numerix address
port is the port number monitored by the protocol processor, on which a request for the resource should be sent; this is a number that ranges from 1 to about 65535.
pathis the directory path ending with the file name if the resource is a file, with a directory name and a slash if the resource is a directory

The colons and slash signs are to be coded as shown in the above formula.

When using the 'file' protocol, the host is the local host and the host address can be coded as 'localhost', for example:
file;//localhost/C:tutorial/docs/index.html
or it can be omitted altogether:
file:///C:tutorial/docs/index.html

The port on which a protocol processor monitors incoming requests is defined as part of the processor setting up.

HTTP processors generally use the 80 port. On the other hand, if the port number is omitted from the URL, it will default to 80.

3.2 Relative URL

Relative URLs reference a directory tree. Micro-computer users are familiar with such a tree.

Nevertheless, here is an example of a directory tree.

3.2.1 Directory tree

A directory tree A tree is a set of nodes which have a parent-child relation among themselves. Each node has one parent, except one which has none; this special node is called the root node.

In the figure, the root node is represented by a circle, the other nodes by a rectangle. The parent-child relation is represented by a line joining two nodes. In this relation, the upper node is the parent, the lower the child.

A set of nodes uninterruptedly linked by relation lines is a path. An example of path is the set of the applications-JAVA-applet-Nature.class nodes.

The tree is condidered to be fanning out downward from the root node, as shown in the figure. So it makes sense to talk about going up or down a path.

Computer files and directories contained in one disk drive have among themselves a tree structured relationship. In the tree, a file or a directory contained in another directory are the children of the latter, which is the parent of them. The root node represents the set of all the files and directories contained in the disk drive without any containing directory.

In the figure directories are colored in yellow, files in a blueish tint. The labels in the rectangles are the names of the directories or files.

3.2.2 Relative URI

A relative URI locates a resource (usually a file or a directory) by defining the path to it from a directory called the URI base.

The following illustrates the relative URI of file "pond.jpeg" with directory "Text" as the URI base.
Relative URI The URI base directory is marked out with a rosy tint. The path is indicated by a red curved line.

The path first goes up from the URI base 'Text' directory, then down to the 'Images' directory, to the 'pond.jpeg' file.

The corresponding relative URI is:

../Images/pond.jpeg

The relative URI records the steps gone through as one moves along the path from the URI base directory to the identified resource. One step upward to the parent node is denoted by a ".." sign. A step downward to a node is denoted by the name of that node. Different steps are separated by a slash character "/". Note that to describe a move upward, there is no need to specify the destination node name, since from a given node there is only one way to move upward, that is to the parent node which is unique.

The relative URI of file "YinYang.class" (with the "Text" directory as the URI base) is:

../../Applications/JAVA/applet/YinYang.class