ITEC310 Computer Networks II: Objectives
ITEC310 Computer Networks II: Objectives
ITEC310 Computer Networks II: Objectives
Computer Networks II
Chapter 27
WWW and HTTP
Architecture
Web Documents
HTTP
2/60
Objectives
• After completing this chapter you should be able to
do the following:
– Discuss the ideas and issues in the famous world
wide web (WWW).
– Describe the client/server application program
(HTTP) that is commonly used to access the web.
1
Architecture
Web Documents
HTTP
3/60
Intoduction
• The World Wide Web (WWW) is a source of information
linked together from points all over the world.
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
4/60
Architecture
• WWW today is a distributed client/server service, in which a
client using a browser can access a service using a server.
– Service provided is distributed over many locations called sites.
2
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
5/60
Architecture
• Each site holds one or more documents, referred to as Web
pages.
– Each Web page can contain a link to other pages in the same
site or at other sites.
– The pages can be retrieved and viewed by using browsers.
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
6/60
Architecture
• A variety of vendors offer commercial browsers that interpret and
display a Web document, and all use nearly the same architecture.
• Each browser usually consists of three parts: a controller, client
protocol, and interpreters.
– The controller receives input from the keyboard or the mouse and
uses the client programs to access the document.
• After the document has been accessed, the controller uses one of the
interpreters to display the document on the screen.
– The client protocol can be one of the protocols such as FTP or
HTTP.
– The interpreter can be HTML, Java, or JavaScript, depending on the
type of document.
3
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
7/60
Architecture
• Each browser usually consists of three parts: a controller, client
protocol, and interpreters (continued).
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
8/60
Architecture
• The Web page is stored at the server.
• Each time a client request arrives, the corresponding
document is sent to the client.
• To improve efficiency, servers normally store requested files
in a cache in memory.
– Memory is faster to access than disk.
• A server can also become more efficient through
multithreading or multiprocessing.
– In this case, a server can answer more than one request at a
time.
4
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
9/60
Architecture
• A client that wants to access a Web page needs the address.
• To facilitate the access of documents distributed throughout
the world, HTTP uses locators.
• The uniform resource locator (URL) is a standard for
specifying any kind of information on the Internet.
• The URL defines four things: protocol, host computer, port,
and path.
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
10/60
Architecture
• The protocol is the client/server program used to retrieve the
document.
– Many different protocols can retrieve a document; among them are
FTP or HTTP.
• The most common today is HTTP.
• The host is the computer on which the information is located,
although the name of the computer can be an alias.
– Web pages are usually stored in computers, and computers are given
alias names that usually begin with the characters "www".
• This is not mandatory, however, as the host can be any name given to the
computer that hosts the Web page.
10
5
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
11/60
Architecture
• The URL can optionally contain the port number of the
server.
– If the port is included, it is inserted between the host and the
path, and it is separated from the host by a colon.
• Path is the pathname of the file where the information is
located.
– The path can itself contain slashes that, in the UNIX operating
system, separate the directories from the subdirectories and
files.
11
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
12/60
Architecture
• The Web has functions such as listed below:
1. Some websites need to allow access to registered clients
only.
2. Websites are being used as electronic stores that allow users
to browse through the store, select wanted items, put them in
an electronic cart, and pay at the end with a credit card.
3. Some websites are used as portals: the user selects the Web
pages he wants to see.
4. Some websites are just advertising.
• For these purposes, the cookie mechanism was devised.
12
6
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
13/60
Architecture
• The creation and storage of cookies depend on the implementation;
however, the principle is the same.
1. When a server receives a request from a client, it stores information
about the client in a file or a string.
2. The server includes the cookie in the response that it sends to the
client.
3. When the client receives the response, the browser stores the cookie
in the cookie directory, which is sorted by the domain server name.
13
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
14/60
Architecture
• When a client sends a request to a server, the browser looks
in the cookie directory to see if it can find a cookie sent by
that server.
– If found, the cookie is included in the request.
• When the server receives the request, it knows that this is
an old client, not a new one.
– The contents of the cookie are never read by the browser or
disclosed to the user.
• It is a cookie made by the server and eaten by the server.
14
7
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
15/60
Architecture
• Now let us see how a cookie is used for the four previously
mentioned purposes:
1. The site that restricts access to registered clients only sends a cookie
to the client when the client registers for the first time.
• For any repeated access, only those clients that send the appropriate
cookie are allowed.
2. An electronic store (e-commerce) can use a cookie for its client
shoppers.
• When a client selects an item and inserts it into a cart, a cookie that
contains information about the item, such as its number and unit price, is
sent to the browser.
• If the client selects a second item, the cookie is updated with the new
selection information. And so on.
• When the client finishes shopping and wants to check out, the last cookie
is retrieved and the total charge is calculated.
ITEC310 Computer Networks II
Eastern Mediterranean University, Department of Information Technology
15
Client (Browser)
Architecture
Server
Web Documents
Uniform Resource Locator
HTTP
Cookies
16/60
Architecture
• Now let us see how a cookie is used for the four previously
mentioned purposes (continued):
3. A Web portal uses the cookie in a similar way.
• When a user selects her favorite pages, a cookie is made and sent.
• If the site is accessed again, the cookie is sent to the server to show
what the client is looking for.
4. A cookie is also used by advertising agencies.
• When a user visits the main website and clicks on the icon of an
advertised corporation, a request is sent to the advertising agency.
• The advertising agency sends the banner, a GIF file, for example, but it
also includes a cookie with the ID of the user.
• Any future use of the banners adds to the database that profiles the
Web behavior of the user.
16
8
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
17/60
Web Documents
• The documents in the WWW can be grouped into three
broad categories: static, dynamic, and active.
– The category is based on the time at which the contents of
the document are determined.
17
18/60
Web Documents
• Static documents are fixed-content documents that are created
and stored in a server.
– The contents of the file are determined when the file is created, not
when it is used.
– When a client accesses the document, a copy of the document is
sent.
– The user can then use a browsing program to display document.
18
9
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
19/60
Web Documents
• Hypertext Markup Language (HTML) is a language for
creating Web pages.
– To make part of a text displayed in boldface with HTML, we
put beginning and ending boldface tags (marks) in the text, as
shown in figure.
19
20/60
Web Documents
• The two tags <B> and </B> are instructions for the browser.
– When the browser sees these two marks, it knows that the text
must be boldfaced.
20
10
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
21/60
Web Documents
• HTML lets us use only ASCII characters for both the main
text and formatting instructions.
– In this way, every computer can receive the whole document
as an ASCII document.
21
22/60
Web Documents
• A Web page is made up of two parts: the head and the body.
– The head is the first part of a Web page.
• The head contains the title of the page and other parameters that
the browser will use.
– The actual contents of a page are in the body, which includes
the text and the tags.
• The text is the actual information contained in a page.
• The tags define the appearance of the document.
– Every HTML tag is a name followed by an optional list of attributes,
all enclosed between less-than and greater-than symbols (< and >).
22
11
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
23/60
Web Documents
• An attribute, if present, is followed by an equals sign and the
value of the attribute.
• Some tags can be used alone; others must be used in pairs.
• Those that are used in pairs are called beginning and ending
tags.
– The beginning tag can have attributes and values and starts
with the name of the tag.
– The ending tag cannot have attributes or values but must
have a slash before the name of the tag.
23
24/60
Web Documents
• The browser makes a decision about the structure of the
text based on the tags, which are embedded into the text.
• Figure shows the format of a tag.
– One commonly used tag category is the text formatting tags such
as <B> and </B>, which make the text bold; <I> and </I>, which
make the text italic; and <U> and </U>,which underline the text.
24
12
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
25/60
Web Documents
• Another interesting tag category is the image tag.
• Non-textual information such as digitized photos or graphic images
is not a physical part of an HTML document.
– We can use an image tag to point to the file of a photo or image.
– The image tag defines the address (URL) of the image to be retrieved.
• It also specifies how the image can be inserted after retrieval.
• We can choose from several attributes.
– The most common are SRC (source), which defines the source
(address), and ALIGN, which defines the alignment of the image.
25
26/60
Web Documents
• Most browsers accept images in the GIF or JPEG formats.
– For example, the following tag can retrieve an image stored as
image1.gif in the directory /bin/images:
<IMG SRC="/bin/images/image1.gif" ALIGN=MIDDLE>
26
13
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
27/60
Web Documents
• Another category is the hyperlink tag, which is needed to
link documents together.
• Any item (word, phrase, paragraph, or image) can refer to
another document through a mechanism called an anchor.
– The anchor is defined by <A ... > and </A> tags, and the anchored
item uses the URL to refer to another document.
– The user can click on the anchored item to go to another document.
• The reference phrase is embedded between the beginning
and ending tags.
27
28/60
Web Documents
• The beginning tag can have several attributes, but the one
required is HREF (hyperlink reference), which defines the
address (URL) of the linked document.
– For example, the link to the EMU online learning management
system can be
<A HREF="http://lms.emu.edu.tr">EMU LMS</A>
28
14
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
29/60
Web Documents
• A dynamic document is created by a Web server
whenever a browser requests the document.
• When a request arrives, the Web server runs an application
program or a script that creates the dynamic document.
• The server returns the output of the program or script as a
response to the browser that requested the document.
• Because a fresh document is created for each request, the
contents of a dynamic document can vary from one request
to another.
29
30/60
Web Documents
• A very simple example of a dynamic document is the
retrieval of the time and date from a server.
– Time and date are kinds of information that are dynamic in that
they change from moment to moment.
• The client can ask the server to run a program such as the
date program in UNIX and send the result of the program to
the client.
30
15
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
31/60
Web Documents
• Common Gateway Interface (CGI) is a technology that
creates and handles dynamic documents.
• CGI is a set of standards that defines how a dynamic
document is written, how data are input to the program, and
how the output result is used.
• CGI is not a new language; instead, it allows programmers to
use any of several languages such as C, C++.
• The only thing that CGI defines is a set of rules and terms
that the programmer must follow.
31
32/60
Web Documents
• Any programmer who can encode a sequence of thoughts in
a program and knows the syntax of mentioned languages
can write a simple CGI program.
• Figure illustrates the steps in creating a dynamic program
using CGl technology.
32
16
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
33/60
Web Documents
• The input from a browser to a server is sent by using a form.
• If the information in a form is small (such as a word), it can
be appended to the URL after a question mark.
– For example, the following URL is carrying form information
(23, a value):
http://www.deanza.edu/cgi-bin/prog.pl?23
• When the server receives the URL, it uses the part of the
URL before the question mark to access the program to be
run, and it interprets the part after the question mark (23) as
the input sent by the client.
33
34/60
Web Documents
34
17
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
35/60
Web Documents
• The problem with CGI technology is the inefficiency that results if
part of the dynamic document that is to be created is fixed and not
changing from request to request.
• The solution is to create a file containing the fixed part of the
document using HTML and embed a script, a source code, that
can be run by the server to provide the varying section.
35
36/60
Web Documents
• A few technologies have been involved in creating dynamic
documents using scripts.
– Among the most common are
• Hypertext Preprocessor (PHP), which uses the Perl language;
• Java Server Pages (JSP), which uses the Java language for
scripting;
• Active Server Pages (ASP), a Microsoft product which uses
Visual Basic language for scripting;
• ColdFusion, which embeds SQL database queries in the HTML
document.
36
18
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
37/60
Web Documents
• For many applications, we need a program or a script to be
run at the client site.
• These are called active documents.
– For example,
• Suppose we want to run a program that creates animated
graphics on the screen or a program that interacts with the user.
• The program definitely needs to be run at the client site where the
animation or interaction takes place.
• When a browser requests an active document, the server sends a
copy of the document or a script.
• The document is then run at the client (browser) site.
37
38/60
Web Documents
• One way to create an active document is to use Java applets.
– Java is a combination of a high-level programming language, a
run-time environment, and a class library that allows a
programmer to write an active document (an applet) and a
browser to run it.
• A Java applet can be run by the browser in two ways.
– The browser can directly request the Java applet program in
the URL and receive the applet in binary form.
– The browser can retrieve and run an HTML file that has
embedded the address of the applet as a tag.
38
19
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
39/60
Web Documents
• Figure shows how Java applets are used in the first method;
the second is similar but needs two transactions.
39
40/60
Web Documents
• The idea of scripts in dynamic documents can also be used
for active documents.
– If the active part of the document is small, it can be written in a
scripting language; then it can be interpreted and run by the
client at the same time.
• The script is in source code (text) and not in binary form.
• The scripting technology used in this case is usually
JavaScript.
– JavaScript is a very high level scripting language developed for
this purpose.
40
20
Architecture Static Documents
Web Documents Dynamic Documents
HTTP Active Documents
41/60
Web Documents
• Figure shows how JavaScript is used to create an active
document.
41
42/60
HTTP
• Hypertext Transfer Protocol (HTTP) is a protocol used
mainly to access data on the World Wide Web.
• HTTP functions as a combination of FTP and SMTP.
– It is similar to FTP because it transfers files and uses the
services of TCP on well-known port 80.
– It is much simpler than FTP because it uses only one TCP
connection.
42
21
Architecture HTTP Transaction
Web Documents Persistent Versus Nonpersistent Connection
HTTP Proxy Server
43/60
HTTP
• Figure illustrates HTTP transaction between client and server.
– Client initializes the transaction by sending a request message.
– Server replies by sending a response.
43
44/60
HTTP
• Formats of the request and response messages are similar.
44
22
Architecture HTTP Transaction
Web Documents Persistent Versus Nonpersistent Connection
HTTP Proxy Server
45/60
HTTP
45
46/60
HTTP
46
23
Architecture HTTP Transaction
Web Documents Persistent Versus Nonpersistent Connection
HTTP Proxy Server
47/60
HTTP
47
48/60
HTTP
• The most common status codes and phrases are listed in table.
48
24
Architecture HTTP Transaction
Web Documents Persistent Versus Nonpersistent Connection
HTTP Proxy Server
49/60
HTTP
• The most
common
status codes
and phrases
are listed in
table
(continued).
49
50/60
HTTP
50
25
Architecture HTTP Transaction
Web Documents Persistent Versus Nonpersistent Connection
HTTP Proxy Server
51/60
HTTP
Example 27.1
This example retrieves a document. We use the GET method to retrieve an image
with the path /usr/bin/image1. The request line shows the method (GET), the URL,
and the HTTP version (1.1). The header has two lines that show that the client can
accept images in the GIF or JPEG format. The request does not have a body. The
response message contains the status line and four lines of header. The header lines
define the date, server, MIME version, and length of the document. The body of the
document follows the header.
51
52/60
HTTP
• HTTP prior to version 1.1 specified a nonpersistent
connection, while a persistent connection is the default after
version 1.1.
• In a nonpersistent connection, one TCP connection is
made for each request/response.
– The following lists the steps in this strategy:
1. The client opens a TCP connection and sends a request.
2. The server sends the response and closes the connection.
3. The client reads the data until it encounters an end-of-file
marker; it then closes the connection.
52
26
Architecture HTTP Transaction
Web Documents Persistent Versus Nonpersistent Connection
HTTP Proxy Server
53/60
HTTP
• In a persistent connection, the server leaves the
connection open for more requests after sending a response.
• The server can close the connection at the request of a client
or if a time-out has been reached.
53
54/60
HTTP
• HTTP supports proxy servers.
– A proxy server is a computer that keeps copies of responses to
recent requests.
• HTTP client sends a request to the proxy server.
– The proxy server checks its cache.
• If the response is not stored in the cache, the proxy server sends the
request to the corresponding server.
– Incoming responses are sent to the proxy server and stored for future
requests from other clients.
• The proxy server reduces the load on the original server,
decreases traffic, and improves latency.
• To use the proxy server, the client must be configured to access
the proxy instead of the target server.
ITEC310 Computer Networks II
Eastern Mediterranean University, Department of Information Technology
54
27
Architecture
Web Documents
HTTP
55/60
Summary
• The World Wide Web (WWW) is a repository of information
linked together from points all over the world.
• Hypertexts are documents linked to one another through the
concept of pointers.
• Browsers interpret and display a Web document.
• A browser consists of a controller, client programs, and
interpreters.
• A Web document can be classified as static, dynamic, or
active.
55
Architecture
Web Documents
HTTP
56/60
Summary
• A static document is one in which the contents are fixed and
stored in a server. The client can make no changes in the
server document.
• Hypertext Markup Language (HTML) is a language used to
create static Web pages.
• Any browser can read formatting instructions (tags)
embedded in an HTML document.
• Tags provide structure to a document, define titles and
headers, format text, control the data flow, insert figures, link
different documents together, and define executable code.
56
28
Architecture
Web Documents
HTTP
57/60
Summary
• A dynamic Web document is created by a server only at a
browser request.
• The Common Gateway Interface (CGI) is a standard for
creating and handling dynamic Web documents.
• A CGI program with its embedded CGI interface tags can be
written in a language such as C, C++, Shell Script, or Perl.
• An active document is a copy of a program retrieved by the
client and run at the client site.
• Java is a combination of a high-level programming
language, a run-time environment, and a class library that
allows a programmer to write an active document and a
browser to run it.
ITEC310 Computer Networks II
Eastern Mediterranean University, Department of Information Technology
57
Architecture
Web Documents
HTTP
58/60
Summary
• Java is used to create applets (small application programs).
• The Hypertext Transfer Protocol (HTTP) is the main protocol
used to access data on the World Wide Web (WWW).
• HTTP uses a TCP connection to transfer files.
• An HTTP message is similar in form to an SMTP message.
• The HTTP request line consists of a request type, a URL,
and the HTTP version number.
• The uniform resource locator (URL) consists of a method,
host computer, optional port number, and path name to
locate information on the WWW.
58
29
Architecture
Web Documents
HTTP
59/60
Summary
• The HTTP request type or method is the actual command or
request issued by the client to the server.
• The status line consists of the HTTP version number, a
status code, and a status phrase.
• The HTTP status code relays general information,
information related to a successful request, redirection
information, or error information.
• The HTTP header relays additional information between the
client and server.
• An HTTP header consists of a header name and a header
value.
ITEC310 Computer Networks II
Eastern Mediterranean University, Department of Information Technology
59
Architecture
Web Documents
HTTP
60/60
Summary
• HTTP, version 1.1, specifies a persistent connection.
• A proxy server keeps copies of responses to recent
requests.
60
30