Computer Communications
Computer Communications
Computer Communications
This lecture introduces the ISO-OSI layered architecture of Networks. According to the
ISO standards, networks have been divided into 7 layers depending on the complexity of
the fucntionality each of these layers provide. The detailed description of each of these
layers is given in the notes below. We will first list the layers as defined by the standard
in the increasing order of function complexity:
1. Physical Layer
2. Data Link Layer
3. Network Layer
4. Transport Layer
5. Session Layer
6. Presentation Layer
7. Application Layer
Physical Layer
This layer is the lowest layer in the OSI model. It helps in the transmission of data
between two machines that are communicating through a physical medium, which can be
optical fibres,copper wire or wireless etc. The following are the main functions of the
physical layer:
2. Encoding and Signalling: How are the bits encoded in the medium is also
decided by this layer. For example, on the coppar wire medium, we can use
differnet voltage levels for a certain time interval to represent '0' and '1'. We may
use +5mV for 1nsec to represent '1' and -5mV for 1nsec to represent '0'. All the
issues of modulation is dealt with in this layer. eg, we may use Binary phase shift
keying for the representation of '1' and '0' rather than using different volatage
levels if we have to transfer in RF waves.
Binary Phase Shift Keying
3. Data Transmission and Reception: The transfer of each bit of data is the
responsibility of this layer. This layer assures the transmissoin of each bit with a
high probability. The transmission of the bits is not completely reliable as their is
no error correction in this layer.
4. Topology and Network Design: The network design is the integral part of the
physical layer. Which part of the network is the router going to be placed, where
the switches will be used, where we will put the hubs, how many machines is
each switch going to handle, what server is going to be placed where, and many
such concerns are to be taken care of by the physical layer. The variosu kinds of
netopologies that we decide to use may be ring, bus, star or a hybrid of these
topologies depending on our requirements.
Data Link Layer
This layer provides reliable transmission of a packet by using the services of the physical
layer which transmits bits over the medium in an unreliable fashion. This layer is
concerned with :
1. Framing : Breaking input data into frames (typically a few hundred bytes) and
caring about the frame boundaries and the size of each frame.
2. Acknowledgment : Sent by the receiving end to inform the source that the frame
was received without any error.
3. Sequence Numbering : To acknowledge which frame was received.
4. Error Detection : The frames may be damaged, lost or duplicated leading to
errors.The error control is on link to link basis.
5. Retransmission : The packet is retransmitted if the source fails to receive
6. Flow Control : Necessary for a fast transmitter to keep pace with a slow receiver.
Data Link Layer
Network Layer
Its basic functions are routing and congestion control.
Routing: This deals with determining how packets will be routed (transferred) from
source to destination. It can be of three types :
• Static : Routes are based on static tables that are "wired into" the network and are
rarely changed.
• Dynamic : All packets of one application can follow different routes depending
upon the topology of the network, the shortest path and the current network load.
• Semi-Dynamic : A route is chosen at the start of each conversation and then all
the packets of the application follow the same route.
Congestion Control: A router can be connected to 4-5 networks. If all the networks send
packet at the same time with maximum rate possible then the router may not be able to
handle all the packets and may drop some/all packets. In this context the dropping of the
packets should be minimized and the source whose packet was dropped should be
informed. The control of such congestion is also a function of the network layer. Other
issues related with this layer are transmitting time, delays, jittering.
Internetworking: Internetworks are multiple networks that are connected in such a way
that they act as one large network, connecting multiple office or department networks.
Internetworks are connected by networking hardware such as routers, switches, and
bridges.Internetworking is a solution born of three networking problems: isolated LANs,
duplication of resources, and the lack of a centralized network management system. With
connected LANs, companies no longer have to duplicate programs or resources on each
network. This in turn gives way to managing the network from one central location
instead of trying to manage each separate LAN. We should be able to transmit any packet
from one network to any other network even if they follow different protocols or use
different addressing modes.
Network Layer does not guarantee that the packet will reach its intended destination.
There are no reliability guarantees.
Transport Layer
Its functions are :
Fragmentation Reassembly
• Types of service : The transport layer also decides the type of service that should
be provided to the session layer. The service may be perfectly reliable, or may be
reliable within certain tolerances or may not be reliable at all. The message may
or may not be received in the order in which it was sent. The decision regarding
the type of service to be provided is taken at the time when the connection is
• Error Control : If reliable service is provided then error detection and error
recovery operations are also performed. It provides error control mechanism on
end to end basis.
• Flow Control : A fast host cannot keep pace with a slow one. Hence, this is a
mechanism to regulate the flow of information.
• Connection Establishment / Release : The transport layer also establishes and
releases the connection across the network. This requires some sort of naming
mechanism so that a process on one machine can indicate with whom it wants to
ession Layer
It deals with the concept of Sessions i.e. when a user logins to a remote server he should
be authenticated before getting access to the files and application programs. Another job
of session layer is to establish and maintain sessions. If during the transfer of data
between two machines the session breaks down, it is the session layer which re-
establishes the connection. It also ensures that the data transfer starts from where it
breaks keeping it transparent to the end user. e.g. In case of a session with a database
server, this layer introduces check points at various places so that in case the connectoin
is broken and reestablished, the transition running on the database is not lost even if the
user has not committed. This activity is called Synchronization. Another function of this
layer is Dialogue Control which determines whose turn is it to speak in a session. It is
useful in video conferencing.
Presentation Layer
This layer is concerned with the syntax and semantics of the information transmitted. In
order to make it possible for computers with different data representations to
communicate data structures to be exchanged can be defined in abstract way alongwith
standard encoding. It also manages these abstract data structres and allows higher level of
data structres to be defined an exchange. It encodes the data in standard agreed
way(network format). Suppose there are two machines A and B one follows 'Big Endian'
and other 'Little Endian' for data representation. This layer ensures that the data
transmitted by one gets converted in the form compatibale to othe machine. This layer is
concerned with the syntax and semantics of the information transmitted.In order to make
it possible for computers with different data representations to communicate data
structures to be exchanged canbe defined in abstract way alongwith standard encoding. It
also manages these abstract data structres and allows higher level of data structres to be
defined an exchange. Other functions include compression, encryption etc.
Application Layer
The seventh layer contains the application protocols with which the user gains access to
the network. The choice of which specific protocols and their associated functions are to
be used at the application level is up to the individual user. Thus the boundary between
the presentation layer and the application layer represents a separation of the protocols
imposed by the network designers from those being selected and implemented by the
network users.For example commonly used protocols are HTTP(for web browsing),
FTP(for file transfer) etc.
Physical Layer
Physical layer is concerned with transmitting raw bits over a communication channel.
The design issues have to do with making sure that when one side sends a 1 bit, it is
recieved by the other side as 1 bit and not as 0 bit. In physical layer we deal with the
communication medium used for transmission.
Types of Medium
Medium can be classified into 2 categories.
1. Guided Media : Guided media means that signals is guided by the prescence of
physical media i.e. signals are under control and remains in the physical wire. For
eg. copper wire.
2. Unguided Media : Unguided Media means that there is no physical path for the
signal to propogate. Unguided media are essentially electro-magnetic waves.
There is no control on flow of signal. For eg. radio waves.
Communication Links
In a nework nodes are connected through links. The communication through links can be
classified as
1. Simplex : Communication can take place only in one direction. eg. T.V
2. Half-duplex : Communication can take place in one direction at a time. Suppose
node A and B are connected then half-duplex communication means that at a time
data can flow from A to B or from B to A but not simultaneously. eg. two persons
talking to each other such that when speaks the other listens and vice versa.
3. Full-duplex : Communication can take place simultaneously in both directions.
eg. A discussion in a group without discipline.
1. Point to Point : In this communication only two nodes are connected to each
other. When a node sends a packet then it can be recieved only by the node on the
other side and none else.
2. Multipoint : It is a kind of sharing communication, in which signal can be
recieved by all nodes. This is also called broadcast.
Bandwidth simply means how many bits can be transmitted per second in the
communication channel. In technical terms it indicates the width of frequency spectrum.
Transmission Media
Guided Transmission Media
In Guided transmission media generally two kind of materials are used.
1. Copper
o Coaxial Cable
o Twisted Pair
2. Optical Fiber
2. Twisted Pair: A Twisted pair consists of two insulated copper wires, typically
1mm thick. The wires are twisted togather in a helical form the purpose of
twisting is to reduce cross talk interference between several pairs. Twisted Pair is
much cheaper then coaxial cable but it is susceptible to noise and electromagnetic
interference and attenuation is large.
3. Optical Fiber: In optical fiber light is used to send data. In general terms
prescence of light is taken as bit 1 and its absence as bit 0. Optical fiber consists
of inner core of either glass or plastic. Core is surrounded by cladding of the same
material but of different refrective index. This cladding is surrounded by a plastic
jacket which prevents optical fiber from electromagnetic interferrence and harshy
environments. It uses the principle of total internal reflection to transfer data over
optical fibers. Optical fiber is much better in bandwidth as compared to copper
wire, since there is hardly any attenuation or electromagnetic interference in
optical wires. Hence there is less requirement to improve quality of signal, in long
distance transmission. Disadvantage of optical fiber is that end points are fairly
expensive. (eg. switches)
1. Depending on material
Made of glass
Made of plastic.
2. Depending on radius
Thin optical fiber
Thick optical fiber
3. Depending on light source
LED (for low bandwidth)
Injection lased diode (for high bandwidth)
Wireless Transmission
1. Radio: Radio is a general term that is used for any kind of frequency. But higher
frequencies are usually termed as microwave and the lower frequency band comes
under radio frequency. There are many application of radio. For eg. cordless
keyboard, wireless LAN, wireless ethernet. but it is limited in range to only a few
hundred meters. Depending on frequency radio offers different bandwidths.
2. Terrestrial microwave: In terrestrial microwave two antennas are used for
communication. A focused beam emerges from an antenna and is recieved by the
other antenna, provided that antennas should be facing each other with no
obstacle in between. For this reason antennas are situated on high towers. Due to
curvature of earth terristial microwave can be used for long distance
communication with high bandwidth. Telecom department is also using this for
long distance communication. An advantage of wireless communication is that it
is not required to lay down wires in the city hence no permissions are required.
3. Satellite communication: Satellite acts as a switch in sky. On earth VSAT(Very
Small Aperture Terminal) are used to transmit and recieve data from satellite.
Generally one station on earth transmitts signal to satellite and it is recieved by
many stations on earth. Satellite communication is generally used in those places
where it is very difficult to obtain line of sight i.e. in highly irregular terristial
regions. In terms of noise wireless media is not as good as the wired media. There
are frequency band in wireless communication and two stations should not be
allowed to transmit simultaneously in a frequency band. The most promising
advantage of satellite is broadcasting. If satellites are used for point to point
communication then they are expensive as compared to wired media.
Data Encoding
Digital data to analog signals
A modem (modulator-demodulator) converts digital data to analog signal. There are 3
ways to modulate a digital signal on an analog carrier signal.
3. Phase shift keying (PSK): The phase of the carrier is discretely varied in relation
either to a reference phase or to the phase of the immediately preceding signal
element, in accordance with data being transmitted. Phase of carrier signal is
shifted to represent '0' , '1'.
Digital data to digital signals
A digital signal is sequence of discrete , discontinuous voltage pulses. Each pulses a
signal element. Encoding scheme is an important factor in how successfully the receiver
interprets the incoming signal.
Encoding Techniques
Following are several ways to map data bits to signal elements.
• Non return to zero(NRZ) NRZ codes share the property that voltage level is
constant during a bit interval. High level voltage = bit 1 and Low level voltage =
bit 0. A problem arises when there is a long sequence of 0s or 1s and the volatage
level is maintained at the same value for a long time. This creates a problem on
the recieving end because now, the clock synchronization is lost due to lack of
any transitions and hence, it is difficult to determine the exact number of 0s or 1s
in this sequence.
NRZ-I has an advantage over NRZ-L. Consider the situation when two data wires
are wrongly connected in each other's place.In NRZ-L all bit sequences will get
reversed (B'coz voltage levels get swapped).Whereas in NAZ-I since bits are
recognized by transition the bits will be correctly interpreted. A disadvantage in
NRZ codes is that a string of 0's or 1's will prevent synchronization of transmitter
clock with receiver clock and a separate clock line need to be provided.
Thus it is ensured that we can never have more than three consecutive 0s.
Now these 5-bit codes are transmitted using NRZI coding thus problem of
consecutive 1s is solved.
Of the remaining 16 codes, 7 are invalid and others are used to send some
control information like line idle(11111), line dead(00000), Halt(00100)
There are other variants for this scheme viz. 5B/6B, 8B/10B etc. These
have self suggesting names.
• Pulse code modulation(PCM): Here intervals are equally spaced. 8 bit PCB uses
256 different levels of amplitude. In non-linear encoding levels may be unequally
• Delta Modulation(DM): Since successive samples do not differ very much we
send the differences between previous and present sample. It requires fewer bits
than in PCM.
Transmission Techniques:
Start bit: It is prefixed to each byte and equals 0. Thus it ensures a transition
from 1 to 0 at onset of transmission of byte.The leading edge of start bit is used as
a reference for generating clock pulses at required sampling instants. Thus each
onset of a byte results in resynchronization of receiver clock.
Bit Stuffing: Suppose our flag bits are 01111110 (six 1's). So the transmitter will
always insert an extra 0 bit after each occurrence of five 1's (except for flags).
After detecting a starting flag the receiver monitors the bit stream . If pattern of
five 1's appear, the sixth is examined and if it is 0 it isdeleted else if it is 1 and
next is 0 the combination is accepted as a flag. Similarly byte stuffing is used for
byte oriented transmission.Here we use an escape sequence to prefix a byte
similar to flag and 2 escape sequences if byte is itself a escape sequence.
When two communicating nodes are connected through a media, it generally happens
that bandwidth of media is several times greater than that of the communicating nodes.
Transfer of a single signal at a time is both slow and expensive. The whole capacity of the
link is not being utilized in this case. This link can be further exploited by sending several
signals combined into one. This combining of signals into one is called multiplexing.
2. Asynchronous TDM: In this method, slots are not fixed. They are allotted
dynamically depending on speed of sources, and whether they are ready
for transmission.
Network Topologies
A network topology is the basic design of a computer network. It is very much like a map
of a road. It details how key network components such as nodes and links are
interconnected. A network's topology is comparable to the blueprints of a new home in
which components such as the electrical system, heating and air conditioning system, and
plumbing are integrated into the overall design. Taken from the Greek work "Topos"
meaning "Place," Topology, in relation to networking, describes the configuration of the
network; including the location of the workstations and wiring connections. Basically it
provides a definition of the components of a Local Area Network (LAN). A topology,
which is a pattern of interconnections among nodes, influences a network's cost and
performance. There are three primary types of network topologies which refer to the
physical and logical layout of the Network cabling. They are:
1. Star Topology: All devices connected with a Star setup communicate through a
central Hub by cable segments. Signals are transmitted and received through the
Hub. It is the simplest and the oldest and all the telephone switches are based on
this. In a star topology, each network device has a home run of cabling back to a
network hub, giving each device a separate connection to the network. So, there
can be multiple connections in parallel.
o Broadcasting and multicasting is simple since you just need to send out
one message
o Less expensive since less cable footage is required
o It is guaranteed that each host will be able to transmit within a finite time
o Very orderly network where every device has access to the token and the
opportunity to transmit
o Performs better than a star network under heavy network load
Generally, a BUS architecture is preferred over the other topologies - ofcourse, this is a
very subjective opinion and the final design depends on the requirements of the network
more than anything else. Lately, most networks are shifting towards the STAR topology.
Ideally we would like to design networks, which physically resemble the STAR topology,
but behave like BUS or RING topology.
The Aloha protocol was designed as part of a project at the University of Hawaii. It
provided data transmission between computers on several of the Hawaiian Islands using
radio transmissions.
• Communications was typically between remote stations and a central sited named
Menehune or vice versa.
• All message to the Menehune were sent using the same frequency.
• When it received a message intact, the Menehune would broadcast an ack on a
distinct outgoing frequency.
• The outgoing frequency was also used for messages from the central site to
remote computers.
• All stations listened for message on this second frequency.
Pure Aloha
Pure Aloha is an unslotted, fully-decentralized protocol. It is extremely simple and trivial
to implement. The ground rule is - "when you want to talk, just talk!". So, a node which
wants to transmits, will go ahead and send the packet on its broadcast channel, with no
consideration whatsoever as to anybody else is transmitting or not.
One serious drawback here is that, you dont know whether what you are sending has been
received properly or not (so as to say, "whether you've been heard and understood?"). To
resolve this, in Pure Aloha, when one node finishes speaking, it expects an
acknowledgement in a finite amount of time - otherwise it simply retransmits the data.
This scheme works well in small networks where the load is not high. But in large, load
intensive networks where many nodes may want to transmit at the same time, this scheme
fails miserably. This led to the development of Slotted Aloha.
Slotted Aloha
This is quite similar to Pure Aloha, differing only in the way transmissions take place.
Instead of transmitting right at demand time, the sender waits for some time. This delay is
specified as follows - the timeline is divided into equal slots and then it is required that
transmission should take place only at slot boundaries. To be more precise, the slotted-
Aloha makes the following assumptions:
In this way, the number of collisions that can possibly take place is reduced by a huge
margin. And hence, the performance become much better compared to Pure Aloha.
collisions may only take place with nodes that are ready to speak at the same time. But
nevertheless, this is a substantial reduction.
1. Listen before speaking: If someone else is speaking, wait until they are done. In
the networking world, this is termed carrier sensing - a node listens to the channel
before transmitting. If a frame from another node is currently being transmitted
into the channel, a node then waits ("backs off") a random amount of time and
then again senses the channel. If the channel is sensed to be idle, the node then
begins frame transmission. Otherwise, the node waits another random amount of
time and repeats this process.
2. If someone else begins talking at the same time, stop talking. In the
networking world, this is termed collision detection - a transmitting node listens
to the channel while it is transmitting. If it detects that another node is
transmitting an interfering frame, it stops transmitting and uses some protocol to
determine when it should next attempt to transmit.
It is evident that the end-to-end channel propagation delay of a broadcast channel - the
time it takes for a signal to propagate from one of the the channel to another - will play a
crucial role in determining its performance. The longer this propagation delay, the larger
the chance that a carrier-sensing node is not yet able to sense a transmission that has
already begun at another node in the network.
So, we need an improvement over CSMA - this led to the development of CSMA/CD.
CSMA/CD doesn't work in some wireless scenarios called "hidden node" problems.
Consider a situation, where there are 3 nodes - A, B and C communicating with each
other using a wireless protocol. Morover, B can communicate with both A and C, but A
and C lie outside each other's range and hence can't communicate directly with each
other. Now, suppose both A and C want to communicate with B simultaneously. They
both will sense the carrier to be idle and hence will begin transmission, and even if there
is a collision, neither A nor C will ever detect it. B on the other hand will receive 2
packets at the same time and might not be able to understand either of them. To get
around this problem, a better version called CSMA/CA was developed, specially for
wireless applications.
One issue that needs to be addressed is how long the rest of the nodes should wait before
they can transmit data over the network. The answer is that the RTS and CTS would carry
some information about the size of the data that B intends to transfer. So, they can
calculate time that would be required for the transmission to be over and assume the
network to be free after that.Another interesting issue is what a node should do if it hears
RTS but not a corresponding CTS. One possibility is that it assumes the recipient node
has not responded and hence no transmission is going on, but there is a catch in this. It is
possible that the node hearing RTS is just on the boundary of the node sending CTS.
Hence, it does hear CTS but the signal is so deteriorated that it fails to recognize it as a
CTS. Hence to be on the safer side, a node will not start transmission if it hears either of
an RTS or a CTS.
The assumption made in this whole discussion is that if a node X can send packets to a
node Y, it can also receive a packet from Y, which is a fair enough assumption given the
fact that we are talking of a local network where standard instruments would be used. If
that is not the case additional complexities would get introduced in the system.
Let us try to parametrize the above problem. Suppose "t" is the time taken for the node A
to transmit the packet on the cable and "T" is the time , the packet takes to reach from A
to B. Suppose transmission at A starts at time t0. In the worst case the collision takes
place just when the first packet is to reach B. Say it is at t0+T-e (e being very small).
Then the collision information will take T-e time to propagate back to A. So, at t0+2(T-e)
A should still be transmitting. Hence, for the correct detection of collision (ignoring e)
t > 2T
t increases with the number of bits to be transferred and decreases with the rate of transfer
(bits per second). T increases with the distance between the nodes and decreases with the
speed of the signal (usually 2/3c). We need to either keep t large enough or T as small.
We do not want to live with lower rate of bit transfer and hence slow networks. We can
not do anything about the speed of the signal. So what we can rely on is the minimum
size of the packet and the distance between the two nodes. Therefore, we fix some
minimum size of the packet and if the size is smaller than that, we put in some extra bits
to make it reach the minimum size. Accordingly we fix the maximum distance between
the nodes. Here too, there is a tradeoff to be made. We do not want the minimum size of
the packets to be too large since that wastes lots of resources on cable. At the same time
we do not want the distance between the nodes to be too small. Typical minimum packet
size is 64 bytes and the corresponding distance is 2-5 kilometers.
Bit-Map Method
In this method, there N slots. If node 0 has a frame to send, it transmit a 1 bit during the
first slot. No other node is allowed to transmit during this period. Next node 1 gets a
chance to transmit 1 bit if it has something to send, regardless of what node 0 had
transmitted. This is done for all the nodes. In general node j may declare the fact that it
has a frsme to send by inserting a 1 into slot j. Hence after all nodes have passed, each
node has complete knowledge of who wants to send a frame. Now they begin
transmitting in numerical order. Since everyone knows who is transmitting and when,
there could never be any collision.
The basic problem with this protocol is its inefficiency during low load. If a node has to
transmit and no other node needs to do so, even then it has to wait for the bitmap to
finish. Hence the bitmap will be repeated over and over again if very few nodes want to
send wasting valuable bandwidth.
Binary Countdown
In this protocol, a node which wants to signal that it has a frame to send does so by
writing its address into the header as a binary number. The arbitration is such that as soon
as a node sees that a higher bit position that is 0 in its address has been overwritten with a
1, it gives up. The final result is the address of the node which is allowed to send. After
the node has transmitted the whole process is repeated all over again. Given below is an
example situation.
Nodes Addresses
A 0010
B 0101
C 1010
D 1001
Node C having higher priority gets to transmit. The problem with this protocol is that the
nodes with higher address always wins. Hence this creates a priority which is highly
unfair and hence undesirable.
Obviously it would be better if one could combine the best properties of the contention
and contention - free protocols, that is, protocol which used contention at low loads to
provide low delay, but used a cotention-free technique at high load to provide good
channel efficiency. Such protocols do exist and are called Limited contention protocols.
It is obvious that the probablity of some station aquiring the channel could only be
increased by decreasing the amount of competition. The limited contention protocols do
exactly that. They first divide the stations up into ( not necessarily disjoint ) groups. Only
the members of group 0 are permitted to compete for slot 0. The competition for aquiring
the slot within a group is contention based. If one of the members of that group succeeds,
it aquires the channel and transmits a frame. If there is collision or no node of a particular
group wants to send then the members of the next group compete for the next slot. The
probablity of a particular node is set to a particular value ( optimum ).
Many improvements could be made to the algorithm. For example, consider the case of
nodes G and H being the only ones wanting to transmit. At slot 1 a collision will be
detected and so 2 will be tried and it will be found to be idle. Hence it is pointless to
probe 3 and one should directly go to 6,7.
10Base5 means it operates at 10 Mbps, uses baseband signaling and can support
segments of up to 500 meters. The 10Base5 cabling is popularly called the Thick
Ethernet. Vampire taps are used for their connections where a pin is carefully forced
halfway into the co-axial cable's core as shown in the figure below. The 10Base2 or Thin
Ethernet bends easily and is connected using standard BNC connectors to form T
junctions (shown in the figure below). In the 10Base-T scheme a different kind of wiring
pattern is followed in which all stations have a twisted-pair cable running to a central hub
(see below). The difference between the different physical connections is shown below:
(a) 10Base5 (b)10Base2 (c)10Base-T
All 802.3 baseband systems use Manchester encoding , which is a way for receivers to
unambiguously determine the start, end or middle of each bit without reference to an
external clock. There is a restriction on the minimum node spacing (segment length
between two nodes) in 10Base5 and 10Base2 and that is 2.5 meter and 0.5 meter
respectively. The reason is that if two nodes are closer than the specified limit then there
will be very high current which may cause trouble in detection of signal at the receiver
end. Connections from station to cable of 10Base5 (i.e. Thick Ethernet) are generally
made using vampire taps and to 10Base2 (i.e. Thin Ethernet) are made using industry
standard BNC connectors to form T junctions. To allow larger networks, multiple
segments can be connected by repeaters as shown. A repeater is a physical layer device. It
receives, amplifies and retransmits signals in either direction.
Note: To connect multiple segments, amplifier is not used because amplifier also
amplifies the noise in the signal, whereas repeater regenerates signal after removing the
• Preamble :Each frame starts with a preamble of 7 bytes, each byte containing the
bit pattern 10101010. Manchester encoding is employed here and this enables the
receiver's clock to synchronize with the sender's and initialise itself.
• Start of Frame Delimiter :This field containing a byte sequence 10101011
denotes the start of the frame itself.
• Dest. Address :The standard allows 2-byte and 6-byte addresses. Note that the 2-
byte addresses are always local addresses while the 6-byte ones can be local or
• Preamble :The Preamble and Start of Frame Delimiter are merged into one in
Ethernet standard. However, the contents of the first 8 bytes remains the same in
• Type :The length field of IEEE 802.3 is replaced by Type field, which denotes the
type of packet being sent viz. IP, ARP, RARP, etc. If the field indicates a value
less than 1500 bytes then it is length field of 802.3 else it is the type field of
Ethernet packet.
1st 0-1
2nd 0-3
3rd 0-7
| |
| |
10th 0-1023
11th 0-1023
12th 0-1023
| |
16th 0-1023
In general after i collisions a random number between 0-2^i-1 is chosen , and that number
of slots is skipped. However, after 10 collisions have been reached the randomization
interval is frozen at maximum of 1023 slots. After 16 collisions the controller reports
failure back to the computer.
5-4-3 Rule
Each version of 802.3 has a maximum cable length per segment because long
propagation time leads to difficulty in collision detection. To compensate for this the
transmission time has to be increased which can be achieved by slowing down the
transmission rate or increasing the packet size, neither of which is desirable. Hence to
allow for large networks, multiple cables are connected via repeaters. Between any two
nodes on an Ethernet network, there can be at most five segments, four repeaters and
three populated segments (non-populated segments are those which do not have any
machine connected between the two repeaters). This is known as the 5-4-3 Rule.
propogation delay + transmission of n-bits (1-bit delay in each node ) > transmission
of the token time
A station may hold the token for the token-holding time. which is 10 ms unless the
installation sets a different value. If there is enough time left after the first frame has been
transmitted to send more frames, then these frames may be sent as well. After all pending
frames have been transmitted or the transmission frame would exceed the token-holding
time, the station regenerates the 3-byte token frame and puts it back on the ring.
Modes of Operation
1. Listen Mode: In this mode the node listens to the data and transmits the data to
the next node. In this mode there is a one-bit delay associated with the
2. Transmit Mode: In this mode the node just discards the any data and puts the
data onto the network.
3. By-pass Mode: In this mode reached when the node is down. Any data is just
bypassed. There is no one-bit delay in this mode.
1. The source itself removes the packet after one full round in the ring.
2. The destination removes it after accepting it: This has two potential problems.
Firstly, the solution won't work for broadcast or multicast, and secondly, there
would be no way to acknowledge the sender about the receipt of the packet.
3. Have a specialized node only to discard packets: This is a bad solution as the
specialized node would know that the packet has been received by the destination
only when it receives the packet the second time and by that time the packet may
have actually made about one and half (or almost two in the worst case) rounds in
the ring.
Thus the first solution is adopted with the source itself removing the packet from the ring
after a full one round. With this scheme, broadcasting and multicasting can be handled as
well as the destination can acknowledge the source about the receipt of the packet (or can
tell the source about some error).
Token Format
J = Code Violation
K = Code Violation
T = 0 for Token
T = 1 for Frame
When a station with a Frame to transmit detects a token which has a priority equal to or
less than the Frame to be transmitted, it may change the token to a start-of-frame
sequence and transmit the Frame
P = Priority
Priority Bits indicate tokens priority, and therefore, which stations are allowed to use it.
Station can transmit if its priority as at least as high as that of the token.
M = Monitor
The monitor bit is used to prevent a token whose priority is greater than 0 or any frame
from continuously circulating on the ring. If an active monitor detects a frame or a high
priority token with the monitor bit equal to 1, the frame or token is aborted. This bit shall
be transmitted as 0 in all frame and tokens. The active monitor inspects and modifies this
bit. All other stations shall repeat this bit as received.
R = Reserved bits
The reserved bits allow station with high priority Frames to request that the next token be
issued at the requested priority.
J = Code Violation
K = Code Violation
I = Intermediate Frame Bit
E = Error Detected Bit
Frame Format:
J = Code Violation
K = Code Violation
When a station with a Frame to transmit detects a token which has a priority equal to or
less than the Frame to be transmitted, it may change the token to a start-of-frame
sequence and transmit the Frame.
P = Priority
Bits Priority Bits indicate tokens priority, and therefore, which stations are allowed to use
it. Station can transmit if its priority as at least as high as that of the token.
M = Monitor
The monitor bit is used to prevent a token whose priority is greater than 0 or any frame
from continuously circulating on the ring. if an active monitor detects a frame or a high
priority token with the monitor bit equal to 1, the frame or token is aborted. This bit shall
be transmitted as 0 in all frame and tokens. The active monitor inspects and modifies this
bit. All other stations shall repeat this bit as received.
R = Reserved bits the reserved bits allow station with high priority Frames to request
that the next token be issued at the requested priority
Data Format:
No upper limit on amount of data as such, but it is limited by the token holding time.
The source computes and sets this value. Destination too calculates this value. If the two
are different, it indicates an error, otherwise the data may be correct.
Frame Status:
This arrangement provides an automatic acknowledgement for each frame. The A and C
bits are present twice in the Frame Status to increase reliability in as much as they are not
covered by the checksum.
J = Code Violation
K = Code Violation
I = Intermediate Frame Bit
If this bit is set to 1, it indicates that this packet is an intermediate part of a bigger packet,
the last packet would have this bit set to 0.
E = Error Detected Bit
This bit is set if any interface detects an error.
Each node in a ring introduces a 1 bit delay. So, one approach might be to set the
minimum limit on the number of nodes in a ring as 24. But, this is not a viable option.
The actual solution is as follows. We have one node in the ring designated as
"monitor". The monitor maintains a 24 bits buffer with help of which it introduces a 24
bit delay. The catch here is what if the clocks of nodes following the source are faster
than the source? In this case the 24 bit delay of the monitor would be less than the 24 bit
delay desired by the host. To avoid this situation the monitor maintains 3 extra bits to
compensate for the faster bits. The 3 extra bits suffice even if bits are 10 % faster. This
compensation is called Phase Jitter Compensation.
Initially the reservation bits are set to 000. When a node wants to transmit a priority n
frame, it must wait until it can capture a token whose priority is less than or equal to n.
Furthermore, when a data frame goes by, a station can try to reserve the next token by
writing the priority of the frame it wants to send into the frame's Reservation bits.
However, if a higher priority has already been reserved there, the station cannot make a
reservation. When the current frame is finished, the next token is generated at the priority
that has been reserved.
A slight problem with the above reservation procedure is that the reservation priority
keeps on increasing. To solve this problem, the station raising the priority remembers the
reservation priority that it replaces and when it is done it reduces the priority to the
previous priority.
Ring Maintenance
Each token ring has a monitor that oversees the ring. Among the monitor's
responsibilities are seeing that the token is not lost, taking action when the ring breaks,
cleaning the ring when garbled frames appear and watching out for orphan frames. An
orphan frame occurs when a station transmits a short frame in it's entirety onto a long
ring and then crashes or is powered down before the frame can be removed. If nothing is
done, the frame circulates indefinitely.
• Detection of orphan frames: The monitor detects orphan frames by setting the
monitor bit in the Access Control byte whenever it passes through. If an incoming
frame has this bit set, something is wrong since the same frame has passed the
monitor twice. Evidently it was not removed by the source, so the monitor drains
• Lost Tokens: The monitor has a timer that is set to the longest possible tokenless
interval : when each node transmits for the full token holding time. If this timer
goes off, the monitor drains the ring and issues a fresh token.
• Garbled frames: The monitor can detect such frames by their invalid format or
checksum, drain the ring and issue a fresh token.
Name Meaning
Duplicate Test if two stations have the same
address test address
00000010 Beacon Used to locate breaks in the ring
00000011 Claim token Attempt to become monitor
00000100 Purge Reinitialize the ring
Active monitor
00000101 Issued periodically by the monitor
Standby Announces the presence of potential
monitor present monitors
The monitor periodically issues a message "Active Monitor Present" informing all nodes
of its presence. When this message is not received for a specific time interval, the nodes
detect a monitor failure. Each node that believes it can function as a monitor broadcasts a
"Standby Monitor Present" message at regular intervals, indicating that it is ready to take
on the monitor's job. Any node that detects failure of a monitor issues a "Claim" token.
There are 3 possible outcomes :
1. If the issuing node gets back its own claim token, then it becomes the monitor.
2. If a packet different from a claim token is received, apparently a wrong guess of
monitor failure was made. In this case on receipt of our own claim token, we
discard it. Note that our claim token may have been removed by some other node
which has detected this error.
3. If some other node has also issued a claim token, then the node with the larger
address becomes the monitor.
In order to resolve errors of duplicate addresses, whenever a node comes up it sends a
"Duplicate Address Detection" message (with the destination = source) across the
network. If the address recognize bit has been set on receipt of the message, the issuing
node realizes a duplicate address and goes to standby mode. A node informs other nodes
of removal of a packet from the ring through a "Purge" message. One maintenance
function that the monitor cannot handle is locating breaks in the ring. If there is no
activity detected in the ring (e.g. Failure of monitor to issue the Active Monitor Present
token...) , the usual procedures of sending a claim token are followed. If the claim token
itself is not received besides packets of any other kind, the node then sends "Beacons" at
regular intervals until a message is received indicating that the broken ring has been
Slotted Ring :
In this system, the ring is slotted into a number of fixed size frames which are
continuously moving around the ring. This makes it necessary that there be enough
number of nodes (large ring size) to ensure that all the bits can stay on the ring at the
same time. The frame header contains information as to whether the slots are empty or
full. The usual disadvantages of overhead/wastage associated with fixed size frames are
Two major disadvantages of this topology are complicated hardware and difficulty in the
detection of start/end of packets.
Contention Ring
Frame Structure
Ring Maintenance:
When the first node on the token bus comes up, it sends a Claim_token packet to
initialize the ring. If more than one station send this packet at the same time, there is a
collision. Collision is resolved by a contention mechanism, in which the contending
nodes send random data for 1, 2, 3 and 4 units of time depending on the first two bits of
their address. The node sending data for the longest time wins. If two nodes have the
same first two bits in their addresses, then contention is done again based on the next two
bits of their address and so on.
After the ring is set up, new nodes which are powered up may wish to join the ring. For
this a node sends Solicit_successor_1 packets from time to time, inviting bids from new
nodes to join the ring. This packet contains the address of the current node and its current
successor, and asks for nodes in between these two addresses to reply. If more than one
nodes respond, there will be collision. The node then sends a Resolve_contention packet,
and the contention is resolved using a similar mechanism as described previously. Thus at
a time only one node gets to enter the ring. The last node in the ring will send a
Solicit_successor_2 packet containing the addresses of it and its successor. This packet
asks nodes not having addresses in between these two addresses to respond.
A question arises that how frequently should a node send a Solicit_successor packet? If it
is sent too frequently, then overhead will be too high. Again if it is sent too rarely, nodes
will have to wait for a long time before joining the ring. If the channel is not busy, a node
will send a Solicit_successor packet after a fixed number of token rotations. This number
can be configured by the network administrator. However if there is heavy traffic in the
network, then a node would defer the sending of bids for successors to join in.
There may be problems in the logical ring due to sudden failure of a node. What happens
when a node goes down along with the token? After passing the token, a node, say node
A, listens to the channel to see if its successor either transmits the token or passes a
frame. If neither happens, it resends a token. Still if nothing happens, A sends a
Who_follows packet, containing the address of the down node. The successor of the
down node, say node C, will now respond with a Set_successor packet, containing its
own address. This causes A to set its successor node to C, and the logical ring is restored.
However, if two successive nodes go down suddenly, the ring will be dead and will have
to be built afresh, starting from a Claim_token packet.
When a node wants to shutdown normally, it sends a Set_successor packet to its
predecessor, naming its own successor. The ring then continues unbroken, and the node
goes out of the ring.
The various control frames used for ring maintenance are shown below:
Priority Scheme:
0 is the lowest priority level and 6 the highest. The following times are defined by the
token bus:
• THT: Token Holding Time. A node holding the token can send priority 6 data for
a maximum of this amount of time.
• TRT_4: Token Rotation Time for class 4 data. This is the maximum time a token
can take to circulate and still allow transmission of class 4 data.
• TRT_2 and TRT_0: Similar to TRT_4.
• It transmits priority 6 data for at most THT time, or as long as it has data.
• Now if the time for the token to come back to it is less than TRT_4, it will
transmit priority 4 data, and for the amount of time allowed by TRT_4. Therefore
the maximum time for which it can send priority 4 data is= Actual TRT - THT -
• Similarly for priority 2 and priority 0 data.
This mechanism ensures that priority 6 data is always sent, making the system suitable
for real time data transmission. In fact this was one of the primary aims in the design of
token bus.
What is Framing?
Since the physical layer merely accepts and transmits a stream of bits without any regard
to meaning or structure, it is upto the data link layer to create and recognize frame
boundaries. This can be accomplished by attaching special bit patterns to the beginning
and end of the frame. If these bit patterns can accidentally occur in data, special care must
be taken to make sure these patterns are not incorrectly interpreted as frame delimiters.
The four framing methods that are widely used are
• Character count
• Starting and ending characters, with character stuffing
• Starting and ending flags, with bit stuffing
• Physical layer coding violations
Character Count
This method uses a field in the header to specify the number of characters in the frame.
When the data link layer at the destination sees the character count,it knows how many
characters follow, and hence where the end of the frame is. The disadvantage is that if the
count is garbled by a transmission error, the destination will lose synchronization and will
be unable to locate the start of the next frame. So, this method is rarely used.
Character stuffing
In the second method, each frame starts with the ASCII character sequence DLE STX
and ends with the sequence DLE ETX.(where DLE is Data Link Escape, STX is Start of
TeXt and ETX is End of TeXt.) This method overcomes the drawbacks of the character
count method. If the destination ever loses synchronization, it only has to look for DLE
STX and DLE ETX characters. If however, binary data is being transmitted then there
exists a possibility of the characters DLE STX and DLE ETX occurring in the data. Since
this can interfere with the framing, a technique called character stuffing is used. The
sender's data link layer inserts an ASCII DLE character just before the DLE character in
the data. The receiver's data link layer removes this DLE before this data is given to the
network layer. However character stuffing is closely associated with 8-bit characters and
this is a major hurdle in transmitting arbitrary sized characters.
Bit stuffing
The third method allows data frames to contain an arbitrary number of bits and allows
character codes with an arbitrary number of bits per character. At the start and end of
each frame is a flag byte consisting of the special bit pattern 01111110 . Whenever the
sender's data link layer encounters five consecutive 1s in the data, it automatically stuffs a
zero bit into the outgoing bit stream. This technique is called bit stuffing. When the
receiver sees five consecutive 1s in the incoming data stream, followed by a zero bit, it
automatically destuffs the 0 bit. The boundary between two frames can be determined by
locating the flag pattern.
Error Control
The bit stream transmitted by the physical layer is not guaranteed to be error free. The
data link layer is responsible for error detection and correction. The most common error
control method is to compute and append some form of a checksum to each outgoing
frame at the sender's data link layer and to recompute the checksum and verify it with the
received checksum at the receiver's side. If both of them match, then the frame is
correctly received; else it is erroneous. The checksums may be of two types:
# Error detecting : Receiver can only detect the error in the frame and inform the sender
about it. # Error detecting and correcting : The receiver can not only detect the error but
also correct it.
Examples of Error Detecting methods:
• Parity bit:
Simple example of error detection technique is parity bit. The parity bit is chosen
that the number of 1 bits in the code word is either even( for even parity) or odd
(for odd parity). For example when 10110101 is transmitted then for even parity
an 1 will be appended to the data and for odd parity a 0 will be appended. This
scheme can detect only single bits. So if two or more bits are changed then that
can not be detected.
• Longitudinal Redundancy Checksum:
Longitudinal Redundancy Checksum is an error detecting scheme which
overcomes the problem of two erroneous bits. In this conceptof parity bit is used
but with slightly more intelligence. With each byte we send one parity bit then
send one additional byte which have the parity corresponding to the each bit
position of the sent bytes. So the parity bit is set in both horizontal and vertical
direction. If one bit get flipped we can tell which row and column have error then
we find the intersection of the two and determine the erroneous bit. If 2 bits are in
error and they are in the different column and row then they can be detected. If the
error are in the same column then the row will differentiate and vice versa. Parity
can detect the only odd number of errors. If they are even and distributed in a
fashion that in all direction then LRC may not be able to find the error.
• Cyclic Redundancy Checksum (CRC):
We have an n-bit message. The sender adds a k-bit Frame Check Sequence (FCS)
to this message before sending. The resulting (n+k) bit message is divisible by
some (k+1) bit number. The receiver divides the message ((n+k)-bit) by the same
(k+1)-bit number and if there is no remainder, assumes that there was no error.
How do we choose this number?
For example, if k=12 then 1000000000000 (13-bit number) can be chosen, but
this is a pretty crappy choice. Because it will result in a zero remainder for all
(n+k) bit messages with the last 12 bits zero. Thus, any bits flipping beyond the
last 12 go undetected. If k=12, and we take 1110001000110 as the 13-bit number
(incidentally, in decimal representation this turns out to be 7238). This will be
unable to detect errors only if the corrupt message and original message have a
difference of a multiple of 7238. The probablilty of this is low, much lower than
the probability that anything beyond the last 12-bits flips. In practice, this number
is chosen after analyzing common network transmission errors and then selecting
a number which is likely to detect these common errors.
Flow Control
Consider a situation in which the sender transmits frames faster than the receiver can
accept them. If the sender keeps pumping out frames at high rate, at some point the
receiver will be completely swamped and will start losing some frames. This problem
may be solved by introducing flow control. Most flow control protocols contain a
feedback mechanism to inform the sender when it should transmit the next frame.
• Stop and Wait Protocol: This is the simplest file control protocol in which the
sender transmits a frame and then waits for an acknowledgement, either positive
or negative, from the receiver before proceeding. If a positive acknowledgement
is received, the sender transmits the next packet; else it retransmits the same
frame. However, this protocol has one major flaw in it. If a packet or an
acknowledgement is completely destroyed in transit due to a noise burst, a
deadlock will occur because the sender cannot proceed until it receives an
acknowledgement. This problem may be solved using timers on the sender's side.
When the frame is transmitted, the timer is set. If there is no response from the
receiver within a certain time interval, the timer goes off and the frame may be
• Sliding Window Protocols: Inspite of the use of timers, the stop and wait
protocol still suffers from a few drawbacks. Firstly, if the receiver had the
capacity to accept more than one frame, its resources are being underutilized.
Secondly, if the receiver was busy and did not wish to receive any more packets,
it may delay the acknowledgement. However, the timer on the sender's side may
go off and cause an unnecessary retransmission. These drawbacks are overcome
by the sliding window protocols.
In sliding window protocols the sender's data link layer maintains a 'sending
window' which consists of a set of sequence numbers corresponding to the frames
it is permitted to send. Similarly, the receiver maintains a 'receiving window'
corresponding to the set of frames it is permitted to accept. The window size is
dependent on the retransmission policy and it may differ in values for the
receiver's and the sender's window. The sequence numbers within the sender's
window represent the frames sent but as yet not acknowledged. Whenever a new
packet arrives from the network layer, the upper edge of the window is advanced
by one. When an acknowledgement arrives from the receiver the lower edge is
advanced by one. The receiver's window corresponds to the frames that the
receiver's data link layer may accept. When a frame with sequence number equal
to the lower edge of the window is received, it is passed to the network layer, an
acknowledgement is generated and the window is rotated by one. If however, a
frame falling outside the window is received, the receiver's data link layer has two
options. It may either discard this frame and all subsequent frames until the
desired frame is received or it may accept these frames and buffer them until the
appropriate frame is received and then pass the frames to the network layer in
In this simple example, there is a 4-byte sliding window. Moving from left to
right, the window "slides" as bytes in the stream are sent and acknowledged.
Most sliding window protocols also employ ARQ ( Automatic Repeat reQuest )
mechanism. In ARQ, the sender waits for a positive acknowledgement before
proceeding to the next frame. If no acknowledgement is received within a certain
time interval it retransmits the frame. ARQ is of two types :
2. Selective Repeat:In this protocol rather than discard all the subsequent
frames following a damaged or lost frame, the receiver's data link layer
simply stores them in buffers. When the sender does not receive an
acknowledgement for the first frame it's timer goes off after a certain time
interval and it retransmits only the lost frame. Assuming error - free
transmission this time, the sender's data link layer will have a sequence of
a many correct frames which it can hand over to the network layer. Thus
there is less overhead in retransmission than in the case of Go Back n
In case of selective repeat protocol the window size may be calculated as
follows. Assume that the size of both the sender's and the receiver's
window is w. So initially both of them contain the values 0 to (w-1).
Consider that sender's data link layer transmits all the w frames, the
receiver's data link layer receives them correctly and sends
acknowledgements for each of them. However, all the acknowledgemnets
are lost and the sender does not advance it's window. The receiver window
at this point contains the values w to (2w-1). To avoid overlap when the
sender's data link layer retransmits, we must have the sum of these two
windows less than sequence number space. Hence, we get the condition
Network Layer
What is Network Layer?
The network layer is concerned with getting packets from the source all the way to the
destination. The packets may require to make many hops at the intermediate routers while
reaching the destination. This is the lowest layer that deals with end to end transmission.
In order to achieve its goals, the network layer must know about the topology of the
communication network. It must also take care to choose routes to avoid overloading of
some of the communication lines while leaving others idle. The network layer-transport
layer interface frequently is the interface between the carrier and the customer, that is the
boundary of the subnet. The functions of this layer include :
1. Routing - The process of transferring packets received from the Data Link Layer
of the source network to the Data Link Layer of the correct destination network is
called routing. Involves decision making at each intermediate node on where to
send the packet next so that it eventually reaches its destination. The node which
makes this choice is called a router. For routing we require some mode of
addressing which is recognized by the Network Layer. This addressing is different
from the MAC layer addressing.
2. Inter-networking - The network layer is the same across all physical networks
(such as Token-Ring and Ethernet). Thus, if two physically different networks
have to communicate, the packets that arrive at the Data Link Layer of the node
which connects these two physically different networks, would be stripped of
their headers and passed to the Network Layer. The network layer would then
pass this data to the Data Link Layer of the other physical network..
3. Congestion Control - If the incoming rate of the packets arriving at any router is
more than the outgoing rate, then congestion is said to occur. Congestion may be
caused by many factors. If suddenly, packets begin arriving on many input lines
and all need the same output line, then a queue will build up. If there is
insufficient memory to hold all of them, packets will be lost. But even if routers
have an infinite amount of memory, congestion gets worse, because by the time
packets reach to the front of the queue, they have already timed out (repeatedly),
and duplicates have been sent. All these packets are dutifully forwarded to the
next router, increasing the load all the way to the destination. Another reason for
congestion are slow processors. If the router's CPUs are slow at performing the
bookkeeping tasks required of them, queues can build up, even though there is
excess line capacity. Similarly, low-bandwidth lines can also cause congestion.
Addressing Scheme
IP addresses are of 4 bytes and consist of :
i) The network address, followed by
ii) The host address
The first part identifies a network on which the host resides and the second part identifies
the particular host on the given network. Some nodes which have more than one interface
to a network must be assigned separate internet addresses for each interface. This multi-
layer addressing makes it easier to find and deliver data to the destination. A fixed size
for each of these would lead to wastage or under-usage that is either there will be too
many network addresses and few hosts in each (which causes problems for routers who
route based on the network address) or there will be very few network addresses and lots
of hosts (which will be a waste for small network requirements). Thus, we do away with
any notion of fixed sizes for the network and host addresses.
We classify networks as follows:
1. Large Networks : 8-bit network address and 24-bit host address. There are
approximately 16 million hosts per network and a maximum of 126 ( 2^7 - 2 )
Class A networks can be defined. The calculation requires that 2 be subtracted
because is reserved for use as the default route and be reserved
for the loop back function. Moreover each Class A network can support a
maximum of 16,777,214 (2^24 - 2) hosts per network. The host calculation
requires that 2 be subtracted because all 0's are reserved to identify the network
itself and all 1s are reserved for broadcast addresses. The reserved numbers may
not be assigned to individual hosts.
2. Medium Networks : 16-bit network address and 16-bit host address. There are
approximately 65000 hosts per network and a maximum of 16,384 (2^14) Class B
networks can be defined with up to (2^16-2) hosts per network.
3. Small networks : 24-bit network address and 8-bit host address. There are
approximately 250 hosts per network.
You might think that Large and Medium networks are sort of a waste as few
corporations/organizations are large enough to have 65000 different hosts. (By the way,
there are very few corporations in the world with even close to 65000 employees, and
even in these corporations it is highly unlikely that each employee has his/her own
computer connected to the network.) Well, if you think so, you're right. This decision
seems to have been a mistak
Address Classes
Internet Protocol
Special Addresses : There are some special IP addresses :
1. Broadcast Addresses They are of two types :
(i) Limited Broadcast : It consists of all 1's, i.e., the address is .
It is used only on the LAN, and not for any external network.
(ii) Directed Broadcast : It consists of the network number + all other bits as1's. It
reaches the router corresponding to the network number, and from there it
broadcasts to all the nodes in the network. This method is a major security
problem, and is not used anymore. So now if we find that all the bits are 1 in the
host no. field, then the packet is simply dropped. Therefore, now we can only do
broadcast in our own network using Limited Broadcast.
2. Network ID = 0
It means we are referring to this network and for local broadcast we make the host
ID zero.
3. Host ID = 0
This is used to refer to the entire network in the routing table.
4. Loop-back Address
Here we have addresses of the type 127.x.y.z It goes down way upto the IP layer
and comes back to the application layer on the same host. This is used to test
network applications before they are used commercially.
Sub netting means organizing hierarchies within the network by dividing the host ID as
per our network. For example consider the network ID : 150.29.x.y
We could organize the remaining 16 bits in any way, like :
4 bits - department
4 bits - LAN
8 bits - host
This gives some structure to the host IDs. This division is not visible to the outside world.
They still see just the network number, and host number (as a whole). The network will
have an internal routing table which stores information about which router to send an
address to. Now consider the case where we have : 8 bits - subnet number, and 8 bits -
host number. Each router on the network must know about all subnet numbers. This is
called the subnet mask. We put the network number and subnet number bits as 1 and the
host bits as 0. Therefore, in this example the subnet mask becomes : . The
hosts also need to know the subnet mask when they send a packet. To find if two
addresses are on the same subnet, we can AND source address with subnet mask, and
destination address with with subnet mask, and see if the two results are the same. The
basic reason for sub netting was avoiding broadcast. But if at the lower level, our
switches are smart enough to send directed messages, then we do not need sub netting.
However, sub netting has some security related advantages.
This is moving towards class-less addressing. We could say that the network number is
21 bits ( for 8 class C networks ) or say that it is 24 bits and 7 numbers following that.
For example : a.b.c.d / 21 This means only look at the first 21 bits as the network address.
Addressing on IITK Network
If we do not have connection with the outside world directly then we could have Private
IP addresses ( 172.31 ) which are not to be publicised and routed to the outside world.
Switches will make sure that they do not broadcast packets with such addressed to the
outside world. The basic reason for implementing subnetting was to avoid broadcast. So
in our case we can have some subnets for security and other reasons although if the
switches could do the routing properly, then we do not need subnets. In the IITK network
we have three subnets -CC, CSE building are two subnets and the rest of the campus is
one subset
Packet Structure
1. Header Length : We could have multiple sized headers so we need this field.
Header will always be a multiple of 4bytes and so we can have a maximum length
of the field as 15, so the maximum size of the header is 60 bytes ( 20 bytes are
mandatory ).
2. Type Of Service (ToS) : This helps the router in taking the right routing
decisions. The structure is :
First three bits : They specify the precedences i.e. the priority of the packets.
Next three bits :
o D bit - D stands for delay. If the D bit is set to 1, then this means that the
application is delay sensitive, so we should try to route the packet with
minimum delay.
o T bit - T stands for throughput. This tells us that this particular operation is
throughput sensitive.
o R bit - R stands for reliability. This tells us that we should route this packet
through a more reliable network.
Last two bits: The last two bits are never used. Unfortunately, no router in this
world looks at these bits and so no application sets them nowadays. The second
word is meant for handling fragmentations. If a link cannot transmit large packets,
then we fragment the packet and put sufficient information in the header for
recollection at the destination.
3. ID Field : The source and ID field together will represent the fragments of a
unique packet. So each fragment will have a different ID.
4. Offset : It is a 13 bit field that represents where in the packet, the current
fragment starts. Each bit represents 8 bytes of the packet. So the packet size can
be at most 64 kB. Every fragment except the last one must have its size in bytes as
a multiple of 8 in order to ensure compliance with this structure. The reason why
the position of a fragment is given as an offset value instead of simply numbering
each packet is because refragmentation may occur somewhere on the path to the
other node. Fragmentation, though supported by IPv4 is not encouraged. This is
because if even one fragment is lost the entire packet needs to be discarded. A
quantity M.T.U (Maximum Transmission Unit) is defined for each link in the
route. It is the size of the largest packet that can be handled by the link. The Path-
M.T.U is then defined as the size of the largest packet that can be handled by the
path. It is the smallest of all the MTUs along the path. Given information about
the path MTU we can send packets with sizes smaller than the path MTU and thus
prevent fragmentation. This will not completely prevent it because routing tables
may change leading to a change in the path.
5. Flags :It has three bits -
o M bit : If M is one, then there are more fragments on the way and if M is
0, then it is the last fragment
o DF bit : If this bit is sent to 1, then we should not fragment such a packet.
o Reserved bit : This bit is not used.
Reassembly can be done only at the destination and not at any intermediate node.
This is because we are considering Datagram Service and so it is not guaranteed
that all the fragments of the packet will be sent thorough the node at which we
wish to do reassembly.
6. Total Length : It includes the IP header and everything that comes after it.
7. Time To Live (TTL) : Using this field, we can set the time within which the
packet should be delivered or else destroyed. It is strictly treated as the number of
hops. The packet should reach the destination in this number of hops. Every
router decreases the value as the packet goes through it and if this value becomes
zero at a particular router, it can be destroyed.
8. Protocol : This specifies the module to which we should hand over the packet
( UDP or TCP ). It is the next encapsulated protocol.
Value Protocol
0 Pv6 Hop-by-Hop Option.
1 ICMP, Internet Control Message Protocol.
2 IGMP, Internet Group Management Protocol. RGMP,
Router-port Group Management Protocol.
3 GGP, Gateway to Gateway Protocol.
4 IP in IP encapsulation.
5 ST, Internet Stream Protocol.
6 TCP, Transmission Control Protocol.
8 EGP, Exterior Gateway Protocol.
10 BBN RCC Monitoring.
11 NVP, Network Voice Protocol.
12 PUP.
14 EMCON, Emission Control Protocol.
15 XNET, Cross Net Debugger.
16 Chaos.
17 UDP, User Datagram Protocol.
18 TMux, Transport Multiplexing Protocol.
19 DCN Measurement Subsystems.
9. Header Checksum : This is the usual checksum field used to detect errors. Since
the TTL field is changing at every router so the header checksum ( upto the
options field ) is checked and recalculated at every router.
10. Source : It is the IP address of the source node
11. Destination : It is the IP address of the destination node.
12. IP Options : The options field was created in order to allow features to be added
into IP as time passes and requirements change. Currently 5 options are specified
although not all routers support them. They are:
o Securtiy: It tells us how secret the information is. In theory a military
router might use this field to specify not to route through certain routers.
In practice no routers support this field.
o Source Routing: It is used when we want the source to dictate how the
packet traverses the network. It is of 2 types
-> Loose Source Record Routing (LSRR): It requires that the packet
traverse a list of specified routers, in the order specified but the packet
may pass though some other routers as well.
-> Strict Source Record Routing (SSRR): It requires that the packet
traverse only the set of specified routers and nothing else. If it is not
possible, the packet is dropped with an error message sent to the host.
The above is the format for SSRR. For LSRR the code is 131.
o Record Routing :
In this the intermediate routers put there IP addresses in the header, so that
the destination knows the entire path of the packet. Space for storing the
IP address is specified by the source itself. The pointer field points to the
position where the next IP address has to be written. Length field gives the
number of bytes reserved by the source for writing the IP addresses. If the
space provided for storing the IP addresses of the routers visited, falls
short while storing these addresses, then the subsequent routers do not
write their IP addresses.
It is similar to record route option except that nodes also add their
timestamps to the packet. The new fields in this option are
-> Flags: It can have the following values
-> Overflow: It stores the number of nodes that were unable to add their
timestamps to the packet. The maximum value is 15.
For all options a length field is put in order that a router not familiar with
the option will know how many bytes to skip. Thus every option is of the
• Routing
• Congestion Control
• Internetwokring
Routing is the process of forwarding of a packet in a network so that it reaches its
intended destination. The main goals of routing are:
1. Correctness: The routing should be done properly and correctly so that the
packets may reach their proper destination.
2. Simplicity: The routing should be done in a simple manner so that the overhead is
as low as possible. With increasing complexity of the routing algorithms the
overhead also increases.
3. Robustness: Once a major network becomes operative, it may be expected to run
continuously for years without any failures. The algorithms designed for routing
should be robust enough to handle hardware and software failures and should be
able to cope with changes in the topology and traffic without requiring all jobs in
all hosts to be aborted and the network rebooted every time some router goes
4. Stability: The routing algorithms should be stable under all possible
5. Fairness: Every node connected to the network should get a fair chance of
transmitting their packets. This is generally done on a first come first serve basis.
6. Optimality: The routing algorithms should be optimal in terms of throughput and
minimizing mean packet delays. Here there is a trade-off and one has to choose
depending on his suitability.
2. Random Walk: In this method a packet is sent by the node to one of its
neighbours randomly. This algorithm is highly robust. When the network
is highly interconnected, this algorithm has the property of making
excellent use of alternative routes. It is usually implemented by sending
the packet onto the least queued link.
Delta Routing
Delta routing is a hybrid of the centralized and isolated routing algorithms. Here each
node computes the cost of each line (i.e some functions of the delay, queue length,
utilization, bandwidth etc) and periodically sends a packet to the central node giving it
these values which then computes the k best paths from node i to node j. Let Cij1 be the
cost of the best i-j path, Cij2 the cost of the next best path and so on.If Cijn - Cij1 <
delta, (Cijn - cost of n'th best i-j path, delta is some constant) then path n is regarded
equivalent to the best i-j path since their cost differ by so little. When delta -> 0 this
algorithm becomes centralized routing and when delta -> infinity all the paths become
Multipath Routing
In the above algorithms it has been assumed that there is a single best path between any
pair of nodes and that all traffic between them should use it. In many networks however
there are several paths between pairs of nodes that are almost equally good. Sometimes in
order to improve the performance multiple paths between single pair of nodes are used.
This technique is called multipath routing or bifurcated routing. In this each node
maintains a table with one row for each possible destination node. A row gives the best,
second best, third best, etc outgoing line for that destination, together with a relative
weight. Before forwarding a packet, the node generates a random number and then
chooses among the alternatives, using the weights as probabilities. The tables are worked
out manually and loaded into the nodes before the network is brought up and not changed
Hierarchical Routing
In this method of routing the nodes are divided into regions based on hierarchy. A
particular node can communicate with nodes at the same hierarchial level or the nodes at
a lower level and directly under it. Here, the path from any source to a destination is fixed
and is exactly one if the heirarchy is a tree.
Routing Algorithms
Non-Hierarchical Routing
In this type of routing, interconnected networks are viewed as a single network, where
bridges, routers and gateways are just additional nodes.
• Every node keeps information about every other node in the network
• In case of adaptive routing, the routing calculations are done and updated for all
the nodes.
The above two are also the disadvantages of non-hierarchical routing, since the table
sizes and the routing calculations become too large as the networks get bigger. So this
type of routing is feasible only for small networks.
Hierarchical Routing
This is essentially a 'Divide and Conquer' strategy. The network is divided into different
regions and a router for a particular region knows only about its own domain and other
routers. Thus, the network is viewed at two levels:
1. The Sub-network level, where each node in a region has information about its
peers in the same region and about the region's interface with other regions.
Different regions may have different 'local' routing algorithms. Each local
algorithm handles the traffic between nodes of the same region and also directs
the outgoing packets to the appropriate interface.
2. The Network Level, where each region is considered as a single node connected
to its interface nodes. The routing algorithms at this level handle the routing of
packets between two interface nodes, and is isolated from intra-regional transfer.
Networks can be organized in hierarchies of many levels; e.g. local networks of a city at
one level, the cities of a country at a level above it, and finally the network of all nations.
• All nodes in its region which are at one level below it.
• Its peer interfaces.
• At least one interface at a level above it, for outgoing packages.
Disadvantage :
Source Routing
Source routing is similar in concept to virtual circuit routing. It is implemented as under:
• Bridges do not need to lookup their routing tables since the path is already
specified in the packet itself.
• The throughput of the bridges is higher, and this may lead to better utilization of
bandwidth, once a route is established.
• Establishing the route at first needs an expensive search method like flooding.
• To cope up with dynamic relocation of nodes in a network, frequent updates of
tables are required, else all packets would be sent in wrong direction. This too is
• Minimum number of hops: If each link is given a unit cost, the shortest path is
the one with minimum number of hops. Such a route is easily obtained by a
breadth first search method. This is easy to implement but ignores load, link
capacity etc.
• Transmission and Propagation Delays: If the cost is fixed as a function of
transmission and propagation delays, it will reflect the link capacities and the
geographical distances. However these costs are essentially static and do not
consider the varying load conditions.
• Queuing Delays: If the cost of a link is determined through its queuing delays, it
takes care of the varying load conditions, but not of the propagation delays.
Ideally, the cost parameter should consider all the above mentioned factors, and it should
be updated periodically to reflect the changes in the loading conditions. However, if the
routes are changed according to the load, the load changes again. This feedback effect
between routing and load can lead to undesirable oscillations and sudden swings.
Routing Algorithms
As mentioned above, the shortest paths are calculated using suitable algorithms on the
graph representations of the networks. Let the network be represented by graph G ( V,
E ) and let the number of nodes be 'N'. For all the algorithms discussed below, the costs
associated with the links are assumed to be positive. A node has zero cost w.r.t itself.
Further, all the links are assumed to be symmetric, i.e. if di,j = cost of link from node i
to node j, then d i,j = d j,i . The graph is assumed to be complete. If there exists no edge
between two nodes, then a link of infinite cost is assumed. The algorithms given below
find costs of the paths from all nodes to a particular node; the problem is equivalent to
finding the cost of paths from a source to all destinations.
Bellman-Ford Algorithm
This algorithm iterates on the number of edges in a path to obtain the shortest path. Since
the number of hops possible is limited (cycles are implicitly not allowed), the algorithm
terminates giving the shortest path.
d i,j = Length of path between nodes i and j, indicating the cost of the link.
h = Number of hops.
D[ i,h] = Shortest path length from node i to node 1, with upto 'h' hops.
D[ 1,h] = 0 for all h .
Algorithm :
For zero hops, the minimum length path has length of infinity, for every node. For one
hop the shortest-path length associated with a node is equal to the length of the edge
between that node and node 1. Hereafter, we increment the number of hops allowed,
(from h to h+1 ) and find out whether a shorter path exists through each of the other
nodes. If it exists, say through node 'j', then its length must be the sum of the lengths
between these two nodes (i.e. di,j ) and the shortest path between j and 1 obtainable in
upto h paths. If such a path doesn't exist, then the path length remains the same. The
algorithm is guaranteed to terminate, since there are utmost N nodes, and so N-1 paths. It
has time complexity of O ( N3 ) .
Dijkstra's Algorithm
Di = Length of shortest path from node 'i' to node 1.
di,j = Length of path between nodes i and j .
Each node j is labeled with Dj, which is an estimate of cost of path from node j to node
1. Initially, let the estimates be infinity, indicating that nothing is known about the paths.
We now iterate on the length of paths, each time revising our estimate to lower values, as
we obtain them. Actually, we divide the nodes into two groups ; the first one, called set P
contains the nodes whose shortest distances have been found, and the other Q containing
all the remaining nodes. Initially P contains only the node 1. At each step, we select the
node that has minimum cost path to node 1. This node is transferred to set P. At the first
step, this corresponds to shifting the node closest to 1 in P. Its minimum cost to node 1 is
now known. At the next step, select the next closest node from set Q and update the
labels corresponding to each node using :
Dj = min [ Dj , Di + dj,i ]
Finally, after N-1 iterations, the shortest paths for all nodes are known, and the algorithm
Let the closest node to 1 at some step be i. Then i is shifted to P. Now, for each node j ,
the closest path to 1 either passes through i or it doesn't. In the first case Dj remains the
same. In the second case, the revised estimate of Dj is the sum Di + di,j . So we take the
minimum of these two cases and update Dj accordingly. As each of the nodes get
transferred to set P, the estimates get closer to the lowest possible value. When a node is
transferred, its shortest path length is known. So finally all the nodes are in P and the Dj 's
represent the minimum costs. The algorithm is guaranteed to terminate in N-1 iterations
and its complexity is O( N2 ).
Di,j [n] = Length of shortest path between the nodes i and j using only the nodes
1,2,....n as intermediate nodes.
Initial Condition
Di,j[0] = di,j for all nodes i,j .
Initially, n = 0. At each iteration, add next node to n. i.e. For n = 1,2, .....N-1 ,
Suppose the shortest path between i and j using nodes 1,2,...n is known. Now, if node n+1
is allowed to be an intermediate node, then the shortest path under new conditions either
passes through node n+1 or it doesn't. If it does not pass through the node n+1, then
Di,j[n+1] is same as Di,j[n] . Else, we find the cost of the new route, which is obtained
from the sum, Di,n+1[n] + Dn+1,j[n]. So we take the minimum of these two cases at each
step. After adding all the nodes to the set of intermediate nodes, we obtain the shortest
paths between all pairs of nodes together. The complexity of Floyd-Warshall algorithm
is O ( N3 ).
It is observed that all the three algorithms mentioned above give comparable
performance, depending upon the exact topology of the network.
Address Resolution Protocol
If a machine talks to another machine in the same network, it requires its physical or
MAC address. But ,since the application has given the destination's IP address it requires
some mechanism to bind the IP address with its MAC address.This is done through
Address Resolution protocol (ARP).IP address of the destination node is broadcast and
the destination node informs the source of its MAC address.
But this means that every time machine A wants to send packets to machine B, A has to
send an ARP packet to resolve the MAC address of B and hence this will increase the
traffic load too much, so to reduce the communication cost computers that use ARP
maintains a cache of recently acquired IP_to_MAC address bindings, i.e. they dont have
to use ARP repeatedly. ARP Refinements Several refinements of ARP are possible: When
machine A wants to send packets to macine B, it is possible that machine B is going to
send packets to machine A in the near future.So to avoid ARP for machine B, A should
put its IP_to_MAC address binding in the special packet while requesting for the MAC
address of B. Since A broadcasts its initial request for the MAC address of B, every
machine on the network should extract and store in its cache the IP_to_MAC address
binding of A When a new machine appears on the network (e.g. when an operating
system reboots) it can broadcast its IP_to_MAC address binding so that all other
machines can store it in their caches. This will eliminate a lot of ARP packets by all other
machines, when they want to communicate with this new machine.
Example displaying the use of Address Resolution Protocol:
Consider a scenario where a computer tries to contact some remote machine using ping
program, assuming that there has been no exchange of IP datagrams previously between
the two machines and therefore arp packet must be sent to identify the MAC address of
the remote machine.
The arp request message (who is A.A.A.A tell B.B.B.B where the two are IP addresses) is
broadcast on the local area network with an Ethernet protocol type 0x806. The packet is
discarded by all the machines except the target machine which responds with an arp
response message (A.A.A.A is hh:hh:hh:hh:hh:hh where hh:hh:hh:hh:hh:hh is the
Ethernet source address). This packet is unicast to the machine with IP address B.B.B.B.
Since the arp request message included the hardware address (Ethernet source address) of
the requesting computer, target machine doesn't require another arp message to figure it
Detailed Mechanism
Both the machine that issues the request and the server that responds use physical
network addresses during their brief communication. Usually, the requester does not
know the physical address. So, the request is broadcasted to all the machines on the
network. Now, the requester must identify istelf uniquely to the server. For this either
CPU serial number or the machine's physical network address can be used. But using the
physical address as a unique id has two advantages.
• These addresses are always available and do not have to be bound into bootstrap
• Because the identifying information depends on the network and not on the CPU
vendor, all machines on a given network will supply unique identifiers.
Like an ARP message, a RARP message is sent from one machine to the another
encapsulated in the data portion of a network frame. An ethernet frame carrying a RARP
request has the usual preamle, Ethernet source and destination addresses, and packet type
fields in front of the frame. The frame conatins the value 8035 (base 16) to identify the
contents of the frame as a RARP message. The data portion of the frame contains the 28-
octet RARP message. The sender braodcasts a RARP request that specifies itself as both
the sender and target machine, and supplies its physical network address in the target
hardware address field. All machines on the network receive the request, but only those
authorised to supply the RARP services process the request and send a reply, such
machines are known informally as RARP servers. For RARP to succeed, the network
must contain at least one RARP server.
Servers answers request by filling in the target protocol address field, changing the
message type from request to reply, and sending the reply back directly to the machine
making the request.
Drawbacks of RARP
• Since it operates at low level, it requires direct addresss to the network which
makes it difficult for an application programmer to build a server.
• It doesn't fully utilizes the capability of a network like ethernet which is enforced
to send a minimum packet size since the reply from the server contains only one
small piece of information, the 32-bit internet address.
This protocol discusses a mechanism that gateways and hosts use to communicate control
or error information.The Internet protocol provides unreliable,connectionless datagram
service,and that a datagram travels from gateway to gateway until it reaches one that can
deliver it directly to its final destination. If a gateway cannot route or deliver a
datagram,or if the gateway detects an unusual condition, like network congestion, that
affects its ability to forward the datagram, it needs to instruct the original source to take
action to avoid or correct the problem. The Internet Control Message Protocol allows
gateways to send error or control messages to other gateways or hosts;ICMP provides
communication between the Internet Protocol software on one machine and the Internet
Protocol software on another. This is a special purpose message mechanism added by the
designers to the TCP/IP protocols. This is to allow gateways in an internet to report errors
or provide information about unexpecter circumstances. The IP protocol itself contains
nothing to help the sender test connectivity or learn about failures.
1. A high speed computer may be able to generate traffic faster than a network can
transfer it .
2. If many computers sumultaneously need to send datagrams through a single
gateway , the gateway can experience congestion, even though no single source
causes the problem.
When datagrams arrive too quickly for a host or a gateway to process, it enqueues them
in memory temporarily.If the traffic continues, the host or gateway eventually exhausts
menory ans must discard additional datagrams that arrive. A machine uses ICMP source
quench messages to releive congestion. A source quench message is a request for the
source to reduce its current rate of datagram transmission.
There is no ICMP messages to reverse the effect of a source quench.
Source Quench :
Source quench messages have a field that contains a datagram prefix in addition to the
usual ICMP TYPE,CODE,CHECKSUM fields.Congested gateways send one source
quench message each time they discard a datagram; the datagram prefix identifies the
datagram that was dropped.
TCP was specifically designed to provide a reliable end to end byte stream over an
unreliable internetwork. Each machine supporting TCP has a TCP transport entity either a
user process or part of the kernel that manages TCP streams and interface to IP layer. A
TCP entity accepts user data streams from local processes, breaks them up into pieces not
exceeding 64KB and sends each piece as a separate IP datagram. Client Server
mechanism is not necessary for TCP to behave properly.
TCP connection is a duplex connection. That means there is no difference between two
sides once the connection is established.
The simplest three-way handshake is shown in figure below. The figures should be
interpreted in the following way. Each line is numbered for reference purposes. Right
arrows (-->) indicate departure of a TCP segment from TCP A to TCP B, or arrival of a
segment at B from A. Left arrows (<--), indicate the reverse. Ellipsis (...) indicates a
segment which is still in the network (delayed). TCP states represent the state AFTER the
departure or arrival of the segment (whose contents are shown in the center of each line).
Segment contents are shown in abbreviated form, with sequence number, control flags,
and ACK field. Other fields such as window, addresses, lengths, and text have been left
out in the interest of clarity.
In line 2 of above figure, TCP A begins by sending a SYN segment indicating that it will
use sequence numbers starting with sequence number 100. In line 3, TCP B sends a SYN
and acknowledges the SYN it received from TCP A. Note that the acknowledgment field
indicates TCP B is now expecting to hear sequence 101, acknowledging the SYN which
occupied sequence 100.
At line 4, TCP A responds with an empty segment containing an ACK for TCP B's SYN;
and in line 5, TCP A sends some data. Note that the sequence number of the segment in
line 5 is the same as in line 4 because the ACK does not occupy sequence number space
(if it did, we would wind up ACKing ACK's!).
Simultaneous initiation is only slightly more complex, as is shown in figure below. Each
The first two figures show how a three way handshake deals with problems of
duplicate/delayed connection requests and duplicate/delayed connection
acknowledgements in the network.The third figure highlights the problem of spoofing
associated with a two way handshake.
Some Conventions
1. The ACK contains 'x+1' if the sequence number received is 'x'.
2. If 'ISN' is the sequence number of the connection packet then 1st data packet has the
seq number 'ISN+1'
3. Seq numbers are 32 bit.They are byte seq number(every byte has a seq number).With a
packet 1st seq number and length of the packet is sent.
4. Acknowlegements are cummulative.
5. Acknowledgements have a seq number of their own but with a length 0.So the next
data packet have the seq number same as ACK.
Connection Establish
• The sender sends a SYN packet with serquence numvber say 'x'.
• The receiver on receiving SYN packet responds with SYN packet with sequence
number 'y' and ACK with seq number 'x+1'
• On receiving both SYN and ACK packet, the sender responds with ACK packet
with seq number 'y+1'
• The receiver when receives ACK packet, initiates the connection.
Connection Release
• The initiator sends a FIN with the current sequence and acknowledgement
• The responder on receiving this informs the application program that it will
receive no more data and sends an acknowledgement of the packet. The
connection is now closed from one side.
• Now the responder will follow similar steps to close the connection from its side.
Once this is done the connection will be fully closed.
Transport Layer Protocol (continued)
TCP connection is a duplex connection. That means there is no difference between two
sides once the connection is established.
Salient Features of TCP
• Source and destination port :These fields identify the local endpoint of the
connection. Each host may decide for itself how to allocate its own ports starting
at 1024. The source and destination socket numbers together identify the
• Sequence and ACK number : This field is used to give a sequence number to
each and every byte transferred. This has an advantage over giving the sequence
numbers to every packet because data of many small packets can be combined
into one at the time of retransmission, if needed. The ACK signifies the next byte
expected from the source and not the last byte received. The ACKs are cumulative
instead of selective.Sequence number space is as large as 32-bit although 17 bits
would have been enough if the packets were delivered in order. If packets reach in
order, then according to the following formula:
(sender's window size) + (receiver's window size) < (sequence number space)
the sequence number space should be 17-bits. But packets may take different
routes and reach out of order. So, we need a larger sequence number space. And
for optimisation, this is 32-bits.
• Header length :This field tells how many 32-bit words are contained in the TCP
header. This is needed because the options field is of variable length.
• Flags : There are six one-bit flags.
1. URG : This bit indicates whether the urgent pointer field in this packet is
being used.
2. ACK :This bit is set to indicate the ACK number field in this packet is
3. PSH : This bit indicates PUSHed data. The receiver is requested to deliver
the data to the application upon arrival and not buffer it until a full buffer
has been received.
4. RST : This flag is used to reset a connection that has become confused
due to a host crash or some other reason.It is also used to reject an invalid
segment or refuse an attempt to open a connection. This causes an abrupt
end to the connection, if it existed.
5. SYN : This bit is used to establish connections. The connection
request(1st packet in 3-way handshake) has SYN=1 and ACK=0. The
connection reply (2nd packet in 3-way handshake) has SYN=1 and
6. FIN : This bit is used to release a connection. It specifies that the sender
has no more fresh data to transmit. However, it will retransmit any lost or
delayed packet. Also, it will continue to receive data from other side.
Since SYN and FIN packets have to be acknowledged, they must have a
sequence number even if they do not contain any data.
• Window Size : Flow control in TCP is handled using a variable-size sliding
window. The Window Size field tells how many bytes may be sent starting at the
byte acknowledged. Sender can send the bytes with sequence number between
(ACK#) to (ACK# + window size - 1) A window size of zero is legal and says that
the bytes up to and including ACK# -1 have been received, but the receiver would
like no more data for the moment. Permission to send can be granted later by
sending a segment with the same ACK number and a nonzero Window Size field.
• Checksum : This is provided for extreme reliability. It checksums the header, the
data, and the conceptual pseudoheader. The pseudoheader contains the 32-bit IP
address of the source and destination machines, the protocol number for TCP(6),
and the byte count for the TCP segment (including the header).Including the
pseudoheader in TCP checksum computation helps detect misdelivered packets,
but doing so violates the protocol hierarchy since the IP addresses in it belong to
the IP layer, not the TCP layer.
• Urgent Pointer : Indicates a byte offset from the current sequence number at
which urgent data are to be found. Urgent data continues till the end of the
segment. This is not used in practice. The same effect can be had by using two
TCP connections, one for transferring urgent data.
• Options : Provides a way to add extra facilities not covered by the regular header.
1. Maximum Segment Size : It refers to the maximum size of segment ( MSS ) that
is acceptable to both ends of the connection. TCP negotiates for MSS using
OPTION field. In Internet environment MSS is to be selected optimally. An
arbitrarily small segment size will result in poor bandwith utilization since Data to
Overhead ratio remains low. On the other hand extremely large segment size will
necessitate large IP Datagrams which require fragmentation. As there are finite
chances of a fragment getting lost, segment size above "fragmentation threshold "
decrease the Throughput. Theoretically an optimum segment size is the size that
results in largest IP Datagram, which do not require fragmentation anywhere
enroute from source to destination. However it is very difficult to find such an
optimum segmet size. In system V a simple technique is used to identify MSS. If
H1 and H2 are on the same network use MSS=1024. If on different networks then
2. Flow Control : TCP uses Sliding Window mechanism at octet level. The window
size can be variable over time. This is achieved by utilizing the concept of
"Window Advertisement" based on :
1. Buffer availabilty at the receiver
2. Network conditions ( traffic load etc.)
In the former case receiver varies its window size depending upon the space
available in its buffers. The window is referred as RECEIVE WINDOW
(Recv_Win). When receiver buffer begin to fill it advertises a small Recv_Win so
that the sender does'nt send more data than it can accept. If all buffers are full
receiver sends a "Zero" size advertisement. It stops all transmission. When buffers
become available receiver advertises a Non Zero widow to resume retransmission.
The sender also periodically probes the "Zero" window to avoid any deadlock if
the Non Zero Window advertisement from receiver is lost. The Variable size
Recv_Win provides efficient end to end flow control.
The second case arises when some intermediate node ( e.g. a router ) controls the
source to reduce transmission rate. Here another window referred as
COGESTION WINDOW (C_Win) is utilized. Advertisement of C_Win helps to
check and avoid congestion.
When a source sends a segment TCP sets a timer. If this value is set too low it will
result in many unnecessary treransmissions. If set too high it results in wastage of
banwidth and hence lower throughput. In Fast Retransmit scheme the timer value
is set fairly higher than the RTT. The sender can therefore detect segment loss
before the timer expires. This scheme presumes that the sender will get repeated
ACK for a lost packet.
6. Round Trip Time (RTT) : In Internet environment the segments may travel
across different intermediate networks and through multiple routers. The networks
and routers may have different delays, which may vary over time. The RTT
therefore is also variable. It makes difficult to set timers. TCP allows varying
timers by using an adaptive retransmission algorithm. It works as follows.
1. Note the time (t1) when a segment is sent and the time (t2) when its ACK
is received.
2. Compute RTT(sample) = (t 2 - t 1 )
3. Again Compute RTT(new) for next segment.
4. Compute Average RTT by weighted average of old and new values of RTT
5. RTT(est) = a *RTT(old) + (1-a) * RTT (new) where 0 < a < 1
A high value of 'a' makes the estimated RTT insensitive to changes that
last for a short time and RTT relies on the history of the network. A low
value makes it sensitive to current state of the network. A typical value of
'a' is 0.75
6. Compute Time Out = b * RTT(est) where b> 1
A low value of 'b' will ensure quick detection of a packet loss. Any small
delay will however cause unnecessary retransmission. A typical value of
'b' is kept at .2
State Diagram
The state diagram approach to view the TCP connection establishment and closing
simplifies the design of TCP implementation. The idea is to represent the TCP connection
state, which progresses from one state to other as various messages are exchanged. To
simplify the matter, we considered two state diagrams, viz., for TCP connection
establishment and TCP connection closing.
Fig 1 shows the state diagram for the TCP connection establishment and associated table
briefly explains each state.
TCP Connection establishment
The table gives brief description of each state of the above diagram.
State Description Table 1.
Represents the state when waiting for connection request from any
remote host and port. This specifically applies to a Server.
From this state, the server can close the service or actively open a
connection by sending SYN.
This represents waiting time enough for the packets to reach their
TIME_WAIT destination. This waiting time is usually 4 min.
CLOSE_WAIT Represents a state when the server receives a FIN from the remote
TCP , sends ACK and issues close call sending FIN
Quite Time
It might happen that a host currently in communication crashes and reboots. At startup
time, all the data structures and timers will be reset to an initial value. To make sure that
earlier connection packets are gracefully rejected, the local host is not allowed to make
any new connection for a small period at startup. This time will be set in accordance with
reboot time of the operating system.
The first option is recommended here because, the assumption is that this queue for
request is a coincident and some time later, the server should be free to process the new
request. Hence if we drop the packet, the client will go through the time-out and
retransmission and server will be free to process it.
Also, Standard TCP does not define any strategy/option of knowing who requested the
connection. Only Solaris 2.2 supports this option.
Delayed Acknowledgment
TCP will piggyback the acknowledgment with its data. But if the peer does not have the
any data to send at that moment, the acknowledgment should not be delayed too long.
Hence a timer for 200 ms will be used. At every 200 ms, TCP will check for any
acknowledgment to be sent and send them as individual packets.
Small packets
TCP implementation discourages small packets. Especially if a previous relatively large
packet has been sent and no acknowledgment has been received so far, then this small
packet will be stored in the buffer until the situation improves.
But there are some applications for which delayed data is worse than bad data. For
example, in telnet, each key stroke will be processed by the server and hence no delay
should be introduced. As we have seen in Unix Networking programming, options for the
socket can be set as NO_DELAY, so that small packets are not discouraged.
Retransmission Timeout
In some implementation (E.g.. Linux), RTO = RTT + 4 * delay variance is used to instead
of constant 2.
Also instead of calculating RTT(est) from the scratch, cache will be used to store the
history from which new values are calculated as discussed in the previous classes.
Standard values for Maximum Segment Life (MSL) will be between 0.5 to 2 minutes
and Time wait state = f(MSL)
Persist Timer
As we saw in TCP window management, when source sends one full window of packets,
it will set its window size to 0 and expects an ACK from remote TCP to increase its
window size. Suppose such an ACK has been sent and is lost. Hence source will have
current window size = 0 and cannot send & destination is expecting next byte. To avoid
such a deadlock, a Persist Timer will be used. When this timer goes off, the source will
send the last one byte again. So we hope that situation has improved and an ACK to
increase the current window size will be received.
What is a Socket ?
In unix, whenever there is a need for inter process communication within the same
machine, we use mechanism like signals or pipes(named or unnamed). Similarly, when
we desire a communication between two applications possibly running on different
machines, we need sockets. Sockets are treated as another entry in the unix open file
table. So all the system calls which can be used for any IO in unix can be used on socket.
The server and client applications use various system calls to conenct which use the basic
construct called socket. A socket is one end of the communication channel between two
applications running on different machines.
1. Create a socket
2. Connect the socket to the address of the server
3. Send/Receive data
4. Close the socket
1. Create a socket
2. Bind the socket to the port number known to all clients
3. Listen for the connection request
4. Accept connection request
5. Send/Receive data
AF stands for Address Family and PF stands for Protocol Family. In most modern
implementations only the AF is being used. The various kinds of AF are as follows:
Name Purpose
AF_UNIX, AF_LOCAL Local communication
AF_INET IPv4 Internet protocols
AF_INET6 IPv6 Internet protocols
AF_IPX IPX - Novell protocols
AF_NETLINK Kernel user interface device
AF_X25 ITU-T X.25 / ISO-8208 protocol
AF_AX25 Amateur radio AX.25 protocol
AF_ATMPVC Access to raw ATM PVCs
AF_PACKET Low level packet interface
In all the sample programs given below, we will be using AF_INET.
struct sockaddr_in: This construct holds the information about the address family, port
number, Internet address,and the size of the struct sockaddr.
struct sockaddr_in {
short int sin_family; // Address family
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // Same size as struct sockaddr
Some systems (like x8086) are Little Endian i-e. least signficant byte is stored in the
higher address, whereas in Big endian systems most significant byte is stored in the
higher address. Consider a situation where a Little Endian system wants to communicate
with a Big Endian one, if there is no standard for data representation then the data sent by
one machine is misinterpreted by the other. So standard has been defined for the data
representation in the network (called Network Byte Order) which is the Big Endian. The
system calls that help us to convert a short/long from Host Byte order to Network Byte
Order and viceversa are
IP addresses
Assuming that we are dealing with IPv4 addresses, the address is a 32bit integer.
Remembering a 32 bit number is not convenient for humans. So, the address is written as
a set of four integers seperated by dots, where each integer is a representation of 8 bits.
The representation is like a.b.c.d, where a is the representation of the most significant
byte. The system call which converts this representation into Network Byte Order is:
int inet_aton(const char *cp, struct in_addr *inp);
inet_aton() converts the Internet host address cp from the standard numbers-and-dots
notation into binary data and stores it in the structure that inp points to. inet_aton returns
nonzero if the address is valid, zero if not.
For example, if we want to initialize the sockaddr_in construct by the IP address and
desired port number, it is done as follows:
struct sockaddr_in sockaddr;
sockaddr.sin_family = AF_INET;
sockaddr.sin_port = htons(21);
inet_aton("", &(sockaddr.sin_addr));
memset(&(sockaddr.sin_zero), '\0', 8);
Since we have only one protocol for each kind of socket, it does not matter if we
do not define any protocol at all. So for simplicity, we can put "0" (zero) in the
protocol field.
Server: It is normally defined which provides some sevices to the client programs.
However, we will have a deeper look at the concept of a "service" in this respect later.
The most important feature of a server is that it is a passive entiry, one that listens for
request from the clients.
Client: It is the active entity of the architecture, one that generated this request to connect
to a particular port number on a particular server
Communication takes the form of the client process sending a message over the network
to the server process. The client process then waits for a reply message. When the server
process gets the request, it performs the requested work and sends back a reply.The server
that the client will try to connect to should be up and running before the client can be
executed. In most of the cases, the servers runs continuously as a daemon.
There is a general misconception that servers necessarily provide some service and is
therefore called a server. For example an e-mail client provides as much service as an
mail server does. Actually the term service is not very well defined. So it would be better
not to refer to the term at all. In fact servers can be programmed to do practically
anything that a normal application can do. In brief, a server is just an entity that
listens/waits for requests.
To send a request, the client needs to know the address of the server as well as the port
number which has to be supplied to establish a connection. One option is to make the
server choose a random number as a port number, which will be somehow conveyed to
the client. Subsequently the client will use this port number to send requests. This method
has severe limitations as such information has to be communicated offline, the network
connection not yet being established. A better option would be to ensure that the server
runs on the same port number always and the client already has knowledge as to which
port provides which service. Such a standardization already exists. The port numbers 0-
1023 are reserved for the use of the superuser only. The list of the services and the ports
can be found in the file /etc/services.
Connectionless Communication
Analogous to the postal service. Packets(letters) are sent at a time to a particular
destination. For greater reliability, the receiver may send an acknowledgement (a receipt
for the registered letters).
Based on this two types of communication, two kinds of sockets are used:
The sequence of system calls that have to be made in order to setup a connection is given
1. The socket system call is used to obtain a socket descriptor on both the client and
the server. Both these calls need not be synchronous or related in the time at
which they are called.The synopsis is given below:
int socket(int domain, int type, int protocol);
3. Both the client and the server 'bind' to a particular port on their machines using
the bind system call. This function has to be called only after a socket has been
created and has to be passed the socket descriptor returned by the socket call.
Again this binding on both the machines need not be in any particular order.
Moreover the binding procedure on the client is entirely optional. The bind
system call requires the address family, the port number and the IP address. The
address family is known to be AF_INET, the IP address of the client is already
known to the operating system. All that remains is the port number. Of course the
programmer can specify which port to bind to, but this is not necessary. The
binding can be done on a random port as well and still everything would work
fine. The way to make this happen is not to call bind at all. Alternatively bind can
be called with the port number set to 0. This tells the operating system to assign a
random port number to this socket. This way whenever the program tries to
connect to a remote machine through this socket, the operating system binds this
socket to a random local port. This procedure as mentioned above is not
applicable to a server, which has to listen at a standard predetermined port.
4. The next call has to be listen to be made on the server. The synopsis of the listen
call is given below.
int listen(int skfd, int backlog);
5. skfd is the socket descriptor of the socket on which the machine should start
backlog is the maximum length of the queue for accepting requests.
6. The connect system call signifies that the server is willing to accept connections
and thereby start communicating.
7. Actually what happens is that in the TCP suite, there are certain messages that are
sent to and fro and certain initializations have to be performed. Some finite
amount of time is required to setup the resources and allocate memory for
whatever data structures that will be needed. In this time if another request arrives
at the same port, it has to wait in a queue. Now this queue cannot be arbitrarily
large. After the queue reaches a particular size limit no more requests are
accepted by the operating system. This size limit is precisely the backlog
argument in the listen call and is something that the programmer can set. Today's
processors are pretty speedy in their computations and memory allocations. So
under normal circumstances the length of the queue never exceeds 2 or 3. Thus a
backlog value of 2-3 would be fine, though the value typically used is around
5.Note that this call is different from the concept of "parallel" connections.The
established connections are not counted in n. So, we may have 100 parallel
connection running at a time when n=5.
9. The connect function is then called on the client with three arguments, namely the
socket descriptor, the remote server address and the length of the address data
structure. The synopsis of the function is as follows:
#include<netinet/in.h> /* only for AF_INET , or the INET Domain */
10. The request generated by this connect call is processed by the remote server and is
placed in an operating system buffer, waiting to be handed over to the application
which will be calling the accept function. The accept call is the mechanism by
which the networking program on the server receives that requests that have been
accepted by the operating system. This synopsis of the accept system call is given
skfd is the socket descriptor of the socket on which the machine had performed a
listen call and now desires to accept a request on that socket.
addr is the address structure that will be filled in by the operating system by the
port number and IP address of the client which has made this request. This
sockaddr pointer can be type-casted to a sockaddr_in pointer for subsequent
operations on it.
addrlen is again the length of the socket address structure, the pointer to which is
the second argument.
connection: defined as a 4-tuple : (Local IP, Local port, Foreign IP, Foreign port)
For each connection at least one of these has to be unique. Therefore multiple
connections on one port of the server, actually are different.
11. Finally when both connect and accept return the connection has been established.
12. The socket descriptors that are with the server and the client can now be used
identically as a normal I/O descriptor. Both the read and the write calls can be
performed on this socket descriptor. The close call can be performed on this
descriptor to close the connection. Man pages on any UNIX type system will
furnish further details about these generic I/O calls.
13. Variants of read and write also exist, which were specifically designed for
networking applications. These are recv and send.
Except for the flags argument the rest is identical to the arguments of the read and
write calls. Possible values for the flags are:
14. To close a particular connection the shutdown call can also be used to achieve
greater flexibility.
SHUT_RD 0stop all read operations on this socket, but continue writing
SHUT_WR 1stop all write operations on this socket, but keep receiving data
SHUT_RDWR 2same as close
int sendto(int skfd, void *buf, int buflen, int flags, struct sockaddr* to, int tolen);
int recvfrom(int skfd, void *buf, int buflen, int flags, struct sockaddr* from, int
sendto sends a datagram packet containing the data present in buf addressed to
the address present in the sockaddr structure, to.
recvfrom fills in the buf structure with the data received from a datagram packet
and the sockaddr structure, from with the address of the client from which the
packet was received.
Both these calls block until a packet is sent in case of sendto and a packet is
received in case of recvfrom. In the strict sense though sendto is not blocking as
the packet is sent out in most cases and sendto returns immediately.
There are various options which can be set for a socket and there are multiple ways to set
options that affect a socket.
Of these, setsockopt() system call is the one specifically designed for this purpose. Also,
we can retrieve the option which are currently set for a socket by means of getsockopt()
system call.
int setsockopt(int socket, int level, int option_name, const void *option_value,
socklen_t option_len);
The socket argument must refer to an open socket descriptor. The level specifies who in
the system is to interpret the option: the general socket code, the TCP/IP code, or the
XNS code. This function sets the option specified by the option_name, at the protocol
level specified by the level, to the value pointed to by the option_value for the socket
associated with the file descriptor specified by the socket. The level argument specifies
the protocol level at which the option resides. To set options at the socket level, we need
to specify the level argument as SOL_SOCKET. To set options at other levels, we need
to supply the appropriate protocol number for the protocol controlling the option. The
option_name specifies a single option to set. The option_name and any specified options
are passed uninterpreted to the appropriate protocol module for interpretations. The list of
options available at the socket level (SOL_SOCKET) are:
Permits sending of broadcast messages, if this is supported by the protocol. This option
takes an int value. This is a boolean option.
Specifies that the rules used in validating addresses supplied to bind() should allow reuse
of local addresses, if this is supported by the protocol. This option takes an int value. This
is a boolean option.
Lingers on a close() if data is present. This option controls the action taken when unsent
messages queue on a socket and close() is performed. If SO_LINGER is set, the system
blocks the process during close() until it can transmit the data or until the time expires. If
SO_LINGER is not specified, and close() is issued, the system handles the call in a way
that allows the process to continue as quickly as possible. This option takes a linger
structure, as defined in the <sys/socket.h> header, to specify the state of the option and
linger interval.
Leaves received out-of-band data (data marked urgent) in line. This option takes an int
value. This is a boolean option.
Requests that outgoing messages bypass the standard routing facilities. The destination
must be on a directly-connected network, and messages are directed to the appropriate
network interface according to the destination address. The effect, if any, of this option
depends on what protocol is in use. This option takes an int value. This is a boolean
Sets the minimum number of bytes to process for socket input operations. The default
value for SO_RCVLOWAT is 1. If SO_RCVLOWAT is set to a larger value, blocking
receive calls normally wait until they have received the smaller of the low water mark
value or the requested amount. (They may return less than the low water mark if an error
occurs, a signal is caught, or the type of data next in the receive queue is different than
that returned, e.g. out of band data). This option takes an int value. Note that not all
implementations allow this option to be set.
Sets the timeout value that specifies the maximum amount of time an input function waits
until it completes. It accepts a timeval structure with the number of seconds and
microseconds specifying the limit on how long to wait for an input operation to complete.
If a receive operation has blocked for this much time without receiving additional data, it
returns with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data
were received. The default for this option is zero, which indicates that a receive operation
will not time out. This option takes a timeval structure. Note that not all implementations
allow this option to be set.
Sets the minimum number of bytes to process for socket output operations. Non-blocking
output operations will process no data if flow control does not allow the smaller of the
send low water mark value or the entire request to be processed. This option takes an int
value. Note that not all implementations allow this option to be set.
Sets the timeout value specifying the amount of time that an output function blocks
because flow control prevents data from being sent. If a send operation has blocked for
this time, it returns with a partial count or with errno set to [EAGAIN] ore
[EWOULDBLOCK] if no data were sent. The default for this option is zero, which
indicates that a send operation will not time out. This option stores a timeval structure.
Note that not all implementations allow this option to be set.
For boolean options, 0 indicates that the option is disabled and 1 indicates that the option
is enabled.Options at other protocol levels vary in format and name.
Returns the maximum segment size in use for the socket.The typical value for a 43.BSD
socket using an Ethernet is 1024 bytes.
When TCP is being used for a remote login,there will be many small data packets sent
from the client's system to the server.Each packet can contain a single character that the
user enters which is sent to the server for echoing and processing.It might be desirable to
reduce the number of such small packets by combining a number of them into one big
packet.But this causes a delay between the typing of a character by the user and its
appearance on its monitor.This is certainly not something the user will appreciate. For
such services it is desirable that the client's packets be sent as soon as they are ready.The
TCP_NODELAY option is used for these clients to defeat the buffering algorithm, and
allow the client's TCP to send small packets as soon as possible.
int getsockopt(int socket, int level, int option_name, void *option_value, socklen_t
This function retrieves the value for the option specified by the option_name argument
for the socket. If the size of the option value is greater than option_len, the value stored in
the object pointed to by the option_value will be silently truncated. Otherwise, the object
pointed to by the option_len will be modified to indicate the actual length of the value.
The level specifies the protocol level at which the option resides. To retrieve options at
the socket level, we need to specify the level argument as SOL_SOCKET. To retrieve
options at other levels, we need to supply the appropriate protocol number for the
protocol controlling the option. The socket in use may require the process to have
appropriate privileges to use the getsockopt() function. The list of options for
option_name is the same as those available for setsockopt() system call.
Normally a client process is started on the same system or on another system that is
connected to the server's system with a network. Client processes are often initiated by
the interactive user entering a command to a time-sharing system. The client process
sends a request across the network to the server requesting service of some form. In this
way, normally a server handles only one client at a time. If multiple client connections
arrive at about the same time, the kernel queues them upto some limit, and returns them
to accept function one at a time. But if the server takes more time to service each client
(say a few seconds or minutes), we would need some way to overlap the service of one
client with another client. Multiple requests can be handled in two ways. In fact servers
are classified on the basis of the way they handle multiple requests.
Servers must keep running at all times. However, all the servers are not working all this
time but merely waiting for a request from a client. To avoid this waste of resources, a
single server is run which waits on all the port numbers. This Internet super server is
called inetd in UNIX. It is referred to as the ``Internet Super-Server'' because it manages
connections for several daemons. Programs that provide network service are commonly
known as daemons. inetd serves as a managing server for other daemons. When a
connection is received by inetd, it determines which daemon the connection is destined
for, spawns the particular daemon and delegates the socket to it. Running one instance of
inetd reduces the overall system load as compared to running each daemon individually
in stand-alone mode. This daemon provides two features -
2. It simplifies the writing of the server processes to handle the requests, since many of
the start-up details are handled by inetd.
The inetd process uses fork and exec system calls to invoke the actual server process.The
only way the server can obtain the identity of the client is by calling the getpeername
system call.
Getpeername returns the name of the peer connected to socket sockfd. The addrlen
parameter should be initialised to indicate the amount of space pointed to by peer . On
return it contains the actual size of the name returned (in bytes). The name is truncated if
the buffer provided is too small.
A similar system call is getsockname. Getsockname returns the current name for the
specified socket. The addrlen parameter should be initialized to indicate the amount of
space pointed to by name. On return it contains the actual size of the name returned (in
Topics in TCP
TCP Congestion Control
If the receiver advertises a large window-size , larger than what the network en route can
handle , then there will invariably be packet losses. So there will be re-transmissions as
well . However , the sender cannot send all the packets for which ACK has not been
received because this way it will be causing even more congestion in the network.
Moreover , the sender at this point of time cannot be sure about how many packets have
actually been lost . It might be that this is the only one that has been lost , and some
following it have actually been received and buffered by the receiver. In that case , the
sender will have unnecessarily sent a number of packets.
Congestion Window
We have already seen one bound on the size of the segments sent by the receiver-namely ,
the receiver window that the receiver advertises . However there could be a bottleneck
created by some intermediate network that is getting clogged up. The net effect is that
just having the receiver window is not enough. There should be some bound relating to
the congestion of the network path - congestion window captures exactly this bound.
Similar to receiver window, we have another window , the Congestion Window , and the
maximum size of the segments sent are bounded by the minimum of the sizes of the two
windows. E.g. If the receiver says "send 8K" (size of the receiver window ) , but the
sender knows that bursts of more than 4K (size of congestion window ) clog the network
up, then it sends 4K. On the other hand , if the congestion window was of size 32K , then
the sender would send segments of maximum size 8K.
Notice that TCP always tries to keep the flow rate slightly below the maximum value. So
if the network traffic fluctuates slightly, then a lot of packets might be lost. Packet losses
cause a terrible loss in throughput.
In all these schemes, we have been assuming that any packet loss occurs only due to
network congestion. What happens if some packet loss occurs not due to some congestion
but due to some random factors?
When a packet is lost, the congestion window size is set to 1. Then when we retransmit
the packet, if we receive a cumulative ACK for a lot of subsequent packets, we can
assume that the packet loss was not due to congestion, but because of some random
factors. So we give up slow start and straightaway set the size of Congestion Window to
the threshold value.
Nagle's algorithm
when data comes to the sender one byte at a time , send the first byte and buffer all the
remaining bytes till the outstanding byte is acknowledged. Then send all the buffered
characters in one segment and start buffering again till they are acknowledged. It can help
reduce the bandwidth usage for example when the user is typing quickly into a telnet
connection and the network is slow .
Persistent Timer
Consider the following deadlock situation . The receiver sends an ACK with 0 sized
window, telling the sender to wait. Later it send an ACK with non-zero window, but this
ACK packet gets lost. Then both the receiver and the sender will be waiting for each
other to do something. So we keep another timer. When this timer goes off, the sender
transmits a probe packet to the sender with an ACK number that is old. The receiver
responds with an ACK with updated window size and transmission resumes.
Now we look at the solution of the last two problems ,namely Problem of Random
Losses and Sequence Number Wrap Around.
Selective Acknowledgement
We need a selective acknowledgement but that creates a problem in TCP because we use
byte sequence numbers .So what we we do is that we send the sequence number and the
length. We may have to send a large number of such Selective Acknowledgements which
will increase the overhead So whenever we get out of sequence packets we send the
information a few time not in all the packets anyway. So we cannot rely on Selective
Acknowledgement anyway. If we have 32 bit sequence number and 32 bit length,then
already we will have too much of overhead .One proposal is to use 16 bit length field. If
we have very small gaps then we will think that random losses are there and we need to
fill them .If large gaps are there we assume that congestion is there and we need to slow
Kind: 8
Length: 10 bytes
|Kind=8 | 10 | TS Value | TS Echo Reply |
1 1 4 4 (length in bytes)
The Timestamps option carries two four-byte timestamp fields. The Timestamp Value
field (TSval) contains the current value of the timestamp clock of the TCP sending the
option. The Timestamp Echo Reply field (TSecr) is only valid if the ACK bit is set in the
TCP header; if it is valid, it echos a times- tamp value that was sent by the remote TCP in
the TSval field of a Timestamps option. When TSecr is not valid, its value must be zero.
The TSecr value will generally be the time stamp for the last in-sequence packet received.
Sequence of packet send : 1 (t1) 2 (t2) 3 (t3) 4 (t4) 5
(t5) 6 (t6)
sequence of packets received: 1 2 4 3
5 6
time stamp copied in ACK: t1 t2 t3
In both the PAWS and the RTTM mechanism, the "timestamps" are 32- bit unsigned
integers in a modular 32-bit space. Thus, "less than" is defined the same way it is for TCP
sequence numbers, and the same implementation techniques apply. If s and t are
timestamp values, s < t if 0 < (t - s) < 2**31, computed in unsigned 32-bit arithmetic.
The choice of incoming timestamps to be saved for this comparison must guarantee a
value that is monotone increasing. For example, we might save the timestamp from the
segment that last advanced the left edge of the receive window, i.e., the most recent in-
sequence segment. Instead, we choose the value TS.Recent for the RTTM mechanism,
since using a common value for both PAWS and RTTM simplifies the implementation of
both. TS.Recent differs from the timestamp from the last in-sequence segment only in
the case of delayed ACKs, and therefore by less than one window. Either choice will
therefore protect against sequence number wrap-around.
RTTM was specified in a symmetrical manner, so that TSval timestamps are carried in
both data and ACK segments and are echoed in TSecr fields carried in returning ACK or
data segments. PAWS submits all incoming segments to the same test, and therefore
protects against duplicate ACK segments as well as data segments. (An alternative un-
symmetric algorithm would protect against old duplicate ACKs: the sender of data would
reject incoming ACK segments whose TSecr values were less than the TSecr saved from
the last segment whose ACK field advanced the left edge of the send window. This
algorithm was deemed to lack economy of mechanism and symmetry.)
TSval timestamps sent on {SYN} and {SYN,ACK} segments are used to initialize
PAWS. PAWS protects against old duplicate non-SYN segments, and duplicate SYN
segments received while there is a synchronized connection. Duplicate {SYN} and
{SYN,ACK} segments received when there is no connection will be discarded by the
normal 3-way handshake and sequence number checks of TCP.
Header Prediction
As we want to know that from which TCP connection this packet belongs. So for each
new packet we have to match the header of each packet to the database that will take a lot
of time so what we do is we first compare this header with the header of last received
packet and on an average this will reduce the work. Assuming that this packet is from the
same TCP connection from where we have got the last one (locality principal).
UDP's main purpose is to abstract network traffic in the form of datagrams. A datagram
comprises one single "unit" of binary data; the first eight (8) bytes of a datagram contain
the header information and the remaining bytes contain the data itself.
UDP Headers
The UDP header consists of four (4) fields of two bytes each:
length checksum
UDP port numbers allow different applications to maintain their own "channels" for data;
both UDP and TCP use this mechanism to support multiple applications sending and
receiving data concurrently. The sending application (that could be a client or a server)
sends UDP datagrams through the source port, and the recipient of the packet accepts this
datagram through the destination port. Some applications use static port numbers that are
reserved for or registered to the application. Other applications use dynamic
(unregistered) port numbers. Because the UDP port headers are two bytes long, valid port
numbers range from 0 to 65535; by convention, values above 49151 represent dynamic
The datagram size is a simple count of the number of bytes contained in the header and
data sections . Because the header length is a fixed size, this field essentially refers to the
length of the variable-sized data portion (sometimes called the payload). The maximum
size of a datagram varies depending on the operating environment. With a two-byte size
field, the theoretical maximum size is 65535 bytes. However, some implementations of
UDP restrict the datagram to a smaller number -- sometimes as low as 8192 bytes.
UDP checksums work as a safety feature. The checksum value represents an encoding of
the datagram data that is calculated first by the sender and later by the receiver. Should an
individual datagram be tampered with (due to a hacker) or get corrupted during
transmission (due to line noise, for example), the calculations of the sender and receiver
will not match, and the UDP protocol will detect this error. The algorithm is not fool-
proof, but it is effective in many cases. In UDP, check summing is optional -- turning it
off squeezes a little extra performance from the system -- as opposed to TCP where
checksums are mandatory. It should be remembered that check summing is optional only
for the sender, not the receiver. If the sender has used checksum then it is mandatory for
the receiver to do so.
Usage of the Checksum in UDP is optional. In case the sender does not use it, it sets the
checksum field to all 0's. Now if the sender computes the checksum then the recipient
must also compute the checksum an set the field accordingly. If the checksum is
calculated and turns out to be all 1's then the sender sends all 1's instead of all 0's. This is
since in the algorithm for checksum computation used by UDP, a checksum of all 1's if
equivalent to a checksum of all 0's. Now the checksum field is unambiguous for the
recipient, if it is all 0's then checksum has not been used, in any other case the checksum
has to be computed.
There is also another motivation for DNS. All the related information about a particular
network (generally maintained by an organization, firm or university) should be available
at one place. The organization should have complete control over what it includes in its
network and how does it "organize" its network. Meanwhile, all this information should
be available transparently to the outside world.
Conceptually, the internet is divide into several hundred top level domains where each
domain covers many hosts. Each domain is partitioned in subdomains which may be
further partitioned into subsubdomains and so on... So the domain space is partitioned in
a tree like structure as shown below. It should be noted that this tree hierarchy has
nothing in common with the IP address hierarchy or organization.
The internet uses a hierarchical tree structure of Domain Name Servers for IP address
resolution of a host name.
The top level domains are either generic or names of countries. eg of generic top level
domains are .edu .mil .gov .org .net .com .int etc. For countries we have one entry for
each country as defined in ISO3166. eg. .in (India) .uk (United Kingdom).
The leaf nodes of this tree are target machines. Obviously we would have to ensure that
the names in a row in a subdomain are unique. The max length of any name between two
dots can be 63 characters. The absolute address should not be more than 255 characters.
Domain names are case insensitive. Also in a name only letters, digits and hyphen are
allowed. For eg. is a domain name corresponding to a machine named
www under the subsubdomain
Resource Records:
Every domain whether it is a single host or a top level domain can have a set of resource
records associated with it. Whenever a resolver (this will be explained later) gives the
domain name to DNS it gets the resource record associated with it. So DNS can be
looked upon as a service which maps domain names to resource records. Each resource
record has five fields and looks as below:
Domain Name Class Type Time to Live Value
DNS (Contd...)
Resource Record
A Resource Record (RR) has the following:
Note: While short TTLs can be used to minimize caching, and a zero TTL prohibits
caching, the realities of Internet performance suggest that these times should be on the
order of days for the typical host. If a change can be anticipated, the TTL can be reduced
prior to the change to minimize inconsistency during the change, and then increased back
to its former value following the change. The data in the RDATA section of RRs is
carried as a combination of binary strings and domain names. The domain names are
frequently used as "pointers" to other data in the DNS.
When a name server fails to find a desired RR in the resource set associated with the
domain name, it checks to see if the resource set consists of a CNAME record with a
matching class. If so, the name server includes the CNAME record in the response and
restarts the query at the domain name specified in the data field of the CNAME record.
Name Servers
Name servers are the repositories of information that make up the domain database. The
database is divided up into sections called zones, which are distributed among the name
servers. Name servers can answer queries in a simple manner; the response can always be
generated using only local data, and either contains the answer to the question or a
referral to other name servers "closer" to the desired information. The way that the name
server answers the query depends upon whether it is operating in recursive mode or
iterative mode:
• The simplest mode for the server is non-recursive, since it can answer queries
using only local information: the response contains an error, the answer, or a
referral to some other server "closer" to the answer. All name servers must
implement non-recursive queries.
• The simplest mode for the client is recursive, since in this mode the name server
acts in the role of a resolver and returns either an error or the answer, but never
referrals. This service is optional in a name server, and the name server may also
choose to restrict the clients which can use recursive mode.
In iterative mode, on the other hand, if the server does not have the information requested
locally then it return the address of some name server who might have the information
about the query. It is then the responsibility of the contacting application to contact the
next name server to resolve its query and do this iteratively until gets an answer or and
Relative Names
In place of giving full DNS names like or one can
give just cu2 or bhaskar.This can be used by the server side as well as the client side.But
for this one has to manually specify these extensions in the database of the servers
holding the resource records.
The BOOTP uses UDP/IP. It is run when the machine boots. The protocol allows diskless
machines to discover their IP address and the address of the server host. Additionally
name of the file to be loaded from memory and executed is also supplied to the machine.
This protocol is an improvement over RARP which has the follwing limitations:
1. Networks which do not have a broadcast method can't support RARP as it uses
the broadcast method of the MAC layer underneath the IP layer.
2. RARP is heavily dependent on the MAC protocol.
3. RARP just supplies the IP address corresponding to a MAC address It doesn't
support respond with any more data.
4. RARP uses the computer hardware's address to identify the machine and hence
cannot be used in networks that dynamically assign hardware addresses.
Events in BOOTP
1. The Client broadcasts its MAC address (or other unique hardware identity
number) asking for help in booting.
2. The BOOTP Server responds with the data that specifies how the Client should be
configured (pre-configured for the specific client)
Note: BOOTP doesn't use the MAC layer broadcast but uses UDP/IP.
Configuration Information
The important informations provided are:
• IP address
• IP address of the default router for that particular subnet
• Subnet mask
• IP addresses of the primary and secondary nameservers
Timers Used
Note that lease time is the time specified by the server for which the services have been
provided to the client.
• Lease Renewal Timer - When this timer expires machine will ask the server for
more time sending a DHCP Request.
• Lease Rebinding Timer - Whenever this timer expires, we have not been
receiving any response from the server and so we can assume the server is down.
Thus send a DHCP Request to all the servers using IP Broadcast facility. This is
only point of difference between Lease renewal and rebinding.
• Lease Expiry Timer - Whenever this timer expires, the system will have to start
crashing as the host does not have a valid IP address in the network.
Routing in Internet
The Origin of Internet
The response of Internet to the issue of choosing routing tables with complete/par tail
information is shown by the following architecture. There are a few nodes having
complete routing information and a large number of nodes with partial information. The
nodes with complete information, called core gateways, are well connected by a
Backbone Network. These nodes talk to each other to keep themselves updated. The non-
core gateways are connected to the core gateways. (Historically, this architecture comes
from the ARPANET.)
The original internet was structured around a backbone of ARPANET with several core
gateways connected to it .These core gateways connected some Local Area Networks
(LANs) to the rest of the network. These core gateways talked to themselves and
exchanged routing information's. Every core gateway contained complete information
about all possible destinations.
How do you do routing ?
The usual IP routing algorithm employs an internet routing table (some times called an IP
routing table) on each machine that Stores the information about the possible
destinations, and how to reach them.
Default Routes
This technique used to hide information and keep routing table size small consolidates
multiple entries into a default case. If no route appears in the routing table, the routing
routine sends the data gram to the default router.
Default routing is especially useful when a site has a small set of local addresses and only
one connection to the rest of the internet.
Host-Specific Routes
Most IP routing software allows per-host routes to be specified as a special case. Having
per-host routes gives the local network administrator more control over network use,
permits testing, and can also be used to control access for security purposes. when
debugging network connections or routing tables, the ability to specify a special route to
one individual machine turns out to be especially useful.
Internet with Two Backbones
As long as there was just one single router connecting ARPANET with NSFNET there
was no problem. The core gateways of ARPANET had information about all destinations
and the routers inside NSFNET contained information about local destinations and used a
default route to send all non-NSFNET traffic to between NSFNET and ARPANET as
both of them used different matrices to measure costs. the core gateways through the
router between ARPANET and NSFNET. However as multiple connections were made
between the two backbones, problems arise. Which route should a packet from net1 to
net2 take? Should it be R1 or R2 or R3 or R4 or R5? For this some exchange of routing
information between the two backbones was necessary. But, this was again a problem as
how should we compare information.
As the number of networks and routers increased, to reduce the load on the core gateways
because of the enormous amount of calculations, routing was done with some core
gateways keeping complete information and the non-core gateways keeping partial
In thisarchitecture, G1 ,G2 ,G3 are all core gateways and G4 and G5 are non-core gateways.
We must have a mechanism for someone to tell G2 that it is connected to net2 , net3 and
net4 , besides net1. Only G5 can tell this to G2 and so we must provide for a mechanism
for G2 to talk to G5 . A concept of one backbone with core gateways connected to
Autonomous Systems was developed. An Autonomous system is a group of networks
controlled by a single administrative authority. Routers within an autonomous system are
free to choose their own mechanisms for discovering , propagating ,validating , and
checking the consistency of routes. Each autonomous system must agree to advertise
network reachability information to other autonomous systems. Each advertisement
propagates through a core router. The assumption made is that most of the routers in the
autonomous system have complete information about the autonomous system. One such
router will be assigned the task of talking to the core gateway.
This is one of the most widely used IGP. It was developed at Berkeley. This is also known
by the name of the program that implements it, routed .This implements Distance Vector
algorithm.Features of RIP:
This is an Interior Gateway Protocol designed by the Internet Engineering Task Force
( IETF ). This algorithm scales better than the vector distance algorithms. This Protocol
tackles several goals:
• OSPF includes type of service(ToS) routing. So, you can installmultiple routers to
a given destination, one for each type of service. When routing a datagram, a
router running OSPF uses both the destination address and type of service fields
in the IP Header to choose a route.
• OSPF provides load balancing. If there are multiple routes to a given destination
at the same cost, OSPF distributes traffic over all the routes equally.
• OSPF allows for creation of AREA HIERARCHIES. This makes the growth of
the network easier and makes the network at a site easier to manage. Each area is
self contained, so, multiple groups within a site can cooperate in the use of OSPF
for routing.
• OSPF protocol specifies that all exchanges between the routers be authenticated.
OSPF allows variety of authentication schemes, and even allows one area to
choose a different scheme from the other areas.
• To accomodate multi-access networks like ethernet, OSPF allows every multi-
access network to have a designated router( designated gateway).
• To permit maximum flexibility, OSPF allows the description of a virtual network
topology that abstracts away from details of physical connections.
• OSPF also allows for routers to exchange routing information learned from other
sites. The message format distinguishes between information acquired from
external sources and information acquired from routers interior to the site, so
there is no ambiguity about the source or reliability of routes.
• It hastoo much overhead of sending LSPs but is gradually becoming popular.
EGP is used only to find network reachability and not for differentiating between good
and bad routes. We can only use distance metric to declare a route plausible and not for
comparing it with some other route (unless the two route form part of a same autonomous
system). Since there cannot be two different routes to the same network, EGP restricts
the topology of any internet to a tree structure in which a core system forms the root.
There are no loops among other autonomous systems connected to it. This leads to
several problems:
Routing (Continued)
Shortest Path Algorithm
1. Dijktstra's Algorithm:
At the end each node will be labeled (see Figure.1) with its distance from source node
along the best known path. Initially, no paths are known, so all nodes are labeled with
infinity. As the algorithm proceeds and paths are found, the labels may change reflecting
better paths. Initially, all labels are tentative. When it is discovered that a label represents
the shortest possible path from the source to that node, it is made permanent and never
changed thereafter.
Look at the weighted undirected graph of Figure.1(a), where the weights represent, for
example, distance. We want to find shortest path from A to D. We start by making node A
as permanent, indicated by a filled in circle. Then we examine each of the nodes adjacent
to A (the working node), relabeling each one with the distance to A. Whenever a node is
relabeled, we also label it with the node from which the probe was made so that we can
construct the final path later. Having examined each of the nodes adjacent to A, we
examine all the tentatively labeled nodes in the whole graph and make the one with the
smallest label permanent, as shown in Figure.1(b). This one becomes new working node.
We now start at B, and examine all nodes adjacent to it. If the sum of the label on B and
the distance from B to the node being considered is less than the label on the node, we
have a shorter path, so the node is relabeled. After all the nodes adjacent to the working
node have been inspected and the tentative labels changed if possible, the entire graph is
searched for the tentatively labeled node with the smallest value. This node is made
permanent and becomes the working node for the next round. The Figure. 1 shows the
first five steps of the algorithm.
Note: Dijkstra's Algorithm is applicable only when cost of all the nodes is non-negative.
We look at the distributed version which works on the premise that the information about
far away nodes can be had from the adjoining links.
The algorithm works as follows.
o Compute the link costs from the starting node to every directly connected
node .
o Select the cheapest links for every node (if there is more than one) .
o For every directly connected node, compute the link costs for all these
o Select the cheapest route for any node .
Repeat until all nodes have been processed.
Every node should have the information about it's immediate neighbors and over a period
of time we will have information about other nodes. Within n units of time , where n is
the diameter of the network, every node will have the complete information. We do not
need to be synchronized i.e. do not need to exchange information at the same time.
Routing algorithms based on Dijkstra's algorithm are called Link State Algorithms.
Distance Vector Protocols are based on distributed Bellman's algorithm. In the former we
are sending little information to many nodes while in the latter we send huge information
to few neighbors.
Count-to-Infinity problem:
Suppose the link between A and E is down events may occur are:
(1) F tells A that it has a path to E with cost 6
(2) A sets cost to E to be 11, and advertise to F again
(3) F sets the cost to E to be 16, and advertise to A again
This cycle will continue and the cost to E goes to infinity. The core of the problem is that
when X tells Y that it has a path somewhere ,Y has no way to know whether it itself is on
the path.
During this process of counting to infinity, packets from A or F destined to E are likely
to loop back and forth between A and F, causing congestion for other packets.
Design of a bad routing protocol can lead to highly undesirable results. Consider the
following scenario to understand this. We are having 16 nodes logically connected in a
ring as shown in Figure1.
Each node sends one unit of data in unit time except one node Q which sends e (0<e<1)
unit of data in unit time. We consider cost of the link as the traffic in that link. We
consider P as the only receiver in the ring. Now Ideally we must have nodes left of the
diagonal PQ sending data clockwise and that on the right of the PQ counterclockwise as
shown in Figure 2. We may assume that Q sends data counterclockwise. Assume that this
ideal distribution was achieved at some time. Now we can see that cost of links to the left
of PQ are respectively 1,2,3 ..,7 while that on the right of PQ are e,1+e,2+e ..,7+e.
Therefore when we reconsider the shortest path the node immediately to the right of Q
will see traffic 28 to the left while 28+7e to the right and therefore will start sending data
clockwise and same is true for Q also. This will heavily change the traffic on the network.
Now the traffic load will shift to the left of PQ and next reconsideration will make a lot of
nodes from left of PQ send data counterclockwise. This may keep oscillating and will
cost a lot to the network. To prevent these two steps can be taken :
Available Methods :
• Abstract Syntax Notation (ASN.1) : This is the notation developed by the ISO, to
describe the data structures used in communication, in a flexible yet standard
enough way. The basic idea is to define all the data structure types ( i.e., data
types) needed by each application in ASN.1 and package them together in a
module. When an application wants to transmit a data structure, it can pass the
data structure to the presentation layer, along with the ASN.1 name of the data
structure. Using the ASN.1 definition as a guide, the presentation layer then
knows what the types and sizes of the fields are, and thus know how to encode
them for transmission ( Explicit Typing). It can be implemented with Asymmetric
or Symmetric data conversion.
• External Data Representation (XDR) : This is implemented using Symmetric
data representation. XDR is the standard representation used for data traversing
the network. XDR provides representations for most of the structures that a C-
program can specify. However the encodings contain only data items and not
information about their types. Thus, client and servers using XDR must agree on
the exact format of messages that they will exchange ( Implicit Typing ).
The chief advantage lies in flexibility: neither the server nor the client need to
understand the architecture of the other. But, computational overhead is the main
disadvantage. Nevertheless, it simplifies programming, reduces errors, increases
interoperability among programs and makes easier network management and
The Buffer Paradigm : XDR uses a buffer paradigm which requires a program to
allocate a buffer large enough to hold the external representation of a message and
to add items (i.e., fields) one at a time. Thus a complete message is created in
XDR format.
Remote Procedure Call (RPC)
Now let us see how a typical remote procedure call gets executed :-
1. Client program calls the stub procedure linked within its own address space. It is a
normal local call.
2. The client stub then collects the parameters and packs them into a message
(Parameter Marshalling). The message is then given to the transport layer for
3. The transport entity just attaches a header to the message and puts it out on the
network without further ado.
4. When the message arrives at the server the transport entity there passes it tot the
server stub, which unmarshalls the parameters.
5. The server stub then calls the server procedure, passing the parameters in the
standard way.
6. After it has completed its work, the server procedure returns, the same way as any
other procedure returns when it is finished. A result may also be returned.
7. The server stub then marshalls the result into a message and hands it off at the
transport interface.
8. The reply gets back to the client machine.
9. The transport entity hands the result to the client stub.
10. Finally, the client stub returns to its caller, the client procedure, along-with the
value returned by the server in step 6.
This whole mechanism is used to give the client procedure the illusion that it is making a
direct call to a distant server procedure. To the extent the illusion exceeds, the mechanism
is said to be transparent. But the transparency fails in parameter passing. Passing any
data ( or data structure) by value is OK, but passing parameter 'by reference' causes
problems. This is because the pointer in question here, points to an address in the address
space of the client process, and this address space is not shared by the server process. So
the server will try to search the address pointed to by this passed pointer, in its own
address space. This address may not have the value same as that on the client side, or it
may not lie in the server process' address space, or such an address may not even exist in
the server address space.
One solution to this can be Copy-in Copy-out. What we pass is the value of the pointer,
instead of the pointer itself. A local pointer, pointing to this value is created on the server
side (Copy-in). When the server procedure returns, the modified 'value' is returned, and is
copied back to the address from where it was taken (Copy-out). But this is
disadvantageous when the pointer involved point to huge data structures. Also this
approach is not foolproof. Consider the following example ( C-code) :
The procedure 'myfunction()' resides on the server machine. If the program executes on a
single machine then we must expect the output to be '4'. But when run in the client-server
model we get '3'. Why ? Because 'x, and 'y' point to different memory locations with the
same value. Each then increments its own copy and the incremented value is returned.
Thus '3' is passed back and not '4'.
Many RPC systems finesse the whole problem by prohibiting the use of reference
parameters, pointers, function or procedure parameters on remote calls (Copy-in). This
makes the implementation easier, but breaks down the transparency.
Protocol : Another key implementation issue is the protocol to be used - TCP or UDP. If
TCP is used then there may be problem in case of network breakdown. No problem
occurs if the breakdown happens before client sends its request (client will be notified of
this), or after the request is sent and the reply is not received ( time-out will occur). In
case the breakdown occurs just after the server has sent the reply, then it won't be able to
figure out whether its response has reached the client or not. This could be devastating for
bank servers, which need to make sure that their reply has in fact reached to the client
( probably an ATM machine). So UDP is generally preferred over TCP, in making remote
procedure calls.
Idempotent Operations:
If the server crashes, in the middle of the computation of a procedure on behalf of a
client, then what must the client do? Suppose it again sends its request, when the server
comes up. So some part of the procedure will be re-computed. It may have instructions
whose repeated execution may give different results each time. If the side effect of
multiple execution of the procedure is exactly the same as that of one execution, then we
call such procedures as Idempotent Procedures. In general, such operations are called
Idempotent Operations.
For e.g. consider ATM banking. If I send a request to withdraw Rs. 200 from my account
and some how the request is executed twice, then in the two transactions of 'withdrawing
Rs. 200' will be shown, whereas, I will get only Rs. 200. Thus 'withdrawing is a non-
idempotent operation. Now consider the case when I send a request to 'check my
balance'. No matter how many times is this request executed, there will arise no
inconsistency. This is an idempotent operation.
Semantics of RPC :
If all operations could be cast into an idempotent form, then time-out and retransmission
will work. But unfortunately, some operations are inherently non-idempotent (e.g.,
transferring money from one bank account to another ). So the exact semantics of RPC
systems were categorized as follows:
• Exactly once : Here every call is carried out 'exactly once', no more no less. But
this goal is unachievable as after a server crash it is impossible to tell that a
particular operation was carried out or not.
• At most once : when this form is used control always returns to the caller. If
everything had gone right, then the operation will have been performed exactly
once. But, if a server crash is detected, retransmission is not attempted, and
further recovery is left up to the client.
• At least once : Here the client stub keeps trying over and over, until it gets a
proper reply. When the caller gets control back it knows that the operation has
been performed one or more times. This is ideal for idempotent operations, but
fails for non-idempotent ones.
• Last of many : This a version of 'At least once', where the client stub uses a
different transaction identifier in each retransmission. Now the result returned is
guaranteed to be the result of the final operation, not the earlier ones. So it will be
possible for the client stub to tell which reply belongs to which request and thus
filter out all but the last one.
Sun RPC allows both TCP and UDP for communication between remote procedures and
programs calling them. It uses the at least once semantic i.e., the remote procedure is
executed at least once. It uses copy-in method of parameter passing but does not support
copy-out style. It uses XDR for data representation. It does not handle orphans(which are
servers whose corresponding clients have died). Thus if a client gives a request to a
server for execution of a remote procedure and eventually dies before accepting the
results, the server does not know whom to reply. It also uses a tool called rpcgen to
generate stubs automatically.
Let us suppose that a client (say client1) wants to execute procedure P1(in the figure
above). Another client (say client2) wants to execute procedure P2(in the figure above).
Since both P1 and P2 access common global variables they must be executed in a
mutually exclusive manner. Thus in view of this Sun RPC provides mutual exclusion by
default i.e. no two procedures in a program can be active at the same time. This
introduces some amount of delay in the execution of procedures, but mutual exclusion is
a more fundamental and important thing to provide, without it the results may go wrong.
Thus we see that anything which can be a threat to application programmers, is provided
How A Client Invokes A Procedure On Another Host
The remote procedure is a part of a program executing in a remote host. Thus we would
have to properly locate the host, the program in it, and the procedure in the program.
Each host can be specified by a unique 32-bit integer. SUN RPC standard specifies that
each remote program executing on a computer must be assigned a unique 32-bit integer
that the caller uses to identify it. Furthermore, Sun RPC assigns a 32-bit integer identifier
for each remote procedure inside a given remote program. The procedures are numbered
sequentially: 1, 2, ...., N. To help ensure that program numbers defined by separate
organizations do not conflict, Sun RPC has divided the set of program numbers into eight
Thus it seems sufficient that if we are able to locate the host, the program in the host, and
the procedure in the program, we would be able to uniquely locate the remote procedure
which is to be executed.
If an RPC program does not use a reserved, well-known protocol port, clients cannot
contact it directly. Because, when the server (remote program) begins execution, it asks
the operating system to allocate an unused protocol port number. The server uses the
newly allocated protocol port for all communication. The system may choose a different
protocol port number each time the server begins(i.e., the server may have a different port
assigned each time the system boots).
The client (the program that issues the remote procedure call) knows the machine address
and RPC program number for the remote program it wishes to contact. However, because
the RPC program (server) only obtains a protocol port after it begins execution, the client
cannot know which protocol port the server obtained. Thus, the client cannot contact the
remote program directly.
To allow clients to contact remote programs, the Sun RPC mechanism includes a
dynamic mapping service. The RPC port mapping mechanism uses a server to maintain a
small database of port mappings on each machine. This RPC server waits on a particular
port number (111) and it receives the requests for all remote procedure calls.
Whenever a remote program (i.e., a server) begins execution, it allocates a local port that
it will use for communication. The remote program then contacts the server on its local
machine for registration and adds a pair of integers to the database:
Once an RPC program has registered itself, callers on other machines can find its
protocol port by sending a request to the server. To contact a remote program, a caller
must know the address of the machine on which the remote program executes as well as
the RPC program number assigned to the program. The caller first contacts the server on
the target machine, and sends an RPC program number. The server returns the protocol
port number that the specified program is currently using. This server is called the RPC
port mapper or simply the port mapper. A caller can always reach the port mapper
because it communicates using the well known protocol port, 111. Once a caller knows
the protocol port number the target program is using, it can contact the remote program
program directly.
RPC Programming
RPC Programming can be thought in multiple levels. At one extreme, the user writing the
application program uses the RPC library. He/she need not have to worry about the
communication through the network. At the other end there are the low level details about
network communication. To execute a remote procedure the client would have to go
through a lot of overhead e.g., calling XDR for formatting of data, putting it in output
buffer, connecting to port mapper and subsequently connecting to the port through which
the remote procedure would communicate etc. The RPC library contains procedures that
provide almost everything required to make a remote procedure call. The library contains
procedures for marshaling and unmarshaling of the arguments and the results
respectively. Different XDR routines are available to change the format of data to XDR
from native, and from XDR to native format. But still a lot of overhead remains to
properly call the library routines. To minimize the overhead faced by the application
programmer to call a remote procedure a tool named rpcgen is devised which generates
client and server stubs. The stubs are generated automatically, thus they have loose
flexibility e.g., the timeout time, the number of retransmissions are fixed. The program
specification file is given as input and both the server and client stubs are automatically
generated by rpcgen. The specification file should have a .x extension attatched to it. It
contains the following information:-
• constant declarations ,
• global data (if any),
• information about all remote procedures ie.
• procedure argument type ,
• return type .
int PRINTMESSAGE ( string ) = 1;
} = 1;
} = 99;
We will have to do some changes on the server as well as client side. The server program
( msg_proc.c ) will look like this :
#include <stdio.h>
#inculde <rpc/rpc.h>
#include "msg.h"
int *printmessage_1( msg )
char **msg;
After creating the specification file we give the command $rpcgen spec.x ( where
spec.x is the name of the specification file ). The following files actions are taken and the
files spec.h, spec_svc.c, spec_clnt.c get created :
Once we have these files we write
$cc msg_proc.c spec_svc.c
$cc client.c spec_clnt.c
1. When we start the server program it creates a socket and binds any local port to it. It
then calls svc_register, to register the program number and version. This function contacts
the port mapper to register itself.
2. When the client program is started it calls clnt_create. This call specifies the name of
the remote system, the program number, version number, and the protocol. This functions
contacts the port mapper and finds the port for the server ( Sun RPC supports both TCP
and UDP).
3. The client now calls a remote procedure defined in the client stub. This stub sends the
datagram/packet to the server, using the port number obtained in step two. The client
waits for a response transmitting the requests a fixed number of times in case of a
timeout. This datagram/packet is received by the server stub associated with the server
program. The server stub executes the called procedure. When the function returns to the
server stub it takes the return value, converts it to the XDR format and transmits it back
to the client. The client stub receives the response, converts it as required and returns to
the client program
Distributed Applications
We can use any of the following two approaches in designing a distributed application.
Semantics of Applications
• Normal Application: A main program which may call procedures defined within
the program (proc A in this case). On return from this procedure the program
continues. This procedure (proc A) may itself call other procedures (proc B in this
case). Refer to the figure below:
• Distributed Application: A client program executing on a machine 1 may call a
procedure (proc A) which is defined and run on another machine ( we say server
for machine 1 is machine 2) upon return from the call the program on machine 1
continues. The server program on machine 2 may in turn act as a client and call
procedures on another machine3 (now machine 3 is a server for machine 2). Refer
to the figure below:
Passing Arguments in Distributed Programs
Possible Solutions
• One solution may be to find out the architecture of receiving end, convert the data
to be sent to that architectue and then send the data. However, this will lead to
following problems:
1. It is not easy to find out the architecture of a machine.
2. If I change the architecture of my machine then this information has to be
conveyed to the client.
• Another solution is to have a standard format for networks. This may lead to
inefficiency in the case when the two communicating machines have the same
architecture beacuse in this case the conversion is unnecessary.
XDR (External Data Representation)
XDR was the solution adopted by SUN RPC. RPC was mainly the outcome of the need
for distributed filesystems(NFS).
Buffer Paradigm
The program allocates a buffer large enough to hold the external representation of a
message and adds items one at a time. The library routine invoked to allocate space for
the buffer is xdr_mem_create . After allocating space we may append data to this buffer
using various conversion library routines like xdr_int (xdr_int coverts an integer to it's
external representaion and appends it to the buffer) to convert native objects to external
representaion and then append to the buffer. After all the data to be passed has been
converted and appended we send the buffer.
First add the information related to the the data being sent to the buffer and then append
the data to the buffer. For example, to send a character followed by an integer (if the
sending machine uses one byte for char and two bytes for integers) we send the
information as - one byte char, two byte integer ...
The routines for encoding and decoding are the same, depending on the type of the buffer
which may be (specified at the time fo allocating space for the buffer) XDR_ENCODE or
XDR_DECODE encoding or decoding are performed respectively.
There are routines (like xdr_stdin_create) to write/read from sockets and file descriptors.
Given a reliable end-to-end trasport protocol like TCP, File Transfer might seem trivial.
But, the details authorization, representation among heterogeneous machines make the
protocol complex.
FTP allows concurrent accesses by nultiple clients. Clients use TCP to connect to the
server. A master server awaits connections and creates a slave process to handleeach
connection. Unlike most servers, the slave process does not perform all the necessary
computation. Instead the slave accepts and handles the control connection from the client,
but uses an additinal process to handle a separate data transfer connection. The control
connection carries the command that tells the server which file to transfer.
Data transfer connections and the data transfer processes that use them can be created
dynamically when needed, but the control connection persists throughout a session. Once
the control connection disappears, the session is terminated and the software at both ends
terminates all data transfer processes.
In addition topassing user commands to the server, FTP uses the control connection to
allow client and server processes to coordinate their use of dynamically assigned TCP
protocol ports and the creation of data transfer processes that use those ports.
Proxy commands - allows one to copy files from any machine to any other arbitrary
machine ie. the machine the files are being copied to need not be the client but any other
Sometimes some special processing can be done which is not part of the protocol. eg. if
a request for copying a file is made by issuing command 'get file_A.gz' and the zipped
file does not exist but the file file_A does , then the file is automatically zipped and sent.
Consider what happens when the connection breaks during a FTP session. Two things
may happen, certain FTP servers may again restart from the beginning and whatever
portion of the file had been copied is overwritten. Other FTP servers may ask the client
how much it has already read and it simply continues from that point.
TFTP stands for Trivial File Transfer Protocol. Many applications do not need the full
functionality of FTP nor can they afford the complexity. TFTP provides an inexpensive
mechanism that does not need complex interactions between the client and the server.
TFTP restricts operations to simple file transfer and does not provide authentication.
Diskless devices have TFTP encoded in read-only memory(ROM) and use it to obtain an
initial memory image when the machine is powered on. The advantage of using TFTP is
that it allows bootstrapping code to use the same underlying TCP/IP protocols. that the
operating system uses once it begins execution. Thus it is possible for a computer to
bootstrap from a server on another physical network. TFTP does not have a reliable
stream transport service. It runs on top of UDP or any other unreliable packet delivery
system using timeout and retransmission to ensure that data arrives. The sending side
transmits a file in fixed size blocks and awaits acknowledgements for each block before
sending the next.
The first packet sent requests file transfer and establishes connection between server and
client. Other specifications are file name and whether it is to be transferred to client or to
the server. Blocks of the file are numbered starting from 1 and each data packet has a
header that specifies the number of blocks it carries and each acknowledgement contains
the number of the block being acknowledged. A block of less than 512 bytes signals end
of file. There can be five types of TFTP packets . The initial packet must use operation
codes 1 or 2 specifying either a read request or a write request and also the filename.
Once the read request or write request has been made the server uses the IP address and
UDP port number of the client to identify subsequent operations.Thus data or ack msgs
do not contain filename. The final message type is used to report errors.
TFTP supports symmetric retransmission. Each side has a timeout and retransmission.If
the side sending data times out, then it retransmits the last data block. If the receiving side
times out it retransmits the last acknowledgement. This ensures that transfer will not fail
after a single packet loss.
Problem caused by symmetric retransmission - Sorcerer's Apprentice Bug
When an ack for a data packet is delayed but not lost then the sender retransmits the same
data packet which the receiver acknowledges. Thus both the acks eventually arrives at the
sender and the sender now transmits the next data packet once corresponding to each ack.
Therefore a retransmission of all the subsequent packets are triggered . Basically the
receiver will acknowledge both copies of this packet and send two acks which causes the
sender in turn to send two copies of the next packet.. The cycle continues with each
packet being transmitted twice.
TFTP supports multiple file types just like FTP ie. binary and ascii data. TFTP may also
be integrated with email . When the file type is of type mail then the FILENAME field is
to be considered as the name of the mailbox and instead of writing the mail to a new file
it should be appended to it. However this implementation is not commonly used .
When an email is sent it is the mail transfer agent (MTA) of the source that contacts the
MTA of the destination. The protocol used by the MTA 's on the source and destination
side is called SMTP. SMTP stands for Simple Mail Transfer Protocol.. There are some
protocols that come between the user agent and the MTA eg. POP,IMAP which are
discussed later.
Mail Gateways -
Mail gateways are also called mail relays, mail bridges and in such systems the senders
machine does not contact the receiver's machine directly but sends mail across one or
more intermediate machines that forward it on. These intermediate machines are called
mail gateways.Mail gateways are introduce unreliablity.Once the sender sends to first
intermediate m/c then it discards its local copy. So failure at an intermediate machine
may result in message loss without informing the sender or the receiver. Mail gateways
also introduce delays. Neither the sender nor the receiver can determine how long the
delay will last or where it has been delayed.
However mail gateways have an advantage providing interoperability ie. they provide
connections among standard TCP/IP mail systems and other mail systems as well as
between TCP/IP internets and networks that do not support Internet protocols. So when
there is a change in protocol then the mail gateway helps in translating the mail message
from one protocol to another since it will be designed to understand both. .
TCP/IP protocol suite specifies a standard for the exchange of mail between machines. It
was derived from the (MTP ) Mail Transfer Protocol. it deals with how the nderlying mail
delivery system passes messages across a link from one.machine to another. The mail is
enclosed in what is called an envelope . The enveilope contains the To and From fields
and these are followed by the mail . The mail consists of two parts namely the Header
and the Data.
The Header has the To and From fields. If Headers are defined by us they should start
with X. The standard headers do not start with X.
In SMTP data portion can contain only printable ASCII characters The old method of
sending a binary file was to send it in uuencoded form but there was no way to
distinguish between the many types of binary files possible eg. .tar , .gz , .dvi etc.
MIME-Version: 1.0
Content-Type: image/gif
Content-Transfer-Encoding: base64
Content Descirption : contains the file name of the file that is being sent. Content
-Type : is an important field that specifies the data format ie. tells what kind of data is
being sent. It contains two identifiers a content type and a subtype separated by a slash.
for e.g. image/gif
There are 7 Content Types -
1. text
2. image
3. video
4. audio
5. application
6. multipart
7. message
So a better protocol was proposed - ESMTPESMTP stands for Extended Simple Mail
Transfer Protocol. It is compatible with SMTP. Just as the first packet sent in SMTP is
HELO similarly in ESMTP the first packet is called EHELO. If the receiver supports
ESMTP then it will answer to this EHELO packet by sending what data type and what
kind of encoding it supports. Even a SMTP based receiver can reply to it. Also if there is
an error message or there is no answer then the sender uses SMTP.
The delivery protocols determine how the mail is transferred by the mail transfer agent
to the user agent which provides an interface for reading mails.
There are 3 kinds
1. POP3 (Post Office Protocol) Here the mail person accesses the mail box from
say a PC and the mail gets accumulated on a server. So in POP3 the mail
is downloaded to the PC at a time interval which can be specified by the
user. POP3 is used when the mail is always read from the same machine,
so it helps to download the mail to it in advance.
2.IMAP(Intermediate Mail Access Protocol) Here the user may access the mail box
on the server from different machines so there is no point in downloading
the mail before hand. Instead when the mail has to be read one has to log
on to the server. (IMAP thus provides authentication) The mailbox on the
server can be looked upon as a relational database.
3.DMSP(Distributive Mail System Protocol) There are multiple mailboxes on
servers. To read the mail I connect to them from time to time and whenever I
do so the mail will be downloaded. When a reply is sent then it will put the
message in a queue. Thus DMSP is like a pseudo MTA.
1. Managed Nodes
2. Management Stations
3. Management Information (called Object)
4. A management protocol
Internet is adressed as 1.3.61. All the objects under this domain has this string at the
beginning. The informations are exchanged in a standard and vendor-neutral way . All the
data are represented in Abstract Syntax Notation 1 (ASN.1). It is similar to XDR as in
RPC but it have widely different representation scheme. A part of it actually adopted in
SNMP and modified to form Structure Of Information Base. The Protocol specifies
various kinds of messages that can be exchanged between the managed nodes and the
management station.
Message Description
1. Get_Request Request the value for a variable
2. Get_Response Returns the value of the variable asked for
3. Get_Next_Request Request a variable next to the previous one
4. Set_Request Set the value of an Object.
5. Trap Agent to manager Trap report
6. Get_bulk_request Request a set of variable of same type
7. Inform_Request Exchange of MIB among Management stations
The last two options has been actually added in the SNMPv2. The fourth option need
some kind of authentication from the management station.
Addressing Example :
Following is an Example of the kind of address one can refer to when fetching a value
in the table :-
So when accessing the netmask of some IP-entity the variable name wld be : .1.3.key-value
Here since Ip-address the unique key to index any member of the array the address can
be like :-
This lecture discusses about security mechanisms in the Internet namely Firewall . In
brief, It's a configuration of routers and networks placed between an organization's
internal internet and a connection to an external internet to provide security. In other
words, Firewall is a mechanism to provide limited access to machines either from the
outside world to internal internet or from internal world to outside world. By, providing
these security mechanisms, we are increasing the processing time before one can access a
machine. So, there is a trade-off between security and ease of use. A firewall partitions an
internet into two regions, referred to informally as the inside and outside.
| | _________ Firewall
______________________ | | ____________________
| | | | | |
| | | | | |
| Rest of Internet |________ | |_____ | Intranet |
| | | | | |
|_____________________ | | | |___________________|
Outside Inside
Security Lapses
• Vulnerable Services - NFS : A user should not be allowed to export certain files
to the outside world and from the outside world also, someone should not be
allowed to export our files.
• Routing based attacks : Some kind of ICMP message should not be allowed to
enter my network. e.g.. Source routing and change route ICMP's.
• Controlled access to our systems : e.g.. Mail server and web pages should be
accessible from outside but our individual PC's should not be accessible from the
outside world.
• Authentication : Encryption can be used between hosts on different networks.
• Enhanced Privacy : Some applications should be blocked. e.g.. finger ...
• PING & SYN attack : Since these messages are send very frequently, therefore
you won't be able to do anything except reply to these messages. So, I should not
allow these messages to enter my network.
So. whatever I provide for my security is called Firewall. It is a mechanism and not
just a hardware or software.
Firewall Mechanisms
1. Network Policy : Here, we take into consideration, what services are allowed for
outside and inside users and the services which are allowed can have additional
restrictions. e.g.. I might be allowed to download things from the net but not upload i.e..
some outside users cannot download the things from our net. Some exceptional cases
might be there which have to be handled separately. And if some new application comes
up then , we choose an appropriate network policy.
2. Authentication mechanism : An application can be designed which ask for a password
for authentication.
3. Packet Filtering : Router have information about some particular packets which should
not be allowed.
1. Complacency : There are lots of attacks on the firewall from internal users and
therefore, it's limitations should be understood.
3. Throughput :So, in order to check which packets are allowed and which are not, we are
doing some processing which can be an overhead and thus affects throughput.
• One time passwords: passwords are used only once and then it changes. But only
the user and the machine knows the changing passwords.
• password aging : User are forced to change passwords after some time on regular
• smart cards : swipe through the PC.
• biometrics : eyes or finger prints are used.
Packet Filtering :
Terms associated:
• Source IP address
• Destination IP address
• Source port #
• Destination port #
• protocol
• interface
Many commercial routers offer a mechanism that augments normal routing and
permits a manager to further control packet processing. Informally called a packet filter,
the mechanism requires the manager to specify how the router should dispose of each
datagram. For example, the manager might choose to filter (i.e.. block) all datagrams that
come from a particular source or those used by a particular application, while choosing to
route other datagrams to their destination.
The term packet filter arises because the filtering mechanism does not keep a record
of interaction or a history of previous datagrams. Instead, the filter considers each
datagrams separately. When a datagram first arrives, the router passes the datagram
through its packet filter before performing any other processing. If the filter rejects the
datagram, the router drops it immediately.
For example, normally I won't allow TFTP, openwin, RPC, rlogin, rsh packets to pass
through the router whether from inside or outside and router just discard these packets.
But I might put some restrictions on telnet, ftp, http, and smtp packets in order to pass
through the router and therefore some processing is to be done before discarding or
allowing these packets.
Because TCP/IP does not dictate a standard for packet filters, each router vendor is free
to choose the capabilities of their packet filter as well as the interface the manager uses to
configure the filter. Some routers permit a manager to configure separate filter actions
for each interface, while others have a single configuration for all interfaces. Usually,
when specifying datagrams that the filter should block, a manager can list any
combination of source IP address, destination IP address, protocol, source protocol port
number, and destination protocol port number.
So, these filtering rules may become more tricky with complex network policies.
Since, Filtering rules are based on port numbers, there is a problem with RPC
applications. First, the number of well-known ports is large and growing. Thus, a
manager would need to update such a list continually because a simple error of omission
could leave the firewall vulnerable. Second, much of the traffic on an internet does not
travel to or from a well-known port. In addition to programmers who can choose port
numbers for their private client-server applications, services like Remote Procedure Call
(RPC) assigns port dynamically. Third, listing ports of well-known services leaves the
firewall vulnerable to tunneling, a technique in which one datagram is temporarily
encapsulated in another for transfer across part of an internet.
I can run multiple proxy on same machine. They may detect misuse by keeping loops.
For example, some machine give login to Ph.D.. students. So, in this case it's better to
keep proxy servers than to give login on those machines. But the disadvantage with this is
that there are two connections for each process.
_________ __________
| | | |
| User |_______________| Proxy |___________ Outside
| ________| 1. |_________ | 2.
_________ ___________
| | | |
Inside _________| Router 1 |_______________________ | Router 2 |______
|_________| | |__________ |
| |
| Proxy |
The problem with this is that there is only one proxy and thus, it may get overloaded.
Therefore, to reduce load, we can use multiple screened host firewalls. And this is what
normally used.
_________ __________
| | | |
Inside _____ | Router 1 |______________________________ | Router 2 |
|_________| | |__________ |
| |
| Proxy 1 | Proxy2 .......
|________ |
Modem pool
User can dial and open only a terminal server but he has to give a password. But
TELNET and FTP client does not understand proxy. Therefore, people come out with
Transparent proxy which means that I have some memory which keeps track of whether
this packet was allowed earlier or not and therefore, I need not check this time. Client
does not know that there is somebody who is checking my authentication.
So, transparent proxy is used only for checking the IP packets whereas proxy is used
when many IP addresses are not available.
X -------| |
| NAT |
Y -------|___________ |
I may not like to have global IP address because then, anybody can contact me inspite of
these security measures. So, I work with Private IP. In that case, there has to be a one-to-
one mapping between private IP and global IP.
Wireless Networks
As the need of communication became more and more demanding, new technologies in
the field of networks developed. One of them is the use of wireless networks. It is the
transmission of data from source to destination without the use of wires as the physical
1. They are ubiquitous networks. As the do not require messy wires as a medium of
communication, they can be used to connect far-off places.
2. They are cheaper than wired networks specially in the case of long-distance
3. They are pretty effective and fast, especially with the modern advancements in
this field.
1. Circuit Switching: In this technology, when a user makes a call, the resources are
reserved for him. The advantage of this technology is that it prevents collisions
among various users. But the disadvantage is that it leads to inefficient utilization
of bandwidth-- if the user fails to send data or if the transmission speed is faster
than the speed of sending data. then most of the bandwidth is wasted.
2. Packet Switching: In this technology, resources are never reserved for any
particular user. The advantage of this technology is that it leads to efficient
utilization of bandwidth i.e. the channel is never free until & unless there are no
users, But the disadvantage is that it causes many collision.
ATM was built as a combination of the best features of these two. Also ATM provides
QoS (Quality of Service) based on the following priority pattern:
1. CBR-Constant Bit Rate: Jobs that can tolerate no delay are assigned the CBR
priority. These jobs are provided same number of bits every frame.. For
example, viewing a video reel definitely requires some blocks in every frame.
2. VBR-Variable Bit Rate: Jobs that may produce different sized packets at
different times are assigned VBR priority. They are provided with a variable
number of bits varying between a maximum and a minimum in different frames.
e.g.. a document may be compressed differently by different machines.
Transmitting it will be a variable transmission.
3. ABR-Available Bit Rate: This is the same as VBR except that it has only the
minimum fixed. If there are no CBR or VBR jobs left, it can use the entire frame,
4. UBR-Unavailable Bit Rate: These jobs are the least priority jobs. The network
does not promise anything but simply tries its best to transmit it.
WLAN-Wireless LAN
This is currently being used as dictated by the standards of IEEE 802.11. It can be
installed at the medium access layer and the data transmission can occur using a
converter to reach the wired LAN network.( IEEE 802.x)
WATM-Wireless ATM
It is the wireless version of ATM. It provides QoS. It is not yet available in market.
because installing it will require the simultaneous installation of ATM infrastructure. It is
currently being tested thoroughly.
Coupling of Networks:
The alternatives are:
1. WLAN-LAN is the simplest of the above. According to the IEEE standards, the
IEEE 802.11 (WLAN) can be used with IEEE 802.x (LAN) as follows:
Spread Spectrum: To reduce the effect of noise signals, the bandwidth of the signal is
increased tremendously. This is costly but assures better transmission. This is called
SPREAD-SPECTRUM. This is used in two ways:
• FHSS (Frequency hopping spread spectrum): The entire packet is not sent at
the same bandwidth. Say, it is sent at frequency range A for time T1, frequency
range B for time T2, A for T1, B for T2 and so on. The receiver also knows this
sequence and so, looks at A for time T1, then at B for time T2 and so on. Thus this
sort of understanding between the sender and receiver prevents the signal from
being completely garbled .
• DSSS (Direct Sequence Spread Spectrum): This involves sending of coded
data instead of the actual data. This code is known to the destination only which
can decipher the data now.
The problem still left undealt is that of bursty errors. If there is lot of traffic, interference
may hinder the Base Station from receiving data for a burst of time. This is called
"Bursty Errors".
Such problem are looked at by MAC-Medium Access Control.
The Ad-Hoc network can be set up anytime. It does not require a Base Station. It is
generally used for indoor purposes.
The Infrastructure network involves Base Station and Mobile Terminals. It provides
uplink facility ( link from MT to BS) and downlink facility (link from BS to MT).
This protocol decides how to assign data slots to different users. The various policies it
uses are:
a fixed bandwidth in which he can speak at all
3. CDMA (CODIVISION MULTIPLE ACCESS): Each user is given different
frequencies at different times. This ensures that each user gets a fair amount of
Also, sometimes, statistical multiple access is used in which a slot is assigned to a user
only if it has data to send.
1. FDD (Frequency Division Duplex) This provides two separate bandwidths for
uplink and downlink transmission. This leads to inefficient utilization of
bandwidth as there is more traffic on downlink than uplink
2. TDD (Time Division Duplex) This provides an adoptive boundary between the
uplink and downlink frequency which depends on the what is being used at that
particular time. It works as follows:
Any mobile terminal can be in 3 states : empty state, request state and ready-to-transmit
Network Security
Data on the network is analogous to possessions of a person. It has to be kept secure from
others with malicious intent. This intent ranges from bringing down servers on the
network to using people's private information like credit card numbers to sabotage of
major organizations with a presence on a network. To secure data, one has to ensure that
it makes sense only to those for whom it is meant. This is the case for data transactions
where we want to prevent eavesdroppers from listening to and stealing data. Other
aspects of security involve protecting user data on a computer by providing password
restricted access to the data and maybe some resources so that only authorized people get
to use these, and identifying miscreants and thwarting their attempts to cause damage to
the network among other things.
1. Authentication: We have to check that the person who has requested for
something or has sent an e-mail is indeed allowed to do so. In this process we will
also look at how the person authenticates his identity to a remote machine.
2. Integrity: We have to check that the message which we have received is indeed
the message which was sent. Here CRC will not be enough because somebody
may deliberately change the data. Nobody along the route should be able to
change the data.
3. Confidentiality: Nobody should be able to read the data on the way so we need
4. Non-repudiation: Once we sent a message, there should be no way that we can
deny sending it and we have to accept that we had sent it.
5. Authorization: This refers to the kind of service which is allowed for a particular
client. Even though a user is authenticated we may decide not to authorize him to
use a particular service.
For authentication, if two persons know a secret then we just need to prove that no third
person could have generated the message. But for Non-repudiation we need to prove that
even the sender could not have generated the message. So authentication is easier than
Non-repudiation. To ensure all this, we take the help of cryptography. We can have two
kinds of encryption :
1. Symmetric Key Encryption: There is a single key which is shared between the
two users and the same key is used for encrypting and decrypting the message.
2. Public Key Encryption: There are two keys with each user : a public key and a
private key. The public key of a user is known to all but the private key is not
known to anyone except the owner of the key. If a user encrypts a message in his
private key then it can be decrypted by anyone by using the sender's public key.
To send a message securely, we encrypt the message in the public key of the
receiver which can only be decrypted by the user with his private key.
Symmetric key encryption is much faster and efficient in terms of performance. But it
does not give us Non-repudiation. And there is a problem of how do the two sides agree
on the key to be used assuming that the channel is insecure ( others may snoop on our
packet ). In symmetric key exchange, we need some amount of public key encryption for
authentication. However, in public key encryption, we can send the public key in plain
text and so key exchange is trivial. But this does not authenticate anybody. So along with
the public key, there needs to be a certificate. Hence we would need a public key
infrastructure to distribute such certificates in the world.
Key Exchange in Symmetric Key Schemes
We will first look at the case where we can use public key encryption for this key
exchange. . The sender first encrypts the message using the symmetric key. Then the
sender encrypts the symmetric key first using it's private key and then using the receiver's
public key. So we are doing the encryption twice. If we send the certificate also along
with this then we have authentication also. So what we finally send looks like this :
Z: Certificatesender + Publicreciever ( Privatesender ( Ek ) ) + Ek ( M )
Here Ek stands for the symmetric key and Ek ( M ) for the message which has been
encrypted in this symmetric key.
However this still does not ensure integrity. The reason is that if there is some change in
the middle element, then we will not get the correct key and hence the message which we
decrypt will be junk. So we need something similar to CRC but slightly more
complicated. This is because somebody might change the CRC and the message
consistently. This function is called Digital Signature.
Digital Signatures
Suppose A has to send a message to B. A computes a hash function of the message and
then sends this after encrypting it using its own private key. This constitutes the signature
produced by A. B can now decrypt it, recompute the hash function of the message it has
received and compare the two. Obviously, we would need the hash functions to be such
that the probability of two messages hashing to the same value is extremely low. Also, it
should be difficult to compute a message with the same hash function as another given
message. Otherwise any intruder could replace the message with another that has the
same hash value and leave the signatures intact leading to loss of integrity. So the
message along with the digital signature looks like this :
Z + Privatesender ( Hash ( M ) )
Digital Certificates
In addition to using the public key we would like to have a guarantee of talking to a
known person. We assume that there is an entity who is entrusted by everyone and whose
public key is known to everybody. This entity gives a certificate to the sender having the
sender's name, some other information and the sender's public key. This whole
information is encrypted in the private key of this trusted entity. A person can decrypt this
message using the public key of the trusted authority. But how can we be sure that the
public key of the authority is correct ? In this respect Digital signatures are like I-Cards.
Let us ask ourselves the question : How safe are we with I-Cards? Consider a situation
where you go to the bank and need to prove your identity. I-Card is used as a proof of
your identity. It contains your signature. How does the bank know you did not make the
I-Card yourselves? It needs some proof of that and in the case of I-Cards they contain a
counter signature by the director for the purpose. Now how does the bank know the
signature I claim to be of the director indeed belongs to him? Probably the director will
also have an I-Card with a counter signature of a higher authority. Thus we will get a
chain of signing authorities. Thus in addition to signing we need to prove that the
signatures are genuine and for that purpose we would probably use multiple I-Cards each
carrying a higher level of signature-counter signature pair.
Key exchange in symmetric key schemes is a tricky business because anyone snooping
on the exchange can get hold of the key if we are not careful and since there is no public-
private key arrangement here, he can obtain full control over the communication. There
are various approaches to the foolproof exchange of keys in these schemes. We look at
one approach which is as follows:-
A and B are two persons wishing to communicate. Both of them generate a random
number each, say x and y respectively. There is a function f which has no inverse. Now A
sends f(x) to B and B sends f(y) to A. So now A knows x and f(y) and B knows y and
f(x). There is another function g such that g(x, f(y)) = g(y, f(x)). The key used by A is
g(x, f(y)) and that used by B is g(y, f(x)). Both are actually same. The implementation of
this approach is described below :
1. A has two large prime numbers n and g. There are other conditions also that these
numbers must satisfy.
2. A sends n, g and gx mod n to B in a message. B evaluates (gx mod n)y to be used
as the key.
3. B sends gy mod n to A. A evaluates (gy mod n)x to be used as the key. So now
both parties have the common number gxy mod n. This is the symmetric (secret
communication) key used by both A and B now.
This works because though the other people know n, g, gx mod n, gy mod n but still they
cannot evaluate the key because they do not know either x or y.
However there is a security problem even then. Though this system cannot be broken but
it can be bypassed. The situation which we are referring to is called the man-in-the-
middle attack. We assume that there is a guy C in between A and B. C has the ability to
capture packets and create new packets. When A sends n, g and gx mod n, C captures
them and sends n, g and gz mod n to B. On receiving this B sends n, g and gy mod n but
again C captures these and sends n, g and gz mod n to A. So A will use the key (gz mod
n)x and B will use the key (gz mod n)y . Both these keys are known to C and so when a
packet comes from A, C decrypts it using A's key and encrypts it in it's own key and then
sends it to B. Again when a packet comes from B, it does a similar thing before sending
the packet to A. So effectively there are two keys - one operating between A and C and
the other between C and B.
There must be some solution to this problem. The solution can be such so that we may
not be able to communicate further ( because our keys are different ) but atleast we can
prevent C from looking at the data. We have to do something so that C cannot encrypt or
decrypt the data. We use a policy that A only sends half a packet at a time. C cannot
decrypt half a packet and so it is stuck. A sends the other half only when it receives a
half-packet from B. C has two options when it receives half a packet :
1. It does not send the packet to B at all and dumps it. In this case B will anyway
come to know that there is some problem and so it will not send it's half-packet.
2. It forwards the half-packet as it is to B. Now when B sends it's half-packet, A
sends the remaining half. When B decrypts this entire packet it sees that the data
is junk and so it comes to know that there is some problem in communication.
Here we have assumed that there is some application level understanding between A and
B like the port number. If A sends a packet at port number 25 and receives a packet at
port number 35, then it will come to know that there is some problem. At the very least
we have ensured that C cannot read the packets though it can block the communication.
There is another much simpler method of exchanging keys which we now discuss :
Network Security(Contd...)
Key Distribution Centre(Recap.)
There is a central trusted node called the Key Distribution Center ( KDC ). Every node
has a key which is shared between it and the KDC. Since no one else knows node A's
secret key KA, KDC is sure that the message it received has come from A. When A wants
to communicate with B it could do two things:
1. A sends a message encrypted in it's key KA to the KDC. The KDC then sends a
common key KS to both A and B encrypted in their respective keys KA and KB. A
and B can communicate safely using this key.
2. Otherwise A sends a key KS to KDC saying that it wants to talk to B encrypted in
the key KA. KDC send a message to B saying that A wants to communicate with
you using KS.
There is a problem with this implementation. It is prone to replay attack. The messages
are in encrypted form and hence would not make sense to an intruder but they may be
replayed to the listener again and again with the listener believing that the messages are
from the correct source. When A send a message KA(M), C can send the same message to
B by using the IP address of A. A solution to be used is to use the key only once. If B
sends the first message KA(A,KS) also along with K(s,M), then again we may have
trouble. In case this happens, B should accept packets only with higher sequence
• Timestamps which however don't generally work because of the offset in time
between machines. Synchronization over the network becomes a problem.
• Nonce numbers which are like ticket numbers. B accepts a message only if it has
not seen this nonce number before.
In general, 2-way handshakes are always prone to attacks. So we now look at an another
This is like a bug-fix to the KDC scheme to eliminate replay attacks. A 3-way handshake
(using nonce numbers) very similar to the ubiquitous TCP 3-way handshake is used
between communicating parties. A sends a random number RA to KDC. KDC send back a
ticket to A which has the common key to be used.
RA, RB and RA2 are nonce numbers. RA is used by A to communicate with the
KDC. On getting the appropriate reply from the KDC, A starts communicating
with B, whence another nonce number RA2 is used. The first three messages tell B
that the message has come from KDC and it has authenticated A. The second last
message authenticates B. The reply from B contains RB, which is a nonce number
generated by B. The last message authenticates A. The last two messages also
remove the possibility of replay attack.
However, the problem with this scheme is that if somehow an intruder gets to
know the key KS ( maybe a year later ), then he can replay the entire thing
( provided he had stored the packets ). One possible solution can be that the ticket
contains a time stamp. We could also put a condition that A and B should change
the key every month or so. To improve upon the protocol, B should also involve
KDC for authentication. We look at one possible improvement here. which is a
different protocol.
Suppose nodes A and B have a shared key KAB which was somehow pre-decided
between them. Can we have a secure communication between A and B ? We must
have some kind of a three way handshake to avoid replay attack So, we need to
have some interaction before we start sending the data. A challenges B by sending
it a random number RA and expects an encrypted reply using the pre-decided key
KAB. B then challenges A by sending it a random number RB and expects an
encrypted reply using the pre-decided key KAB.
1. A, RA------------->
2. <--------KAB(RA), RB
3. KAB(RB)---------->
Unfortunately this scheme is so simple that this will not work. This protocol
works on the assumption that there is a unique connection between A and B. If
multiple connections are possible, then this protocol fails. In replay attack, we
could repeat the message KAB(M) if we can somehow convince B that I am A.
Here, a node C need not know the shared key to communicate with B. To identify
itself as A, C just needs to send KAB(RB1) as the response to the challenge-value
RB1 given by B in the first connection. C can remarkably get this value through the
second connection by asking B itself to provide the response to its own challenge.
Thus, C can verify itself and start communicating freely with B.
Thus, replay of messages becomes possible using the second connection. Any
encryption desired, can be obtained by sending the value as RB2 in the second
connection, and obtaining its encrypted value from B itself.
1 Connection: A, RA------------->
<----------KAB(RA), RB1
2nd Connection: A, RB1------------>
<--------- KAB(RB1), RB2
1 Connection: KAB(RB1)--------->
Can we have a simple solution apart from time-stamp ? We could send KAB(RA,RB)
in the second message instead of KAB(RA) and RA. It may help if we keep two
different keys for different directions. So we share two keys one from A to B and
the other from B to A. If we use only one key, then we could use different number
spaces ( like even and odd) for the two directions. Then A would not be able to
send RB. So basically we are trying to look at the traffic in two directions as two
different traffics. This particular type of attack is called reflection attack.
5 - way handshake
We should tell the sender that the person who initiates the connection should
authenticate himself first. So we look at another protocol. Here we are using a 5-
way handshake but it is secure. When we combine the messages, then we are
changing the order of authentication which is leading to problems. Basically
KAB(RB) should be sent before KAB(RA). If we have a node C in the middle, then C
can pose as B and talk to A. So C can do replay attack by sending messages which
it had started some time ago.
1. A------------------>
2. <-----------------RB
3. KAB(RB)---------->
4. RA---------------->
5. <----------KAB(RA)
Fig: 5-way handshake in Challenge-Response Protocol
However in this case also, if we have a node C in the middle, then C can pose as
B and talk to A. So C can do replay attack by sending messages which it had
stored some time ago !!
Kerberos was created by Massachusetts Institute of Technology as a solution to many
network security problems. It is being used in the MIT campus for reliability. The basic
features of Kerberos may be put as:
1. Client to the Authentication Server(AS): The following data in plain text form
are sent:
o Username.
o Ticket Granting Server(TGS) name.
o A nonce id 'n'.
2. Response from the Authentication Server(AS) to the Client: The following data
in encrypted form with the key shared between the AS and the Client is sent:
o The TGS session key.
o The Ticket Granting Ticket. This contains the following data encrypted
with the TGS password and can be decrypted by the TGS only.
The TGS name.
The Work Station address.
The TGS session key.
o The nonce id 'n'.
3. Client to the Ticket Granting Server: This contains the following data
o The Ticket Granting ticket.
o Authenticator.
o The Application Server.
o The nonce id 'n'
4. Ticket Granting Server to the Client: The following data encrypted by the TGS
session key is sent:
o The new session key.
o Nonce id 'n'
o Ticket for the application server- The ticket contains the following data
encrypted by the application servers' key:
Server name
The Workstation address
The new session key.
After these exchanges the identity of the user is confirmed and the normal exchange of
data in encrypted form using the new session key can take place. The current version of
Kerberos being developed is Kerberos V5.
Types of Tickets
By establishing "inter-realm" keys, the administrators of two realms can allow a client
authenticated in the local realm to use its authentication remotely (Of course, with
appropriate permission the client could arrange registration of a separately-named
principal in a remote realm, and engage in normal exchanges with that realm's services.
However, for even small numbers of clients this becomes cumbersome, and more
automatic methods as described here are necessary). The exchange of inter-realm keys (a
separate key may be used for each direction) registers the ticket-granting service of each
realm as a principal in the other realm. A client is then able to obtain a ticket-granting
ticket for the remote realm's ticket- granting service from its local realm. When that
ticket-granting ticket is used, the remote ticket-granting service uses the inter- realm key
(which usually differs from its own normal TGS key) to decrypt the ticket-granting ticket,
and is thus certain that it was issued by the client's own TGS. Tickets issued by the
remote ticket- granting service will indicate to the end-service that the client was
authenticated from another realm.
Limitations of Kerberos
Whether it came from A or from someone else., but he plays along and sends A back a
message containing A's n1, his own random number n2 and a proposed session key, Ks.
When A gets this message, he decrypts it using his private key. He sees n1 in it, and
hence gets sure that B actually got the message. The message must have come from B,
since none else can determine n1. A agrees to the session by sending back message. When
B sees n2 encrypted with the session key he just generated, he knows A got message and
verified n1.
Digital Signatures
The authenticity of many legal, financial and other documents is determined by the
presence or absence of an authorized handwritten signature. The problem of devising a
replacement for handwritten signatures is a difficult one. Basically, what is needed is a
system bu which one party can send a assigned message to other party in such a way that:
Message Digest
One criticism of signature methods is that they often couple two distinct functions :
authentication and secrecy. Often, authentication is needed but secrecy is not. Since
cryptography is slow, it is frequently desirable to be able to send signed plaintext
documents.One scheme, known as MESSAGE DIGEST, is based on the idea of a one-
way hash function that takes an arbitrarily long piece of plaintext and from it computes a
fixed length bit string. This hash function has three important properties:
A DFS is a file system whose clients, servers and storage devices are dis- persed among
the machines of distributed system. A file system provides a set of file operations like
read, write, open, close, delete etc. which forms the file services. The clients are provided
with these file services. The basic features of DFS are multiplicity and autonomy of
clients and servers.
NFS follows the directory structure almost same as that in non-NFS system but there are
some differences between them with respect to:
• Naming
• Path Names
• Semantics
Naming is a mapping between logical and physical objects. For example, users refers to a
file by a textual name, but it is mapped to disk blocks. There are two notions regarding
name mapping used in DFS.
• Location Transparency: The name of a file does not give any hint of file's
physical storage location.
• Location Independence: The name of a file does not need to be changed when
file's physical storage location changes.
A location independent naming scheme is basically a dynamic mapping. NFS does not
support location independency.
There are three major naming schemes used in DFS. In the simplest approach, files are
named by some combination of machine or host name and the path name. This naming
scheme is neither location independent nor location transparent. This may be used in
server side. Second approach is to attach or mount the remote directories to the local
directories. This gives an appearance of a coherent directory. This scheme is used by
NFS. Early NFS allowed only previously mounted remote directories. But with the
advent of automount , remote directories are mounted on demand based on the table of
mount points and file structure names. This has other advantages like the file-mount table
size is much smaller and for each mount point, we can specify many servers. The third
approach of naming is to use name space which is identical to all machines. In practice,
there are many special files that make this approach difficult to implement.
The mount protocol is used to establish the initial logical connection between a server
and a client. A mount operation includes the name of the remote directory to be mounted
and the name of the server machine storing it. The server maintains an export list which
specifies local file system that it exports for mounting along with the permitted machine
names. Unix uses /etc/exports for this purpose. Since, the list has a maximum length,
NFS is limited in scalabilty. Any directory within an exported file system can be mounted
remotely on a machine. When the server receives a mount request, it returns a file handle
to the client. File handle is basically a data-structure of length 32 bytes. It serves as the
key for further access to files within the mounted system. In Unix term, the file handle
consists of a file system identifier that is stored in super block and an inode number to
identify the exact mounted directory within the exported file system. In NFS, one new
field is added in inode that is called the generic number.
Except the opening and closing a file, there is almost one-to-one mapping between Unix
system calls for file operations and the NFS protocol RPCs. A remote file operation can
be translated directly to the corresponding RPC. Though conceptu- ally, NFS adheres to
the remote service paradigm, in practice, it uses buffering and caching. File blocks and
attributes are fetched by RPCs and cached locally. Future remote operations use the
cached data, subject to consistency constraints.
Since, NFS runs on RPC and RPC runs on UDP/IP which is unreliable, operations should
be idempotent.
In NFS, clients use delayed write. But they don't free delayed written block until the
server confirms that the data have been written on disk. So, here, Unix semantics are not
preserved. NFS does not handle client crash recovery like Unix. Since, servers in NFS are
stateless, there is no need to handle server crash recovery also.
Time Skew
Because of differences of time at server and client, this problem occures. This may lead
to problems in performing some operations like " make ".
Performance Issues
To increase the reliability and system performance, the following things are generally
This is a brief description of NFS version 2. NFS version 3 has already been come out
and this new version is an enhancement of the previous version. It removes many of the
difficulties and drawbacks of NFS 2.
• Whole File Serving: The entire file is transferred in one go, limited only by the
maximum size UDP/IP supports
• Whole File Caching: The entire file is cached in the local machine cache,
reducing file-open latency, and frequent read/write requests to the server
• Write On Close: Writes are propagated to the server side copy only when the
client closes the local copy of the file
In AFS, the server keeps track of which files are opened by which clients (as was not in
the case of NFS). In other words, AFS has stateful servers, whereas NFS has stateless
servers. Another difference between the two file systems is that AFS provides location
independence (the physical storage location of the file can be changed, without having to
change the path of the file, etc.) as well as location transparency (the file name does not
hint at its physical storage location). But as was seen in the last lecture, NFS provides
only location transparency. Stateful servers in AFS allow the server to inform all clients
with open files about any updates made to that file by another client, through what is
known as a callback. Callbacks to all clients with a copy of that file is ensured as a
callback promise is issued by the server to a client when it requests for a copy of a file.
• Vice: The server side process that resides on top of the unix kernel, providing
shared file services to each client
• Venus: The client side cache manager which acts as an interface between the
application program and the Vice
All the files in AFS are distributed among the servers. The set of files in one server is
referred to as a volume. In case a request can not be satisfied from this set of files, the
vice server informs the client where it can find the required file.
• Open a file: Venus traps application generated file open system calls, and checks
whether it can be serviced locally (i.e. a copy of the file already exists in the
cache) before requesting Vice for it. It then returns a file descriptor to the calling
application. Vice, along with a copy of the file, transfers a callback promise, when
Venus requests for a file.
• Read and Write: Reads/Writes are done from/to the cached copy.
• Close a file: Venus traps file close system calls and closes the cached copy of the
file. If the file had been updated, it informs the Vice server which then replaces its
copy with the updated one, as well as issues callbacks to all clients holding
callback promises on this file. On receiving a callback, the client discards its copy,
and works on this fresh copy.
The server wishes to maintain its states at all times, so that no information is lost due to
crashes. This is ensured by the Vice which writes the states to the disk. When the server
comes up again, it also informs all the servers about its crash, so that information about
updates may be passed to it.
A client may issue an open immediately after it issued a close (this may happen if it has
recovered from a crash very quickly). It will wish to work on the same copy. For this
reason, Venus waits a while (depending on the cache capacity) before discarding copies
of closed files. In case the application had not updated the copy before it closed it, it may
continue to work on the same copy. However, if the copy had been updated, and the client
issued a file open after a certain time interval (say 30 seconds), it will have to ask the
server the last modification time, and accordingly, request for a new copy. For this, the
clocks will have to be synchronized.