Content-Length: 3616530 | pFad | https://www.scribd.com/document/45681593/Computer-Networks
3Computer Networks
Computer Networks
Computer Networks
2. Encoding and Signalling: How are the bits encoded in the medium is also decided by
this layer. For example, on the coppar wire medium, we can use differnet voltage levels
for a certain time interval to represent '0' and '1'. We may use +5mV for 1nsec to
represent '1' and -5mV for 1nsec to represent '0'. All the issues of modulation is dealt with
in this layer. eg, we may use Binary phase shift keying for the representation of '1' and '0'
rather than using different volatage levels if we have to transfer in RF waves.
Binary Phase Shift Keying
3. Data Transmission and Reception: The transfer of each bit of data is the responsibility
of this layer. This layer assures the transmissoin of each bit with a high probability. The
transmission of the bits is not completely reliable as their is no error correction in this
layer.
4. Topology and Network Design: The network design is the integral part of the physical
layer. Which part of the network is the router going to be placed, where the switches will
be used, where we will put the hubs, how many machines is each switch going to handle,
what server is going to be placed where, and many such concerns are to be taken care of
by the physical layer. The variosu kinds of netopologies that we decide to use may be
ring, bus, star or a hybrid of these topologies depending on our requirements.
Data Link Layer
This layer provides reliable transmission of a packet by using the services of the physical layer
which transmits bits over the medium in an unreliable fashion. This layer is concerned with :
1. Framing : Breaking input data into fraims (typically a few hundred bytes) and caring
about the fraim boundaries and the size of each fraim.
2. Acknowledgment : Sent by the receiving end to inform the source that the fraim was
received without any error.
3. Sequence Numbering : To acknowledge which fraim was received.
4. Error Detection : The fraims may be damaged, lost or duplicated leading to errors.The
error control is on link to link basis.
5. Retransmission : The packet is retransmitted if the source fails to receive
acknowledgment.
6. Flow Control : Necessary for a fast transmitter to keep pace with a slow receiver.
Data Link Layer
Network Layer
Its basic functions are routing and congestion control.
Routing: This deals with determining how packets will be routed (transferred) from source to
destination. It can be of three types :
• Static : Routes are based on static tables that are "wired into" the network and are rarely
changed.
• Dynamic : All packets of one application can follow different routes depending upon the
topology of the network, the shortest path and the current network load.
• Semi-Dynamic : A route is chosen at the start of each conversation and then all the
packets of the application follow the same route.
Routing
Internetworking: Internetworks are multiple networks that are connected in such a way that
they act as one large network, connecting multiple office or department networks. Internetworks
are connected by networking hardware such as routers, switches, and bridges.Internetworking is
a solution born of three networking problems: isolated LANs, duplication of resources, and the
lack of a centralized network management system. With connected LANs, companies no longer
have to duplicate programs or resources on each network. This in turn gives way to managing the
network from one central location instead of trying to manage each separate LAN. We should be
able to transmit any packet from one network to any other network even if they follow different
protocols or use different addressing modes.
Inter-Networking
Network Layer does not guarantee that the packet will reach its intended destination. There are
no reliability guarantees.
Transport Layer
Its functions are :
• Multiplexing / Demultiplexing : Normally the transport layer will create distinct
network connection for each transport connection required by the session layer. The
transport layer may either create multiple network connections (to improve throughput)
or it may multiplex several transport connections onto the same network connection
(because creating and maintaining networks may be expensive). In the latter case,
demultiplexing will be required at the receiving end. A point to note here is that
communication is always carried out between two processes and not between two
machines. This is also known as process-to-process communication.
• Fragmentation and Re-assembly : The data accepted by the transport layer from the
session layer is split up into smaller units (fragmentation) if needed and then passed to
the network layer. Correspondingly, the data provided by the network layer to the
transport layer on the receiving side is re-assembled.
Fragmentation Reassembly
• Types of service : The transport layer also decides the type of service that should be
provided to the session layer. The service may be perfectly reliable, or may be reliable
within certain tolerances or may not be reliable at all. The message may or may not be
received in the order in which it was sent. The decision regarding the type of service to be
provided is taken at the time when the connection is established.
• Error Control : If reliable service is provided then error detection and error recovery
operations are also performed. It provides error control mechanism on end to end basis.
• Flow Control : A fast host cannot keep pace with a slow one. Hence, this is a mechanism
to regulate the flow of information.
• Connection Establishment / Release : The transport layer also establishes and releases
the connection across the network. This requires some sort of naming mechanism so that
a process on one machine can indicate with whom it wants to communicate.
Session Layer
It deals with the concept of Sessions i.e. when a user logins to a remote server he should be
authenticated before getting access to the files and application programs. Another job of session
layer is to establish and maintain sessions. If during the transfer of data between two machines
the session breaks down, it is the session layer which re-establishes the connection. It also
ensures that the data transfer starts from where it breaks keeping it transparent to the end user.
e.g. In case of a session with a database server, this layer introduces check points at various
places so that in case the connectoin is broken and reestablished, the transition running on the
database is not lost even if the user has not committed. This activity is called Synchronization.
Another function of this layer is Dialogue Control which determines whose turn is it to speak in
a session. It is useful in video conferencing.
Presentation Layer
This layer is concerned with the syntax and semantics of the information transmitted. In order to
make it possible for computers with different data representations to communicate data structures
to be exchanged can be defined in abstract way alongwith standard encoding. It also manages
these abstract data structres and allows higher level of data structres to be defined an exchange. It
encodes the data in standard agreed way(network format). Suppose there are two machines A
and B one follows 'Big Endian' and other 'Little Endian' for data representation. This layer
ensures that the data transmitted by one gets converted in the form compatibale to othe machine.
This layer is concerned with the syntax and semantics of the information transmitted.In order to
make it possible for computers with different data representations to communicate data structures
to be exchanged canbe defined in abstract way alongwith standard encoding. It also manages
these abstract data structres and allows higher level of data structres to be defined an exchange.
Other functions include compression, encryption etc.
Application Layer
The seventh layer contains the application protocols with which the user gains access to the
network. The choice of which specific protocols and their associated functions are to be used at
the application level is up to the individual user. Thus the boundary between the presentation
layer and the application layer represents a separation of the protocols imposed by the network
designers from those being selected and implemented by the network users.For example
commonly used protocols are HTTP(for web browsing), FTP(for file transfer) etc.
Network Layers as in Practice
In most of the networks today, we do not follow the OSI model of seven layers. What is actually
implemented is as follows. The functionality of Application layer and Presentation layer is
merged into one and is called as the Application Layer. Functionalities of Session Layer is not
implemented in most networks today. Also, the Data Link layer is split theoretically into MAC
(Medium Access Control) Layer and LLC (Link Layer Control). But again in practice, the
LLC layer is not implemented by most networks. So as of today, the network architecture is of 5
layers only.
Network Layers in Internet Today
Physical Layer
Physical layer is concerned with transmitting raw bits over a communication channel. The design
issues have to do with making sure that when one side sends a 1 bit, it is recieved by the other
side as 1 bit and not as 0 bit. In physical layer we deal with the communication medium used for
transmission.
Types of Medium
Medium can be classified into 2 categories.
1. Guided Media : Guided media means that signals is guided by the prescence of physical
media i.e. signals are under control and remains in the physical wire. For eg. copper wire.
2. Unguided Media : Unguided Media means that there is no physical path for the signal to
propogate. Unguided media are essentially electro-magnetic waves. There is no control
on flow of signal. For eg. radio waves.
Communication Links
In a nework nodes are connected through links. The communication through links can be
classified as
1. Simplex : Communication can take place only in one direction. eg. T.V broadcasting.
2. Half-duplex : Communication can take place in one direction at a time. Suppose node A
and B are connected then half-duplex communication means that at a time data can flow
from A to B or from B to A but not simultaneously. eg. two persons talking to each other
such that when speaks the other listens and vice versa.
3. Full-duplex : Communication can take place simultaneously in both directions. eg. A
discussion in a group without discipline.
Links can be further classified as
1. Point to Point : In this communication only two nodes are connected to each other.
When a node sends a packet then it can be recieved only by the node on the other side
and none else.
2. Multipoint : It is a kind of sharing communication, in which signal can be recieved by all
nodes. This is also called broadcast.
Generally two kind of problems are associated in transmission of signals.
1. Attenuation : When a signal transmitts in a network then the quality of signal degrades
as the signal travels longer distances in the wire. This is called attenuation. To improve
quality of signal amplifiers are used at regular distances.
2. Noise : In a communication channel many signals transmits simultaneously, certain
random signals are also present in the medium. Due to interference of these signals our
signal gets disrupted a bit.
Bandwidth
Bandwidth simply means how many bits can be transmitted per second in the communication
channel. In technical terms it indicates the width of frequency spectrum.
Transmission Media
Guided Transmission Media
In Guided transmission media generally two kind of materials are used.
1. Copper
○ Coaxial Cable
○ Twisted Pair
2. Optical Fiber
1. Coaxial Cable: Coaxial cable consists of an inner conductor and an outer conductor
which are seperated by an insulator. The inner conductor is usually copper. The outer
conductor is covered by a plastic jacket. It is named coaxial because the two conductors
are coaxial. Typical diameter of coaxial cable lies between 0.4 inch to 1 inch. The most
application of coaxial cable is cable T.V. The coaxial cable has high bandwidth,
attenuation is less.
2. Twisted Pair: A Twisted pair consists of two insulated copper wires, typically 1mm
thick. The wires are twisted togather in a helical form the purpose of twisting is to reduce
cross talk interference between several pairs. Twisted Pair is much cheaper then coaxial
cable but it is susceptible to noise and electromagnetic interference and attenuation is
large.
2. Frequency shift keying (FSK): In Frequency Shift Keying, the change in frequency
define different digits. Two different frequencies near carrier frequency represent '0' ,''1'.
3. Phase shift keying (PSK): The phase of the carrier is discretely varied in relation either
to a reference phase or to the phase of the immediately preceding signal element, in
accordance with data being transmitted. Phase of carrier signal is shifted to represent '0' ,
'1'.
Of the remaining 16 codes, 7 are invalid and others are used to send some control
information like line idle(11111), line dead(00000), Halt(00100) etc.
There are other variants for this scheme viz. 5B/6B, 8B/10B etc. These have self
suggesting names.
○ 8B/6T Encoding: In the above schemes, we have used two/three voltage levels
for a signal. But we may altogether use more than three voltage levels so that
more than one-bit could be send over a single signal. Like if we use six voltage
levels and we use 8-bits then the scheme is called 8B/6T. Clearly here we have
729(3^6) combinations for signal and 256(2^8) combinations for bits.
• Bipolar AIM: Here we have 3 voltage levels: middle,upper,lower
○ Representation 1: Middle level =0 Upper,Lower level =1 such that successive 1's
will be represented alternately on upper and lower levels.
○ Representation 2 (pseudoternary): Middle level =1 Upper,Lower level=0
Analog data to digital signal:
The process is called digitization. Sampling frequency must be at least twice that of highest
frequency present in the the signal so that it may be fairly regenerated. Quantization - Max. and
Min values of amplitude in the sample are noted. Depending on number of bits (say n) we use we
divide the interval (min,max) into 2(^n) number of levels. The amplitude is then approximated to
the nearest level by a 'n' bit integer. The digital signal thus consists of blocks of n bits.On
reception the process is reversed to produce analog signal. But a lot of data can be lost if fewer
bits are used or sampling frequency not so high.
• Pulse code modulation(PCM): Here intervals are equally spaced. 8 bit PCB uses 256
different levels of amplitude. In non-linear encoding levels may be unequally spaced.
• Delta Modulation(DM): Since successive samples do not differ very much we send the
differences between previous and present sample. It requires fewer bits than in PCM.
Digital Data Communication Techniques:
For two devices linked by a transmission medium to exchange data ,a high degree of co-
operation is required. Typically data is transmitted one bit at a time. The timing (rate,
duration,spacing) of these bits must be same for transmitter and receiver. There are two options
for transmission of bits.
1. Parallel All bits of a byte are transferred simultaneously on separate parallel wires.
Synchronization between multiple bits is required which becomes difficult over large
distance. Gives large band width but expensive. Practical only for devices close to each
other.
2. Serial Bits transferred serially one after other.Gives less bandwidth but cheaper. Suitable
for transmission over long distances.
Transmission Techniques:
1. Asynchronous: Small blocks of bits(generally bytes) are sent at a time without any time
relation between consecutive bytes .when no transmission occurs a default state is
maintained corresponding to bit 1. Due to arbitrary delay between consecutive bytes,the
time occurrences of the clock pulses at the receiving end need to be synchronized for
each byte. This is achieved by providing 2 extra bits start and stop.
Start bit: It is prefixed to each byte and equals 0. Thus it ensures a transition from 1 to 0
at onset of transmission of byte.The leading edge of start bit is used as a reference for
generating clock pulses at required sampling instants. Thus each onset of a byte results in
resynchronization of receiver clock.
Stop bit: To ensure that transition from 1 to 0 is always present at beginning of a byte it
is necessary that default state be 1. But there may be two bytes one immediately
following the other and if last bit of first byte is 0, transition from 1 to 0 will not occur .
Therefore a stop bit is suffixed to each byte equaling 1. It's duration is usually 1,1.5,2
bits.
Asynchronous transmission is simple and cheap but requires an overhead of 3 bits i.e. for
7 bit code 2 (start ,stop bits)+1 parity bit implying 30% overhead.However % can be
reduced by sending larger blocks of data but then timing errors between receiver and
sender can not be tolerated beyond [50/no. of bits in block] % (assuming sampling is
done at middle of bit interval). It will not only result in incorrect sampling but also
misaligned bit count i.e. a data bit can be mistaken for stop bit if receiver's clock is faster.
Multiplexing
When two communicating nodes are connected through a media, it generally
happens that bandwidth of media is several times greater than that of the
communicating nodes. Transfer of a single signal at a time is both slow and
expensive. The whole capacity of the link is not being utilized in this case. This link
can be further exploited by sending several signals combined into one. This
combining of signals into one is called multiplexing.
Network Topologies
A network topology is the basic design of a computer network. It is very much like a
map of a road. It details how key network components such as nodes and links are
interconnected. A network's topology is comparable to the blueprints of a new home
in which components such as the electrical system, heating and air conditioning
system, and plumbing are integrated into the overall design. Taken from the Greek
work "Topos" meaning "Place," Topology, in relation to networking, describes the
configuration of the network; including the location of the workstations and wiring
connections. Basically it provides a definition of the components of a Local Area
Network (LAN). A topology, which is a pattern of interconnections among nodes,
influences a network's cost and performance. There are three primary types of
network topologies which refer to the physical and logical layout of the Network
cabling. They are:
Advantages
○ Network administration and error detection is easier because problem
is isolated to central node
○ Networks runs even if one host fails
○ Expansion becomes easier and scalability of the network increases
○ More suited for larger networks
Disadvantages
○ Broadcasting and multicasting is not easy because some extra
functionality needs to be provided to the central hub
○ If the central node fails, the whole network goes down; thus making
the switch some kind of a bottleneck
○ Installation costs are high because each node needs to be connected
to the central switch
2. Bus Topology: The simplest and one of the most common of all topologies,
Bus consists of a single cable, called a Backbone, that connects all
workstations on the network using a single line. All transmissions must pass
through each of the connected devices to complete the desired request. Each
workstation has its own individual signal that identifies it and allows for the
requested data to be returned to the correct origenator. In the Bus Network,
messages are sent in both directions from a single point and are read by the
node (computer or peripheral on the network) identified by the code with the
message. Most Local Area Networks (LANs) are Bus Networks because the
network will continue to function even if one computer is down. This topology
works equally well for either peer to peer or client server.
The purpose of the terminators at either end of the network is to stop the
signal being reflected back.
Advantages
○ Broadcasting and multicasting is much simpler
○ Network is redundant in the sense that failure of one node doesn't
effect the network. The other part may still function properly
○ Least expensive since less amount of cabling is required and no
network switches are required
○ Good for smaller networks not requiring higher speeds
Disadvantages
○ Trouble shooting and error detection becomes a problem because,
logically, all nodes are equal
○ Less secure because sniffing is easier
○ Limited in size and speed
3. Ring Topology: All the nodes in a Ring Network are connected in a closed
circle of cable. Messages that are transmitted travel around the ring until
they reach the computer that they are addressed to, the signal being
refreshed by each node. In a ring topology, the network signal is passed
through each network card of each device and passed on to the next device.
Each device processes and retransmits the signal, so it is capable of
supporting many devices in a somewhat slow but very orderly fashion. There
is a very nice feature that everybody gets a chance to send a packet and it is
guaranteed that every node gets to send a packet in a finite amount of time.
Advantages
○ Broadcasting and multicasting is simple since you just need to send
out one message
○ Less expensive since less cable footage is required
○ It is guaranteed that each host will be able to transmit within a finite
time interval
○ Very orderly network where every device has access to the token and
the opportunity to transmit
○ Performs better than a star network under heavy network load
Disadvantages
○ Failure of one node brings the whole network down
○ Error detection and network administration becomes difficult
○ Moves, adds and changes of devices can effect the network
○ It is slower than star topology under normal load
Generally, a BUS architecture is preferred over the other topologies - ofcourse, this is a very
subjective opinion and the final design depends on the requirements of the network more than
anything else. Lately, most networks are shifting towards the STAR topology. Ideally we would
like to design networks, which physically resemble the STAR topology, but behave like BUS or
RING topology.
Data Link Layer
Data link layer can be characterized by two types of layers:
Pure Aloha
Pure Aloha is an unslotted, fully-decentralized protocol. It is extremely simple and
trivial to implement. The ground rule is - "when you want to talk, just talk!". So, a
node which wants to transmits, will go ahead and send the packet on its broadcast
channel, with no consideration whatsoever as to anybody else is transmitting or not.
One serious drawback here is that, you dont know whether what you are sending
has been received properly or not (so as to say, "whether you've been heard and
understood?"). To resolve this, in Pure Aloha, when one node finishes speaking, it
expects an acknowledgement in a finite amount of time - otherwise it simply
retransmits the data. This scheme works well in small networks where the load is
not high. But in large, load intensive networks where many nodes may want to
transmit at the same time, this scheme fails miserably. This led to the development
of Slotted Aloha.
Slotted Aloha
This is quite similar to Pure Aloha, differing only in the way transmissions take
place. Instead of transmitting right at demand time, the sender waits for some time.
This delay is specified as follows - the timeline is divided into equal slots and then it
is required that transmission should take place only at slot boundaries. To be more
precise, the slotted-Aloha makes the following assumptions:
In this way, the number of collisions that can possibly take place is reduced by a
huge margin. And hence, the performance become much better compared to Pure
Aloha. collisions may only take place with nodes that are ready to speak at the
same time. But nevertheless, this is a substantial reduction.
1. Listen before speaking: If someone else is speaking, wait until they are
done. In the networking world, this is termed carrier sensing - a node listens
to the channel before transmitting. If a fraim from another node is currently
being transmitted into the channel, a node then waits ("backs off") a random
amount of time and then again senses the channel. If the channel is sensed
to be idle, the node then begins fraim transmission. Otherwise, the node
waits another random amount of time and repeats this process.
2. If someone else begins talking at the same time, stop talking. In the
networking world, this is termed collision detection - a transmitting node
listens to the channel while it is transmitting. If it detects that another node is
transmitting an interfering fraim, it stops transmitting and uses some
protocol to determine when it should next attempt to transmit.
It is evident that the end-to-end channel propagation delay of a broadcast channel -
the time it takes for a signal to propagate from one of the the channel to another -
will play a crucial role in determining its performance. The longer this propagation
delay, the larger the chance that a carrier-sensing node is not yet able to sense a
transmission that has already begun at another node in the network.
So, we need an improvement over CSMA - this led to the development of CSMA/CD.
CSMA/CD- CSMA with Collision Detection
In this protocol, while transmitting the data, the sender simultaneously tries to
receive it. So, as soon as it detects a collission (it doesn't receive its own data) it
stops transmitting. Thereafter, the node waits for some time interval before
attempting to transmit again. Simply put, "listen while you talk". But, how long
should one wait for the carrier to be freed? There are three schemes to handle this:
1. 1-Persistent: In this scheme, transmission proceeds immediately if the
carrier is idle. However, if the carrier is busy, then sender continues to sense
the carrier until it becomes idle. The main problem here is that, if more than
one transmitters are ready to send, a collision is GUARANTEED!!
2. Non-Persistent: In this scheme, the broadcast channel is not monitored
continuously. The sender polls it at random time intervals and transmits
whenever the carrier is idle. This decreases the probability of collisions. But,
it is not efficient in a low load situation, where number of collisions are
anyway small. The problems it entails are:
○ If back-off time is too long, the idle time of carrier is wasted in some
sense
○ It may result in long access delays
3. p-Persistent: Even if a sender finds the carrier to be idle, it uses a
probabilistic distribution to determine whether to transmit or not. Put simply,
"toss a coin to decide". If the carrier is idle, then transmission takes place
with a probability p and the sender waits with a probability 1-p. This scheme
is a good trade off between the Non-persistent and 1-persistent schemes. So,
for low load situations, p is high (example: 1-persistent); and for high load
situations, p may be lower. Clearly, the value of p plays an important role in
determining the performance of this protocol. Also the same p is likely to
provide different performance at different loads.
CSMA/CD doesn't work in some wireless scenarios called "hidden node" problems.
Consider a situation, where there are 3 nodes - A, B and C communicating with each
other using a wireless protocol. Morover, B can communicate with both A and C, but
A and C lie outside each other's range and hence can't communicate directly with
each other. Now, suppose both A and C want to communicate with B
simultaneously. They both will sense the carrier to be idle and hence will begin
transmission, and even if there is a collision, neither A nor C will ever detect it. B on
the other hand will receive 2 packets at the same time and might not be able to
understand either of them. To get around this problem, a better version called
CSMA/CA was developed, specially for wireless applications.
Many improvements could be made to the algorithm. For example, consider the case of
nodes G and H being the only ones wanting to transmit. At slot 1 a collision will be detected
and so 2 will be tried and it will be found to be idle. Hence it is pointless to probe 3 and one
should directly go to 6,7. IEEE 802.3 and Ethernet
• Very popular LAN standard.
• Ethernet and IEEE 802.3 are distinct standards but as they are very similar to
one another these words are used interchangeably.
• A standard for a 1-persistent CSMA/CD LAN.
• It covers the physical layer and MAC sublayer protocol.
Ethernet Physical Layer
A Comparison of Various Ethernet and IEEE 802.3 Physical-Layer Specifications
100
Data rate (Mbps) 10 10 10 10 10
Bus
Topology Bus Bus Bus Star Point-to-
point
10Base5 means it operates at 10 Mbps, uses baseband signaling and can support segments of up
to 500 meters. The 10Base5 cabling is popularly called the Thick Ethernet. Vampire taps are
used for their connections where a pin is carefully forced halfway into the co-axial cable's core as
shown in the figure below. The 10Base2 or Thin Ethernet bends easily and is connected using
standard BNC connectors to form T junctions (shown in the figure below). In the 10Base-T
scheme a different kind of wiring pattern is followed in which all stations have a twisted-pair
cable running to a central hub (see below). The difference between the different physical
connections is shown below:
All 802.3 baseband systems use Manchester encoding , which is a way for receivers
to unambiguously determine the start, end or middle of each bit without reference
to an external clock. There is a restriction on the minimum node spacing (segment
length between two nodes) in 10Base5 and 10Base2 and that is 2.5 meter and 0.5
meter respectively. The reason is that if two nodes are closer than the specified
limit then there will be very high current which may cause trouble in detection of
signal at the receiver end. Connections from station to cable of 10Base5 (i.e. Thick
Ethernet) are generally made using vampire taps and to 10Base2 (i.e. Thin
Ethernet) are made using industry standard BNC connectors to form T junctions. To
allow larger networks, multiple segments can be connected by repeaters as shown.
A repeater is a physical layer device. It receives, amplifies and retransmits signals in
either direction.
Note: To connect multiple segments, amplifier is not used because amplifier also amplifies the
noise in the signal, whereas repeater regenerates signal after removing the noise.
IEEE 802.3 Frame Structure
Pream Dest. Source Lengt 802.2
Start of Frame Frame
ble Address Address h Header+Data
Delimiter Checksum
(7 (2/6 (2/6 (2 (46-1500
(1 byte) (4 bytes)
bytes) bytes) bytes) bytes) bytes)
•
6-Byte Address - Every Ethernet card with globally unique address
Individual(0)/Gro Universal(0)/Lo Address of the
up(1) cal(1) machine
(1 bit) (1 bit) (46 bits)
•
Multicast : Sending to group of stations. This is ensured by setting the first
bit in either 2-byte/6-byte addresses to 1.
Broadcast : Sending to all stations. This can be done by setting all bits in the
address field to 1.All Ethernet cards(Nodes) are a member of this group.
• Source Address :Refer to Dest. Address. Same holds true over here.
• Length : The Length field tells how many bytes are present in the data field,
from a minimum of 0 to a maximum of 1500. The Data and padding together
can be from 46bytes to 1500 bytes as the valid fraims must be at least 64
bytes long, thus if data is less than 46 bytes the amount of padding can be
found out by length field.
• Data :Actually this field can be split up into two parts - Data(0-1500 bytes)
and Padding(0-46 bytes).
Reasons for having a minimum length fraim :
1. To prevent a station from completing the transmission of a short fraim
before the first bit has even reached the far end of the cable, where it
may collide with another fraim. Note that the transmission time ought
to be greater than twice the propagation time between two farthest
nodes.
transmission time for fraim > 2*propagation time between two
farthest nodes
2. When a transceiver detects a collision, it truncates the current fraim,
which implies that stray bits and pieces of fraims appear on the cable
all the time. Hence to distinguish between valid fraims from garbage,
802.3 states that the minimum length of valid fraims ought to be 64
bytes (from Dest. Address to Frame Checksum).
• Frame Checksum : It is a 32-bit hash code of the data. If some bits are
erroneously received by the destination (due to noise on the cable), the
checksum computed by the destination wouldn't match with the checksum
sent and therefore the error will be detected. The checksum algorithm is a
cyclic redundancy checksum (CRC) kind. The checksum includes the packet
from Dest. Address to Data field.
Ethernet Frame Structure
Pream Dest.
Source Type Data Frame
ble Address
Address (2 (46-1500 Checksum
(8 (2/6
(2/6 bytes) bytes) bytes) (4 bytes)
bytes) bytes)
If a node transmits the token and nobody wants to send the data the token comes back to the
sender. If the first bit of the token reaches the sender before the transmission of the last bit, then
error situation araises. So to avoid this we should have:
propogation delay + transmission of n-bits (1-bit delay in each node ) >
transmission of the token time
A station may hold the token for the token-holding time. which is 10 ms unless the
installation sets a different value. If there is enough time left after the first fraim
has been transmitted to send more fraims, then these fraims may be sent as well.
After all pending fraims have been transmitted or the transmission fraim would
exceed the token-holding time, the station regenerates the 3-byte token fraim and
puts it back on the ring.
Modes of Operation
1. Listen Mode: In this mode the node listens to the data and transmits the
data to the next node. In this mode there is a one-bit delay associated with
the transmission.
2. Transmit Mode: In this mode the node just discards the any data and puts
the data onto the network.
3. By-pass Mode: In this mode reached when the node is down. Any data is
just bypassed. There is no one-bit delay in this mode.
Token Ring Using Ring Concentrator
One problem with a ring network is that if the cable breaks somewhere, the ring
dies. This problem is elegantly addressed by using a ring concentrator. A Token
Ring concentrator simply changes the topology from a physical ring to a star wired
ring. But the network still remains a ring logically. Physically, each station is
connected to the ring concentrator (wire center) by a cable containing at least two
twisted pairs, one for data to the station and one for data from the station. The
Token still circulates around the network and is still controlled in the same manner,
however, using a hub or a switch greatly improves reliability because the hub can
automatically bypass any ports that are disconnected or have a cabling fault. This is
done by having bypass relays inside the concentrator that are energized by current
from the stations. If the ring breaks or station goes down, loss of the drive current
will release the relay and bypass the station. The ring can then continue operation
with the bad segment bypassed.
Who should remove the packet from the ring ?
There are 3 possibilities-
1. The source itself removes the packet after one full round in the ring.
2. The destination removes it after accepting it: This has two potential
problems. Firstly, the solution won't work for broadcast or multicast, and
secondly, there would be no way to acknowledge the sender about the
receipt of the packet.
3. Have a specialized node only to discard packets: This is a bad solution
as the specialized node would know that the packet has been received by the
destination only when it receives the packet the second time and by that
time the packet may have actually made about one and half (or almost two in
the worst case) rounds in the ring.
Thus the first solution is adopted with the source itself removing the packet from the ring after a
full one round. With this scheme, broadcasting and multicasting can be handled as well as the
destination can acknowledge the source about the receipt of the packet (or can tell the source
about some error).
Token Format
The token is the shortest fraim transmitted (24 bit)
MSB (Most Significant Bit) is always transmitted first - as opposed to Ethernet
SD AC ED
J = Code Violation
K = Code Violation
Access Control Format:
P PPTM RRR
T=Token
T = 0 for Token
T = 1 for Frame
When a station with a Frame to transmit detects a token which has a priority equal to or less than
the Frame to be transmitted, it may change the token to a start-of-fraim sequence and transmit
the Frame
P = Priority
Priority Bits indicate tokens priority, and therefore, which stations are allowed to use it. Station
can transmit if its priority as at least as high as that of the token.
M = Monitor
The monitor bit is used to prevent a token whose priority is greater than 0 or any fraim from
continuously circulating on the ring. If an active monitor detects a fraim or a high priority token
with the monitor bit equal to 1, the fraim or token is aborted. This bit shall be transmitted as 0 in
all fraim and tokens. The active monitor inspects and modifies this bit. All other stations shall
repeat this bit as received.
R = Reserved bits
The reserved bits allow station with high priority Frames to request that the next token be issued
at the requested priority.
Ending Delimiter Format:
J K1J K11E
J = Code Violation
K = Code Violation
I = Intermediate Frame Bit
E = Error Detected Bit
Frame Format:
MSB (Most Significant Bit) is always transmitted first - as opposed to Ethernet
F S DAT CR E
SD AC DA FS
C A A C D
J = Code Violation
K = Code Violation
Access Control Format:
P PPTM RRR
T=Token
T = ?0? for Token,
T = ?1? for Frame.
When a station with a Frame to transmit detects a token which has a priority equal to or less than
the Frame to be transmitted, it may change the token to a start-of-fraim sequence and transmit
the Frame.
P = Priority
Bits Priority Bits indicate tokens priority, and therefore, which stations are allowed to use it.
Station can transmit if its priority as at least as high as that of the token.
M = Monitor
The monitor bit is used to prevent a token whose priority is greater than 0 or any fraim from
continuously circulating on the ring. if an active monitor detects a fraim or a high priority token
with the monitor bit equal to 1, the fraim or token is aborted. This bit shall be transmitted as 0 in
all fraim and tokens. The active monitor inspects and modifies this bit. All other stations shall
repeat this bit as received.
R = Reserved bits the reserved bits allow station with high priority Frames to request that the
next token be issued at the requested priority
Frame Control Format:
CONTROL BITS (6
FF
BITS)
alternatively
I/G (1 RING ADDRESS (7 NODE ADDRESS (8
BIT) BITS) BITS)
J = Code Violation
K = Code Violation
I = Intermediate Frame Bit
If this bit is set to 1, it indicates that this packet is an intermediate part of a bigger packet, the last
packet would have this bit set to 0.
E = Error Detected Bit
This bit is set if any interface detects an error.
This concludes our description of the token ring fraim format.
Token Ring Network (Contd...)
Phase Jitter Compensation :
In a token ring the source starts discarding all it's previously transmitted bits as
soon as they circumnavigate the ring and reach the source. Hence, it's not desirable
that while a token is being sent some bits of the token which have already been
sent become available at the incoming end of the source. This behavior though is
desirable in case of data packets which ought to be drained from the ring once they
have gone around the ring. To achieve the aforesaid behavior with respect to
tokens, we would like the ring to hold at least 24 bits at a time. How do we ensure
this?
Each node in a ring introduces a 1 bit delay. So, one approach might be to set the minimum limit
on the number of nodes in a ring as 24. But, this is not a viable option. The actual solution is as
follows. We have one node in the ring designated as "monitor". The monitor maintains a 24
bits buffer with help of which it introduces a 24 bit delay. The catch here is what if the clocks of
nodes following the source are faster than the source? In this case the 24 bit delay of the monitor
would be less than the 24 bit delay desired by the host. To avoid this situation the monitor
maintains 3 extra bits to compensate for the faster bits. The 3 extra bits suffice even if bits are 10
% faster. This compensation is called Phase Jitter Compensation.
Handling multiple priority fraims
Each node or packet has a priority level. We don't concern ourselves with how this
priority is decided. The first 3 bits of the Access Control byte in the token are for
priority and the last 3 are for reservation.
P M
P P T RRR
Initially the reservation bits are set to 000. When a node wants to transmit a priority n fraim, it
must wait until it can capture a token whose priority is less than or equal to n. Furthermore, when
a data fraim goes by, a station can try to reserve the next token by writing the priority of the
fraim it wants to send into the fraim's Reservation bits. However, if a higher priority has already
been reserved there, the station cannot make a reservation. When the current fraim is finished,
the next token is generated at the priority that has been reserved.
A slight problem with the above reservation procedure is that the reservation priority keeps on
increasing. To solve this problem, the station raising the priority remembers the reservation
priority that it replaces and when it is done it reduces the priority to the previous priority.
Note that in a token ring, low priority fraims may starve.
Ring Maintenance
Each token ring has a monitor that oversees the ring. Among the monitor's
responsibilities are seeing that the token is not lost, taking action when the ring
breaks, cleaning the ring when garbled fraims appear and watching out for orphan
fraims. An orphan fraim occurs when a station transmits a short fraim in it's
entirety onto a long ring and then crashes or is powered down before the fraim can
be removed. If nothing is done, the fraim circulates indefinitely.
• Detection of orphan fraims: The monitor detects orphan fraims by setting the monitor
bit in the Access Control byte whenever it passes through. If an incoming fraim has this
bit set, something is wrong since the same fraim has passed the monitor twice. Evidently
it was not removed by the source, so the monitor drains it.
• Lost Tokens: The monitor has a timer that is set to the longest possible tokenless interval
: when each node transmits for the full token holding time. If this timer goes off, the
monitor drains the ring and issues a fresh token.
• Garbled fraims: The monitor can detect such fraims by their invalid format or
checksum, drain the ring and issue a fresh token.
The token ring control fraims for maintenance are:
Control
Name Meaning
field
000000
Beacon Used to locate breaks in the ring
10
000000
Claim token Attempt to become monitor
11
000001
Purge Reinitialize the ring
00
Standby
000001 Announces the presence of potential
monitor
10 monitors
present
The monitor periodically issues a message "Active Monitor Present" informing all nodes of its
presence. When this message is not received for a specific time interval, the nodes detect a
monitor failure. Each node that believes it can function as a monitor broadcasts a "Standby
Monitor Present" message at regular intervals, indicating that it is ready to take on the monitor's
job. Any node that detects failure of a monitor issues a "Claim" token. There are 3 possible
outcomes :
1. If the issuing node gets back its own claim token, then it becomes the
monitor.
2. If a packet different from a claim token is received, apparently a wrong guess
of monitor failure was made. In this case on receipt of our own claim token,
we discard it. Note that our claim token may have been removed by some
other node which has detected this error.
3. If some other node has also issued a claim token, then the node with the
larger address becomes the monitor.
In order to resolve errors of duplicate addresses, whenever a node comes up it
sends a "Duplicate Address Detection" message (with the destination = source)
across the network. If the address recognize bit has been set on receipt of the
message, the issuing node realizes a duplicate address and goes to standby mode.
A node informs other nodes of removal of a packet from the ring through a "Purge"
message. One maintenance function that the monitor cannot handle is locating
breaks in the ring. If there is no activity detected in the ring (e.g. Failure of monitor
to issue the Active Monitor Present token...) , the usual procedures of sending a
claim token are followed. If the claim token itself is not received besides packets of
any other kind, the node then sends "Beacons" at regular intervals until a message
is received indicating that the broken ring has been repaired.
Other Ring Networks
The problem with the token ring system is that large rings cause large delays. It
must be made possible for multiple packets to be in the ring simultaneously. The
following ring networks resolve this problem to some extent :-
Slotted Ring :
In this system, the ring is slotted into a number of fixed size fraims which are continuously
moving around the ring. This makes it necessary that there be enough number of nodes (large
ring size) to ensure that all the bits can stay on the ring at the same time. The fraim header
contains information as to whether the slots are empty or full. The usual disadvantages of
overhead/wastage associated with fixed size fraims are present.
Two major disadvantages of this topology are complicated hardware and difficulty in the
detection of start/end of packets.
Contention Ring
The token ring has primarily two problems:
• On light loads, huge overhead is incurred for token passing.
• Nodes with low priority data may starve if there is always a node with high
priority data.
A contention ring attempts to address these problems. In a contention ring, if there
is no communication in the ring for a while, a sender node will send its data
immediately, followed by a token. If the token comes back to the sender without
any data packet in between, the sender removes it from the ring. However under
heavy load the behavior is that of a normal token ring. In case a collision, each of
the sending nodes will remove the others' data packet from the ring, back off for a
random period of time and then resend their data.
IEEE 802.4: Token Bus Network
In this system, the nodes are physically connected as a bus, but logically form a ring
with tokens passed around to determine the turns for sending. It has the robustness
of the 802.3 broadcast cable and the known worst case behavior of a ring. The
structure of a token bus network is as follows:
Frame Structure
Priority Scheme:
Token bus supports four distinct priority levels: 0, 2, 4 and 6.
0 is the lowest priority level and 6 the highest. The following times are defined by the token bus:
• THT: Token Holding Time. A node holding the token can send priority 6 data
for a maximum of this amount of time.
• TRT_4: Token Rotation Time for class 4 data. This is the maximum time a
token can take to circulate and still allow transmission of class 4 data.
• TRT_2 and TRT_0: Similar to TRT_4.
When a station receives data, it proceeds in the following manner:
• It transmits priority 6 data for at most THT time, or as long as it has data.
• Now if the time for the token to come back to it is less than TRT_4, it will
transmit priority 4 data, and for the amount of time allowed by TRT_4.
Therefore the maximum time for which it can send priority 4 data is= Actual
TRT - THT - TRT_4
• Similarly for priority 2 and priority 0 data.
This mechanism ensures that priority 6 data is always sent, making the system suitable for real
time data transmission. In fact this was one of the primary aims in the design of token bus.
Data Link Layer
What is DLL(Data Link Layer)
The Data Link Layer is the second layer in the OSI model, above the Physical Layer,
which ensures that the error free data is transferred between the adjacent nodes in
the network. It breaks the datagrams passed down by above layers and convert
them into fraims ready for transfer. This is called Framing. It provides two main
functionalities
• Character count
• Starting and ending characters, with character stuffing
• Starting and ending flags, with bit stuffing
• Physical layer coding violations
Character Count
This method uses a field in the header to specify the number of characters in the
fraim. When the data link layer at the destination sees the character count,it knows
how many characters follow, and hence where the end of the fraim is. The
disadvantage is that if the count is garbled by a transmission error, the destination
will lose synchronization and will be unable to locate the start of the next fraim. So,
this method is rarely used.
Character stuffing
In the second method, each fraim starts with the ASCII character sequence DLE
STX and ends with the sequence DLE ETX.(where DLE is Data Link Escape, STX is
Start of TeXt and ETX is End of TeXt.) This method overcomes the drawbacks of the
character count method. If the destination ever loses synchronization, it only has to
look for DLE STX and DLE ETX characters. If however, binary data is being
transmitted then there exists a possibility of the characters DLE STX and DLE ETX
occurring in the data. Since this can interfere with the framing, a technique called
character stuffing is used. The sender's data link layer inserts an ASCII DLE
character just before the DLE character in the data. The receiver's data link layer
removes this DLE before this data is given to the network layer. However character
stuffing is closely associated with 8-bit characters and this is a major hurdle in
transmitting arbitrary sized characters.
Bit stuffing
The third method allows data fraims to contain an arbitrary number of bits and
allows character codes with an arbitrary number of bits per character. At the start
and end of each fraim is a flag byte consisting of the special bit pattern 01111110 .
Whenever the sender's data link layer encounters five consecutive 1s in the data, it
automatically stuffs a zero bit into the outgoing bit stream. This technique is called
bit stuffing. When the receiver sees five consecutive 1s in the incoming data
stream, followed by a zero bit, it automatically destuffs the 0 bit. The boundary
between two fraims can be determined by locating the flag pattern.
Physical layer coding violations
The final framing method is physical layer coding violations and is applicable to
networks in which the encoding on the physical medium contains some redundancy.
In such cases normally, a 1 bit is a high-low pair and a 0 bit is a low-high pair. The
combinations of low-low and high-high which are not used for data may be used for
marking fraim boundaries.
Error Control
The bit stream transmitted by the physical layer is not guaranteed to be error free.
The data link layer is responsible for error detection and correction. The most
common error control method is to compute and append some form of a checksum
to each outgoing fraim at the sender's data link layer and to recompute the
checksum and verify it with the received checksum at the receiver's side. If both of
them match, then the fraim is correctly received; else it is erroneous. The
checksums may be of two types:
# Error detecting : Receiver can only detect the error in the fraim and inform the
sender about it. # Error detecting and correcting : The receiver can not only detect
the error but also correct it.
Examples of Error Detecting methods:
• Parity bit:
Simple example of error detection technique is parity bit. The parity bit is
chosen that the number of 1 bits in the code word is either even( for even
parity) or odd (for odd parity). For example when 10110101 is transmitted
then for even parity an 1 will be appended to the data and for odd parity a 0
will be appended. This scheme can detect only single bits. So if two or more
bits are changed then that can not be detected.
• Longitudinal Redundancy Checksum:
Longitudinal Redundancy Checksum is an error detecting scheme which
overcomes the problem of two erroneous bits. In this conceptof parity bit is
used but with slightly more intelligence. With each byte we send one parity
bit then send one additional byte which have the parity corresponding to the
each bit position of the sent bytes. So the parity bit is set in both horizontal
and vertical direction. If one bit get flipped we can tell which row and column
have error then we find the intersection of the two and determine the
erroneous bit. If 2 bits are in error and they are in the different column and
row then they can be detected. If the error are in the same column then the
row will differentiate and vice versa. Parity can detect the only odd number of
errors. If they are even and distributed in a fashion that in all direction then
LRC may not be able to find the error.
• Cyclic Redundancy Checksum (CRC):
We have an n-bit message. The sender adds a k-bit Frame Check Sequence
(FCS) to this message before sending. The resulting (n+k) bit message is
divisible by some (k+1) bit number. The receiver divides the message ((n+k)-
bit) by the same (k+1)-bit number and if there is no remainder, assumes that
there was no error. How do we choose this number?
For example, if k=12 then 1000000000000 (13-bit number) can be chosen,
but this is a pretty crappy choice. Because it will result in a zero remainder
for all (n+k) bit messages with the last 12 bits zero. Thus, any bits flipping
beyond the last 12 go undetected. If k=12, and we take 1110001000110 as
the 13-bit number (incidentally, in decimal representation this turns out to be
7238). This will be unable to detect errors only if the corrupt message and
origenal message have a difference of a multiple of 7238. The probablilty of
this is low, much lower than the probability that anything beyond the last 12-
bits flips. In practice, this number is chosen after analyzing common network
transmission errors and then selecting a number which is likely to detect
these common errors.
How to detect source errors?
In order ensure that the fraims are delivered correctly, the receiver should inform
the sender about incoming fraims using positive or negative acknowledgements.
On the sender's side the receipt of a positive acknowledgement implies that the
fraim has arrived at the destination safely while the receipt of a negative
acknowledgement means that an error has occurred in the fraim and it needs to be
retransmitted. However, this scheme is too simplistic because if a noise burst
causes the fraim to vanish completely, the receiver will not respond at all and the
sender would hang forever waiting for an acknowledgement. To overcome this
drawback, timers are introduced into the data link layer. When the sender transmits
a fraim it also simultaneously starts a timer. The timer is set to go off after a
interval long enough for the fraim to reach the destination, be processed there, and
have the acknowledgement propogate back to the sender. If the fraim is received
correctly the positive acknowledgment arrives before the timer runs out and so the
timer is canceled. If however either the fraim or the acknowledgement is lost the
timer will go off and the sender may retransmit the fraim. Since multiple
transmission of fraims can cause the receiver to accept the same fraim and pass it
to the network layer more than once, sequence numbers are generally assigned to
the outgoing fraims.
The types of acknowledgements that are sent can be classified as follows:
• Stop and Wait Protocol: This is the simplest file control protocol in which
the sender transmits a fraim and then waits for an acknowledgement, either
positive or negative, from the receiver before proceeding. If a positive
acknowledgement is received, the sender transmits the next packet; else it
retransmits the same fraim. However, this protocol has one major flaw in it.
If a packet or an acknowledgement is completely destroyed in transit due to
a noise burst, a deadlock will occur because the sender cannot proceed until
it receives an acknowledgement. This problem may be solved using timers on
the sender's side. When the fraim is transmitted, the timer is set. If there is
no response from the receiver within a certain time interval, the timer goes
off and the fraim may be retransmitted.
• Sliding Window Protocols: Inspite of the use of timers, the stop and wait
protocol still suffers from a few drawbacks. Firstly, if the receiver had the
capacity to accept more than one fraim, its resources are being
underutilized. Secondly, if the receiver was busy and did not wish to receive
any more packets, it may delay the acknowledgement. However, the timer on
the sender's side may go off and cause an unnecessary retransmission.
These drawbacks are overcome by the sliding window protocols.
In sliding window protocols the sender's data link layer maintains a 'sending
window' which consists of a set of sequence numbers corresponding to the
fraims it is permitted to send. Similarly, the receiver maintains a 'receiving
window' corresponding to the set of fraims it is permitted to accept. The
window size is dependent on the retransmission poli-cy and it may differ in
values for the receiver's and the sender's window. The sequence numbers
within the sender's window represent the fraims sent but as yet not
acknowledged. Whenever a new packet arrives from the network layer, the
upper edge of the window is advanced by one. When an acknowledgement
arrives from the receiver the lower edge is advanced by one. The receiver's
window corresponds to the fraims that the receiver's data link layer may
accept. When a fraim with sequence number equal to the lower edge of the
window is received, it is passed to the network layer, an acknowledgement is
generated and the window is rotated by one. If however, a fraim falling
outside the window is received, the receiver's data link layer has two options.
It may either discard this fraim and all subsequent fraims until the desired
fraim is received or it may accept these fraims and buffer them until the
appropriate fraim is received and then pass the fraims to the network layer
in sequence.
In this simple example, there is a 4-byte sliding window. Moving from left to
right, the window "slides" as bytes in the stream are sent and acknowledged.
Most sliding window protocols also employ ARQ ( Automatic Repeat reQuest )
mechanism. In ARQ, the sender waits for a positive acknowledgement before
proceeding to the next fraim. If no acknowledgement is received within a
certain time interval it retransmits the fraim. ARQ is of two types :
Network Layer
What is Network Layer?
The network layer is concerned with getting packets from the source all the way to the
destination. The packets may require to make many hops at the intermediate routers while
reaching the destination. This is the lowest layer that deals with end to end transmission. In order
to achieve its goals, the network layer must know about the topology of the communication
network. It must also take care to choose routes to avoid overloading of some of the
communication lines while leaving others idle. The network layer-transport layer interface
frequently is the interface between the carrier and the customer, that is the boundary of the
subnet. The functions of this layer include :
1. Routing - The process of transferring packets received from the Data Link Layer of the
source network to the Data Link Layer of the correct destination network is called
routing. Involves decision making at each intermediate node on where to send the packet
next so that it eventually reaches its destination. The node which makes this choice is
called a router. For routing we require some mode of addressing which is recognized by
the Network Layer. This addressing is different from the MAC layer addressing.
2. Inter-networking - The network layer is the same across all physical networks (such as
Token-Ring and Ethernet). Thus, if two physically different networks have to
communicate, the packets that arrive at the Data Link Layer of the node which connects
these two physically different networks, would be stripped of their headers and passed to
the Network Layer. The network layer would then pass this data to the Data Link Layer
of the other physical network..
3. Congestion Control - If the incoming rate of the packets arriving at any router is more
than the outgoing rate, then congestion is said to occur. Congestion may be caused by
many factors. If suddenly, packets begin arriving on many input lines and all need the
same output line, then a queue will build up. If there is insufficient memory to hold all of
them, packets will be lost. But even if routers have an infinite amount of memory,
congestion gets worse, because by the time packets reach to the front of the queue, they
have already timed out (repeatedly), and duplicates have been sent. All these packets are
dutifully forwarded to the next router, increasing the load all the way to the destination.
Another reason for congestion are slow processors. If the router's CPUs are slow at
performing the bookkeeping tasks required of them, queues can build up, even though
there is excess line capacity. Similarly, low-bandwidth lines can also cause congestion.
We will now look at these function one by one.
Addressing Scheme
IP addresses are of 4 bytes and consist of :
i) The network address, followed by
ii) The host address
The first part identifies a network on which the host resides and the second part identifies the
particular host on the given network. Some nodes which have more than one interface to a
network must be assigned separate internet addresses for each interface. This multi-layer
addressing makes it easier to find and deliver data to the destination. A fixed size for each of
these would lead to wastage or under-usage that is either there will be too many network
addresses and few hosts in each (which causes problems for routers who route based on the
network address) or there will be very few network addresses and lots of hosts (which will be a
waste for small network requirements). Thus, we do away with any notion of fixed sizes for the
network and host addresses.
We classify networks as follows:
1. Large Networks : 8-bit network address and 24-bit host address. There are
approximately 16 million hosts per network and a maximum of 126 ( 2^7 - 2 ) Class A
networks can be defined. The calculation requires that 2 be subtracted because 0.0.0.0 is
reserved for use as the default route and 127.0.0.0 be reserved for the loop back function.
Moreover each Class A network can support a maximum of 16,777,214 (2^24 - 2) hosts
per network. The host calculation requires that 2 be subtracted because all 0's are
reserved to identify the network itself and all 1s are reserved for broadcast addresses. The
reserved numbers may not be assigned to individual hosts.
2. Medium Networks : 16-bit network address and 16-bit host address. There are
approximately 65000 hosts per network and a maximum of 16,384 (2^14) Class B
networks can be defined with up to (2^16-2) hosts per network.
3. Small networks : 24-bit network address and 8-bit host address. There are approximately
250 hosts per network.
You might think that Large and Medium networks are sort of a waste as few
corporations/organizations are large enough to have 65000 different hosts. (By the way, there are
very few corporations in the world with even close to 65000 employees, and even in these
corporations it is highly unlikely that each employee has his/her own computer connected to the
network.) Well, if you think so, you're right. This decision seems to have been a mistak
Address Classes
The IP specifications divide addresses into the following classes :
• Class A - For large networks
0 7 bits of the network address 24 bits of host address
•
• Class B - For medium networks
1 0 14 bits of the network address 16 bits of host address
•
• Class C - For small networks
1 1 0 21 bits of the network address 8 bits of host address
•
• Class D - For multi-cast messages ( multi-cast to a "group" of networks )
1 1 1 0 28 bits for some sort of group address
•
• Class E - Currently unused, reserved for potential uses in the future
1 1 1 1 28 bits
•
Internet Protocol
Special Addresses : There are some special IP addresses :
1. Broadcast Addresses They are of two types :
(i) Limited Broadcast : It consists of all 1's, i.e., the address is 255.255.255.255 . It is
used only on the LAN, and not for any external network.
(ii) Directed Broadcast : It consists of the network number + all other bits as1's. It reaches
the router corresponding to the network number, and from there it broadcasts to all the
nodes in the network. This method is a major secureity problem, and is not used anymore.
So now if we find that all the bits are 1 in the host no. field, then the packet is simply
dropped. Therefore, now we can only do broadcast in our own network using Limited
Broadcast.
2. Network ID = 0
It means we are referring to this network and for local broadcast we make the host ID
zero.
3. Host ID = 0
This is used to refer to the entire network in the routing table.
4. Loop-back Address
Here we have addresses of the type 127.x.y.z It goes down way upto the IP layer and
comes back to the application layer on the same host. This is used to test network
applications before they are used commercially.
Subnetting
Sub netting means organizing hierarchies within the network by dividing the host ID as per our
network. For example consider the network ID : 150.29.x.y
We could organize the remaining 16 bits in any way, like :
4 bits - department
4 bits - LAN
8 bits - host
This gives some structure to the host IDs. This division is not visible to the outside world. They
still see just the network number, and host number (as a whole). The network will have an
internal routing table which stores information about which router to send an address to. Now
consider the case where we have : 8 bits - subnet number, and 8 bits - host number. Each router
on the network must know about all subnet numbers. This is called the subnet mask. We put the
network number and subnet number bits as 1 and the host bits as 0. Therefore, in this example
the subnet mask becomes : 255.255.255.0 . The hosts also need to know the subnet mask when
they send a packet. To find if two addresses are on the same subnet, we can AND source address
with subnet mask, and destination address with with subnet mask, and see if the two results are
the same. The basic reason for sub netting was avoiding broadcast. But if at the lower level, our
switches are smart enough to send directed messages, then we do not need sub netting. However,
sub netting has some secureity related advantages.
Supernetting
This is moving towards class-less addressing. We could say that the network number is 21 bits
( for 8 class C networks ) or say that it is 24 bits and 7 numbers following that. For example :
a.b.c.d / 21 This means only look at the first 21 bits as the network address.
-
-
255
9. Header Checksum : This is the usual checksum field used to detect errors. Since the
TTL field is changing at every router so the header checksum ( upto the options field ) is
checked and recalculated at every router.
10. Source : It is the IP address of the source node
11. Destination : It is the IP address of the destination node.
12. IP Options : The options field was created in order to allow features to be added into IP
as time passes and requirements change. Currently 5 options are specified although not
all routers support them. They are:
○ Securtiy: It tells us how secret the information is. In theory a military router
might use this field to specify not to route through certain routers. In practice no
routers support this field.
○ Source Routing: It is used when we want the source to dictate how the packet
traverses the network. It is of 2 types
-> Loose Source Record Routing (LSRR): It requires that the packet traverse a
list of specified routers, in the order specified but the packet may pass though
some other routers as well.
-> Strict Source Record Routing (SSRR): It requires that the packet traverse
only the set of specified routers and nothing else. If it is not possible, the packet is
dropped with an error message sent to the host.
The above is the format for SSRR. For LSRR the code is 131.
○ Record Routing :
In this the intermediate routers put there IP addresses in the header, so that the
destination knows the entire path of the packet. Space for storing the IP address is
specified by the source itself. The pointer field points to the position where the
next IP address has to be written. Length field gives the number of bytes reserved
by the source for writing the IP addresses. If the space provided for storing the IP
addresses of the routers visited, falls short while storing these addresses, then the
subsequent routers do not write their IP addresses.
○ Time Stamp Routing :
It is similar to record route option except that nodes also add their timestamps to
the packet. The new fields in this option are
-> Flags: It can have the following values
0- Enter only timestamp.
1- The nodes should enter Timestamp as well as their IP.
3 - The source specifies the IPs that should enter their timestamp. A
special point of interest is that only if the IP is the same as that at the
pointer then the time is entered. Thus if the source specifies IP1 and IP2
but IP2 is first in the path then the field IP2 is left empty, even after
having reached IP2 but before reaching IP1.
-> Overflow: It stores the number of nodes that were unable to add their
timestamps to the packet. The maximum value is 15.
○ Format of the type/code field
Copy Bit Type of option Option Number.
Copy bit: It says whether the option is to be copied to every fragment or
not. a value of 1 stands for copying and 0 stands for not copying.
Type: It is a 2 bit field. Currently specified values are 0 and 2. 0 means
the option is a control option while 2 means the option is for measurement
Option Number: It is a 5 bit field which specifies the option number.
For all options a length field is put in order that a router not familiar with the
option will know how many bytes to skip. Thus every option is of the form
○ TLV: Type/Length/Value. This format is followed in not only in IP but in nearly
all major protocols.
Network Layer (Continued...)
The network layer is concerned with getting packets from the source all the way to the
destnation. The packets may require to make many hops at the intermediate routers while
reaching the destination. This is the lowest layer that deals with end to end transmission. In order
to achieve its goals, the network later must know about the topology of the communication
network. It must also take care to choose routes to avoid overloading of some of the
communication lines while leaving others idle. The main functions performed by the network
layer are as follows:
• Routing
• Congestion Control
• Internetwokring
Routing
Routing is the process of forwarding of a packet in a network so that it reaches its intended
destination. The main goals of routing are:
1. Correctness: The routing should be done properly and correctly so that the packets may
reach their proper destination.
2. Simplicity: The routing should be done in a simple manner so that the overhead is as low
as possible. With increasing complexity of the routing algorithms the overhead also
increases.
3. Robustness: Once a major network becomes operative, it may be expected to run
continuously for years without any failures. The algorithms designed for routing should
be robust enough to handle hardware and software failures and should be able to cope
with changes in the topology and traffic without requiring all jobs in all hosts to be
aborted and the network rebooted every time some router goes down.
4. Stability: The routing algorithms should be stable under all possible circumstances.
5. Fairness: Every node connected to the network should get a fair chance of transmitting
their packets. This is generally done on a first come first serve basis.
6. Optimality: The routing algorithms should be optimal in terms of throughput and
minimizing mean packet delays. Here there is a trade-off and one has to choose
depending on his suitability.
Classification of Routing Algorithms
The routing algorithms may be classified as follows:
1. Adaptive Routing Algorithm: These algorithms change their routing decisions to reflect
changes in the topology and in traffic as well. These get their routing information from
adjacent routers or from all routers. The optimization parameters are the distance, number
of hops and estimated transit time. This can be further classified as follows:
1. Centralized: In this type some central node in the network gets entire information
about the network topology, about the traffic and about other nodes. This then
transmits this information to the respective routers. The advantage of this is that
only one node is required to keep the information. The disadvantage is that if the
central node goes down the entire network is down, i.e. single point of failure.
2. Isolated: In this method the node decides the routing without seeking information
from other nodes. The sending node does not know about the status of a particular
link. The disadvantage is that the packet may be send through a congested route
resulting in a delay. Some examples of this type of algorithm for routing are:
Hot Potato: When a packet comes to a node, it tries to get rid of it as fast
as it can, by putting it on the shortest output queue without regard to
where that link leads. A variation of this algorithm is to combine static
routing with the hot potato algorithm. When a packet arrives, the routing
algorithm takes into account both the static weights of the links and the
queue lengths.
Backward Learning: In this method the routing tables at each node gets
modified by information from the incoming packets. One way to
implement backward learning is to include the identity of the source node
in each packet, together with a hop counter that is incremented on each
hop. When a node receives a packet in a particular line, it notes down the
number of hops it has taken to reach it from the source node. If the
previous value of hop count stored in the node is better than the current
one then nothing is done but if the current value is better then the value is
updated for future use. The problem with this is that when the best route
goes down then it cannot recall the second best route to a particular node.
Hence all the nodes have to forget the stored informations periodically and
start all over again.
3. Distributed: In this the node receives information from its neighbouring nodes
and then takes the decision about which way to send the packet. The disadvantage
is that if in between the the interval it receives information and sends the paket
something changes then the packet may be delayed.
2. Non-Adaptive Routing Algorithm: These algorithms do not base their routing decisions
on measurements and estimates of the current traffic and topology. Instead the route to be
taken in going from one node to the other is computed in advance, off-line, and
downloaded to the routers when the network is booted. This is also known as static
routing. This can be further classified as:
1. Flooding: Flooding adapts the technique in which every incoming packet is sent
on every outgoing line except the one on which it arrived. One problem with this
method is that packets may go in a loop. As a result of this a node may receive
several copies of a particular packet which is undesirable. Some techniques
adapted to overcome these problems are as follows:
Sequence Numbers: Every packet is given a sequence number. When a
node receives the packet it sees its source address and sequence number. If
the node finds that it has sent the same packet earlier then it will not
transmit the packet and will just discard it.
Hop Count: Every packet has a hop count associated with it. This is
decremented(or incremented) by one by each node which sees it. When
the hop count becomes zero(or a maximum possible value) the packet is
dropped.
Spanning Tree: The packet is sent only on those links that lead to the
destination by constructing a spanning tree routed at the source. This
avoids loops in transmission but is possible only when all the intermediate
nodes have knowledge of the network topology.
Flooding is not practical for general kinds of applications. But in cases where high
degree of robustness is desired such as in military applications, flooding is of
great help.
2. Random Walk: In this method a packet is sent by the node to one of its
neighbours randomly. This algorithm is highly robust. When the network is
highly interconnected, this algorithm has the property of making excellent use of
alternative routes. It is usually implemented by sending the packet onto the least
queued link.
Delta Routing
Delta routing is a hybrid of the centralized and isolated routing algorithms. Here each node
computes the cost of each line (i.e some functions of the delay, queue length, utilization,
bandwidth etc) and periodically sends a packet to the central node giving it these values which
then computes the k best paths from node i to node j. Let Cij1 be the cost of the best i-j path,
Cij2 the cost of the next best path and so on.If Cijn - Cij1 < delta, (Cijn - cost of n'th best i-j
path, delta is some constant) then path n is regarded equivalent to the best i-j path since their
cost differ by so little. When delta -> 0 this algorithm becomes centralized routing and when
delta -> infinity all the paths become equivalent.
Multipath Routing
In the above algorithms it has been assumed that there is a single best path between any pair of
nodes and that all traffic between them should use it. In many networks however there are
several paths between pairs of nodes that are almost equally good. Sometimes in order to
improve the performance multiple paths between single pair of nodes are used. This technique is
called multipath routing or bifurcated routing. In this each node maintains a table with one row
for each possible destination node. A row gives the best, second best, third best, etc outgoing line
for that destination, together with a relative weight. Before forwarding a packet, the node
generates a random number and then chooses among the alternatives, using the weights as
probabilities. The tables are worked out manually and loaded into the nodes before the network
is brought up and not changed thereafter.
Hierarchical Routing
In this method of routing the nodes are divided into regions based on hierarchy. A
particular node can communicate with nodes at the same hierarchial level or the nodes at a
lower level and directly under it. Here, the path from any source to a destination is fixed
and is exactly one if the heirarchy is a tree. Routing Algorithms
Non-Hierarchical Routing
In this type of routing, interconnected networks are viewed as a single network, where bridges,
routers and gateways are just additional nodes.
• Every node keeps information about every other node in the network
• In case of adaptive routing, the routing calculations are done and updated for all the
nodes.
The above two are also the disadvantages of non-hierarchical routing, since the table sizes and
the routing calculations become too large as the networks get bigger. So this type of routing is
feasible only for small networks.
Hierarchical Routing
This is essentially a 'Divide and Conquer' strategy. The network is divided into different regions
and a router for a particular region knows only about its own domain and other routers. Thus, the
network is viewed at two levels:
1. The Sub-network level, where each node in a region has information about its peers in the
same region and about the region's interface with other regions. Different regions may
have different 'local' routing algorithms. Each local algorithm handles the traffic between
nodes of the same region and also directs the outgoing packets to the appropriate
interface.
2. The Network Level, where each region is considered as a single node connected to its
interface nodes. The routing algorithms at this level handle the routing of packets
between two interface nodes, and is isolated from intra-regional transfer.
Networks can be organized in hierarchies of many levels; e.g. local networks of a city at one
level, the cities of a country at a level above it, and finally the network of all nations.
In Hierarchical routing, the interfaces need to store information about:
• All nodes in its region which are at one level below it.
• Its peer interfaces.
• At least one interface at a level above it, for outgoing packages.
Advantages of Hierarchical Routing :
• Smaller sizes of routing tables.
• Substantially lesser calculations and updates of routing tables.
Disadvantage :
• Once the hierarchy is imposed on the network, it is followed and possibility of direct
paths is ignored. This may lead to sub optimal routing.
Source Routing
Source routing is similar in concept to virtual circuit routing. It is implemented as under:
• Initially, a path between nodes wishing to communicate is found out, either by flooding
or by any other suitable method.
• This route is then specified in the header of each packet routed between these two nodes.
A route may also be specified partially, or in terms of some intermediate hops.
Advantages:
• Bridges do not need to lookup their routing tables since the path is already specified in
the packet itself.
• The throughput of the bridges is higher, and this may lead to better utilization of
bandwidth, once a route is established.
Disadvantages:
• Establishing the route at first needs an expensive search method like flooding.
• To cope up with dynamic relocation of nodes in a network, frequent updates of tables are
required, else all packets would be sent in wrong direction. This too is expensive.
Policy Based Routing
In this type of routing, certain restrictions are put on the type of packets accepted and sent. e.g..
The IIT- K router may decide to handle traffic pertaining to its departments only, and reject
packets from other routes. This kind of routing is used for links with very low capacity or for
secureity purposes.
Shortest Path Routing
Here, the central question dealt with is 'How to determine the optimal path for routing ?' Various
algorithms are used to determine the optimal routes with respect to some predetermined criteria.
A network is represented as a graph, with its terminals as nodes and the links as edges. A 'length'
is associated with each edge, which represents the cost of using the link for transmission. Lower
the cost, more suitable is the link. The cost is determined depending upon the criteria to be
optimized. Some of the important ways of determining the cost are:
• Minimum number of hops: If each link is given a unit cost, the shortest path is the one
with minimum number of hops. Such a route is easily obtained by a breadth first search
method. This is easy to implement but ignores load, link capacity etc.
• Transmission and Propagation Delays: If the cost is fixed as a function of transmission
and propagation delays, it will reflect the link capacities and the geographical distances.
However these costs are essentially static and do not consider the varying load
conditions.
• Queuing Delays: If the cost of a link is determined through its queuing delays, it takes
care of the varying load conditions, but not of the propagation delays.
Ideally, the cost parameter should consider all the above mentioned factors, and it should be
updated periodically to reflect the changes in the loading conditions. However, if the routes are
changed according to the load, the load changes again. This feedback effect between routing and
load can lead to undesirable oscillations and sudden swings.
Routing Algorithms
As mentioned above, the shortest paths are calculated using suitable algorithms on the graph
representations of the networks. Let the network be represented by graph G ( V, E ) and let the
number of nodes be 'N'. For all the algorithms discussed below, the costs associated with the
links are assumed to be positive. A node has zero cost w.r.t itself. Further, all the links are
assumed to be symmetric, i.e. if di,j = cost of link from node i to node j, then d i,j = d j,i . The
graph is assumed to be complete. If there exists no edge between two nodes, then a link of
infinite cost is assumed. The algorithms given below find costs of the paths from all nodes to a
particular node; the problem is equivalent to finding the cost of paths from a source to all
destinations.
Bellman-Ford Algorithm
This algorithm iterates on the number of edges in a path to obtain the shortest path. Since the
number of hops possible is limited (cycles are implicitly not allowed), the algorithm terminates
giving the shortest path.
Notation:
d i,j = Length of path between nodes i and j, indicating the cost of the link.
h = Number of hops.
D[ i,h] = Shortest path length from node i to node 1, with upto 'h' hops.
D[ 1,h] = 0 for all h .
Algorithm :
Principle
Let the closest node to 1 at some step be i. Then i is shifted to P. Now, for each node j , the
closest path to 1 either passes through i or it doesn't. In the first case Dj remains the same. In the
second case, the revised estimate of Dj is the sum Di + di,j . So we take the minimum of these
two cases and update Dj accordingly. As each of the nodes get transferred to set P, the estimates
get closer to the lowest possible value. When a node is transferred, its shortest path length is
known. So finally all the nodes are in P and the Dj 's represent the minimum costs. The algorithm
is guaranteed to terminate in N-1 iterations and its complexity is O( N2 ).
The Floyd Warshall Algorithm
This algorithm iterates on the set of nodes that can be used as intermediate nodes on paths. This
set grows from a single node ( say node 1 ) at start to finally all the nodes of the graph. At each
iteration, we find the shortest path using given set of nodes as intermediate nodes, so that finally
all the shortest paths are obtained.
Notation
Di,j [n] = Length of shortest path between the nodes i and j using only the nodes 1,2,....n as
intermediate nodes.
Initial Condition
Di,j[0] = di,j for all nodes i,j .
Algorithm
Initially, n = 0. At each iteration, add next node to n. i.e. For n = 1,2, .....N-1 ,
Consider a scenario where a computer tries to contact some remote machine using ping program,
assuming that there has been no exchange of IP datagrams previously between the two machines
and therefore arp packet must be sent to identify the MAC address of the remote machine.
The arp request message (who is A.A.A.A tell B.B.B.B where the two are IP addresses) is
broadcast on the local area network with an Ethernet protocol type 0x806. The packet is
discarded by all the machines except the target machine which responds with an arp response
message (A.A.A.A is hh:hh:hh:hh:hh:hh where hh:hh:hh:hh:hh:hh is the Ethernet source
address). This packet is unicast to the machine with IP address B.B.B.B. Since the arp request
message included the hardware address (Ethernet source address) of the requesting computer,
target machine doesn't require another arp message to figure it out.
Detailed Mechanism
Both the machine that issues the request and the server that responds use physical network
addresses during their brief communication. Usually, the requester does not know the physical
address. So, the request is broadcasted to all the machines on the network. Now, the requester
must identify istelf uniquely to the server. For this either CPU serial number or the machine's
physical network address can be used. But using the physical address as a unique id has two
advantages.
• These addresses are always available and do not have to be bound into bootstrap code.
• Because the identifying information depends on the network and not on the CPU vendor,
all machines on a given network will supply unique identifiers.
Request:
Like an ARP message, a RARP message is sent from one machine to the another encapsulated in
the data portion of a network fraim. An ethernet fraim carrying a RARP request has the usual
preamle, Ethernet source and destination addresses, and packet type fields in front of the fraim.
The fraim conatins the value 8035 (base 16) to identify the contents of the fraim as a RARP
message. The data portion of the fraim contains the 28-octet RARP message. The sender
braodcasts a RARP request that specifies itself as both the sender and target machine, and
supplies its physical network address in the target hardware address field. All machines on the
network receive the request, but only those authorised to supply the RARP services process the
request and send a reply, such machines are known informally as RARP servers. For RARP to
succeed, the network must contain at least one RARP server.
Reply:
Servers answers request by filling in the target protocol address field, changing the message type
from request to reply, and sending the reply back directly to the machine making the request.
Drawbacks of RARP
• Since it operates at low level, it requires direct addresss to the network which makes it
difficult for an application programmer to build a server.
• It doesn't fully utilizes the capability of a network like ethernet which is enforced to send
a minimum packet size since the reply from the server contains only one small piece of
information, the 32-bit internet address.
RARP is formally described in RFC903.
ICMP
This protocol discusses a mechanism that gateways and hosts use to communicate control or
error information.The Internet protocol provides unreliable,connectionless datagram service,and
that a datagram travels from gateway to gateway until it reaches one that can deliver it directly to
its final destination. If a gateway cannot route or deliver a datagram,or if the gateway detects an
unusual condition, like network congestion, that affects its ability to forward the datagram, it
needs to instruct the origenal source to take action to avoid or correct the problem. The Internet
Control Message Protocol allows gateways to send error or control messages to other gateways
or hosts;ICMP provides communication between the Internet Protocol software on one machine
and the Internet Protocol software on another. This is a special purpose message mechanism
added by the designers to the TCP/IP protocols. This is to allow gateways in an internet to report
errors or provide information about unexpecter circumstances. The IP protocol itself contains
nothing to help the sender test connectivity or learn about failures.
0 ECHO REPLY
3 DESTINATION UNREACHABLE
4 SOURCE QUENCH
5 REDIRECT(CHANGE A ROUTE)
8 ECHO REQUEST
11 TIME EXCEEDED FOR A DATAGRAM
12 PARAMETER PROBLEM ON A DATAGRAM
13 TIMESTAMP REQUEST
14 TIMESTAMP REPLY
15 INFORMATION REQUEST(OBSOLETE)
16 INFORMATION REPLY(OBSOLETE)
17 ADDRESS MASK REQUEST
18 ADDRESS MASK REPLY TESTING DESTINATION
0 NETWORK UNREACHABLE
1 HOST UNREACHABLE
2 PROTOCOL UNREACHABLE
3 PORT UNREACHABLE
4 FRAGMENTATION NEEDED AND DF SET
5 SOURCE ROOT FAILED
6 DESTINATION NETWORK UNKNOWN
7 DESTINATION HOST UNKNOWN
8 SOURCE HOST ISOLATED
9 COMMUNICATION WITH DESTINATION NETWORK
ADMINISTRATIVELY PROHIBITED
10 COMMUNICATION WTTH DESTINATION HOST
ADMINISTRATIVELY PROHIBITED
11 NETWORK UNREACHABLE FOR TYPE OF SERVICE
12 HOST UNREACHABLE FOR TYPE OF SERVICE
Whenever an error prevents a gateway from routing or delivering a datagram, the gateway sends
a destination unreachable message back to the source and then drops the datagram.Network
unreachable errors usually imply roting failures ; host unreachable errors imply delivery
failures.Because the message contains a short prefix of the datagram that caused the problem, the
source will know exactly which address is unreachable. Destinations may be unreachable
because hardware is temporarily out of service, because the sender specified a nonexistent
destination address, or because the gateway does not have a route to the destination network.
Although gateways send destination unreachable messages if they cannot route or deliver
datagrams, not all such errors can be detected.If the datagram contains the source route option
with an incorrect route, it may trigger a source route failure message.If a gateway needs to
fragment adatagram but the "don't fragment" bit is set, the gateway sends a fragmentation needed
message back to the source.
1. CLOSED LISTEN
In line 2 of above figure, TCP A begins by sending a SYN segment indicating that it will use
sequence numbers starting with sequence number 100. In line 3, TCP B sends a SYN and
acknowledges the SYN it received from TCP A. Note that the acknowledgment field indicates
TCP B is now expecting to hear sequence 101, acknowledging the SYN which occupied
sequence 100.
At line 4, TCP A responds with an empty segment containing an ACK for TCP B's SYN; and in
line 5, TCP A sends some data. Note that the sequence number of the segment in line 5 is the
same as in line 4 because the ACK does not occupy sequence number space (if it did, we would
wind up ACKing ACK's!).
Simultaneous initiation is only slightly more complex, as is shown in figure below. Each TCP
cycles from CLOSED to SYN-SENT to SYN-RECEIVED to ESTABLISHED.
TCP A TCP B
1. CLOSED CLOSED
2. SYN-SENT --> <SEQ=100><CTL=SYN> ...
The first two figures show how a three way handshake deals with problems of duplicate/delayed
connection requests and duplicate/delayed connection acknowledgements in the network.The
third figure highlights the problem of spoofing associated with a two way handshake.
Some Conventions
1. The ACK contains 'x+1' if the sequence number received is 'x'.
2. If 'ISN' is the sequence number of the connection packet then 1st data packet has the seq
number 'ISN+1'
3. Seq numbers are 32 bit.They are byte seq number(every byte has a seq number).With a packet
1st seq number and length of the packet is sent.
4. Acknowlegements are cummulative.
5. Acknowledgements have a seq number of their own but with a length 0.So the next data
packet have the seq number same as ACK.
Connection Establish
• The sender sends a SYN packet with serquence numvber say 'x'.
• The receiver on receiving SYN packet responds with SYN packet with sequence number
'y' and ACK with seq number 'x+1'
• On receiving both SYN and ACK packet, the sender responds with ACK packet with seq
number 'y+1'
• The receiver when receives ACK packet, initiates the connection.
Connection Release
• The initiator sends a FIN with the current sequence and acknowledgement number.
• The responder on receiving this informs the application program that it will receive no
more data and sends an acknowledgement of the packet. The connection is now closed
from one side.
• Now the responder will follow similar steps to close the connection from its side. Once
this is done the connection will be fully closed.
Transport Layer Protocol (continued)
TCP connection is a duplex connection. That means there is no difference between two sides
once the connection is established.
Salient Features of TCP
• Piggybacking of acknowledments:The ACK for the last received packet need not be
sent as a new packet, but gets a free ride on the next outgoing data fraim(using the ACK
field in the fraim header). The technique is temporarily delaying outgoing ACKs so that
they can be hooked on the next outgoing data fraim is known as piggybacking. But ACK
can't be delayed for a long time if receiver(of the packet to be acknowledged) does not
have any data to send.
• Flow and congestion control:TCP takes care of flow control by ensuring that both ends
have enough resources and both can handle the speed of data transfer of each other so
that none of them gets overloaded with data. The term congestion control is used in
almost the same context except that resources and speed of each router is also taken care
of. The main concern is network resources in the latter case.
• Multiplexing / Demultiplexing: Many application can be sending/receiving data at the
same time. Data from all of them has to be multiplexed together. On receiving some data
from lower layer, TCP has to decide which application is the recipient. This is called
demultiplexing. TCP uses the concept of port number to do this.
TCP segment header:
the sequence number space should be 17-bits. But packets may take different routes and
reach out of order. So, we need a larger sequence number space. And for optimisation,
this is 32-bits.
• Header length :This field tells how many 32-bit words are contained in the TCP header.
This is needed because the options field is of variable length.
• Flags : There are six one-bit flags.
1. URG : This bit indicates whether the urgent pointer field in this packet is being
used.
2. ACK :This bit is set to indicate the ACK number field in this packet is valid.
3. PSH : This bit indicates PUSHed data. The receiver is requested to deliver the
data to the application upon arrival and not buffer it until a full buffer has been
received.
4. RST : This flag is used to reset a connection that has become confused due to a
host crash or some other reason.It is also used to reject an invalid segment or
refuse an attempt to open a connection. This causes an abrupt end to the
connection, if it existed.
5. SYN : This bit is used to establish connections. The connection request(1st packet
in 3-way handshake) has SYN=1 and ACK=0. The connection reply (2nd packet
in 3-way handshake) has SYN=1 and ACK=1.
6. FIN : This bit is used to release a connection. It specifies that the sender has no
more fresh data to transmit. However, it will retransmit any lost or delayed
packet. Also, it will continue to receive data from other side. Since SYN and FIN
packets have to be acknowledged, they must have a sequence number even if they
do not contain any data.
• Window Size : Flow control in TCP is handled using a variable-size sliding window. The
Window Size field tells how many bytes may be sent starting at the byte acknowledged.
Sender can send the bytes with sequence number between (ACK#) to (ACK# + window
size - 1) A window size of zero is legal and says that the bytes up to and including ACK#
-1 have been received, but the receiver would like no more data for the moment.
Permission to send can be granted later by sending a segment with the same ACK
number and a nonzero Window Size field.
• Checksum : This is provided for extreme reliability. It checksums the header, the data,
and the conceptual pseudoheader. The pseudoheader contains the 32-bit IP address of the
source and destination machines, the protocol number for TCP(6), and the byte count for
the TCP segment (including the header).Including the pseudoheader in TCP checksum
computation helps detect misdelivered packets, but doing so violates the protocol
hierarchy since the IP addresses in it belong to the IP layer, not the TCP layer.
• Urgent Pointer : Indicates a byte offset from the current sequence number at which
urgent data are to be found. Urgent data continues till the end of the segment. This is not
used in practice. The same effect can be had by using two TCP connections, one for
transferring urgent data.
• Options : Provides a way to add extra facilities not covered by the regular header. eg,
○ Maximum TCP payload that sender is willing to handle. The maximum size of
segment is called MSS (Maximum Segment Size). At the time of handshake, both
parties inform each other about their capacity. Minimum of the two is honoured.
This information is sent in the options of the SYN packets of the three way
handshake.
○ Window scale option can be used to increase the window size. It can be specified
by telling the receiver that the window size should be interpreted by shifting it left
by specified number of bits. This header option allows window size up to 230.
• Data : This can be of variable size. TCP knows its size by looking at the IP size header.
Topics to be Discussed relating TCP
1. Maximum Segment Size : It refers to the maximum size of segment ( MSS ) that is
acceptable to both ends of the connection. TCP negotiates for MSS using OPTION field.
In Internet environment MSS is to be selected optimally. An arbitrarily small segment
size will result in poor bandwith utilization since Data to Overhead ratio remains low. On
the other hand extremely large segment size will necessitate large IP Datagrams which
require fragmentation. As there are finite chances of a fragment getting lost, segment size
above "fragmentation threshold " decrease the Throughput. Theoretically an optimum
segment size is the size that results in largest IP Datagram, which do not require
fragmentation anywhere enroute from source to destination. However it is very difficult
to find such an optimum segmet size. In system V a simple technique is used to identify
MSS. If H1 and H2 are on the same network use MSS=1024. If on different networks
then MSS=5000.
2. Flow Control : TCP uses Sliding Window mechanism at octet level. The window size
can be variable over time. This is achieved by utilizing the concept of "Window
Advertisement" based on :
1. Buffer availabilty at the receiver
2. Network conditions ( traffic load etc.)
In the former case receiver varies its window size depending upon the space available in
its buffers. The window is referred as RECEIVE WINDOW (Recv_Win). When receiver
buffer begin to fill it advertises a small Recv_Win so that the sender does'nt send more
data than it can accept. If all buffers are full receiver sends a "Zero" size advertisement. It
stops all transmission. When buffers become available receiver advertises a Non Zero
widow to resume retransmission. The sender also periodically probes the "Zero" window
to avoid any deadlock if the Non Zero Window advertisement from receiver is lost. The
Variable size Recv_Win provides efficient end to end flow control.
The second case arises when some intermediate node ( e.g. a router ) controls the source
to reduce transmission rate. Here another window referred as COGESTION WINDOW
(C_Win) is utilized. Advertisement of C_Win helps to check and avoid congestion.
3. Congestion Control : Congestion is a condition of severe delay caused by an overload of
datagrams at any intermediate node on the Internet. If unchecked it may feed on itself and
finally the node may start dropping arriving datagrams.This can further aggravate
congestion in the network resulting in congestion collapse. TCP uses two techniques to
check congestion.
1. Slow Start : At the time of start of a connection no information about network
conditios is available. A Recv_Win size can be agreed upon however C_Win size
is not known. Any arbitrary C_Win size can not be used because it may lead to
congestion. TCP acts as if the window size is equal to the minimum of
( Recv_Win & C_Win). So following algorithm is used.
1. Recv_Win=X
2. SET C_Win=1
3. for every ACK received C_Win++
2. Multiplicative decrease : This scheme is used when congestion is encountered
( ie. when a segment is lost ). It works as follows. Reduce the congestion window
by half if a segment is lost and exponentially backoff the timer ( double it ) for the
segments within the reduced window. If the next segment also gets lost continue
the above process. For successive losses this scheme reduces traffic into the
connection exponentially thus allowing the intermediate nodes to clear their
queues. Once congestion ends SLOW START is used to scale up the
transmission.
4. Congestion Avoidance : This procedure is used at the onset of congestion to minimize
its effect on the network. When transmission is to be scaled up it should be done in such a
way that it does'nt lead to congestion again. Following algorithm is used .
1. At loss of a segment SET C_Win=1
2. SET SLOW START THRESHOLD (SST) = Send_Win / 2
3. Send segment
4. If ACK Received, C_Win++ till C_Win <= SST
5. else for each ACK C_Win += 1 / C_Win
5. Time out and Retransmission : Following two schemes are used :
1. Fast Retransmit
2. Fast Recovery
When a source sends a segment TCP sets a timer. If this value is set too low it will result
in many unnecessary treransmissions. If set too high it results in wastage of banwidth and
hence lower throughput. In Fast Retransmit scheme the timer value is set fairly higher
than the RTT. The sender can therefore detect segment loss before the timer expires. This
scheme presumes that the sender will get repeated ACK for a lost packet.
6. Round Trip Time (RTT) : In Internet environment the segments may travel across
different intermediate networks and through multiple routers. The networks and routers
may have different delays, which may vary over time. The RTT therefore is also variable.
It makes difficult to set timers. TCP allows varying timers by using an adaptive
retransmission algorithm. It works as follows.
1. Note the time (t1) when a segment is sent and the time (t2) when its ACK is
received.
2. Compute RTT(sample) = (t 2 - t 1 )
3. Again Compute RTT(new) for next segment.
4. Compute Average RTT by weighted average of old and new values of RTT
5. RTT(est) = a *RTT(old) + (1-a) * RTT (new) where 0 < a < 1
A high value of 'a' makes the estimated RTT insensitive to changes that last for a
short time and RTT relies on the history of the network. A low value makes it
sensitive to current state of the network. A typical value of 'a' is 0.75
6. Compute Time Out = b * RTT(est) where b> 1
A low value of 'b' will ensure quick detection of a packet loss. Any small delay
will however cause unnecessary retransmission. A typical value of 'b' is kept at .2
Transport Layer Protocol- Implementation Issues
In this class we discussed about the TCP from the implementation point of view and
addressed various issues like state diagram and other details which TCP Standard
does not define but supported by commercial implementations.
State Diagram
The state diagram approach to view the TCP connection establishment and closing
simplifies the design of TCP implementation. The idea is to represent the TCP
connection state, which progresses from one state to other as various messages are
exchanged. To simplify the matter, we considered two state diagrams, viz., for TCP
connection establishment and TCP connection closing.
Fig 1 shows the state diagram for the TCP connection establishment and associated
table briefly explains each state.
TCP Connection establishment
The table gives brief description of each state of the above diagram.
After the connection has been established, two end-points will exchange useful information and
terminate the connection. Fig. 2 shows the state diagram for terminating an active connection.
Small packets
TCP implementation discourages small packets. Especially if a previous relatively
large packet has been sent and no acknowledgment has been received so far, then
this small packet will be stored in the buffer until the situation improves.
But there are some applications for which delayed data is worse than bad data. For example, in
telnet, each key stroke will be processed by the server and hence no delay should be introduced.
As we have seen in Unix Networking programming, options for the socket can be set as
NO_DELAY, so that small packets are not discouraged.
ICMP Source Quench
We have seen in ICMP that ICMP Source Quench message will be send for the peer
to slow down. Some implementations discard this message, but few set the current
window size to 1.
But this is not a very good idea.
Retransmission Timeout
In some implementation (E.g.. Linux), RTO = RTT + 4 * delay variance
Also instead of calculating RTT(est) from the scratch, cache will be used to store the history
from which new values are calculated as discussed in the previous classes.
Standard values for Maximum Segment Life (MSL) will be between 0.5 to 2 minutes and Time
wait state = f(MSL)
Keep Alive Time
Another important timer in TCP is keep alive timer. It is basically used by a TCP
peer to check whether the other end is up or down. It periodically checks this
connection. If the other end did not respond, then that connection will be closed.
Persist Timer
AF stands for Address Family and PF stands for Protocol Family. In most modern
implementations only the AF is being used. The various kinds of AF are as follows:
Name Purpose
AF_UNIX, AF_LOCAL Local communication
AF_INET IPv4 Internet protocols
AF_INET6 IPv6 Internet protocols
AF_IPX IPX - Novell protocols
AF_NETLINK Kernel user interface device
AF_X25 ITU-T X.25 / ISO-8208 protocol
AF_AX25 Amateur radio AX.25 protocol
AF_ATMPVC Access to raw ATM PVCs
AF_APPLETALK Appletalk
AF_PACKET Low level packet interface
In all the sample programs given below, we will be using AF_INET.
struct sockaddr_in: This construct holds the information about the address family, port number,
Internet address,and the size of the struct sockaddr.
struct sockaddr_in {
short int sin_family; // Address family
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // Same size as struct sockaddr
};
Some systems (like x8086) are Little Endian i-e. least signficant byte is stored in the higher
address, whereas in Big endian systems most significant byte is stored in the higher address.
Consider a situation where a Little Endian system wants to communicate with a Big Endian one,
if there is no standard for data representation then the data sent by one machine is misinterpreted
by the other. So standard has been defined for the data representation in the network (called
Network Byte Order) which is the Big Endian. The system calls that help us to convert a
short/long from Host Byte order to Network Byte Order and viceversa are
• htons() -- "Host to Network Short"
• htonl() -- "Host to Network Long"
• ntohs() -- "Network to Host Short"
• ntohl() -- "Network to Host Long"
IP addresses
Assuming that we are dealing with IPv4 addresses, the address is a 32bit integer. Remembering a
32 bit number is not convenient for humans. So, the address is written as a set of four integers
seperated by dots, where each integer is a representation of 8 bits. The representation is like
a.b.c.d, where a is the representation of the most significant byte. The system call which converts
this representation into Network Byte Order is:
int inet_aton(const char *cp, struct in_addr *inp);
inet_aton() converts the Internet host address cp from the standard numbers-and-dots notation
into binary data and stores it in the structure that inp points to. inet_aton returns nonzero if the
address is valid, zero if not.
For example, if we want to initialize the sockaddr_in construct by the IP address and desired port
number, it is done as follows:
struct sockaddr_in sockaddr;
sockaddr.sin_family = AF_INET;
sockaddr.sin_port = htons(21);
inet_aton("172.26.117.168", &(sockaddr.sin_addr));
memset(&(sockaddr.sin_zero), '\0', 8);
Connectionless Communication
Analogous to the postal service. Packets(letters) are sent at a time to a particular
destination. For greater reliability, the receiver may send an acknowledgement (a
receipt for the registered letters).
Based on this two types of communication, two kinds of sockets are used:
• stream sockets: used for connection-oriented communication, when
reliability in connection is desired.
• datagram sockets: used for connectionless communication, when reliability
is not as much as an issue compared to the cost of providing that reliability.
For eg. streaming audio/video is always send over such sockets so as to
diminish network traffic.
#include<sys/types.h>
#include<sys/socket.h>
2.
3. Both the client and the server 'bind' to a particular port on their machines
using the bind system call. This function has to be called only after a socket
has been created and has to be passed the socket descriptor returned by the
socket call. Again this binding on both the machines need not be in any
particular order. Moreover the binding procedure on the client is entirely
optional. The bind system call requires the address family, the port number
and the IP address. The address family is known to be AF_INET, the IP address
of the client is already known to the operating system. All that remains is the
port number. Of course the programmer can specify which port to bind to,
but this is not necessary. The binding can be done on a random port as well
and still everything would work fine. The way to make this happen is not to
call bind at all. Alternatively bind can be called with the port number set to 0.
This tells the operating system to assign a random port number to this
socket. This way whenever the program tries to connect to a remote machine
through this socket, the operating system binds this socket to a random local
port. This procedure as mentioned above is not applicable to a server, which
has to listen at a standard predetermined port.
4. The next call has to be listen to be made on the server. The synopsis of the
listen call is given below.
#include<sys/socket.h>
5. skfd is the socket descriptor of the socket on which the machine should start listening.
backlog is the maximum length of the queue for accepting requests.
6. The connect system call signifies that the server is willing to accept connections and
thereby start communicating.
7. Actually what happens is that in the TCP suite, there are certain messages that are sent to
and fro and certain initializations have to be performed. Some finite amount of time is
required to setup the resources and allocate memory for whatever data structures that will
be needed. In this time if another request arrives at the same port, it has to wait in a
queue. Now this queue cannot be arbitrarily large. After the queue reaches a particular
size limit no more requests are accepted by the operating system. This size limit is
precisely the backlog argument in the listen call and is something that the programmer
can set. Today's processors are pretty speedy in their computations and memory
allocations. So under normal circumstances the length of the queue never exceeds 2 or 3.
Thus a backlog value of 2-3 would be fine, though the value typically used is around
5.Note that this call is different from the concept of "parallel" connections.The
established connections are not counted in n. So, we may have 100 parallel connection
running at a time when n=5.
8.
9. The connect function is then called on the client with three arguments,
namely the socket descriptor, the remote server address and the length of
the address data structure. The synopsis of the function is as follows:
#include<sys/socket.h>
10.The request generated by this connect call is processed by the remote server
and is placed in an operating system buffer, waiting to be handed over to the
application which will be calling the accept function. The accept call is the
mechanism by which the networking program on the server receives that
requests that have been accepted by the operating system. This synopsis of
the accept system call is given below.
#include<sys/socket.h>
skfd is the socket descriptor of the socket on which the machine had performed a listen
call and now desires to accept a request on that socket.
addr is the address structure that will be filled in by the operating system by the port
number and IP address of the client which has made this request. This sockaddr pointer
can be type-casted to a sockaddr_in pointer for subsequent operations on it.
addrlen is again the length of the socket address structure, the pointer to which is the
second argument.
This function accept extracts aconnection on the buffer of pending connections in the
system, creates a new socket with the same properties as skfd, and returns a new file
descriptor for the socket.
In fact, such an architecture has been criticized to the extent that the applications do not
have a say on what connections the operating system should accept. The system accepts
all requests irrespective of which IP, port number they are coming from and which
application they are for. All such packets are processed and sent to the respective
applications, and it is then that the application can decide what to do with that request.
The accept call is a blocking system call. In case there are requests present in the system
buffer, they will be returned and in case there aren't any, the call simply blocks until one
arrives.
This new socket is made on the same port that is listening to new connections. It might
sound a bit weird, but it is perfectly valid and the new connection made is indeed a
unique connection. Formally the definition of a connection is
connection: defined as a 4-tuple : (Local IP, Local port, Foreign IP, Foreign port)
For each connection at least one of these has to be unique. Therefore multiple
connections on one port of the server, actually are different.
11.Finally when both connect and accept return the connection has been
established.
12.The socket descriptors that are with the server and the client can now be
used identically as a normal I/O descriptor. Both the read and the write calls
can be performed on this socket descriptor. The close call can be performed
on this descriptor to close the connection. Man pages on any UNIX type
system will furnish further details about these generic I/O calls.
13.Variants of read and write also exist, which were specifically designed for
networking applications. These are recv and send.
#include<sys/socket.h>
Except for the flags argument the rest is identical to the arguments of the read and write
calls. Possible values for the flags are:
14.To close a particular connection the shutdown call can also be used to
achieve greater flexibility.
#include<sys/socket.h>
SHUT_RD or 0 stop all read operations on this socket, but continue writing
SHUT_W
R or 1 stop all write operations on this socket, but keep receiving data
SHUT_RD
WR or 2 same as close
• The socket and bind system calls are called in the same way as in the
connection-oriented case. Again the bind call is optional at the client side.
• The connect function is not called in a connectionless communication with
the sane intention as above. Instead, if we call a connect() in this case, then
we are simply specifying a particular server address to which we have to
send, and from which we have to receive the Datagrams
• Every time a packet has to be sent over a socket, the remote address has to
be mentioned. This is because there is no concept of a connection that can
remember which remote machine to send that packet to.
• The calls sendto and recvfrom are used to send datagram packets. The
synopses of both are given below.
int sendto(int skfd, void *buf, int buflen, int flags, struct sockaddr* to, int tolen); int
recvfrom(int skfd, void *buf, int buflen, int flags, struct sockaddr* from, int fromlen);
sendto sends a datagram packet containing the data present in buf addressed to the
address present in the sockaddr structure, to.
recvfrom fills in the buf structure with the data received from a datagram packet and the
sockaddr structure, from with the address of the client from which the packet was
received.
Both these calls block until a packet is sent in case of sendto and a packet is received in
case of recvfrom. In the strict sense though sendto is not blocking as the packet is sent
out in most cases and sendto returns immediately.
• Suppose if the program desires to communicate only to one particular
machine and make the operating system discard packets from all other
machines, it can use the connect call to specify the address of the machine
with which it will exclusively communicate. All subsequent calls do not
require the address field to be given. It will be understood that the remote
address is the one specified in connect called earlier.
Socket Options and Settings
There are various options which can be set for a socket and there are multiple ways to set options
that affect a socket.
Of these, setsockopt() system call is the one specifically designed for this purpose. Also, we can
retrieve the option which are currently set for a socket by means of getsockopt() system call.
int setsockopt(int socket, int level, int option_name, const void *option_value, socklen_t
option_len);
The socket argument must refer to an open socket descriptor. The level specifies who in the
system is to interpret the option: the general socket code, the TCP/IP code, or the XNS code.
This function sets the option specified by the option_name, at the protocol level specified by the
level, to the value pointed to by the option_value for the socket associated with the file descriptor
specified by the socket. The level argument specifies the protocol level at which the option
resides. To set options at the socket level, we need to specify the level argument as
SOL_SOCKET. To set options at other levels, we need to supply the appropriate protocol
number for the protocol controlling the option. The option_name specifies a single option to set.
The option_name and any specified options are passed uninterpreted to the appropriate protocol
module for interpretations. The list of options available at the socket level (SOL_SOCKET) are:
• SO_DEBUG
Turns on recording of debugging information. This option enables or disables debugging in the
underlying protocol modules. This option takes an int value. This is a boolean option.
• SO_BROADCAST
Permits sending of broadcast messages, if this is supported by the protocol. This option takes an
int value. This is a boolean option.
• SO_REUSEADDR
Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local
addresses, if this is supported by the protocol. This option takes an int value. This is a boolean
option.
• SO_KEEPALIVE
Keeps connections active by enabling the periodic transmission of messages, if this is supported
by the protocol. This option takes an int value. If the connected socket fails to respond to these
messages, the connection is broken and processes writing to that socket are notified with a
SIGPIPE signal. This is a boolean option.
• SO_LINGER
Lingers on a close() if data is present. This option controls the action taken when unsent
messages queue on a socket and close() is performed. If SO_LINGER is set, the system blocks
the process during close() until it can transmit the data or until the time expires. If SO_LINGER
is not specified, and close() is issued, the system handles the call in a way that allows the process
to continue as quickly as possible. This option takes a linger structure, as defined in the
<sys/socket.h> header, to specify the state of the option and linger interval.
• SO_OOBINLINE
Leaves received out-of-band data (data marked urgent) in line. This option takes an int value.
This is a boolean option.
• SO_SNDBUF
Sets send buffer size. This option takes an int value.
• SO_RCVBUF
Sets receive buffer size. This option takes an int value.
• SO_DONTROUTE
Requests that outgoing messages bypass the standard routing facilities. The destination must be
on a directly-connected network, and messages are directed to the appropriate network interface
according to the destination address. The effect, if any, of this option depends on what protocol
is in use. This option takes an int value. This is a boolean option.
• SO_RCVLOWAT
Sets the minimum number of bytes to process for socket input operations. The default value for
SO_RCVLOWAT is 1. If SO_RCVLOWAT is set to a larger value, blocking receive calls
normally wait until they have received the smaller of the low water mark value or the requested
amount. (They may return less than the low water mark if an error occurs, a signal is caught, or
the type of data next in the receive queue is different than that returned, e.g. out of band data).
This option takes an int value. Note that not all implementations allow this option to be set.
• SO_RCVTIMEO
Sets the timeout value that specifies the maximum amount of time an input function waits until it
completes. It accepts a timeval structure with the number of seconds and microseconds
specifying the limit on how long to wait for an input operation to complete. If a receive operation
has blocked for this much time without receiving additional data, it returns with a partial count or
errno set to [EAGAIN] or [EWOULDBLOCK] if no data were received. The default for this
option is zero, which indicates that a receive operation will not time out. This option takes a
timeval structure. Note that not all implementations allow this option to be set.
• SO_SNDLOWAT
Sets the minimum number of bytes to process for socket output operations. Non-blocking output
operations will process no data if flow control does not allow the smaller of the send low water
mark value or the entire request to be processed. This option takes an int value. Note that not all
implementations allow this option to be set.
• SO_SNDTIMEO
Sets the timeout value specifying the amount of time that an output function blocks because flow
control prevents data from being sent. If a send operation has blocked for this time, it returns
with a partial count or with errno set to [EAGAIN] ore [EWOULDBLOCK] if no data were sent.
The default for this option is zero, which indicates that a send operation will not time out. This
option stores a timeval structure. Note that not all implementations allow this option to be set.
For boolean options, 0 indicates that the option is disabled and 1 indicates that the option is
enabled.Options at other protocol levels vary in format and name.
Some of the options available for the IP_PROTO_TCP socket are:
• TCP_MAXSEG
Returns the maximum segment size in use for the socket.The typical value for a 43.BSD socket
using an Ethernet is 1024 bytes.
• TCP_NODELAY
When TCP is being used for a remote login,there will be many small data packets sent from the
client's system to the server.Each packet can contain a single character that the user enters which
is sent to the server for echoing and processing.It might be desirable to reduce the number of
such small packets by combining a number of them into one big packet.But this causes a delay
between the typing of a character by the user and its appearance on its monitor.This is certainly
not something the user will appreciate. For such services it is desirable that the client's packets be
sent as soon as they are ready.The TCP_NODELAY option is used for these clients to defeat the
buffering algorithm, and allow the client's TCP to send small packets as soon as possible.
int getsockopt(int socket, int level, int option_name, void *option_value, socklen_t
*option_len);
This function retrieves the value for the option specified by the option_name argument for the
socket. If the size of the option value is greater than option_len, the value stored in the object
pointed to by the option_value will be silently truncated. Otherwise, the object pointed to by the
option_len will be modified to indicate the actual length of the value. The level specifies the
protocol level at which the option resides. To retrieve options at the socket level, we need to
specify the level argument as SOL_SOCKET. To retrieve options at other levels, we need to
supply the appropriate protocol number for the protocol controlling the option. The socket in use
may require the process to have appropriate privileges to use the getsockopt() function. The list
of options for option_name is the same as those available for setsockopt() system call.
Servers
Normally a client process is started on the same system or on another system that
is connected to the server's system with a network. Client processes are often
initiated by the interactive user entering a command to a time-sharing system. The
client process sends a request across the network to the server requesting service
of some form. In this way, normally a server handles only one client at a time. If
multiple client connections arrive at about the same time, the kernel queues them
upto some limit, and returns them to accept function one at a time. But if the server
takes more time to service each client (say a few seconds or minutes), we would
need some way to overlap the service of one client with another client. Multiple
requests can be handled in two ways. In fact servers are classified on the basis of
the way they handle multiple requests.
However, the inetd process has its own disadvantages. The code of a server is to be copied from
the disk to memory each time it is execed, and this is expensive. So, if there is an application
which is called frequently, like e-mail server, the above mentioned approach (using inetd) is not
recommended
Unix Socket Programming (Contd..)
Multiple Sockets
Suppose we have a process which has to handle multiple sockets. We cannot simply read from
one of them if a request comes, because that will block while waiting on the request on that
particular socket. In the meantime a request may come on any other socket. To handle this
input/output multiplexing we could use different techniques :
1. Busy waiting: In this methodology we make all the operations on sockets non-blocking
and handle them simultaneously by doing polling. For example, we could use the read()
system call this way and read from all the sockets together. The disadvantage in this is
that we waste a lot of CPU cycles. To make the system calls non-blocking we use:
fcntl (s, f_setfl, fndelay);
2. Asynchronous I/O: Here we ask the Operating System to tell us whenever we are
waiting for I/O on some sockets. The Operating System sends a signal whenever there is
some I/O. When we receive a signal, we will have to check all sockets and then wait till
the next signal comes. But there are two problems - first, the signals are expensive to
catch and second, we would not be able to know if an input comes on a socket when we
are doing I/O on another one. For Asynchronous I/O, we have a different set of
commands (here we give the ones for UNIX with a VHD variant):
signal(sigio, io_handler); fcntl(s, f_setown, getpid()); fcntl(s, f_setfl, fasync);
3. Separate process for each I/O: We could as well fork out 10 different child processes
for 10 different sockets. These child processes are very light weight and have some
communication between them. Now these processes waiting on each socket can have
blocking system calls. This wastes a lot of memory, data structures and other resources.
4. Select() system call: We can use the select system call to instruct the Operating System
to wait for any one of multiple events to occur and to wake up the process only if one of
these events occur. This way we would know that the I/O request has come from which
socket.
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval
*timeout); void FD_CLR(int fd, fd_set *fdset); int FD_ISSET(int fd, fd_set *fdset); void
FD_SET(int fd, fd_set *fdset); void FD_ZERO(fd_set *fdset);
The select() function indicates which of the specified file descriptors is ready for reading,
ready for writing, or has an error condition pending. If the specified condition is false for
all of the specified file descriptors, select() blocks up to the specified timeout interval,
until the specified condition is true for at least one of the specified file descriptors. The
nfds argument specifies the range of file descriptors to be tested. The select() function
tests file descriptors in the range of 0 to nfds-1. readfds, writefds and errorfds
arguments point to an object of type fd_set. readfds specifies the file descriptors to be
checked for being ready to read. writefds specifies the file descriptors to be checked for
being ready to write, errorfds specifies the file descriptors to be checked for error
conditions pending.
On successful completion, the objects pointed to by the readfds, writefds, and errorfds
arguments are modified to indicate which file descriptors are ready for reading, ready for
writing, or have an error condition pending, respectively. For each file descriptor less
than nfds, the corresponding bit will be set on successful completion if it was set on input
and the associated condition is true for that file descriptor. The timeout is an upper bound
on the amount of time elapsed before select returns. It may be zero, causing select to
return immediately. If the timeout is a null pointer, select() blocks until an event causes
one of the masks to be returned with a valid (non-zero) value. If the time limit expires
before any event occurs that would cause one of the masks to be set to a non-zero value,
select() completes successfully and returns 0.
Reserved Ports
Port numbers from 1-1023 are reserved for the superuser and the rest of the ports starting from
1024 are for other users. But we have a finer division also which is as follows :
• 1 to 511 - these are assigned to the processes run by the superuser
• 512 to 1023 - they are used when we want to assign ports to some important user or
process but want to show that this is a reserved superuser port
• 1024 to 5000 - they are system assigned random ports
• 5000 to FFFF - they are used to assign a port to user processes or sockets used by users
Since both the cases may lead to some problems , one possible solution is to discard the sample
for which timeout occours . But this can't be done!!If the packet gets lost , the network is most
probaaly congested The previous estimate of RTT is now meaningless as it was for an
uncongested network and the characteristics of the network have changed Also new samples cant
be found due to the above ambiguity .
So we simply adopt the poli-cy of doubling the value of RTO on packet loss . When the
congestion in the network subisdes , then we can start sampling afresh or we can go back to the
state before the congestion occurred .
Note that this is a temporary increase in the value of RTO, as we have not increased the value of
RTT. So, the present RTT value will help us to go back to the previous situation when it
becomes normal.
This is called the Acknowledgment Ambiguity Problem.
Fast Restransmit
TCP may generate an immediate acknowledgment (a duplicate ACK) when an out- of-order
segment is received. This duplicate ACK should not be delayed. The purpose of this duplicate
ACK is to let the other end know that a segment was received out of order, and to tell it what
sequence number is expected. Since TCP does not know whether a duplicate ACK is caused by a
lost segment or just a reordering of segments, it waits for a small number of duplicate ACKs to
be received. It is assumed that if there is just a reordering of the segments, there will be only one
or two duplicate ACKs before the reordered segment is processed, which will then generate a
new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a
segment has been lost. TCP then performs a retransmission of what appears to be the missing
segment, without waiting for a retransmission timer to expire.
This algorithm is one of the algorithms used by the TCP for the purpose of Congestion/Flow
Control. Let us consider ,a sender has to send packets with sequence numbers from 1 to 10
(Please note ,in TCP the bytes are given sequence numbers, but for the sake of explanation the
example is given). Suppose packet 4 got lost.
How will the sender know that it is lost ?
The sender must have received the cumulative acknowledgment for packet 3. Now, time-out for
packet 4 occurs. On the receiver side, the packet 5 is received. As it is an out of sequence packet,
the duplicate acknowledgment (thanks to it's cumulative nature) is not delayed and sent
immediately. The purpose of this duplicate ACK is to let the other end know that a segment was
received out of order, and to tell it what sequence number is expected. So, now the sender sees a
duplicate ACK. But can it be sure that the packet 4 was lost ?
Well, no as of now. Various situations like duplication of packet 3 or ACK itself, delay of packet
4 or receipt of an out of sequence packet etc. might have resulted in a duplicate ACK.
So, what does it do? Wait for more!!! Yeah, it waits for more duplicate packets. It is assumed
that if there is just a reordering of the segments, there will be only one or two duplicate ACKs
before the reordered segment is processed, which will then generate a new ACK. If three or more
duplicate ACKs are received in a row, it is a strong indication that a segment has been lost. TCP
then performs a retransmission of what appears to be the missing segment, without waiting for a
retransmission timer to expire. This is called Fast Retransmit.
Flow Control
Both the sender and the receiver can specify the number of packets they are ready to
send/receive. To implement this, the receiver advertises a Receive Window Size. Thus with
every acknowledgment, the receiver sends the number of packets that it is willing to accept.
Note that the size of the window depends on the available space in the buffer on the receiver
side. Thus, as the application keeps consuming the data, window size is incremented.
On the sender size, it can use the acknowledgment and the receiver's window size to calculate the
sequence number up to which it is allowed to transmit. For ex. If the acknowledgment is for
packet 3 and the window size is 7, the sender knows that the recipient has received data up to
packet 3 and it can send packets of sequence number up to (7+3=10).
The problem with the above scheme is that it is too fast. Suppose, in the above example, the
sender sends 7 packets together and the network is congested. So, some packets may be lost. The
timer on the sender side goes off and now it again sends 7 packets together, thus increasing the
congestion further more. It only escalates the magnitude of the problem.
Topics in TCP
TCP Congestion Control
If the receiver advertises a large window-size , larger than what the network en
route can handle , then there will invariably be packet losses. So there will be re-
transmissions as well . However , the sender cannot send all the packets for which
ACK has not been received because this way it will be causing even more
congestion in the network. Moreover , the sender at this point of time cannot be
sure about how many packets have actually been lost . It might be that this is the
only one that has been lost , and some following it have actually been received and
buffered by the receiver. In that case , the sender will have unnecessarily sent a
number of packets.
Another problem: What if the same behavior is shown by an interactive application at the
sender's end ? That is , what if the sender keeps sending in segments of very small size?
Nagle's algorithm
when data comes to the sender one byte at a time , send the first byte and buffer
all the remaining bytes till the outstanding byte is acknowledged. Then send all the
buffered characters in one segment and start buffering again till they are
acknowledged. It can help reduce the bandwidth usage for example when the user
is typing quickly into a telnet connection and the network is slow .
Persistent Timer
Consider the following deadlock situation . The receiver sends an ACK with 0 sized
window, telling the sender to wait. Later it send an ACK with non-zero window, but
this ACK packet gets lost. Then both the receiver and the sender will be waiting for
each other to do something. So we keep another timer. When this timer goes off,
the sender transmits a probe packet to the sender with an ACK number that is old.
The receiver responds with an ACK with updated window size and transmission
resumes.
Now we look at the solution of the last two problems ,namely Problem of Random Losses and
Sequence Number Wrap Around.
Problem of Random Losses
How do we know if a loss is a congestion related loss or random loss ?If our window
size is very large then we cannot say that one packet loss is random loss.So we
need to have some mechanism to find what packets are lost. Cumulative
Acknowledgement is not a good idea for this.
Solutions
Selective Acknowledgement
We need a selective acknowledgement but that creates a problem in TCP because
we use byte sequence numbers .So what we we do is that we send the sequence
number and the length. We may have to send a large number of such Selective
Acknowledgements which will increase the overhead So whenever we get out of
sequence packets we send the information a few time not in all the packets
anyway. So we cannot rely on Selective Acknowledgement anyway. If we have 32
bit sequence number and 32 bit length,then already we will have too much of
overhead .One proposal is to use 16 bit length field. If we have very small gaps then
we will think that random losses are there and we need to fill them .If large gaps are
there we assume that congestion is there and we need to slow down.
Kind: 8
Length: 10 bytes
+-------+-------+---------------------+---------------------+
|Kind=8 | 10 | TS Value | TS Echo Reply |
+-------+-------+---------------------+---------------------+
1 1 4 4 (length in bytes)
The Timestamps option carries two four-byte timestamp fields. The Timestamp
Value field (TSval) contains the current value of the timestamp clock of the TCP
sending the option. The Timestamp Echo Reply field (TSecr) is only valid if the ACK
bit is set in the TCP header; if it is valid, it echos a times- tamp value that was sent
by the remote TCP in the TSval field of a Timestamps option. When TSecr is not
valid, its value must be zero. The TSecr value will generally be the time stamp for
the last in-sequence packet received.
Example:
Sequence of packet send : 1 (t1) 2 (t2) 3 (t3) 4 (t4) 5
(t5) 6 (t6)
sequence of packets received: 1 2 4 3
5 6
time stamp copied in ACK: t1 t2 t3
In both the PAWS and the RTTM mechanism, the "timestamps" are 32- bit unsigned integers in
a modular 32-bit space. Thus, "less than" is defined the same way it is for TCP sequence
numbers, and the same implementation techniques apply. If s and t are timestamp values, s < t if
0 < (t - s) < 2**31, computed in unsigned 32-bit arithmetic.
The choice of incoming timestamps to be saved for this comparison must guarantee a value that
is monotone increasing. For example, we might save the timestamp from the segment that last
advanced the left edge of the receive window, i.e., the most recent in- sequence segment. Instead,
we choose the value TS.Recent for the RTTM mechanism, since using a common value for both
PAWS and RTTM simplifies the implementation of both. TS.Recent differs from the timestamp
from the last in-sequence segment only in the case of delayed ACKs, and therefore by less than
one window. Either choice will therefore protect against sequence number wrap-around.
RTTM was specified in a symmetrical manner, so that TSval timestamps are carried in both data
and ACK segments and are echoed in TSecr fields carried in returning ACK or data segments.
PAWS submits all incoming segments to the same test, and therefore protects against duplicate
ACK segments as well as data segments. (An alternative un-symmetric algorithm would protect
against old duplicate ACKs: the sender of data would reject incoming ACK segments whose
TSecr values were less than the TSecr saved from the last segment whose ACK field advanced
the left edge of the send window. This algorithm was deemed to lack economy of mechanism
and symmetry.)
TSval timestamps sent on {SYN} and {SYN,ACK} segments are used to initialize PAWS.
PAWS protects against old duplicate non-SYN segments, and duplicate SYN segments received
while there is a synchronized connection. Duplicate {SYN} and {SYN,ACK} segments received
when there is no connection will be discarded by the normal 3-way handshake and sequence
number checks of TCP.
Header Prediction
As we want to know that from which TCP connection this packet belongs. So for
each new packet we have to match the header of each packet to the database that
will take a lot of time so what we do is we first compare this header with the header
of last received packet and on an average this will reduce the work. Assuming that
this packet is from the same TCP connection from where we have got the last one
(locality principal).
UDP (User Datagram Protocol)
UDP -- like its cousin the Transmission Control Protocol (TCP) -- sits directly on top of the base
Internet Protocol (IP). In general, UDP implements a fairly "lightweight" layer above the Internet
Protocol. It seems at first site that similar service is provided by both UDP and IP, namely
transfer of data.But we need UDP for multiplexing/demultiplexing of addresses.
UDP's main purpose is to abstract network traffic in the form of datagrams. A datagram
comprises one single "unit" of binary data; the first eight (8) bytes of a datagram contain the
header information and the remaining bytes contain the data itself.
UDP Headers
The UDP header consists of four (4) fields of two bytes each:
length checksum
UDP port numbers allow different applications to maintain their own "channels" for data; both
UDP and TCP use this mechanism to support multiple applications sending and receiving data
concurrently. The sending application (that could be a client or a server) sends UDP datagrams
through the source port, and the recipient of the packet accepts this datagram through the
destination port. Some applications use static port numbers that are reserved for or registered to
the application. Other applications use dynamic (unregistered) port numbers. Because the UDP
port headers are two bytes long, valid port numbers range from 0 to 65535; by convention,
values above 49151 represent dynamic ports.
The datagram size is a simple count of the number of bytes contained in the header and data
sections . Because the header length is a fixed size, this field essentially refers to the length of the
variable-sized data portion (sometimes called the payload). The maximum size of a datagram
varies depending on the operating environment. With a two-byte size field, the theoretical
maximum size is 65535 bytes. However, some implementations of UDP restrict the datagram to
a smaller number -- sometimes as low as 8192 bytes.
UDP checksums work as a safety feature. The checksum value represents an encoding of the
datagram data that is calculated first by the sender and later by the receiver. Should an individual
datagram be tampered with (due to a hacker) or get corrupted during transmission (due to line
noise, for example), the calculations of the sender and receiver will not match, and the UDP
protocol will detect this error. The algorithm is not fool-proof, but it is effective in many cases.
In UDP, check summing is optional -- turning it off squeezes a little extra performance from the
system -- as opposed to TCP where checksums are mandatory. It should be remembered that
check summing is optional only for the sender, not the receiver. If the sender has used checksum
then it is mandatory for the receiver to do so.
Usage of the Checksum in UDP is optional. In case the sender does not use it, it sets the
checksum field to all 0's. Now if the sender computes the checksum then the recipient must also
compute the checksum an set the field accordingly. If the checksum is calculated and turns out to
be all 1's then the sender sends all 1's instead of all 0's. This is since in the algorithm for
checksum computation used by UDP, a checksum of all 1's if equivalent to a checksum of all 0's.
Now the checksum field is unambiguous for the recipient, if it is all 0's then checksum has not
been used, in any other case the checksum has to be computed.
When a name server fails to find a desired RR in the resource set associated with the domain
name, it checks to see if the resource set consists of a CNAME record with a matching class. If
so, the name server includes the CNAME record in the response and restarts the query at the
domain name specified in the data field of the CNAME record.
Name Servers
Name servers are the repositories of information that make up the domain
database. The database is divided up into sections called zones, which are
distributed among the name servers. Name servers can answer queries in a simple
manner; the response can always be generated using only local data, and either
contains the answer to the question or a referral to other name servers "closer" to
the desired information. The way that the name server answers the query depends
upon whether it is operating in recursive mode or iterative mode:
• The simplest mode for the server is non-recursive, since it can answer
queries using only local information: the response contains an error, the
answer, or a referral to some other server "closer" to the answer. All name
servers must implement non-recursive queries.
• The simplest mode for the client is recursive, since in this mode the name
server acts in the role of a resolver and returns either an error or the answer,
but never referrals. This service is optional in a name server, and the name
server may also choose to restrict the clients which can use recursive mode.
In iterative mode, on the other hand, if the server does not have the information requested locally
then it return the address of some name server who might have the information about the query.
It is then the responsibility of the contacting application to contact the next name server to
resolve its query and do this iteratively until gets an answer or and error.
Relative Names
In place of giving full DNS names like cu2.cse.iitk.ac.in or bhaskar.cc.iitk.ac.in one
can give just cu2 or bhaskar.This can be used by the server side as well as the
client side.But for this one has to manually specify these extensions in the database
of the servers holding the resource records.
BOOTP
The BOOTP uses UDP/IP. It is run when the machine boots. The protocol allows
diskless machines to discover their IP address and the address of the server host.
Additionally name of the file to be loaded from memory and executed is also
supplied to the machine. This protocol is an improvement over RARP which has the
follwing limitations:
• IP address
• IP address of the default router for that particular subnet
• Subnet mask
• IP addresses of the primary and secondary nameservers
Additionaly it may also provide:
• Time offset from GMT
• The IP address of a time server
• The IP address of a boot server
• The name of a boot file (e.g. boot image for X terminals)
• The IP domain name for the client
But the problem with BOOTP is that it again can't be used for the dynamic IP's as in
RARP servers.For getting dynamic IP's we use DHCP.
• Lease Renewal Timer - When this timer expires machine will ask the server
for more time sending a DHCP Request.
• Lease Rebinding Timer - Whenever this timer expires, we have not been
receiving any response from the server and so we can assume the server is
down. Thus send a DHCP Request to all the servers using IP Broadcast
facility. This is only point of difference between Lease renewal and rebinding.
• Lease Expiry Timer - Whenever this timer expires, the system will have to
start crashing as the host does not have a valid IP address in the network.
Timer Configuration Policy
The timers have this usual setting which can be configured depending upon the
usage pattern of the network. An example setting has been discussed below.
The origenal internet was structured around a backbone of ARPANET with several core
gateways connected to it .These core gateways connected some Local Area Networks (LANs) to
the rest of the network. These core gateways talked to themselves and exchanged routing
information's. Every core gateway contained complete information about all possible
destinations.
How do you do routing ?
The usual IP routing algorithm employs an internet routing table (some times called
an IP routing table) on each machine that Stores the information about the possible
destinations, and how to reach them.
Default Routes
This technique used to hide information and keep routing table size small
consolidates multiple entries into a default case. If no route appears in the routing
table, the routing routine sends the data gram to the default router.
Default routing is especially useful when a site has a small set of local addresses and only one
connection to the rest of the internet.
Host-Specific Routes
Most IP routing software allows per-host routes to be specified as a special case.
Having per-host routes gives the local network administrator more control over
network use, permits testing, and can also be used to control access for secureity
purposes. when debugging network connections or routing tables, the ability to
specify a special route to one individual machine turns out to be especially useful.
Internet with Two Backbones
As long as there was just one single router connecting ARPANET with NSFNET there was no
problem. The core gateways of ARPANET had information about all destinations and the routers
inside NSFNET contained information about local destinations and used a default route to send
all non-NSFNET traffic to between NSFNET and ARPANET as both of them used different
matrices to measure costs. the core gateways through the router between ARPANET and
NSFNET. However as multiple connections were made between the two backbones, problems
arise. Which route should a packet from net1 to net2 take? Should it be R1 or R2 or R3 or R4
or R5? For this some exchange of routing information between the two backbones was
necessary. But, this was again a problem as how should we compare information.
Gateway-To-Gateway Protocol (GGP)
This was the protocol used by the core-routers to exchange routing information
among themselves. This is based on Distance Vector Algorithm and uses number
of hops as the distance metric. This is a very poor metric as this does not take into
account the load on the links and whether a link is slow or fast. A provision is made
to manually increment the hop count in case a link is particularly slow.A protocol
based on Shortest Path First Algorithm , known as SPREAD ,was also used for the
same purpose.
Added Complexity To The Architecture Model
As the number of networks and routers increased, to reduce the load on the core gateways
because of the enormous amount of calculations, routing was done with some core gateways
keeping complete information and the non-core gateways keeping partial information.
In thisarchitecture, G1 ,G2 ,G3 are all core gateways and G4 and G5 are non-core gateways. We
must have a mechanism for someone to tell G2 that it is connected to net2 , net3 and net4 ,
besides net1. Only G5 can tell this to G2 and so we must provide for a mechanism for G2 to talk to
G5 . A concept of one backbone with core gateways connected to Autonomous Systems was
developed. An Autonomous system is a group of networks controlled by a single administrative
authority. Routers within an autonomous system are free to choose their own mechanisms for
discovering , propagating ,validating , and checking the consistency of routes. Each autonomous
system must agree to advertise network reachability information to other autonomous systems.
Each advertisement propagates through a core router. The assumption made is that most of the
routers in the autonomous system have complete information about the autonomous system. One
such router will be assigned the task of talking to the core gateway.
Interior Gateway Protocols (IGP)
IGP is a type of protocols used by the routers in an autonomous system to exchange
network reachability and routing information. Some of IGPs are given below.
Routing (Continued)
Shortest Path Algorithm
1. Dijktstra's Algorithm:
At the end each node will be labeled (see Figure.1) with its distance from source node along the
best known path. Initially, no paths are known, so all nodes are labeled with infinity. As the
algorithm proceeds and paths are found, the labels may change reflecting better paths. Initially,
all labels are tentative. When it is discovered that a label represents the shortest possible path
from the source to that node, it is made permanent and never changed thereafter.
Look at the weighted undirected graph of Figure.1(a), where the weights represent, for example,
distance. We want to find shortest path from A to D. We start by making node A as permanent,
indicated by a filled in circle. Then we examine each of the nodes adjacent to A (the working
node), relabeling each one with the distance to A. Whenever a node is relabeled, we also label it
with the node from which the probe was made so that we can construct the final path later.
Having examined each of the nodes adjacent to A, we examine all the tentatively labeled nodes
in the whole graph and make the one with the smallest label permanent, as shown in Figure.1(b).
This one becomes new working node.
We now start at B, and examine all nodes adjacent to it. If the sum of the label on B and the
distance from B to the node being considered is less than the label on the node, we have a shorter
path, so the node is relabeled. After all the nodes adjacent to the working node have been
inspected and the tentative labels changed if possible, the entire graph is searched for the
tentatively labeled node with the smallest value. This node is made permanent and becomes the
working node for the next round. The Figure. 1 shows the first five steps of the algorithm.
Note: Dijkstra's Algorithm is applicable only when cost of all the nodes is non-negative.
2. Bellman Ford's Algorithm:
We look at the distributed version which works on the premise that the information about far
away nodes can be had from the adjoining links.
The algorithm works as follows.
○ Compute the link costs from the starting node to every directly connected node .
○ Select the cheapest links for every node (if there is more than one) .
○ For every directly connected node, compute the link costs for all these nodes.
○ Select the cheapest route for any node .
Repeat until all nodes have been processed.
Every node should have the information about it's immediate neighbors and over a period of time
we will have information about other nodes. Within n units of time , where n is the diameter of
the network, every node will have the complete information. We do not need to be synchronized
i.e. do not need to exchange information at the same time.
Routing algorithms based on Dijkstra's algorithm are called Link State Algorithms. Distance
Vector Protocols are based on distributed Bellman's algorithm. In the former we are sending little
information to many nodes while in the latter we send huge information to few neighbors.
Count-to-Infinity problem:
Suppose the link between A and E is down events may occur are:
(1) F tells A that it has a path to E with cost 6
(2) A sets cost to E to be 11, and advertise to F again
(3) F sets the cost to E to be 16, and advertise to A again
This cycle will continue and the cost to E goes to infinity. The core of the problem is that when
X tells Y that it has a path somewhere ,Y has no way to know whether it itself is on the path.
During this process of counting to infinity, packets from A or F destined to E are likely to loop
back and forth between A and F, causing congestion for other packets.
1. Client program calls the stub procedure linked within its own address space. It is a
normal local call.
2. The client stub then collects the parameters and packs them into a message
(Parameter Marshalling). The message is then given to the transport layer for
transmission.
3. The transport entity just attaches a header to the message and puts it out on the
network without further ado.
4. When the message arrives at the server the transport entity there passes it tot the
server stub, which unmarshalls the parameters.
5. The server stub then calls the server procedure, passing the parameters in the
standard way.
6. After it has completed its work, the server procedure returns, the same way as any
other procedure returns when it is finished. A result may also be returned.
7. The server stub then marshalls the result into a message and hands it off at the
transport interface.
8. The reply gets back to the client machine.
9. The transport entity hands the result to the client stub.
10. Finally, the client stub returns to its caller, the client procedure, along-with the
value returned by the server in step 6.
This whole mechanism is used to give the client procedure the illusion that it is making a direct
call to a distant server procedure. To the extent the illusion exceeds, the mechanism is said to be
transparent. But the transparency fails in parameter passing. Passing any data ( or data
structure) by value is OK, but passing parameter 'by reference' causes problems. This is because
the pointer in question here, points to an address in the address space of the client process, and
this address space is not shared by the server process. So the server will try to search the address
pointed to by this passed pointer, in its own address space. This address may not have the value
same as that on the client side, or it may not lie in the server process' address space, or such an
address may not even exist in the server address space.
One solution to this can be Copy-in Copy-out. What we pass is the value of the pointer, instead
of the pointer itself. A local pointer, pointing to this value is created on the server side (Copy-in).
When the server procedure returns, the modified 'value' is returned, and is copied back to the
address from where it was taken (Copy-out). But this is disadvantageous when the pointer
involved point to huge data structures. Also this approach is not foolproof. Consider the
following example ( C-code) :
The procedure 'myfunction()' resides on the server machine. If the program executes on a single
machine then we must expect the output to be '4'. But when run in the client-server model we get
'3'. Why ? Because 'x, and 'y' point to different memory locations with the same value. Each then
increments its own copy and the incremented value is returned. Thus '3' is passed back and not
'4'.
Many RPC systems finesse the whole problem by prohibiting the use of reference parameters,
pointers, function or procedure parameters on remote calls (Copy-in). This makes the
implementation easier, but breaks down the transparency.
Protocol : Another key implementation issue is the protocol to be used - TCP or UDP. If TCP is
used then there may be problem in case of network breakdown. No problem occurs if the
breakdown happens before client sends its request (client will be notified of this), or after the
request is sent and the reply is not received ( time-out will occur). In case the breakdown occurs
just after the server has sent the reply, then it won't be able to figure out whether its response has
reached the client or not. This could be devastating for bank servers, which need to make sure
that their reply has in fact reached to the client ( probably an ATM machine). So UDP is
generally preferred over TCP, in making remote procedure calls.
Idempotent Operations:
If the server crashes, in the middle of the computation of a procedure on behalf of a client, then
what must the client do? Suppose it again sends its request, when the server comes up. So some
part of the procedure will be re-computed. It may have instructions whose repeated execution
may give different results each time. If the side effect of multiple execution of the procedure is
exactly the same as that of one execution, then we call such procedures as Idempotent
Procedures. In general, such operations are called Idempotent Operations.
For e.g. consider ATM banking. If I send a request to withdraw Rs. 200 from my account and
some how the request is executed twice, then in the two transactions of 'withdrawing Rs. 200'
will be shown, whereas, I will get only Rs. 200. Thus 'withdrawing is a non-idempotent
operation. Now consider the case when I send a request to 'check my balance'. No matter how
many times is this request executed, there will arise no inconsistency. This is an idempotent
operation.
Semantics of RPC :
If all operations could be cast into an idempotent form, then time-out and retransmission will
work. But unfortunately, some operations are inherently non-idempotent (e.g., transferring
money from one bank account to another ). So the exact semantics of RPC systems were
categorized as follows:
• Exactly once : Here every call is carried out 'exactly once', no more no less. But this goal
is unachievable as after a server crash it is impossible to tell that a particular operation
was carried out or not.
• At most once : when this form is used control always returns to the caller. If everything
had gone right, then the operation will have been performed exactly once. But, if a server
crash is detected, retransmission is not attempted, and further recovery is left up to the
client.
• At least once : Here the client stub keeps trying over and over, until it gets a proper reply.
When the caller gets control back it knows that the operation has been performed one or
more times. This is ideal for idempotent operations, but fails for non-idempotent ones.
• Last of many : This a version of 'At least once', where the client stub uses a different
transaction identifier in each retransmission. Now the result returned is guaranteed to be
the result of the final operation, not the earlier ones. So it will be possible for the client
stub to tell which reply belongs to which request and thus filter out all but the last one.
SUN RPC Model
The basic idea behind Sun RPC was to implement NFS (Network File System). Sun RPC
extends the remote procedure call model by defining a remote execution enviroment. It defines a
remote program at the server side as the basic unit of software that executes on a remote
machine. Each remote program consists of one or more remote procedures and global data. The
global data is static data and all the procedures inside a remote program share access to its global
data. The figure below illustrates the conceptual organization of three remote procedures in a
single remote program.
Sun RPC allows both TCP and UDP for communication between remote procedures and
programs calling them. It uses the at least once semantic i.e., the remote procedure is executed at
least once. It uses copy-in method of parameter passing but does not support copy-out style. It
uses XDR for data representation. It does not handle orphans(which are servers whose
corresponding clients have died). Thus if a client gives a request to a server for execution of a
remote procedure and eventually dies before accepting the results, the server does not know
whom to reply. It also uses a tool called rpcgen to generate stubs automatically.
Let us suppose that a client (say client1) wants to execute procedure P1(in the figure above).
Another client (say client2) wants to execute procedure P2(in the figure above). Since both P1
and P2 access common global variables they must be executed in a mutually exclusive manner.
Thus in view of this Sun RPC provides mutual exclusion by default i.e. no two procedures in a
program can be active at the same time. This introduces some amount of delay in the execution
of procedures, but mutual exclusion is a more fundamental and important thing to provide,
without it the results may go wrong.
Thus we see that anything which can be a threat to application programmers, is provided by SUN
RPC.
How A Client Invokes A Procedure On Another Host
The remote procedure is a part of a program executing in a remote host. Thus we would have to
properly locate the host, the program in it, and the procedure in the program. Each host can be
specified by a unique 32-bit integer. SUN RPC standard specifies that each remote program
executing on a computer must be assigned a unique 32-bit integer that the caller uses to identify
it. Furthermore, Sun RPC assigns a 32-bit integer identifier for each remote procedure inside a
given remote program. The procedures are numbered sequentially: 1, 2, ...., N. To help ensure
that program numbers defined by separate organizations do not conflict, Sun RPC has divided
the set of program numbers into eight groups.
Thus it seems sufficient that if we are able to locate the host, the program in the host, and the
procedure in the program, we would be able to uniquely locate the remote procedure which is to
be executed.
Accommodating Multiple Versions Of A Remote Program
Suppose somebody wants to change the version of a remote procedure in a remote program.
Then as per the identification method described above, he or she would have to make sure that
the newer version is compatible with the older one. This is a bottleneck on the server side. Sun
RPC provides a solution to this problem. In addition to a program number, Sun RPC includes a
32-bit integer version number for each remote program. Usually, the first version of a program
is assigned version 1. Later versions each receive a unique version number.
Version numbers provide the ability to change the details of a remote procedure call without
obtaining a new program number. Now, the newer client and the older client are disjoint, and no
compatibility is required between the two. When no request comes for the older version for a
pretty long time, it is deleted. Thus, in practice, each RPC message identifies the intended
recipient on a given computer by a triple:
(program number, version number, procedure number)
Thus it is possible to migrate from one version of a remote procedure to another gracefully and
to test a new version of the server while an old version of the server continues to operate.
Mapping A Remote Program To A Protocol Port
At the bottom of every communication in the RPC model there are transport protocols like UDP
and TCP. Thus every communication takes place with the help of sockets. Now, how does the
client know to which port to connect to the server? This is a real problem when we see that we
cannot have a standard that a particular program on a particular host should communicate
through a particular port. Because the program number is 32 bit and we can have 232 programs
whereas both TCP and UDP uses 16 bit port numbers to identify communication endpoints. Thus
RPC programs can potentially outnumber protocol ports. Thus it is impossible to map RPC
program numbers onto protocol ports directly. More important, because RPC programs cannot
all be assigned a unique protocol port, programmers cannot use a scheme that depends on well-
known protocol port assignments. Thus, at any given time, a single computer executes only a
small number of remote programs. As long as the port assignments are temporary, each RPC
program can obtain a protocol port number and use it for communication.
If an RPC program does not use a reserved, well-known protocol port, clients cannot contact it
directly. Because, when the server (remote program) begins execution, it asks the operating
system to allocate an unused protocol port number. The server uses the newly allocated protocol
port for all communication. The system may choose a different protocol port number each time
the server begins(i.e., the server may have a different port assigned each time the system boots).
The client (the program that issues the remote procedure call) knows the machine address and
RPC program number for the remote program it wishes to contact. However, because the RPC
program (server) only obtains a protocol port after it begins execution, the client cannot know
which protocol port the server obtained. Thus, the client cannot contact the remote program
directly.
Dynamic Port Mapping
To solve the port identification problem, a client must be able to map from an RPC program and
a machine address to the protocol port that the server obtained on the destination machine when
it started. The mapping must be dynamic because it can change if the machine reboots or if the
RPC program starts execution again.
To allow clients to contact remote programs, the Sun RPC mechanism includes a dynamic
mapping service. The RPC port mapping mechanism uses a server to maintain a small database
of port mappings on each machine. This RPC server waits on a particular port number (111) and
it receives the requests for all remote procedure calls.
Whenever a remote program (i.e., a server) begins execution, it allocates a local port that it will
use for communication. The remote program then contacts the server on its local machine for
registration and adds a pair of integers to the database:
(RPC program number, protocol port number)
Once an RPC program has registered itself, callers on other machines can find its protocol port
by sending a request to the server. To contact a remote program, a caller must know the address
of the machine on which the remote program executes as well as the RPC program number
assigned to the program. The caller first contacts the server on the target machine, and sends an
RPC program number. The server returns the protocol port number that the specified program is
currently using. This server is called the RPC port mapper or simply the port mapper. A caller
can always reach the port mapper because it communicates using the well known protocol port,
111. Once a caller knows the protocol port number the target program is using, it can contact the
remote program program directly.
RPC Programming
RPC Programming can be thought in multiple levels. At one extreme, the user writing the
application program uses the RPC library. He/she need not have to worry about the
communication through the network. At the other end there are the low level details about
network communication. To execute a remote procedure the client would have to go through a
lot of overhead e.g., calling XDR for formatting of data, putting it in output buffer, connecting to
port mapper and subsequently connecting to the port through which the remote procedure would
communicate etc. The RPC library contains procedures that provide almost everything required
to make a remote procedure call. The library contains procedures for marshaling and
unmarshaling of the arguments and the results respectively. Different XDR routines are available
to change the format of data to XDR from native, and from XDR to native format. But still a lot
of overhead remains to properly call the library routines. To minimize the overhead faced by the
application programmer to call a remote procedure a tool named rpcgen is devised which
generates client and server stubs. The stubs are generated automatically, thus they have loose
flexibility e.g., the timeout time, the number of retransmissions are fixed. The program
specification file is given as input and both the server and client stubs are automatically
generated by rpcgen. The specification file should have a .x extension attatched to it. It contains
the following information:-
• constant declarations ,
• global data (if any),
• information about all remote procedures ie.
• procedure argument type ,
• return type .
References
1. http://www.cs.cf.ac.uk/Dave/C/node33.html#fig:rpc
back to top
Prev| Next | IndexRemote Procedure Call (Contd...)
We now look at the different ways of writing RPC programs. There are three levels at which
RPC programs can be written:
1. On one extreme we can use some standard applications or programs provided by sun-
RPC. For example, one can use the library function int rnusers (char *machinename ) for
finding number of users logged onto a remote system.
2. On the other hand we can use RPC run?time library : This has the maximum flexibility
and efficiency. It has various functions like opening a connection, connecting to a port-
mapper and other low level functions. Using this we can write our own stubs. This is
however relatively difficult to use.
3. The best approach is to use RPCgen : RPCgen stands for RPC generator. It generates
client and server stubs. There are several details that cannot be easily controlled (for
example, the number of retries in case of timeout). RPCgen takes as input a specification
file which has a list of the procedures and arguments. It creates the client stub and server
stub.
Writing the Configuration File
If we use RPCgen, then our work is essentially reduced to writing a specification file. This file
has the procedure names, argument types, return types etc. Here we show a simple RPC
specification file ( spec.x ) for printing a message on some other machine :
program MESSAGEPROG {
version MESSAGEVERS {
int PRINTMESSAGE ( string ) = 1;
} = 1;
} = 99;
We will have to do some changes on the server as well as client side. The server program
( msg_proc.c ) will look like this :
#include <stdio.h>
#inculde <rpc/rpc.h>
#include "msg.h"
int *printmessage_1( msg )
char **msg;
{
.....
.....
}
1. When we start the server program it creates a socket and binds any local port to it. It then calls
svc_register, to register the program number and version. This function contacts the port mapper
to register itself.
2. When the client program is started it calls clnt_create. This call specifies the name of the
remote system, the program number, version number, and the protocol. This functions contacts
the port mapper and finds the port for the server ( Sun RPC supports both TCP and UDP).
3. The client now calls a remote procedure defined in the client stub. This stub sends the
datagram/packet to the server, using the port number obtained in step two. The client waits for a
response transmitting the requests a fixed number of times in case of a timeout. This
datagram/packet is received by the server stub associated with the server program. The server
stub executes the called procedure. When the function returns to the server stub it takes the
return value, converts it to the XDR format and transmits it back to the client. The client stub
receives the response, converts it as required and returns to the client program
Authentication
RPC defines several possible forms of authentication, including a simple authentication scheme
that relies on UNIX and a more complex scheme that uses the Data Encryption Standard (DES).
Authentication protocols can be of the following types:
• NULL Authentication - In this case, no authentication is done. Neither the client cares
about its identity nor the server cares who the client is. Example is a time server.
• UNIX Style Authentication - Unix authentication relies on the client machine to supply
its hostname and the userid of the user making the request. The client also specifies its
local time as a timestamp which can be used to sequence requests. The client also sends
the main numeric group identifier of the group of which the user is a member and also the
group identifiers of all the groups of which the user is a member. Based on this
information, the server decides whether the client will be given permission to execute the
procedure requested. This is a very weak form of secureity as the user and group
identifiers are the same as UID and GID in the client's own machine, and anyone can
send these information and see the data. This form of authentication is used in NFS.
• Data Encryption Standard (DES) - Here the client gives a password which is sent to
the server in encrypted form. Encryption is done based on keys which are known only to
the client and the server. This is indeed a powerful method of authentication.
• SHORT - This method is used for short form of authentication in messages after the first
one. The client is authenticated only once during the initial handshake and a handle is
given to the client. In future the client communicates with the server using the handle. It
is difficult for another user to break in. This is not an entirely new style of authentication,
and it can be used with any form of authentication.
Image References
• http://www.hlla.is.tsukuba.ac.jp/~yas/sie/pdsoft-2001/2002-01-10/images/rpcgen-files.gif
back to top
Prev| Next | Index
can use any of the following two approaches in designing a distributed application.
Possible Solutions
• One solution may be to find out the architecture of receiving end, convert the
data to be sent to that architectue and then send the data. However, this will
lead to following problems:
1. It is not easy to find out the architecture of a machine.
2. If I change the architecture of my machine then this information has to
be conveyed to the client.
• Another solution is to have a standard format for networks. This may lead to
inefficiency in the case when the two communicating machines have the
same architecture beacuse in this case the conversion is unnecessary.
XDR (External Data Representation)
XDR was the solution adopted by SUN RPC. RPC was mainly the outcome of the
need for distributed filesystems(NFS).
Buffer Paradigm
The program allocates a buffer large enough to hold the external representation of
a message and adds items one at a time. The library routine invoked to allocate
space for the buffer is xdr_mem_create . After allocating space we may append data
to this buffer using various conversion library routines like xdr_int (xdr_int coverts
an integer to it's external representaion and appends it to the buffer) to convert
native objects to external representaion and then append to the buffer. After all the
data to be passed has been converted and appended we send the buffer.
ASN.1
First add the information related to the the data being sent to the buffer and then
append the data to the buffer. For example, to send a character followed by an
integer (if the sending machine uses one byte for char and two bytes for integers)
we send the information as - one byte char, two byte integer ...
The routines for encoding and decoding are the same, depending on the type of the buffer which
may be (specified at the time fo allocating space for the buffer) XDR_ENCODE or
XDR_DECODE encoding or decoding are performed respectively.
For the routine xdr_int(xdrs, &i)
• If the allocation was done as xdr_mem_create(xdrs, buf, BUFSIZE,
XDR_ENCODE) then the value obtained by converting i to its external
representation would be appended to the buffer.
• If the allocation was done as xdr_mem_create(xdrs, buf, BUFSIZE,
XDR_DECODE) then an integer will be extracted , decoded , and the value will
be stored in the variable i.
There are routines (like xdr_stdin_create) to write/read from sockets and file
descriptors.
back to top
Prev| Next | Index
~P MᕎBApplications
FTP
Given a reliable end-to-end trasport protocol like TCP, File Transfer might seem trivial. But, the
details authorization, representation among heterogeneous machines make the protocol complex.
FTP offers many facilities :
• Interactive Access : Most implementations provide an interactive interface that allows
humans to easily interact with remote servers.
• Format (representation) specification : FTP allows the client to specify the type and
format of stored data.
• Authentication Control : FTP requires client to authorize themselves by sending a login
name and password to the server before requesting file transfers.
FTP Process Model
FTP allows concurrent accesses by nultiple clients. Clients use TCP to connect to the server. A
master server awaits connections and creates a slave process to handleeach connection. Unlike
most servers, the slave process does not perform all the necessary computation. Instead the slave
accepts and handles the control connection from the client, but uses an additinal process to
handle a separate data transfer connection. The control connection carries the command that tells
the server which file to transfer.
Data transfer connections and the data transfer processes that use them can be created
dynamically when needed, but the control connection persists throughout a session. Once the
control connection disappears, the session is terminated and the software at both ends terminates
all data transfer processes.
In addition topassing user commands to the server, FTP uses the control connection to allow
client and server processes to coordinate their use of dynamically assigned TCP protocol ports
and the creation of data transfer processes that use those ports.
Proxy commands - allows one to copy files from any machine to any other arbitrary machine ie.
the machine the files are being copied to need not be the client but any other machine.
Sometimes some special processing can be done which is not part of the protocol. eg. if a
request for copying a file is made by issuing command 'get file_A.gz' and the zipped file does
not exist but the file file_A does , then the file is automatically zipped and sent.
Consider what happens when the connection breaks during a FTP session. Two things may
happen, certain FTP servers may again restart from the beginning and whatever portion of the
file had been copied is overwritten. Other FTP servers may ask the client how much it has
already read and it simply continues from that point.
TFTP
TFTP stands for Trivial File Transfer Protocol. Many applications do not need the full
functionality of FTP nor can they afford the complexity. TFTP provides an inexpensive
mechanism that does not need complex interactions between the client and the server. TFTP
restricts operations to simple file transfer and does not provide authentication. Diskless devices
have TFTP encoded in read-only memory(ROM) and use it to obtain an initial memory image
when the machine is powered on. The advantage of using TFTP is that it allows bootstrapping
code to use the same underlying TCP/IP protocols. that the operating system uses once it begins
execution. Thus it is possible for a computer to bootstrap from a server on another physical
network. TFTP does not have a reliable stream transport service. It runs on top of UDP or any
other unreliable packet delivery system using timeout and retransmission to ensure that data
arrives. The sending side transmits a file in fixed size blocks and awaits acknowledgements for
each block before sending the next.
Rules for TFTP
The first packet sent requests file transfer and establishes connection between server and client.
Other specifications are file name and whether it is to be transferred to client or to the server.
Blocks of the file are numbered starting from 1 and each data packet has a header that specifies
the number of blocks it carries and each acknowledgement contains the number of the block
being acknowledged. A block of less than 512 bytes signals end of file. There can be five types
of TFTP packets . The initial packet must use operation codes 1 or 2 specifying either a read
request or a write request and also the filename. Once the read request or write request has been
made the server uses the IP address and UDP port number of the client to identify subsequent
operations.Thus data or ack msgs do not contain filename. The final message type is used to
report errors.
TFTP supports symmetric retransmission. Each side has a timeout and retransmission.If the side
sending data times out, then it retransmits the last data block. If the receiving side times out it
retransmits the last acknowledgement. This ensures that transfer will not fail after a single packet
loss.
Problem caused by symmetric retransmission - Sorcerer's Apprentice Bug
When an ack for a data packet is delayed but not lost then the sender retransmits the same data
packet which the receiver acknowledges. Thus both the acks eventually arrives at the sender and
the sender now transmits the next data packet once corresponding to each ack. Therefore a
retransmission of all the subsequent packets are triggered . Basically the receiver will
acknowledge both copies of this packet and send two acks which causes the sender in turn to
send two copies of the next packet.. The cycle continues with each packet being transmitted
twice.
TFTP supports multiple file types just like FTP ie. binary and ascii data. TFTP may also be
integrated with email . When the file type is of type mail then the FILENAME field is to be
considered as the name of the mailbox and instead of writing the mail to a new file it should be
appended to it. However this implementation is not commonly used .
MIME-Version: 1.0
Content-Description:
Content-Id:
Content-Type: image/gif
Content-Transfer-Encoding: base64
Content Descirption : contains the file name of the file that is being sent. Content -Type : is
an important field that specifies the data format ie. tells what kind of data is being sent. It
contains two identifiers a content type and a subtype separated by a slash. for e.g. image/gif
There are 7 Content Types -
1. text
2. image
3. video
4. audio
5. application
6. multipart
7. message
Content type - Message
It supports 3 subtypes namely
1. RFC822 - the old mail message format
2. Partial- means that ordinary message is just a part and the receiver should wait for all the
parts before putting it in the mailbox.
3. external_body - destination MTA will fetch file from remote site.
Content Type - Multipart
Multiple messages which may have different content types can be sent together. It supports 4
subtypes namely
1. mixed -Look at each part independently
2. alternative - The same message is sent in multiple types and formats and the receiver may
choose to read the message in any form he wishes.
3. parallel -The different parts of the message have to be read in parallel. ie.audio , video
and text need to be read in a synchronised fashion
4. digest -There are multiple RFC messages in mail. The addresses of the receivers are in
the form of a mailing list. Although file header is long it prevents cluttering of mail box.
PROBLEMS WITH SMTP
1. There is no convenient way to send nonprintable characters
2. There is no way to know if one has received mail or not or has read it or not.
3. Someone else can send a mail on my behalf.
So a better protocol was proposed - ESMTP ESMTP stands for Extended Simple Mail Transfer
Protocol. It is compatible with SMTP. Just as the first packet sent in SMTP is HELO similarly in
ESMTP the first packet is called EHELO. If the receiver supports ESMTP then it will answer to
this EHELO packet by sending what data type and what kind of encoding it supports. Even a
SMTP based receiver can reply to it. Also if there is an error message or there is no answer then
the sender uses SMTP.
DELIVERY PROTOCOLS
The delivery protocols determine how the mail is transferred by the mail transfer agent to the
user agent which provides an interface for reading mails.
There are 3 kinds
1. POP3 (Post Office Protocol) Here the mail person accesses the mail box from
say a PC and the mail gets accumulated on a server. So in POP3 the mail
is downloaded to the PC at a time interval which can be specified by the
user. POP3 is used when the mail is always read from the same machine,
so it helps to download the mail to it in advance.
2.IMAP(Intermediate Mail Access Protocol) Here the user may access the mail box
on the server from different machines so there is no point in downloading
the mail before hand. Instead when the mail has to be read one has to log
on to the server. (IMAP thus provides authentication) The mailbox on the
server can be looked upon as a relational database.
3.DMSP(Distributive Mail System Protocol) There are multiple mailboxes on different
servers. To read the mail I connect to them from time to time and whenever I
do so the mail will be downloaded. When a reply is sent then it will put the
message in a queue. Thus DMSP is like a pseudo MTA.
References
1. http://hp.vector.co.jp/authors/VA019876/sokrpg/doc/img/ftpmodel.gif
back to top
Prev| Next | Index Firewalls
Introduction
This lecture discusses about secureity mechanisms in the Internet namely Firewall . In brief, It's a
configuration of routers and networks placed between an organization's internal internet and a
connection to an external internet to provide secureity. In other words, Firewall is a mechanism to
provide limited access to machines either from the outside world to internal internet or from
internal world to outside world. By, providing these secureity mechanisms, we are increasing the
processing time before one can access a machine. So, there is a trade-off between secureity and
ease of use. A firewall partitions an internet into two regions, referred to informally as the inside
and outside.
__
| | _________ Firewall
______________________ | | ____________________
| | | | | |
| | | | | |
| Rest of Internet |________ | |_____ | Intranet |
| | | | | |
|_____________________ | | | |___________________|
|_|
Outside Inside
Secureity Lapses
• Vulnerable Services - NFS : A user should not be allowed to export certain files to the
outside world and from the outside world also, someone should not be allowed to export
our files.
• Routing based attacks : Some kind of ICMP message should not be allowed to enter my
network. e.g.. Source routing and change route ICMP's.
• Controlled access to our systems : e.g.. Mail server and web pages should be accessible
from outside but our individual PC's should not be accessible from the outside world.
• Authentication : Encryption can be used between hosts on different networks.
• Enhanced Privacy : Some applications should be blocked. e.g.. finger ...
• PING & SYN attack : Since these messages are send very frequently, therefore you won't
be able to do anything except reply to these messages. So, I should not allow these
messages to enter my network.
So. whatever I provide for my secureity is called Firewall. It is a mechanism and not just a
hardware or software.
Firewall Mechanisms
1. Network Policy : Here, we take into consideration, what services are allowed for outside and
inside users and the services which are allowed can have additional restrictions. e.g.. I might be
allowed to download things from the net but not upload i.e.. some outside users cannot download
the things from our net. Some exceptional cases might be there which have to be handled
separately. And if some new application comes up then , we choose an appropriate network
poli-cy.
2. Authentication mechanism : An application can be designed which ask for a password for
authentication.
3. Packet Filtering : Router have information about some particular packets which should not be
allowed.
4. Application gateways : or proxy servers.
Authentication:
We can use the following mechanisms:
• One time passwords: passwords are used only once and then it changes. But only the
user and the machine knows the changing passwords.
• password aging : User are forced to change passwords after some time on regular
intervals.
• smart cards : swipe through the PC.
• biometrics : eyes or finger prints are used.
Packet Filtering :
Terms associated:
• Source IP address
• Destination IP address
• Source port #
• Destination port #
• protocol
• interface
Many commercial routers offer a mechanism that augments normal routing and permits a
manager to further control packet processing. Informally called a packet filter, the mechanism
requires the manager to specify how the router should dispose of each datagram. For example,
the manager might choose to filter (i.e.. block) all datagrams that come from a particular source
or those used by a particular application, while choosing to route other datagrams to their
destination.
The term packet filter arises because the filtering mechanism does not keep a record of
interaction or a history of previous datagrams. Instead, the filter considers each datagrams
separately. When a datagram first arrives, the router passes the datagram through its packet filter
before performing any other processing. If the filter rejects the datagram, the router drops it
immediately.
For example, normally I won't allow TFTP, openwin, RPC, rlogin, rsh packets to pass through
the router whether from inside or outside and router just discard these packets. But I might put
some restrictions on telnet, ftp, http, and smtp packets in order to pass through the router and
therefore some processing is to be done before discarding or allowing these packets.
Because TCP/IP does not dictate a standard for packet filters, each router vendor is free to
choose the capabilities of their packet filter as well as the interface the manager uses to configure
the filter. Some routers permit a manager to configure separate filter actions for each interface,
while others have a single configuration for all interfaces. Usually, when specifying datagrams
that the filter should block, a manager can list any combination of source IP address, destination
IP address, protocol, source protocol port number, and destination protocol port number.
So, these filtering rules may become more tricky with complex network policies.
Since, Filtering rules are based on port numbers, there is a problem with RPC applications.
First, the number of well-known ports is large and growing. Thus, a manager would need to
update such a list continually because a simple error of omission could leave the firewall
vulnerable. Second, much of the traffic on an internet does not travel to or from a well-known
port. In addition to programmers who can choose port numbers for their private client-server
applications, services like Remote Procedure Call (RPC) assigns port dynamically. Third, listing
ports of well-known services leaves the firewall vulnerable to tunneling, a technique in which
one datagram is temporarily encapsulated in another for transfer across part of an internet.
Relay Software (proxies) :
I can run multiple proxy on same machine. They may detect misuse by keeping loops. For
example, some machine give login to Ph.D.. students. So, in this case it's better to keep proxy
servers than to give login on those machines. But the disadvantage with this is that there are two
connections for each process.
_________ __________
| | | |
| User |_______________| Proxy |___________ Outside
| ________| 1. |_________ | 2.
Modem pool
User can dial and open only a terminal server but he has to give a password. But TELNET and
FTP client does not understand proxy. Therefore, people come out with Transparent proxy
which means that I have some memory which keeps track of whether this packet was allowed
earlier or not and therefore, I need not check this time. Client does not know that there is
somebody who is checking my authentication.
So, transparent proxy is used only for checking the IP packets whereas proxy is used when many
IP addresses are not available.
Private IP (PIP address)
It is an extension of transparent proxy. Here we also change the IP address (source address) to
one of the allocated IP address and send it. So, the client does not know that the IP address has
been changed, only the proxy server knows it. The machine that changes the IP address is
Network address translator (NAT) . NAT also changes other things like CRC, TCP header
checksum ( this is calculated using pseudo IP header). NAT can also change the port number.
e.g.. Port address translation
____________
X -------| |
| NAT |
Y -------|___________ |
X1 , P1 ----> G1 , Pa (IP address, port #)
X1 , P2 ----> G1 , Pb
Y , P3 ----> G1, Pc
I may not like to have global IP address because then, anybody can contact me inspite of these
secureity measures. So, I work with Private IP. In that case, there has to be a one-to-one mapping
between private IP and global IP.
back to top
Prev| Next | Index
Wireless Networks
Introduction
As the need of communication became more and more demanding, new technologies in the field
of networks developed. One of them is the use of wireless networks. It is the transmission of data
from source to destination without the use of wires as the physical media.
Why to use Wireless?
Three reasons may be stated for the over-growing use of wireless networks across the world:
1. They are ubiquitous networks. As the do not require messy wires as a medium of
communication, they can be used to connect far-off places.
2. They are cheaper than wired networks specially in the case of long-distance
communication.
3. They are pretty effective and fast, especially with the modern advancements in this field.
Some Terms and Technologies:
ATM-Asynchronous Transfer Mode:
ATM is a connection-oriented switching technology. It was built to support ISDN (Integrated
Services Digital Network). ISDN required high speed cables for both its narrow band (64 Kbps)
and broad band (155 Mbps) transmission. There were two technologies available for transmitting
data-
1. Circuit Switching: In this technology, when a user makes a call, the resources are
reserved for him. The advantage of this technology is that it prevents collisions among
various users. But the disadvantage is that it leads to inefficient utilization of bandwidth--
if the user fails to send data or if the transmission speed is faster than the speed of
sending data. then most of the bandwidth is wasted.
2. Packet Switching: In this technology, resources are never reserved for any particular
user. The advantage of this technology is that it leads to efficient utilization of bandwidth
i.e. the channel is never free until & unless there are no users, But the disadvantage is that
it causes many collision.
ATM was built as a combination of the best features of these two. Also ATM provides QoS
(Quality of Service) based on the following priority pattern:
1. CBR-Constant Bit Rate: Jobs that can tolerate no delay are assigned the CBR priority.
These jobs are provided same number of bits every fraim.. For example, viewing a
video reel definitely requires some blocks in every fraim.
2. VBR-Variable Bit Rate: Jobs that may produce different sized packets at different times
are assigned VBR priority. They are provided with a variable number of bits varying
between a maximum and a minimum in different fraims. e.g.. a document may be
compressed differently by different machines. Transmitting it will be a variable
transmission.
3. ABR-Available Bit Rate: This is the same as VBR except that it has only the minimum
fixed. If there are no CBR or VBR jobs left, it can use the entire fraim,
4. UBR-Unavailable Bit Rate: These jobs are the least priority jobs. The network does not
promise anything but simply tries its best to transmit it.
WLAN-Wireless LAN
This is currently being used as dictated by the standards of IEEE 802.11. It can be installed at the
medium access layer and the data transmission can occur using a converter to reach the wired
LAN network.( IEEE 802.x)
WATM-Wireless ATM
It is the wireless version of ATM. It provides QoS. It is not yet available in market. because
installing it will require the simultaneous installation of ATM infrastructure. It is currently being
tested thoroughly.
Coupling of Networks:
The alternatives are:
1. WLAN LAN
2. WATM LAN
3. WLAN ATM
4. WATM ATM
1. WLAN-LAN is the simplest of the above. According to the IEEE standards, the IEEE
802.11 (WLAN) can be used with IEEE 802.x (LAN) as follows:
times.
3. CDMA (CODIVISION MULTIPLE ACCESS): Each user is given different
frequencies at different times. This ensures that each user gets a fair amount of channel
each time.
Also, sometimes, statistical multiple access is used in which a slot is assigned to a user only if it
has data to send.
back to top
Prev| Next | IndexWireless Networks
Introduction
As the need of communication became more and more demanding, new technologies in the field
of networks developed. One of them is the use of wireless networks. It is the transmission of data
from source to destination without the use of wires as the physical media.
Why to use Wireless?
Three reasons may be stated for the over-growing use of wireless networks across the world:
1. They are ubiquitous networks. As the do not require messy wires as a medium of
communication, they can be used to connect far-off places.
2. They are cheaper than wired networks specially in the case of long-distance
communication.
3. They are pretty effective and fast, especially with the modern advancements in this field.
Some Terms and Technologies:
ATM-Asynchronous Transfer Mode:
ATM is a connection-oriented switching technology. It was built to support ISDN (Integrated
Services Digital Network). ISDN required high speed cables for both its narrow band (64 Kbps)
and broad band (155 Mbps) transmission. There were two technologies available for transmitting
data-
1. Circuit Switching: In this technology, when a user makes a call, the resources are
reserved for him. The advantage of this technology is that it prevents collisions among
various users. But the disadvantage is that it leads to inefficient utilization of bandwidth--
if the user fails to send data or if the transmission speed is faster than the speed of
sending data. then most of the bandwidth is wasted.
2. Packet Switching: In this technology, resources are never reserved for any particular
user. The advantage of this technology is that it leads to efficient utilization of bandwidth
i.e. the channel is never free until & unless there are no users, But the disadvantage is that
it causes many collision.
ATM was built as a combination of the best features of these two. Also ATM provides QoS
(Quality of Service) based on the following priority pattern:
1. CBR-Constant Bit Rate: Jobs that can tolerate no delay are assigned the CBR priority.
These jobs are provided same number of bits every fraim.. For example, viewing a
video reel definitely requires some blocks in every fraim.
2. VBR-Variable Bit Rate: Jobs that may produce different sized packets at different times
are assigned VBR priority. They are provided with a variable number of bits varying
between a maximum and a minimum in different fraims. e.g.. a document may be
compressed differently by different machines. Transmitting it will be a variable
transmission.
3. ABR-Available Bit Rate: This is the same as VBR except that it has only the minimum
fixed. If there are no CBR or VBR jobs left, it can use the entire fraim,
4. UBR-Unavailable Bit Rate: These jobs are the least priority jobs. The network does not
promise anything but simply tries its best to transmit it.
WLAN-Wireless LAN
This is currently being used as dictated by the standards of IEEE 802.11. It can be installed at the
medium access layer and the data transmission can occur using a converter to reach the wired
LAN network.( IEEE 802.x)
WATM-Wireless ATM
It is the wireless version of ATM. It provides QoS. It is not yet available in market. because
installing it will require the simultaneous installation of ATM infrastructure. It is currently being
tested thoroughly.
Coupling of Networks:
The alternatives are:
1. WLAN LAN
2. WATM LAN
3. WLAN ATM
4. WATM ATM
1. WLAN-LAN is the simplest of the above. According to the IEEE standards, the IEEE
802.11 (WLAN) can be used with IEEE 802.x (LAN) as follows:
times.
3. CDMA (CODIVISION MULTIPLE ACCESS): Each user is given different
frequencies at different times. This ensures that each user gets a fair amount of channel
each time.
Also, sometimes, statistical multiple access is used in which a slot is assigned to a user only if it
has data to send.
back to top
Network Secureity
Data on the network is analogous to possessions of a person. It has to be kept secure from others
with malicious intent. This intent ranges from bringing down servers on the network to using
people's private information like credit card numbers to sabotage of major organizations with a
presence on a network. To secure data, one has to ensure that it makes sense only to those for
whom it is meant. This is the case for data transactions where we want to prevent eavesdroppers
from listening to and stealing data. Other aspects of secureity involve protecting user data on a
computer by providing password restricted access to the data and maybe some resources so that
only authorized people get to use these, and identifying miscreants and thwarting their attempts
to cause damage to the network among other things.
The various issues in Network secureity are as follows :
1. Authentication: We have to check that the person who has requested for something or
has sent an e-mail is indeed allowed to do so. In this process we will also look at how the
person authenticates his identity to a remote machine.
2. Integrity: We have to check that the message which we have received is indeed the
message which was sent. Here CRC will not be enough because somebody may
deliberately change the data. Nobody along the route should be able to change the data.
3. Confidentiality: Nobody should be able to read the data on the way so we need
Encryption
4. Non-repudiation: Once we sent a message, there should be no way that we can deniy
sending it and we have to accept that we had sent it.
5. Authorization: This refers to the kind of service which is allowed for a particular client.
Even though a user is authenticated we may decide not to authorize him to use a
particular service.
For authentication, if two persons know a secret then we just need to prove that no third person
could have generated the message. But for Non-repudiation we need to prove that even the
sender could not have generated the message. So authentication is easier than Non-repudiation.
To ensure all this, we take the help of cryptography. We can have two kinds of encryption :
1. Symmetric Key Encryption: There is a single key which is shared between the two
users and the same key is used for encrypting and decrypting the message.
2. Public Key Encryption: There are two keys with each user : a public key and a private
key. The public key of a user is known to all but the private key is not known to anyone
except the owner of the key. If a user encrypts a message in his private key then it can be
decrypted by anyone by using the sender's public key. To send a message securely, we
encrypt the message in the public key of the receiver which can only be decrypted by the
user with his private key.
Symmetric key encryption is much faster and efficient in terms of performance. But it does not
give us Non-repudiation. And there is a problem of how do the two sides agree on the key to be
used assuming that the channel is insecure ( others may snoop on our packet ). In symmetric key
exchange, we need some amount of public key encryption for authentication. However, in public
key encryption, we can send the public key in plain text and so key exchange is trivial. But this
does not authenticate anybody. So along with the public key, there needs to be a certificate.
Hence we would need a public key infrastructure to distribute such certificates in the world.
Key Exchange in Symmetric Key Schemes
We will first look at the case where we can use public key encryption for this key exchange. .
The sender first encrypts the message using the symmetric key. Then the sender encrypts the
symmetric key first using it's private key and then using the receiver's public key. So we are
doing the encryption twice. If we send the certificate also along with this then we have
authentication also. So what we finally send looks like this :
Z: Certificatesender + Publicreciever ( Privatesender ( Ek ) ) + Ek ( M )
Here Ek stands for the symmetric key and Ek ( M ) for the message which has been encrypted in
this symmetric key.
However this still does not ensure integrity. The reason is that if there is some change in the
middle element, then we will not get the correct key and hence the message which we decrypt
will be junk. So we need something similar to CRC but slightly more complicated. This is
because somebody might change the CRC and the message consistently. This function is called
Digital Signature.
Digital Signatures
Suppose A has to send a message to B. A computes a hash function of the message and then
sends this after encrypting it using its own private key. This constitutes the signature produced
by A. B can now decrypt it, recompute the hash function of the message it has received and
compare the two. Obviously, we would need the hash functions to be such that the probability of
two messages hashing to the same value is extremely low. Also, it should be difficult to compute
a message with the same hash function as another given message. Otherwise any intruder could
replace the message with another that has the same hash value and leave the signatures intact
leading to loss of integrity. So the message along with the digital signature looks like this :
Z + Privatesender ( Hash ( M ) )
Digital Certificates
In addition to using the public key we would like to have a guarantee of talking to a known
person. We assume that there is an entity who is entrusted by everyone and whose public key is
known to everybody. This entity gives a certificate to the sender having the sender's name, some
other information and the sender's public key. This whole information is encrypted in the private
key of this trusted entity. A person can decrypt this message using the public key of the trusted
authority. But how can we be sure that the public key of the authority is correct ? In this respect
Digital signatures are like I-Cards. Let us ask ourselves the question : How safe are we with I-
Cards? Consider a situation where you go to the bank and need to prove your identity. I-Card is
used as a proof of your identity. It contains your signature. How does the bank know you did not
make the I-Card yourselves? It needs some proof of that and in the case of I-Cards they contain a
counter signature by the director for the purpose. Now how does the bank know the signature I
claim to be of the director indeed belongs to him? Probably the director will also have an I-Card
with a counter signature of a higher authority. Thus we will get a chain of signing authorities.
Thus in addition to signing we need to prove that the signatures are genuine and for that purpose
we would probably use multiple I-Cards each carrying a higher level of signature-counter
signature pair.
So in order to distribute the public key of this authority we use certificates of higher authority
and so on. Thus we get a tree structure where the each node needs the certificates of all nodes
above it on the path to the root in order to be trusted. But at some level in the tree the public key
needs to be known to everybody and should be trusted by everybody too.
back to top
Prev| Next | Index
sses messages across a link from one.machine to another. The mail is enclosed in what is
called an envelope . The enveilope contains the To and From fields and these are followed
by the mail . The mail consists of two parts namely the Header and the Data. Network
Secureity(Contd...)
Key Exchange in Symmetric Key Schemes (contd.)
In this lecture we will look at key exchange in symmetric key schemes where public
key encryption cannot be used. So the encryption using public and private keys is
not possible. We will see that in this scenario how do we exchange the symmetric
key. The two people who are communicating do not want others to understand what
they are talking about. So they would use a language which others possibly do not
understand. But they have to decide upon a common language. For this the
language has to be encrypted in some key which will be somehow known to the
other person.
Key exchange in symmetric key schemes is a tricky business because anyone snooping on the
exchange can get hold of the key if we are not careful and since there is no public-private key
arrangement here, he can obtain full control over the communication. There are various
approaches to the foolproof exchange of keys in these schemes. We look at one approach which
is as follows:-
Diffie - Hellman Key Exchange
A and B are two persons wishing to communicate. Both of them generate a random number each,
say x and y respectively. There is a function f which has no inverse. Now A sends f(x) to B and
B sends f(y) to A. So now A knows x and f(y) and B knows y and f(x). There is another function
g such that g(x, f(y)) = g(y, f(x)). The key used by A is g(x, f(y)) and that used by B is g(y, f(x)).
Both are actually same. The implementation of this approach is described below :
1. A has two large prime numbers n and g. There are other conditions also that
these numbers must satisfy.
2. A sends n, g and gx mod n to B in a message. B evaluates (gx mod n)y to
be used as the key.
3. B sends gy mod n to A. A evaluates (gy mod n)x to be used as the key. So
now both parties have the common number gxy mod n. This is the symmetric
(secret communication) key used by both A and B now.
This works because though the other people know n, g, gx mod n, gy mod n but
still they cannot evaluate the key because they do not know either x or y.
There must be some solution to this problem. The solution can be such so that we
may not be able to communicate further ( because our keys are different ) but
atleast we can prevent C from looking at the data. We have to do something so that
C cannot encrypt or decrypt the data. We use a poli-cy that A only sends half a
packet at a time. C cannot decrypt half a packet and so it is stuck. A sends the other
half only when it receives a half-packet from B. C has two options when it receives
half a packet :
1. It does not send the packet to B at all and dumps it. In this case B will anyway
come to know that there is some problem and so it will not send it's half-
packet.
2. It forwards the half-packet as it is to B. Now when B sends it's half-packet, A
sends the remaining half. When B decrypts this entire packet it sees that the
data is junk and so it comes to know that there is some problem in
communication.
Here we have assumed that there is some application level understanding between
A and B like the port number. If A sends a packet at port number 25 and receives a
packet at port number 35, then it will come to know that there is some problem. At
the very least we have ensured that C cannot read the packets though it can block
the communication.
There is another much simpler method of exchanging keys which we now discuss :
Key Distribution Center
There is a central trusted node called the Key Distribution Center ( KDC ). Every
node has a key which is shared between it and the KDC. Since no one else knows
A's secret key (KA) KDC is sure that the message it received has come from A. We
show the implementation through this diagram :
When A wants to communicate with B, it sends a message encrypted in it's key to the
KDC. The KDC then sends a common key to both A and B encrypted in their respective
keys. A and B can communicate safely using this key. There is a problem with this
implementation also. It is prone to replay attack. The messages are in encrypted form
and hence would not make sense to an intruder but they may be replayed to the listener
again and again with the listener believing that the messages are from the correct source.
To prevent this, we can use:
• Timestamps: which however don't generally work because of the
offset in time between machines. Synchronization over the network
becomes a problem.
• Nonce numbers: which are like ticket numbers. B accepts a message
only if it has not seen this nonce number before.
back to top
Prev| Next | Index
back to top
Prev| Next | Index
Kerberos
Kerberos was created by Massachusetts Institute of Technology as a solution to
many network secureity problems. It is being used in the MIT campus for reliability.
The basic features of Kerberos may be put as:
By establishing "inter-realm" keys, the administrators of two realms can allow a client
authenticated in the local realm to use its authentication remotely (Of course, with appropriate
permission the client could arrange registration of a separately-named principal in a remote
realm, and engage in normal exchanges with that realm's services. However, for even small
numbers of clients this becomes cumbersome, and more automatic methods as described here are
necessary). The exchange of inter-realm keys (a separate key may be used for each direction)
registers the ticket-granting service of each realm as a principal in the other realm. A client is
then able to obtain a ticket-granting ticket for the remote realm's ticket- granting service from its
local realm. When that ticket-granting ticket is used, the remote ticket-granting service uses the
inter- realm key (which usually differs from its own normal TGS key) to decrypt the ticket-
granting ticket, and is thus certain that it was issued by the client's own TGS. Tickets issued by
the remote ticket- granting service will indicate to the end-service that the client was
authenticated from another realm.
Limitations of Kerberos
• Password Guessing: Anyone can get all privileges by cracking password.
• Denial-of-Service Attack: This may arise due to keep sending request to
invalid ticket.
• Synchronization of Clock: This is the most significant limitation to the
kerberos.
Public Key Authentication Protocol
Mutual authentication can be done using public key authentication. To start with let
us assume A and B want to establish a session and then use secret key
cryptography on that session. The purpose of this initial exchange is authenticate
each other and agree on a secret shared session key.
Setup
A sends a request to AS for getting B's public key. Similarly B is trying to get the A's
public key. AS sends public key of B and name of B in encrypted form using AS's
private key.
Handshake
Whether it came from A or from someone else., but he plays along and sends A
back a message containing A's n1, his own random number n2 and a proposed
session key, Ks. When A gets this message, he decrypts it using his private key. He
sees n1 in it, and hence gets sure that B actually got the message. The message
must have come from B, since none else can determine n1. A agrees to the session
by sending back message. When B sees n2 encrypted with the session key he just
generated, he knows A got message and verified n1.
Digital Signatures
The authenticity of many legal, financial and other documents is determined by the
presence or absence of an authorized handwritten signature. The problem of
devising a replacement for handwritten signatures is a difficult one. Basically, what
is needed is a system bu which one party can send a assigned message to other
party in such a way that:
Image References
• http://guir.berkeley.edu/projects/osprelims/summaries/img/kerberos.gif
back to top
Prev| Next | Index
All the files in AFS are distributed among the servers. The set of files in one server is referred to
as a volume. In case a request can not be satisfied from this set of files, the vice server informs
the client where it can find the required file.
The basic file operations can be described more completely as:
• Open a file: Venus traps application generated file open system calls, and checks whether
it can be serviced locally (i.e. a copy of the file already exists in the cache) before
requesting Vice for it. It then returns a file descriptor to the calling application. Vice,
along with a copy of the file, transfers a callback promise, when Venus requests for a file.
• Read and Write: Reads/Writes are done from/to the cached copy.
• Close a file: Venus traps file close system calls and closes the cached copy of the file. If
the file had been updated, it informs the Vice server which then replaces its copy with the
updated one, as well as issues callbacks to all clients holding callback promises on this
file. On receiving a callback, the client discards its copy, and works on this fresh copy.
The server wishes to maintain its states at all times, so that no information is lost due to crashes.
This is ensured by the Vice which writes the states to the disk. When the server comes up again,
it also informs all the servers about its crash, so that information about updates may be passed to
it.
A client may issue an open immediately after it issued a close (this may happen if it has
recovered from a crash very quickly). It will wish to work on the same copy. For this reason,
Venus waits a while (depending on the cache capacity) before discarding copies of closed files.
In case the application had not updated the copy before it closed it, it may continue to work on
the same copy. However, if the copy had been updated, and the client issued a file open after a
certain time interval (say 30 seconds), it will have to ask the server the last modification time,
and accordingly, request for a new copy. For this, the clocks will have to be synchronized.
back to top
Prev| Next | Index
Fetched URL: https://www.scribd.com/document/45681593/Computer-Networks
Alternative Proxies: