Distributed Os
Distributed Os
L e c t u r e # 1: I n t r o d u c t i o n
OS vs distributed OS
Operating systems:
o Imagine the steps to run a program that prints a single character to the screen
without the help of the OS?
An operating system allows (1) users to access system resources painlessly. (2) system
resources to be used effectively.
o resource management: time management, space management, process
synchronization, deadlock handling, accounting and status information.
o User friendliness: execution environment, error detection and handling,
protection, security, fault tolerance and failure recovery.
Traditional operating systems: running on stand alone computer with single processors.
Advanced operating systems: distributed operating systems, multiprocessor operating
systems and database operating systems.
o running on a system with multiple (autononus) computers/processors
o usually assuming that the individual computers run traditional operating systems.
o One goal of these advanced operating systems is to hide the details of multiple
computers and make users feel that they are using a single computer.
o we need different kind of os due to the difference of system architectures (share
memory systems and distributed memory systems) and due to the difference of
application requirement (database and realtime)
o An example: a large scale distributed file system: Andrew file system.
Some examples of the issues to be considered in advanced operating systems (this is what we will
be covering in the course)
Summary:
This course will NOT cover techniques used in traditional operating systems.
This course will cover techniques to build a high level system services that manage a
group of individual computers.
L e c t u r e # 2: D i s t r i b u t e d
O p e r a t i n g S y s t e ms : a n
i nt r oduc t i on
o Resource sharing
(but not as easily as if on the same machine)
o Enhanced performance
(but 2 machines are not as good as a single machine that is 2
times as fast)
o Improved reliability & availability
(but probability of single failure increases, as does difficulty of
recovery)
o Modular expandability
Distributed OS's have not been economically successful!!!
System models:
the minicomputer model (several minicomputers with each computer supporting multiple
users and providing access to remote resources).
the workstation model (each user has a workstation, the system provides some common
services, such as a distributed file system).
the processor pool model (the model allocates processor to a user according to the user's
needs).
Naming
Scalability
Compatibility
Process synchronization
Data migration: data are brought to the location that needs them.
Security
Structuring
Communication Networks
Communication Models
message passing
remote procedure call (RPC)
# i n c l u d e < s y s /s o c k e t .h >
s s i z e _ t s e n d t o (i n t s o c k e t , c o n s t v o i d *me s s a g e ,
s i z e _ t l e n g t h ,i n t f l a g s ,
c o n s t s t r u c t s o c k a d d r *d e s t _ a d d r , s i z e _ t
d e s t _ l e n );
s s i z e _ t r e c v f r o m(i n t s o c k e t , v o i d *b u f f e r ,
s i z e _ t l e n g t h ,i n t f l a g s ,s t r u c t s o c k a d d r
*a d d r e s s ,
s i z e _ t *a d d r e s s _ l e n );
i n t p o l l (s t r u c t p o l l f d f d s [ ] , n f d s _ t n f d s ,
i n t t i me o u t );
i n t s e l e c t (i n t n f d s , f d _ s e t *r e a d f d s , f d _ s e t
*w r i t e f d s ,
f d _ s e t *e r r o r f d s , s t r u c t t i me v a l *t i me o u t );
You can find more information on these and other socket I/O operations in the Unix man pages.
RPC
With message passing, the application programmer must worry about many details:
parsing messages
pairing responses with request messages
converting between data representations
knowing the address of the remote machine/server
handling communication and system failures
RPC Issues
Binding method
RPC Diagram
L e c t u r e # 3:
The o r e t i c a l
F o u n d a t i o n s --
Cl o c k s i n a
Di s t r i b u t e d
E n v i r o n me n t
Topics for today
Some inherent limitations of a distributed system and their
implication.
Lamport logical clocks
Vector clocks
Distributed systems
A collection of computers that do not share a common clock and a
common memory
Processes in a distributed system exchange information over the
communication channel, the message delay is unpredictable.
The events in a distributed system are not total chaos. Under some
conditions, it is possible to ascertain the order of the events.
Lamport's logical clocks try to catch this.
Total Ordering
We can extend the partial ordering of the happened-before relation to a total
ordering on ervents, by using the logical clocks and resolving any ties by an
arbitrary rule based on the processor/process ID.
Ci(a)< Cj(b) or
Ci(a)=Cj(b) and Pi < Pj
That is, the ordering we get from Lamport's clocks is not enough to
guarantee that if two events precede one another in the ordering relation
they are also causally related. The following Vector Clock scheme is
intended to improve on this.
Vector Clocks
Clock values are vectors
Vector length is n, the number of processes
Ci[i](a) = local time of Pi at event a
Ci[j](a) = time Cj[j](b) of last event b at Pj that is known to happen
before local event a
Vector Clocks
a b if ta < tb
b a if tb < ta
otherwise a and b are concurrent
a b iff ta < tb
Note: These methods serialize the actions of the system. That makes the
behavior more predictable, but also may mean loss of performance, due to
idle time. That, plus scaling problems, means these algorithms are not likely
to be of much use for high-performance computing.
1. VTPj[i] = VTm[i] - 1
2. VTPj[k] ³ VTm[k] for all k ¹ i
Delayed messages are queued at each process, sorted by their
vector timestamps, with concurrent messages ordered by time
of receipt.
3. When m is delivered to Pj, VTPj is updated as usual for vector clocks.
Schiper-Eggli-Sandoz Protocol
Generalizes the above, so that messages do not need to be broadcast, but are
just sent between pairs of processes, and the communication channels do not
need to be FIFO.