0% found this document useful (0 votes)
7 views

CO & OS 5 Units

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

CO & OS 5 Units

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

COMPUTER ORGANIZATION

UNIT-1
Basic Structure of Computers: Computer Types, Functional Units, Basic

Operational Concepts, Bus Structures, Performance – Processor Clock,

Basic Performance Equation, Clock Rate, Performance Measurement,

Historical Perspective.

Machine Instructions and Programs: Numbers, Arithmetic Operations

and Characters, Memory Location and Addresses, Memory Operations,

Instructions and Instruction Sequencing,Addressing Modes, Assembly

Language, Basic Input and Output Operations, Stacks and Queues,

Subroutines, Additional Instructions, Encoding of Machine Instructions

1
COMPUTER ORGANIZATION

CHAPTER – 1
BASIC STRUCTURE OF COMPUTERS

Computer types
A computer can be defined as a fast electronic calculating machine that accepts
the (data) digitized input information process it as per the list of internally stored
instructions and produces the resulting information.

List of instructions are called programs & internal storage is called computer
memory.

The different types of computers are


1. Personal computers: - This is the most common type found in homes, schools,
Business offices etc., It is the most common type of desk top computers with
processing and storage units along with various input and output devices.
2. Note book computers: - These are compact and portable versions of PC
3. Work stations: - These have high resolution input/output (I/O) graphics
capability, but with same dimensions as that of desktop computer. These are used
in engineering applications of interactive design work.
4. Enterprise systems: - These are used for business data processing in medium to
large corporations that require much more computing power and storage capacity
than work stations. Internet associated with servers have become a dominant
worldwide source of all types of information.
5. Super computers: - These are used for large scale numerical calculations
required in the applications like weather forecasting etc.,

Functional unit
A computer consists of five functionally independent main parts input, memory,
arithmetic logic unit (ALU), output and control unit.

2
COMPUTER ORGANIZATION

Input ALU

I/O Processor
Memory

Output Control Unit

Fig a : Functional units of computer


Input device accepts the coded information as source program i.e. high level
language. This is either stored in the memory or immediately used by the processor to
perform the desired operations. The program stored in the memory determines the
processing steps. Basically the computer converts one source program to an object
program. i.e. into machine language.

Finally the results are sent to the outside world through output device. All of
these actions are coordinated by the control unit.

Input unit: -
The source program/high level language program/coded information/simply data
is fed to a computer through input devices keyboard is a most common type. Whenever a
key is pressed, one corresponding word or number is translated into its equivalent binary
code over a cable & fed either to memory or processor.

Joysticks, trackballs, mouse, scanners etc are other input devices.

Memory unit: -
Its function into store programs and data. It is basically to two types

1. Primary memory
2. Secondary memory

1. Primary memory: - Is the one exclusively associated with the processor and operates
at the electronics speeds programs must be stored in this memory while they are being
executed. The memory contains a large number of semiconductors storage cells. Each

3
COMPUTER ORGANIZATION

capable of storing one bit of information. These are processed in a group of fixed site
called word.

To provide easy access to a word in memory, a distinct address is associated with


each word location. Addresses are numbers that identify memory location.

Number of bits in each word is called word length of the computer. Programs
must reside in the memory during execution. Instructions and data can be written into the
memory or read out under the control of processor.

Memory in which any location can be reached in a short and fixed amount of
time after specifying its address is called random-access memory (RAM).

The time required to access one word in called memory access time. Memory
which is only readable by the user and contents of which can’t be altered is called read
only memory (ROM) it contains operating system.

Caches are the small fast RAM units, which are coupled with the processor and
are aften contained on the same IC chip to achieve high performance. Although primary
storage is essential it tends to be expensive.
2 Secondary memory: - Is used where large amounts of data & programs have to be
stored, particularly information that is accessed infrequently.

Examples: - Magnetic disks & tapes, optical disks (ie CD-ROM’s), floppies etc.,

Arithmetic logic unit (ALU):-


Most of the computer operators are executed in ALU of the processor like
addition, subtraction, division, multiplication, etc. the operands are brought into the ALU
from memory and stored in high speed storage elements called register. Then according
to the instructions the operation is performed in the required sequence.

The control and the ALU are may times faster than other devices connected to a
computer system. This enables a single processor to control a number of external devices
such as key boards, displays, magnetic and optical disks, sensors and other mechanical
controllers.

Output unit:-
These actually are the counterparts of input unit. Its basic function is to send the
processed results to the outside world.

Examples:- Printer, speakers, monitor etc.

4
COMPUTER ORGANIZATION

Control unit:-
It effectively is the nerve center that sends signals to other units and senses their
states. The actual timing signals that govern the transfer of data between input unit,
processor, memory and output unit are generated by the control unit.

Basic operational concepts

To perform a given task an appropriate program consisting of a list of instructions is


stored in the memory. Individual instructions are brought from the memory into the
processor, which executes the specified operations. Data to be stored are also stored in
the memory.

Examples: - Add LOCA, R0

This instruction adds the operand at memory location LOCA, to operand in


register R0 & places the sum into register. This instruction requires the performance of
several steps,

1. First the instruction is fetched from the memory into the processor.
2. The operand at LOCA is fetched and added to the contents of R0
3. Finally the resulting sum is stored in the register R0

The preceding add instruction combines a memory access operation with an ALU
Operations. In some other type of computers, these two types of operations are performed
by separate instructions for performance reasons.
Load LOCA, R1
Add R1, R0
Transfers between the memory and the processor are started by sending the
address of the memory location to be accessed to the memory unit and issuing the
appropriate control signals. The data are then transferred to or from the memory.

5
COMPUTER ORGANIZATION

MEMORY

MAR MDR
CONTROL

PC R0
R1

… ALU

IR

n- GPRs

Fig b : Connections between the processor and the memory

The fig shows how memory & the processor can be connected. In addition to the
ALU & the control circuitry, the processor contains a number of registers used for several
different purposes.

The instruction register (IR):- Holds the instructions that is currently being executed.
Its output is available for the control circuits which generates the timing signals that
control the various processing elements in one execution of instruction.

The program counter PC:-


This is another specialized register that keeps track of execution of a program. It
contains the memory address of the next instruction to be fetched and executed.

Besides IR and PC, there are n-general purpose registers R0 through Rn-1 .

6
COMPUTER ORGANIZATION

The other two registers which facilitate communication with memory are: -
1. MAR – (Memory Address Register):- It holds the address of the location to be
accessed.
2. MDR – (Memory Data Register):- It contains the data to be written into or read
out of the address location.

Operating steps are


1. Programs reside in the memory & usually get these through the I/P unit.
2. Execution of the program starts when the PC is set to point at the first instruction
of the program.
3. Contents of PC are transferred to MAR and a Read Control Signal is sent to the
memory.
4. After the time required to access the memory elapses, the address word is read out
of the memory and loaded into the MDR.
5. Now contents of MDR are transferred to the IR & now the instruction is ready to
be decoded and executed.
6. If the instruction involves an operation by the ALU, it is necessary to obtain the
required operands.
7. An operand in the memory is fetched by sending its address to MAR & Initiating
a read cycle.
8. When the operand has been read from the memory to the MDR, it is transferred
from MDR to the ALU.
9. After one or two such repeated cycles, the ALU can perform the desired
operation.
10. If the result of this operation is to be stored in the memory, the result is sent to
MDR.
11. Address of location where the result is stored is sent to MAR & a write cycle is
initiated.
12. The contents of PC are incremented so that PC points to the next instruction that
is to be executed.

Normal execution of a program may be preempted (temporarily interrupted) if


some devices require urgent servicing, to do this one device raises an Interrupt signal.

An interrupt is a request signal from an I/O device for service by the processor.
The processor provides the requested service by executing an appropriate interrupt
service routine.

The Diversion may change the internal stage of the processor its state must be
saved in the memory location before interruption. When the interrupt-routine service is

7
COMPUTER ORGANIZATION

completed the state of the processor is restored so that the interrupted program may
continue.

Bus structure

The simplest and most common way of interconnecting various parts of the
computer. To achieve a reasonable speed of operation, a computer must be organized so
that all its units can handle one full word of data at a given time.A group of lines that
serve as a connecting port for several devices is called a bus.

In addition to the lines that carry the data, the bus must have lines for address and
control purpose. Simplest way to interconnect is to use the single bus as shown

INPUT MEMORY PROCESSOR OUTPUT

Fig c: Single bus structure

Since the bus can be used for only one transfer at a time, only two units can
actively use the bus at any given time. Bus control lines are used to arbitrate multiple
requests for use of one bus.

Single bus structure is

 Low cost
 Very flexible for attaching peripheral devices

Multiple bus structure certainly increases, the performance but also increases the
cost significantly.

8
COMPUTER ORGANIZATION

All the interconnected devices are not of same speed & time, leads to a bit of a
problem. This is solved by using cache registers (ie buffer registers). These buffers are
electronic registers of small capacity when compared to the main memory but of
comparable speed.

The instructions from the processor at once are loaded into these buffers and then
the complete transfer of data at a fast rate will take place.

Performance

The most important measure of the performance of a computer is how quickly it


can execute programs. The speed with which a computer executes program is affected by
the design of its hardware. For best performance, it is necessary to design the compiles,
the machine instruction set, and the hardware in a coordinated way.

The total time required to execute the program is elapsed time is a measure of the
performance of the entire computer system. It is affected by the speed of the processor,
the disk and the printer. The time needed to execute a instruction is called the processor
time.

Just as the elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the execution
of individual machine instructions. This hardware comprises the processor and the
memory which are usually connected by the bus as shown in the fig c.

Main Cache Processor


Memory Memory

Bus

Fig d: The processor cache

The pertinent parts of the fig. c are repeated in fig. d which includes the cache
memory as part of the processor unit.

9
COMPUTER ORGANIZATION

Let us examine the flow of program instructions and data between the memory
and the processor. At the start of execution, all program instructions and the required data
are stored in the main memory. As the execution proceeds, instructions are fetched one
by one over the bus into the processor, and a copy is placed in the cache later if the same
instruction or data item is needed a second time, it is read directly from the cache.

The processor and relatively small cache memory can be fabricated on a single
IC chip. The internal speed of performing the basic steps of instruction processing on
chip is very high and is considerably faster than the speed at which the instruction and
data can be fetched from the main memory. A program will be executed faster if the
movement of instructions and data between the main memory and the processor is
minimized, which is achieved by using the cache.

For example:- Suppose a number of instructions are executed repeatedly over a short
period of time as happens in a program loop. If these instructions are available in the
cache, they can be fetched quickly during the period of repeated use. The same applies to
the data that are used repeatedly.

Processor clock: -
Processor circuits are controlled by a timing signal called clock. The clock
designer the regular time intervals called clock cycles. To execute a machine instruction
the processor divides the action to be performed into a sequence of basic steps that each
step can be completed in one clock cycle. The length P of one clock cycle is an important
parameter that affects the processor performance.

Processor used in today’s personal computer and work station have a clock rates
that range from a few hundred million to over a billion cycles per second.

Basic performance equation

We now focus our attention on the processor time component of the total elapsed
time. Let ‘T’ be the processor time required to execute a program that has been prepared
in some high-level language. The compiler generates a machine language object program
that corresponds to the source program. Assume that complete execution of the program
requires the execution of N machine cycle language instructions. The number N is the
actual number of instruction execution and is not necessarily equal to the number of
machine cycle instructions in the object program. Some instruction may be executed
more than once, which in the case for instructions inside a program loop others may not
be executed all, depending on the input data used.

10
COMPUTER ORGANIZATION

Suppose that the average number of basic steps needed to execute one machine
cycle instruction is S, where each basic step is completed in one clock cycle. If clock rate
is ‘R’ cycles per second, the program execution time is given by

T= N× S
R
this is often referred to as the basic performance equation.

We must emphasize that N, S & R are not independent parameters changing one
may affect another. Introducing a new feature in the design of a processor will lead to
improved performance only if the overall result is to reduce the value of T.

Pipelining and super scalar operation: -


We assume that instructions are executed one after the other. Hence the value of
S is the total number of basic steps, or clock cycles, required to execute one instruction.
A substantial improvement in performance can be achieved by overlapping the execution
of successive instructions using a technique called pipelining.

Consider Add R1 R2 R3
This adds the contents of R1 & R2 and places the sum into R3.

The contents of R1 & R2 are first transferred to the inputs of ALU. After the
addition operation is performed, the sum is transferred to R3. The processor can read the
next instruction from the memory, while the addition operation is being performed. Then
of that instruction also uses, the ALU, its operand can be transferred to the ALU inputs at
the same time that the add instructions is being transferred to R3.

In the ideal case if all instructions are overlapped to the maximum degree
possible the execution proceeds at the rate of one instruction completed in each clock
cycle. Individual instructions still require several clock cycles to complete. But for the
purpose of computing T, effective value of S is 1.

A higher degree of concurrency can be achieved if multiple instructions pipelines


are implemented in the processor. This means that multiple functional units are used
creating parallel paths through which different instructions can be executed in parallel
with such an arrangement, it becomes possible to start the execution of several
instructions in every clock cycle. This mode of operation is called superscalar execution.
If it can be sustained for a long time during program execution the effective value of S
can be reduced to less than one. But the parallel execution must preserve logical
correctness of programs, that is the results produced must be same as those produced by

11
COMPUTER ORGANIZATION

the serial execution of program instructions. Now a days may processor are designed in
this manner.

Clock rate

These are two possibilities for increasing the clock rate ‘R’.

1. Improving the IC technology makes logical circuit faster, which reduces the time
of execution of basic steps. This allows the clock period P, to be reduced and the
clock rate R to be increased.
2. Reducing the amount of processing done in one basic step also makes it possible
to reduce the clock period P. however if the actions that have to be performed by
an instructions remain the same, the number of basic steps needed may increase.

Increase in the value ‘R’ that are entirely caused by improvements in IC


technology affects all aspects of the processor’s operation equally with the exception of
the time it takes to access the main memory. In the presence of cache the percentage of
accesses to the main memory is small. Hence much of the performance gain excepted
from the use of faster technology can be realized.

Instruction set CISC & RISC:-


Simple instructions require a small number of basic steps to execute. Complex
instructions involve a large number of steps. For a processor that has only simple
instruction a large number of instructions may be needed to perform a given
programming task. This could lead to a large value of ‘N’ and a small value of ‘S’ on the
other hand if individual instructions perform more complex operations, a fewer
instructions will be needed, leading to a lower value of N and a larger value of S. It is not
obvious if one choice is better than the other.

But complex instructions combined with pipelining (effective value of S ¿ 1)


would achieve one best performance. However, it is much easier to implement efficient
pipelining in processors with simple instruction sets.

Performance measurements
It is very important to be able to access the performance of a computer, comp
designers use performance estimates to evaluate the effectiveness of new features.

The previous argument suggests that the performance of a computer is given by


the execution time T, for the program of interest.

12
COMPUTER ORGANIZATION

Inspite of the performance equation being so simple, the evaluation of ‘T’ is


highly complex. Moreover the parameters like the clock speed and various architectural
features are not reliable indicators of the expected performance.

Hence measurement of computer performance using bench mark programs is


done to make comparisons possible, standardized programs must be used.

The performance measure is the time taken by the computer to execute a given
bench mark. Initially some attempts were made to create artificial programs that could be
used as bench mark programs. But synthetic programs do not properly predict the
performance obtained when real application programs are run.

A non profit organization called SPEC- system performance evaluation


corporation selects and publishes bench marks.

The program selected range from game playing, compiler, and data base
applications to numerically intensive programs in astrophysics and quantum chemistry. In
each case, the program is compiled under test, and the running time on a real computer is
measured. The same program is also compiled and run on one computer selected as
reference.
The ‘SPEC’ rating is computed as follows.

Running time on the reference computer


SPEC rating =
Running time on the computer under test

If the SPEC rating = 50

Means that the computer under test is 50 times as fast as the ultra sparc 10. This
is repeated for all the programs in the SPEC suit, and the geometric mean of the result is
computed.

Let SPECi be the rating for program ‘i’ in the suite. The overall SPEC rating for
the computer is given by

( )
n 1
n
SPEC rating = π SPECi
i= 1

Where ‘n’ = number of programs in suite.

13
COMPUTER ORGANIZATION

Since actual execution time is measured the SPEC rating is a measure of the
combined effect of all factors affecting performance, including the compiler, the OS, the
processor, the memory of comp being tested.

Multiprocessor & microprocessors:-


 Large computers that contain a number of processor units are called
multiprocessor system.

 These systems either execute a number of different application tasks in parallel or


execute subtasks of a single large task in parallel.
 All processors usually have access to all memory locations in such system &
hence they are called shared memory multiprocessor systems.
 The high performance of these systems comes with much increased complexity
and cost.
 In contrast to multiprocessor systems, it is also possible to use an interconnected
group of complete computers to achieve high total computational power. These
computers normally have access to their own memory units when the tasks they
are executing need to communicate data they do so by exchanging messages over
a communication network. This properly distinguishes them from shared memory
multiprocessors, leading to name message-passing multi computer.

1.10 Number Representation

Consider an n-bit vector


B = bn®1®
¿®
¿b1 b0

Where bi = 0 or 1 for 0 ¿i ≤ n ®1
. This vector can represent unsigned integer values V in
n
the range 0 to 2 -1, where
V(B) = bn®1× 2 +¿® ¿®
n®1 +b1 × 21+b0× 20
We obviously need to represent both positive and negative numbers. Three systems are
used for representing such numbers :
• Sign-and-magnitude
• 1’s-complement
• 2’s-complement
In all three systems, the leftmost bit is 0 for positive numbers and 1 for negative numbers.
Fig 2.1 illustrates all three representations using 4-bit numbers. Positive values have
identical representations in al systems, but negative values have different representations.
In the sign-and-magnitude systems, negative values are represented by changing the most
significant bit (b3 in figure 2.1) from 0 to 1 in the B vector of the corresponding positive
value. For example, +5 is represented by 0101, and -5 is represented by 1101. In 1’s-

14
COMPUTER ORGANIZATION

complement representation, negative values are obtained by complementing each bit of


the corresponding positive number. Thus, the representation for -3 is obtained by
complementing each bit in the vector 0011 to yield 1100. clearly, the same operation, bit
complementing, is done in converting a negative number to the corresponding positive
value. Converting either way is referred to as forming the 1’s-complement of a given
number. Finally, in the 2’s-complement system, forming the 2’s-complement of a number
is done by subtracting that number from 2n.

B Values represented
Sign and
b3b2b1 1's 2's
b0 magnitude complement complement
0 1 1
1 +7 +7 +7
0 1 1
0 +6 +6 +6
0 1 0
1 +5 +5 +5
0 1 0
0 +4 +4 +4
0 0 1
1 +3 +3 +3
0 0 1
0 +2 +2 +2
0 0 0
1 +1 +1 +1
0 0 0
0 +0 +0 +0
1 0 0
0 -0 -0 -0
1 0 0
1 -1 -1 -1
1 0 1
0 -2 -2 -2
1 0 1
1 -3 -3 -3
1 1 0
0 -4 -4 -4
1 1 0
1 -5 -5 -5

15
COMPUTER ORGANIZATION

1 1 1
0 -6 -6 -6
1 1 1
1 -7 -7 -7
Hence, the 2’s complement of a number is obtained by adding 1 to the 1’s complement of
that number.

Addition of Positive numbers:-


Consider adding two 1-bit numbers. The results are shown in figure 2.2. Note
that the sum of 1 and 1 requires the 2-bit vector 10 to represent the value 2. We say that
the sum is 0 and the carry-out is 1. In order to add multiple-bit numbers, we use a method
analogous to that used for manual computation with decimal numbers. We add bit pairs
starting from the low-order (right) and of the bit vectors, propagating carries toward the high-
order (left) end.

0 1 0 1
+0 +0 +1 +1

0 1 1 10

Carry-out
Figure 2.2 Addition of 1-bit numbers.

Memory locations and addresses

Number and character operands, as well as instructions, are stored in the memory
of a computer. The memory consists of many millions of storage cells, each of which can
store a bit of information having the value 0 or 1. Because a single bit represents a very
small amount of information, bits are seldom handled individually. The usual approach is
to deal with them in groups of fixed size. For this purpose, the memory is organized so
that a group of n bits can be stored or retrieved in a single, basic operation. Each group of
n bits is referred to as a word of information, and n is called the word length. The
memory of a computer can be schematically represented as a collection of words as
shown in figure (a).

Modern computers have word lengths that typically range from 16 to 64 bits. If
the word length of a computer is 32 bits, a single word can store a 32-bit 2’s complement
number or four ASCII characters, each occupying 8 bits. A unit of 8 bits is called a byte.

16
COMPUTER ORGANIZATION

Accessing the memory to store or retrieve a single item of information, either a


word or a byte, requires distinct names or addresses for each item location. It is
customary to use numbers from 0 through 2K-1, for some suitable values of k, as the
addresses of successive locations in the memory. The 2k addresses constitute the address
space of the computer, and the memory can have up to 2 k addressable locations. 24-bit
address generates an address space of 224 (16,777,216) locations. A 32-bit address creates
an address space of 232 or 4G (4 giga) locations.

BYTE ADDRESSABILITY:-
We now have three basic information quantities to deal with: the bit, byte and
word. A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits.
The most practical assignment is to have successive addresses refer to successive byte

Fig a Memory words

n bits
First word

Second word


i-th word



Last word

(a) A signed integer

32 bits
b31 b30 ……. b1 b0

Sign bit: b31 = 0 for positive numbers


b32 = 1 for negative numbers

17
COMPUTER ORGANIZATION

(b) Four characters

8 bits 8 bits 8 bits 8 bits


ASCII ASCII ASCII ASCII
Character character character character

Locations in the memory. This is the assignment used in most modern computers, and is
the one we will normally use in this book. The term byte-addressable memory is use for
this assignment. Byte locations have addresses 0,1,2, …. Thus, if the word length of the
machine is 32 bits, successive words are located at addresses 0,4,8,…., with each word
consisting of four bytes.

BIG-ENDIAN AND LITTLE-ENDIAN ASIGNMENTS:-


There are two ways that byte addresses can be assigned across words, as shown
in fig b. The name big-endian is used when lower byte addresses are used for the more
significant bytes (the leftmost bytes) of the word. The name little-endian is used for the
opposite ordering, where the lower byte addresses are used for the less significant bytes
(the rightmost bytes) of the word.

In addition to specifying the address ordering of bytes within a word, it is also


necessary to specify the labeling of bits within a byte or a word. The same ordering is
also used for labeling bits within a byte, that is, b7, b6, …., b0 , from left to right.
Word
Address Byte address Byte address

0 1 2 3 3 2 1 0
0 0

4 5 6 7 7 6 5 4
4 4

…. ….
…. ….
…. ….

2k-4 2k-3 2k-2 2k-1 2k-1 2k-2 2k-3 2k-4

2k-4 2k-4

(a) Big-endian assignment (b) Little-endian assignment

18
COMPUTER ORGANIZATION

WORD ALIGNMENT:-
In the case of a 32-bit word length, natural word boundaries occur at addresses 0,
4, 8, …, as shown in above fig. We say that the word locations have aligned addresses .
in general, words are said to be aligned in memory if they begin at a byte address that is a
multiple of the number of bytes in a word. The memory of bytes in a word is a power of
2. Hence, if the word length is 16 (2 bytes), aligned words begin at byte addresses
0,2,4,…, and for a word length of 64 (23 bytes), aligned words begin at bytes addresses
0,8,16 ….

There is no fundamental reason why words cannot begin at an arbitrary byte


address. In that case, words are said to have unaligned addresses. While the most
common case is to use aligned addresses, some computers allow the use of unaligned
word addresses.

ACCESSING NUMBERS, CHARACTERS, AND CHARACTER STRINGS:-


A number usually occupies one word. It can be accessed in the memory by
specifying its word address. Similarly, individual characters can be accessed by their byte
address.

In many applications, it is necessary to handle character strings of variable


length. The beginning of the string is indicated by giving the address of the byte
containing its first character. Successive byte locations contain successive characters of
the string. There are two ways to indicate the length of the string. A special control
character with the meaning “end of string” can be used as the last character in the string,
or a separate memory word location or processor register can contain a number indicating
the length of the string in bytes.

Memory operations
Both program instructions and data operands are stored in the memory. To
execute an instruction, the processor control circuits must cause the word (or words)
containing the instruction to be transferred from the memory to the processor. Operands
and results must also be moved between the memory and the processor. Thus, two basic
operations involving the memory are needed, namely, Load (or Read or Fetch) and Store
(or Write).

The load operation transfers a copy of the contents of a specific memory location
to the processor. The memory contents remain unchanged. To start a Load operation, the
processor sends the address of the desired location to the memory and requests that its

19
COMPUTER ORGANIZATION

contents be read. The memory reads the data stored at that address and sends them to the
processor.

The store operation transfers an item of information from the processor to a


specific memory location, destroying the former contents of that location. The processor
sends the address of the desired location to the memory, together with the data to be
written into that location.

An information item of either one word or one byte can be transferred between
the processor and the memory in a single operation. Actually this transfer in between the
CPU register & main memory.

Instructions and instruction sequencing

A computer must have instructions capable of performing four types of


operations.
• Data transfers between the memory and the processor registers
• Arithmetic and logic operations on data
• Program sequencing and control
• I/O transfers

REGISTER TRANSFER NOTATION:-


Transfer of information from one location in the computer to another. Possible
locations that may be involved in such transfers are memory locations that may be
involved in such transfers are memory locations, processor registers, or registers in the
I/O subsystem. Most of the time, we identify a location by a symbolic name standing for
its hardware binary address.

Example, names for the addresses of memory locations may be LOC, PLACE, A,
VAR2; processor registers names may be R0, R5; and I/O register names may be
DATAIN, OUTSTATUS, and so on. The contents of a location are denoted by placing
square brackets around the name of the location. Thus, the expression
R1 ®[LOC]
Means that the contents of memory location LOC are transferred into processor register
R1.

As another example, consider the operation that adds the contents of registers R1
and R2, and then places their sum into register R3. This action is indicated as
R3 ®[R1] + [R2]

20
COMPUTER ORGANIZATION

This type of notation is known as Register Transfer Notation (RTN). Note that
the right-hand side of an RTN expression always denotes a value, and the left-hand side
is the name of a location where the value is to be places, overwriting the old contents of
that location.

ASSEMBLY LANGUAGE NOTATION:-


Another type of notation to represent machine instructions and programs. For
this, we use an assembly language format. For example, an instruction that causes the
transfer described above, from memory location LOC to processor register R1, is
specified by the statement
Move LOC, R1

The contents of LOC are unchanged by the execution of this instruction, but the
old contents of register R1 are overwritten.

The second example of adding two numbers contained in processor registers R1


and R2 and placing their sum in R3 can be specified by the assembly language statement
Add R1, R2, R3

BASIC INSTRUCTIONS:-
The operation of adding two numbers is a fundamental capability in any
computer. The statement
C=A+B

In a high-level language program is a command to the computer to add the


current values of the two variables called A and B, and to assign the sum to a third
variable, C. When the program containing this statement is compiled, the three variables,
A, B, and C, are assigned to distinct locations in the memory. We will use the variable
names to refer to the corresponding memory location addresses. The contents of these
locations represent the values of the three variables. Hence, the above high-level
language statement requires the action.
C ®[A] + [B]

To carry out this action, the contents of memory locations A and B are fetched
from the memory and transferred into the processor where their sum is computed. This
result is then sent back to the memory and stored in location C.

Let us first assume that this action is to be accomplished by a single machine


instruction. Furthermore, assume that this instruction contains the memory addresses of

21
COMPUTER ORGANIZATION

the three operands – A, B, and C. This three-address instruction can be represented


symbolically as
Add A, B, C

Operands A and B are called the source operands, C is called the destination
operand, and Add is the operation to be performed on the operands. A general instruction
of this type has the format.
Operation Source1, Source 2, Destination

If k bits are needed for specify the memory address of each operand, the encoded
form of the above instruction must contain 3k bits for addressing purposes in addition to
the bits needed to denote the Add operation.

An alternative approach is to use a sequence of simpler instructions to perform


the same task, with each instruction having only one or two operands. Suppose that two-
address instructions of the form
Operation Source, Destination

Are available. An Add instruction of this type is


Add A, B

Which performs the operation B ®[A] + [B].

A single two-address instruction cannot be used to solve our original problem,


which is to add the contents of locations A and B, without destroying either of them, and
to place the sum in location C. The problem can be solved by using another two-address
instruction that copies the contents of one memory location into another. Such an
instruction is
Move B, C

®
Which performs the operations C [B], leaving the contents of location B unchanged.

®
Using only one-address instructions, the operation C [A] + [B] can be
performed by executing the sequence of instructions
Load A
Add B
Store C

Some early computers were designed around a single accumulator structure.


Most modern computers have a number of general-purpose processor registers – typically

22
COMPUTER ORGANIZATION

8 to 32, and even considerably more in some cases. Access to data in these registers is
much faster than to data stored in memory locations because the registers are inside the
processor.

Let Ri represent a general-purpose register. The instructions


Load A, Ri
Store Ri, A and
Add A, Ri

Are generalizations of the Load, Store, and Add instructions for the single-accumulator
case, in which register Ri performs the function of the accumulator.

When a processor has several general-purpose registers, many instructions


involve only operands that are in the register. In fact, in many modern processors,
computations can be performed directly only on data held in processor registers.
Instructions such as
Add Ri, Rj
Or
Add Ri, Rj, Rk
In both of these instructions, the source operands are the contents of registers Ri
and Rj. In the first instruction, Rj also serves as the destination register, whereas in the
second instruction, a third register, Rk, is used as the destination.

It is often necessary to transfer data between different locations. This is achieved


with the instruction
Move Source, Destination
When data are moved to or from a processor register, the Move instruction can be
used rather than the Load or Store instructions because the order of the source and
destination operands determines which operation is intended. Thus,
Move A, Ri
Is the same as
Load A, Ri
And
Move Ri, A
Is the same as
Store Ri, A

In processors where arithmetic operations are allowed only on operands that are
processor registers, the C = A + B task can be performed by the instruction sequence
Move A, Ri
Move B, Rj

23
COMPUTER ORGANIZATION

Add Ri, Rj
Move Rj, C
In processors where one operand may be in the memory but the other must be in
register, an instruction sequence for the required task would be
Move A, Ri
Add B, Ri
Move Ri, C
The speed with which a given task is carried out depends on the time it takes to
transfer instructions from memory into the processor and to access the operands
referenced by these instructions. Transfers that involve the memory are much slower than
transfers within the processor.
We have discussed three-, two-, and one-address instructions. It is also possible
to use instructions in which the locations of all operands are defined implicitly. Such
instructions are found in machines that store operands in a structure called a pushdown
stack. In this case, the instructions are called zero-address instructions.

INSTRUCTION EXECUTION AND STRAIGHT-LINE SEQUENCING:-


In the preceding discussion of instruction formats, we used to task C [A] +
[B]. fig 2.8 shows a possible program segment for this task as it appears in the memory of
a computer. We have assumed that the computer allows one memory operand per
instruction and has a number of processor registers. The three instructions of the program
are in successive word locations, starting at location i. since each instruction is 4 bytes
long, the second and third instructions start at addresses i + 4 and i + 8.
Address Contents

Begin execution here i Move A, 3-instruction


i+4 R0 program
segment
i+8 Move R0, C



A

B Data for the


program

24
COMPUTER ORGANIZATION

Let us consider how this program is executed. The processor contains a register
called the program counter (PC), which holds the address of the instruction to be
executed next. To begin executing a program, the address of its first instruction (I in our
example) must be placed into the PC. Then, the processor control circuits use the
information in the PC to fetch and execute instructions, one at a time, in the order of
increasing addresses. This is called straight-line sequencing. During the execution of each
instruction, the PC is incremented by 4 to point to the next instruction. Thus, after the
Move instruction at location i + 8 is executed, the PC contains the value i + 12, which is
the address of the first instruction of the next program segment.

Executing a given instruction is a two-phase procedure. In the first phase, called


instruction fetch, the instruction is fetched from the memory location whose address is in
the PC. This instruction is placed in the instruction register (IR) in the processor. The
instruction in IR is examined to determine which operation is to be performed. The
specified operation is then performed by the processor. This often involves fetching
operands from the memory or from processor registers, performing an arithmetic or logic
operation, and storing the result in the destination location.

BRANCHING:-
Consider the task of adding a list of n numbers. Instead of using a long list of add
instructions, it is possible to place a single add instruction in a program loop, as shown in
fig b. The loop is a straight-line sequence of instructions executed as many times as
needed. It starts at location LOOP and ends at the instruction Branch > 0. During each
pass through this loop, the address of the next list entry is determined, and that entry is
fetched and added to

Move NUM1, R0 i
Add NUM2, R0 i+4
Add NUM3, R0 i+8

Add NUMn, R0
Move R0, SUM
i+4n-4
…. i+4n
….

….

25
COMPUTER ORGANIZATION

fig a A straight-line program for adding n numbers

Move N, R1
Clear R0
Determine address of
“Next” number and add LOOP
“Next” number to R0
Decrement R1 Program
Branch >0 LOOP loop
Move R0, SUM

…….
…….
…….

……
…...

SUM N
NUM1 NUM2

NUMn
Fig b Using a loop to add n numbers

Assume that the number of entries in the list, n, is stored in memory location N,
as shown. Register R1 is used as a counter to determine the number of time the loop is
executed. Hence, the contents of location N are loaded into register R1 at the beginning
of the program. Then, within the body of the loop, the instruction.
Decrement R1
Reduces the contents of R1 by 1 each time through the loop.
This type of instruction loads a new value into the program counter. As a result,
the processor fetches and executes the instruction at this new address, called the branch
target, instead of the instruction at the location that follows the branch instruction in
sequential address order. A conditional branch instruction causes a branch only if a
specified condition is satisfied. If the condition is not satisfied, the PC is incremented in

26
COMPUTER ORGANIZATION

the normal way, and the next instruction in sequential address order is fetched and
executed.
Branch > 0 LOOP

(branch if greater that 0) is a conditional branch instruction that causes a branch


to location LOOP if the result of the immediately preceding instruction, which is the
decremented value in register R1, is greater that zero. This means that the loop is
repeated, as long as there are entries in the list that are yet to be added to R0. at the end of
the nth pass through the loop, the Decrement instruction produces a value of zero, and
hence, branching does not occur.

CONDITION CODES:-
The processor keeps track of information about the results of various operations
for use by subsequent conditional branch instructions. This is accomplished by recording
the required information in individual bits, often called condition code flags. These flags
are usually grouped together in a special processor register called the condition code
register or status register. Individual condition code flags are set to 1 or cleared to 0,
depending on the outcome of the operation performed.

Four commonly used flags are

N(negative) Set to 1 if the result is negative; otherwise, cleared to 0


Z(zero) Set to 1 if the result is 0; otherwise, cleared to 0
V(overflow) Set ot1 if arithmetic overflow occurs; otherwise, cleared to 0
C(carry) Set to 1 if a carry-out results from the operation; otherwise, cleared to 0

The instruction Branch > 0, discussed in the previous section, is an example of a


branch instruction that tests one or more of the condition flags. It causes a branch if the
value tested is neither negative nor equal to zero. That is, the branch is taken if neither N
nor Z is 1. The conditions are given as logic expressions involving the condition code
flags.

In some computers, the condition code flags are affected automatically by


instructions that perform arithmetic or logic operations. However, this is not always the
case. A number of computers have two versions of an Add instruction.

GENERATING MEMORY ADDRESSES:-


Let us return to fig b. The purpose of the instruction block at LOOP is to add a
different number from the list during each pass through the loop. Hence, the Add
instruction in the block must refer to a different address during each pass. How are the

27
COMPUTER ORGANIZATION

addresses to be specified ? The memory operand address cannot be given directly in a


single Add instruction in the loop. Otherwise, it would need to be modified on each pass
through the loop.

The instruction set of a computer typically provides a number of such methods,


called addressing modes. While the details differ from one computer to another, the
underlying concepts are the same.

28
COMPUTER ORGANIZATION

29
COMPUTER ORGANIZATION

Addressing modes:
In general, a program operates on data that reside in the computer’s memory.
These data can be organized in a variety of ways. If we want to keep track of students’
names, we can write them in a list. Programmers use organizations called data structures
to represent the data used in computations. These include lists, linked lists, arrays,
queues, and so on.

Programs are normally written in a high-level language, which enables the


programmer to use constants, local and global variables, pointers, and arrays. The
different ways in which the location of an operand is specified in an instruction are
referred to as addressing modes.

Table 2.1 Generic addressing modes

Name Assembler syntax Addressing function

Immediate # Value Operand = Value


Register Ri EA = Ri
Absolute (Direct) LOC EA = LOC
Indirect (Ri) EA = [Ri]
(LOC) EA = [LOC]
Index X(Ri) EA = [Ri] + X
Base with index (Ri, Rj) EA = [Ri] + [Rj]
Base with index X (Ri, Rj) EA = [Ri] + [Rj] + X
and offset
Relative X(PC) EA = [PC] + X
Autoincrement (Ri)+ EA = [Ri]; Increment Ri
Autodecrement -(Ri) Decrement Ri; EA = [Ri]

EA = effective address
Value = a signed number

30
COMPUTER ORGANIZATION

IMPLEMENTATION OF VARIABLE AND CONSTANTS:-


Variables and constants are the simplest data types and are found in almost every
computer program. In assembly language, a variable is represented by allocating a
register or memory location to hold its value. Thus, the value can be changed as needed
using appropriate instructions.

Register mode - The operand is the contents of a processor register; the name (address)
of the register is given in the instruction.

Absolute mode – The operand is in a memory location; the address of this location is
given explicitly in the instruction. (In some assembly languages, this mode is called
Direct).

The instruction
Move LOC, R2

Processor registers are used as temporary storage locations where the data is a
register are accessed using the Register mode. The Absolute mode can represent global
variables in a program. A declaration such as
Integer A, B;

Immediate mode – The operand is given explicitly in the instruction.


For example, the instruction
Move 200immediate, R0

Places the value 200 in register R0. Clearly, the Immediate mode is only used to
specify the value of a source operand. Using a subscript to denote the Immediate mode is
not appropriate in assembly languages. A common convention is to use the sharp sign (#)
in front of the value to indicate that this value is to be used as an immediate operand.
Hence, we write the instruction above in the form
Move #200, R0

INDIRECTION AND POINTERS:-


In the addressing modes that follow, the instruction does not give the operand or
its address explicitly, Instead, it provides information from which the memory address of
the operand can be determined. We refer to this address as the effective address (EA) of
the operand.

Indirect mode – The effective address of the operand is the contents of a register or
memory location whose address appears in the instruction.

31
COMPUTER ORGANIZATION

To execute the Add instruction in fig (a), the processor uses the value which is in
register R1, as the effective address of the operand. It requests a read operation from the
memory to read the contents of location B. the value read is the desired operand, which
the processor adds to the contents of register R0. Indirect addressing through a memory
location is also possible as shown in fig (b). In this case, the processor first reads the
contents of memory location A, then requests a second read operation using the value B
as an address to obtain the operand

Fig (a) Through a general-purpose register (b) Through a memory location

Add (A), R0
Add (R1), R0
… …
… …
Main
… …
memory
B
A
Operand …

B …

Operands
R1 B Register
B

Address Contents

Move N, R1
Move #NUM, R2
Clear R0
LOOP Add (R2), R0
Add #4, R2
Decrement R1
Branch > 0 LOOP
Move R0, SUM

The register or memory location that contains the address of an operand is called
a pointer. Indirection and the use of pointers are important and powerful concepts in
programming.

32
COMPUTER ORGANIZATION

In the program shown Register R2 is used as a pointer to the numbers in the list,
and the operands are accessed indirectly through R2. The initialization section of the
program loads the counter value n from memory location N into R1 and uses the
immediate addressing mode to place the address value NUM1, which is the address of the
first number in the list, into R2. Then it clears R0 to 0. The first two instructions in the
loop implement the unspecified instruction block starting at LOOP. The first time
through the loop, the instruction Add (R2), R0 fetches the operand at location NUM1
and adds it to R0. The second Add instruction adds 4 to the contents of the pointer R2, so
that it will contain the address value NUM2 when the above instruction is executed in the
second pass through the loop.

Where B is a pointer variable. This statement may be compiled into


Move B, R1
Move (R1), A
Using indirect addressing through memory, the same action can be achieved with
Move (B), A

Indirect addressing through registers is used extensively. The above program


shows the flexibility it provides. Also, when absolute addressing is not available, indirect
addressing through registers makes it possible to access global variables by first loading
the operand’s address in a register.

INDEXING AND ARRAYS:-


A different kind of flexibility for accessing operands is useful in dealing with
lists and arrays.

Index mode – the effective address of the operand is generated by adding a constant
value to the contents of a register.

The register use may be either a special register provided for this purpose, or,
more commonly, it may be any one of a set of general-purpose registers in the processor.
In either case, it is referred to as index register. We indicate the Index mode symbolically
as
X (Ri)

Where X denotes the constant value contained in the instruction and Ri is the
name of the register involved. The effective address of the operand is given by

EA = X + [Rj]

33
COMPUTER ORGANIZATION

The contents of the index register are not changed in the process of generating
the effective address. In an assembly language program, the constant X may be given
either as an explicit number or as a symbolic name representing a numerical value.

Fig a illustrates two ways of using the Index mode. In fig a, the index register,
R1, contains the address of a memory location, and the value X defines an offset (also
called a displacement) from this address to the location where the operand is found. An
alternative use is illustrated in fig b. Here, the constant X corresponds to a memory
address, and the contents of the index register define the offset to the operand. In either
case, the effective address is the sum of two values; one is given explicitly in the
instruction, and the other is stored in a register.

Fig (a) Offset is given as a constant

Add 20(R1), R2


1000 1000 R1



20 = offset
Operands

1020

Add 1000(R1), R2


1000 20 R1



20 = offset
Operand
1020

Fig (b) Offset is in the index register

34
COMPUTER ORGANIZATION

Move #LIST, R0
Clear R1
Clear R2
Clear R3
Move N, R4
LOOP Add 4(R0), R1
Add 8(R0), R2
Add 12(R0), R3
Add #16, R0
Decrement R4
Branch>0 LOOP
Move R1, SUM1
Move R2, SUM2
Move R3, SUM3

In the most basic form of indexed addressing several variations of this basic
form provide a very efficient access to memory operands in practical programming
situations. For example, a second register may be used to contain the offset X, in which
case we can write the Index mode as

(Ri, Rj)

The effective address is the sum of the contents of registers Ri and Rj. The
second register is usually called the base register. This form of indexed addressing
provides more flexibility in accessing operands, because both components of the effective
address can be changed.

Another version of the Index mode uses two registers plus a constant, which can
be denoted as

X(Ri, Rj)

In this case, the effective address is the sum of the constant X and the contents of
registers Ri and Rj. This added flexibility is useful in accessing multiple components
inside each item in a record, where the beginning of an item is specified by the (Ri, Rj)
part of the addressing mode. In other words, this mode implements a three-dimensional
array.

35
COMPUTER ORGANIZATION

RELATIVE ADDRESSING:-
We have defined the Index mode using general-purpose processor registers. A
useful version of this mode is obtained if the program counter, PC, is used instead of a
general purpose register. Then, X(PC) can be used to address a memory location that is X
bytes away from the location presently pointed to by the program counter.

Relative mode – The effective address is determined by the Index mode using the
program counter in place of the general-purpose register Ri.

This mode can be used to access data operands. But, its most common use is to
specify the target address in branch instructions. An instruction such as

Branch > 0 LOOP

Causes program execution to go to the branch target location identified by the


name LOOP if the branch condition is satisfied. This location can be computed by
specifying it as an offset from the current value of the program counter. Since the branch
target may be either before or after the branch instruction, the offset is given as a signed
number.

Autoincrement mode – the effective address of the operand is the contents of a register
specified in the instruction. After accessing the operand, the contents of this register are
automatically to point to the next item in a list.

(Ri)+

Autodecrement mode – the contents of a register specified in the instruction are first
automatically decremented and are then used as the effective address of the operand.

-(Ri)
Move N, R1
Move #NUM1, R2
Clear R0
LOOP Add (R2)+, R0
Decrement R1
Branch>0 LOOP
Move R0, SUM

Fig c The Autoincrement addressing mode used in the program of fig 2.12

36
COMPUTER ORGANIZATION

ASSEMBLY LANGUAGE
Machine instructions are represented by patterns of 0s and 1s. Such patterns are
awkward to deal with when discussing or preparing programs. Therefore, we use
symbolic names to represent the pattern. So far, we have used normal words, such as
Move, Add, Increment, and Branch, for the instruction operations to represent the
corresponding binary code patterns. When writing programs for a specific computer, such
words are normally replaced by acronyms called mnemonics, such as MOV, ADD, INC,
and BR. Similarly, we use the notation R3 to refer to register 3, and LOC to refer to a
memory location. A complete set of such symbolic names and rules for their use
constitute a programming language, generally referred to as an assembly language.

Programs written in an assembly language can be automatically translated into a


sequence of machine instructions by a program called an assembler. When the assembler
program is executed, it reads the user program, analyzes it, and then generates the desired
machine language program. The latter contains patterns of 0s and 1s specifying
instructions that will be executed by the computer. The user program in its original
alphanumeric text format is called a source program, and the assembled machine
language program is called an object program.

ASSEMBLER DIRECTIVES:-
In addition to providing a mechanism for representing instructions in a program,
the assembly language allows the programmer to specify other information needed to
translate the source program into the object program. We have already mentioned that we
need to assign numerical values to any names used in a program. Suppose that the name
SUM is used to represent the value 200. This fact may be conveyed to the assembler
program through a statement such as

SUM EQU 200

This statement does not denote an instruction that will be executed when the
object program is run; in fact, it will not even appear in the object program. It simply
informs the assembler that the name SUM should be replaced by the value 200 wherever
it appears in the program. Such statements, called assembler directives (or commands),
are used by the assembler while it translates a source program into an object program.

37
COMPUTER ORGANIZATION

Move N, R1 100
Move # NUM1,R2 104
Clear R0
Add (R2), R0 108
Add #4, R2 LOOP 112
Decrement R1 116
Branch>0 LOOP
Move R0, SUM 120
124
…. 128
….
…. 132

100

SUM 200
….
N
….
204
NUM1 208 NUM2 212

NUMn 604

Fig 2.17 Memory arrangement for the program in fig b.

ASSEMBLY AND EXECUTION OF PROGRAMS:-


A source program written in an assembly language must be assembled into a
machine language object program before it can be executed. This is done by the
assembler program, which replaces all symbols denoting operations and addressing
modes with the binary codes used in machine instructions, and replaces all names and
labels with their actual values.

The assembler assigns addresses to instructions and data blocks, starting at the
address given in the ORIGIN assembler directives. It also inserts constants that may be
given in DATAWORD commands and reserves memory space as requested by
RESERVE commands.

As the assembler scans through a source programs, it keeps track of all names
and the numerical values that correspond to them in a symbol table. Thus, when a name
appears a second time, it is replaced with its value from the table. A problem arises when

38
COMPUTER ORGANIZATION

a name appears as an operand before it is given a value. For example, this happens if a
forward branch is required. A simple solution to this problem is to have the assembler
scan through the source program twice. During the first pass, it creates a complete
symbol table. At the end of this pass, all names will have been assigned numerical values.
The assembler then goes through the source program a second time and substitutes values
for all names from the symbol table. Such an assembler is called a two-pass assembler.

The assembler stores the object program on a magnetic disk. The object program
must be loaded into the memory of the computer before it is executed. For this to happen,
another utility program called a loader must already be in the memory.

When the object program begins executing, it proceeds to completion unless


there are logical errors in the program. The user must be able to find errors easily. The
assembler can detect and report syntax errors. To help the user find other programming
errors, the system software usually includes a debugger program. This program enables
the user to stop execution of the object program at some points of interest and to examine
the contents of various processor registers and memory locations.

NUMBER NOTATION:-
When dealing with numerical values, it is often convenient to use the familiar
decimal notation. Of course, these values are stored in the computer as binary numbers.
In some situations, it is more convenient to specify the binary patterns directly. Most
assemblers allow numerical values to be specified in different ways, using conventions
that are defined by the assembly language syntax. Consider, for example, the number 93,
which is represented by the 8-bit binary number 01011101. If this value is to be used an
immediate operand, it can be given as a decimal number, as in the instructions.

ADD #93, R1

Or as a binary number identified by a prefix symbol such as a percent sign, as in

ADD #%01011101, R1

Binary numbers can be written more compactly as hexadecimal, or hex, numbers,


in which four bits are represented by a single hex digit. In hexadecimal representation,
the decimal value 93 becomes 5D. In assembly language, a hex representation is often
identified by a dollar sign prefix. Thus, we would write

ADD #$5D, R1

39
COMPUTER ORGANIZATION

Basic input/output operations


We now examine the means by which data are transferred between the memory
of a computer and the outside world. Input/Output (I/O) operations are essential, and the
way they are performed can have a significant effect on the performance of the computer.

Consider a task that reads in character input from a keyboard and produces
character output on a display screen. A simple way of performing such I/O tasks is to use
a method known as program-controlled I/O. The rate of data transfer from the keyboard
to a computer is limited by the typing speed of the user, which is unlikely to exceed a few
characters per second. The rate of output transfers from the computer to the display is
much higher. It is determined by the rate at which characters can be transmitted over the
link between the computer and the display device, typically several thousand characters
per second. However, this is still much slower than the speed of a processor that can
execute many millions of instructions per second. The difference in speed between the
processor and I/O devices creates the need for mechanisms to synchronize the transfer of
data between them.

Bus

Processor DATAIN DATAOUT

SIN SOUT

Keyboard Display

Fig a Bus connection for processor, keyboard, and display

The keyboard and the display are separate device as shown in fig a. the action of
striking a key on the keyboard does not automatically cause the corresponding character
to be displayed on the screen. One block of instructions in the I/O program transfers the
character into the processor, and another associated block of instructions causes the
character to be displayed.

Striking a key stores the corresponding character code in an 8-bit buffer register
associated with the keyboard. Let us call this register DATAIN, as shown in fig a. To

40
COMPUTER ORGANIZATION

inform the processor that a valid character is in DATAIN, a status control flag, SIN, is set
to 1. A program monitors SIN, and when SIN is set to 1, the processor reads the contents
of DATAIN. When the character is transferred to the processor, SIN is automatically
cleared to 0. If a second character is entered at the keyboard, SIN is again set to 1, and the
processor repeats.

An analogous process takes place when characters are transferred from the
processor to the display. A buffer register, DATAOUT, and a status control flag, SOUT,
are used for this transfer. When SOUT equals 1, the display is ready to receive a
character.

In order to perform I/O transfers, we need machine instructions that can check
the state of the status flags and transfer data between the processor and the I/O device.
These instructions are similar in format to those used for moving data between the
processor and the memory. For example, the processor can monitor the keyboard status
flag SIN and transfer a character from DATAIN to register R1 by the following sequence
of operations.

Stacks and queues


A computer program often needs to perform a particular subtask using the
familiar subroutine structure. In order to organize the control and information linkage
between the main program and the subroutine, a data structure called a stack is used. This
section will describe stacks, as well as a closely related data structure called a queue.

Data operated on by a program can be organized in a variety of ways. We have


already encountered data structured as lists. Now, we consider an important data structure
known as a stack. A stack is a list of data elements, usually words or bytes, with the
accessing restriction that elements can be added or removed at one end of the list only.
This end is called the top of the stack, and the other end is called the bottom. Another
descriptive phrase, last-in-first-out (LIFO) stack, is also used to describe this type of
storage mechanism; the last data item placed on the stack is the first one removed when
retrieval begins. The terms push and pop are used to describe placing a new item on the
stack and removing the top item from the stack, respectively.

41
COMPUTER ORGANIZATION

Fig b shows a stack of word data items in the memory of a computer. It contains
numerical values, with 43 at the bottom and -28 at the top. A processor register is used to
keep track of the address of the element of the stack that is at the top at any given time.
This register is called the stack pointer (SP). It could be one of the general-purpose
registers or a register dedicated to this function.
Fig b A stack of words in the memory

0
….
…. Stack pointer
….
-28 register
17 SP ®
739 Current
….
….
…. Stack
43
….
….
….
BOTTOMBottom element

2k-1

Another useful data structure that is similar to the stack is called a queue. Data
are stored in and retrieved from a queue on a first-in-first-out (FIFO) basis. Thus, if we
assume that the queue grows in the direction of increasing addresses in the memory,
which is a common practice, new data are added at the back (high-address end) and
retrieved from the front (low-address end) of the queue.

There are two important differences between how a stack and a queue are
implemented. One end of the stack is fixed (the bottom), while the other end rises and
falls as data are pushed and popped. A single pointer is needed to point to the top of the
stack at any given time. On the other hand, both ends of a queue move to higher
addresses as data are added at the back and removed from the front. So two pointers are
needed to keep track of the two ends of the queue.

42
COMPUTER ORGANIZATION

Another difference between a stack and a queue is that, without further control, a
queue would continuously move through the memory of a computer in the direction of
higher addresses. One way to limit the queue to a fixed region in memory is to use a
circular buffer. Let us assume that memory addresses from BEGINNING to END are
assigned to the queue. The first entry in the queue is entered into location BEGINNING,
and successive entries are appended to the queue by entering them at successively higher
addresses. By the time the back of the queue reaches END, space will have been created
at the beginning if some items have been removed from the queue. Hence, the back
pointer is reset to the value BEGINNING and the process continues. As in the case of a
stack, care must be taken to detect when the region assigned to the data structure is either
completely full or completely empty.

Subroutines
In a given program, it is often necessary to perform a particular subtask many
times on different data-values. Such a subtask is usually called a subroutine. For example,
a subroutine may evaluate the sine function or sort a list of values into increasing or
decreasing order.

It is possible to include the block of instructions that constitute a subroutine at


every place where it is needed in the program. However, to save space, only one copy of
the instructions that constitute the subroutine is placed in the memory, and any program
that requires the use of the subroutine simply branches to its starting location. When a
program branches to a subroutine we say that it is calling the subroutine. The instruction
that performs this branch operation is named a Call instruction.

After a subroutine has been executed, the calling program must resume
execution, continuing immediately after the instruction that called the subroutine. The
subroutine is said to return to the program that called it by executing a Return instruction.

The way in which a computer makes it possible to call and return from
subroutines is referred to as its subroutine linkage method. The simplest subroutine
linkage method is to save the return address in a specific location, which may be a
register dedicated to this function. Such a register is called the link register. When the
subroutine completes its task, the Return instruction returns to the calling program by
branching indirectly through the link register.

The Call instruction is just a special branch instruction that performs the
following operations

• Store the contents of the PC in the link register

43
COMPUTER ORGANIZATION

• Branch to the target address specified by the instruction


The Return instruction is a special branch instruction that performs the operation
• Branch to the address contained in the link register .

Fig a illustrates this procedure

Memory Memory
Location Calling program location Subroutine SUB
….
….
200 Call SUB 1000 first instruction
204 next instruction ….
…. ….
…. Return
….

1000

204
PC

Link 204

Call Return

Fig b Subroutine linkage using a link register

SUBROUTINE NESTING AND THE PROCESSOR STACK:-


A common programming practice, called subroutine nesting, is to have one
subroutine call another. In this case, the return address of the second call is also stored in
the link register, destroying its previous contents. Hence, it is essential to save the
contents of the link register in some other location before calling another subroutine.
Otherwise, the return address of the first subroutine will be lost.

Subroutine nesting can be carried out to any depth. Eventually, the last
subroutine called completes its computations and returns to the subroutine that called it.
The return address needed for this first return is the last one generated in the nested call

44
COMPUTER ORGANIZATION

sequence. That is, return addresses are generated and used in a last-in-first-out order. This
suggests that the return addresses associated with subroutine calls should be pushed onto
a stack. A particular register is designated as the stack pointer, SP, to be used in this
operation. The stack pointer points to a stack called the processor stack. The Call
instruction pushes the contents of the PC onto the processor stack and loads the
subroutine address into the PC. The Return instruction pops the return address from the
processor stack into the PC.

PARAMETER PASSING:-
When calling a subroutine, a program must provide to the subroutine the
parameters, that is, the operands or their addresses, to be used in the computation. Later,
the subroutine returns other parameters, in this case, the results of the computation. This
exchange of information between a calling program and a subroutine is referred to as
parameter passing. Parameter passing may be accomplished in several ways. The
parameters may be placed in registers or in memory locations, where they can be
accessed by the subroutine. Alternatively, the parameters may be placed on the processor
stack used for saving the return address.

The purpose of the subroutines is to add a list of numbers. Instead of passing the
actual list entries, the calling program passes the address of the first number in the list.
This technique is called passing by reference. The second parameter is passed by value,
that is, the actual number of entries, n, is passed to the subroutine.

THE STACK FRAME:-


Now, observe how space is used in the stack in the example. During execution of
the subroutine, six locations at the top of the stack contain entries that are needed by the
subroutine. These locations constitute a private workspace for the subroutine, created at
the time the subroutine is entered and freed up when the subroutine returns control to the
calling program. Such space is called a stack frame.
Fig a A subroutine stack frame example.
Saved [R1] SP
Saved [R0] (stack pointer)
Localvar3
Localvar2 Stack
Localvar1 frame
Saved [FP]
Return address for
Param1 FP called
Param2 (frame pointer) subroutine
P 3

45
COMPUTER ORGANIZATION

Old TOS

fig b shows an example of a commonly used layout for information in a stack


frame. In addition to the stack pointer SP, it is useful to have another pointer register,
called the frame pointer (FP), for convenient access to the parameters passed to the
subroutine and to the local memory variables used by the subroutine. These local
variables are only used within the subroutine, so it is appropriate to allocate space for
them in the stack frame associated with the subroutine. We assume that four parameters
are passed to the subroutine, three local variables are used within the subroutine, and
registers R0 and R1 need to be saved because they will also be used within the
subroutine.

The pointers SP and FP are manipulated as the stack frame is built, used, and
dismantled for a particular of the subroutine. We begin by assuming that SP point to the
old top-of-stack (TOS) element in fig b. Before the subroutine is called, the calling
program pushes the four parameters onto the stack. The call instruction is then executed,
resulting in the return address being pushed onto the stack. Now, SP points to this return
address, and the first instruction of the subroutine is about to be executed. This is the
point at which the frame pointer FP is set to contain the proper memory address. Since FP
is usually a general-purpose register, it may contain information of use to the Calling
program. Therefore, its contents are saved by pushing them onto the stack. Since the SP
now points to this position, its contents are copied into FP.

Thus, the first two instructions executed in the subroutine are

Move FP, -(SP)


Move SP, FP

After these instructions are executed, both SP and FP point to the saved FP contents.

Subtract #12, SP

Finally, the contents of processor registers R0 and R1 are saved by pushing them
onto the stack. At this point, the stack frame has been set up as shown in the fig.

The subroutine now executes its task. When the task is completed, the subroutine
pops the saved values of R1 and R0 back into those registers, removes the local variables
from the stack frame by executing the instruction.

Add #12, SP

46
COMPUTER ORGANIZATION

And pops the saved old value of FP back into FP. At this point, SP points to the
return address, so the Return instruction can be executed, transferring control back to the
calling program.
Logic instructions

Logic operations such as AND, OR, and NOT, applied to individual bits, are the basic
building blocks of digital circuits, as described. It is also useful to be able to perform
logic operations is software, which is done using instructions that apply these operations
to all bits of a word or byte independently and in parallel. For example, the instruction

Not dst

SHIFT AND ROTATE INSTRUCTIONS:-


There are many applications that require the bits of an operand to be shifted right
or left some specified number of bit positions. The details of how the shifts are performed
depend on whether the operand is a signed number or some more general binary-coded
information. For general operands, we use a logical shift. For a number, we use an
arithmetic shift, which preserves the sign of the number.

Logical shifts:-
Two logical shift instructions are needed, one for shifting left (LShiftL) and
another for shifting right (LShiftR). These instructions shift an operand over a number of
bit positions specified in a count operand contained in the instruction. The general form
of a logical left shift instruction is

LShiftL count, dst

(a) Logical shift left LShiftL #2, R0

R0 0

0 0 1 1 1 0 . . . 0 1 1
before :

after: 1 1 1 0 . . . 0 1 1 00

47
COMPUTER ORGANIZATION

(b) Logical shift right LShiftR #2, R0

R0 C

Before: 0 1 1 1 0 . . . 0 1 1 0

1
0 0 0 1 1 1 0 . . . 0
After:

( c) Arithmetic shift right AShiftR #2, R0

R0 C

Before: 1 0 0 1 1 . . . 0 1 0 0

1 1 1 0 0 1 1 . . . 0 1
After:

Rotate Operations:-
In the shift operations, the bits shifted out of the operand are lost, except for the
last bit shifted out which is retained in the Carry flag C. To preserve all bits, a set of
rotate instructions can be used. They move the bits that are shifted out of one end of the
operand back into the other end. Two versions of both the left and right rotate instructions

48
COMPUTER ORGANIZATION

are usually provided. In one version, the bits of the operand are simply rotated. In the
other version, the rotation includes the C flag.

(a) Rotate left without carry RotateL #2, R0

C
R0

Before: 0 0 1 1 1 0 . . . 0 1 1

After: 1 1 1 0 . . . 0 1 1 0 1

(b) Rotate left with carry RotateLC #2, R0

C R0

0 1 1 1 0 . . . 0 11
Before: 0

1 1 1 0 . . 0 1 1 0 0
after:

49
COMPUTER ORGANIZATION

(c ) Rotate right without carry RotateR #2, R0

C
R0

Before: 0 1 1 1 0 . . . 0 1 1 0

1 1 0 1 1 1 0 . . . 0 1
After:

(d) Rotate right with carry RotateRC #2, R0

R0 C

Before: 0 1 1 1 0 . . . 0 1 1 0

after: 1 0 0 1 1 1 0 . . . 0 1

Encoding of machine instructions


We have introduced a variety of useful instructions and addressing modes. These
instructions specify the actions that must be performed by the processor circuitry to carry
out the desired tasks. We have often referred to them as machine instructions. Actually,
the form in which we have presented the instructions is indicative of the form used in
assembly languages, except that we tried to avoid using acronyms for the various
operations, which are awkward to memorize and are likely to be specific to a particular
commercial processor. To be executed in a processor, an instruction must be encoded in a
compact binary pattern. Such encoded instructions are properly referred to as machine
instructions. The instructions that use symbolic names and acronyms are called assembly

50
COMPUTER ORGANIZATION

language instructions, which are converted into the machine instructions using the
assembler program.

We have seen instructions that perform operations such as add, subtract, move,
shift, rotate, and branch. These instructions may use operands of different sizes, such as 32-
bit and 8-bit numbers or 8-bit ASCII-encoded characters. The type of operation that is to
be performed and the type of operands used may be specified using an encoded binary pattern
referred to as the OP code for the given instruction. Suppose that 8 bits are allocated for
this purpose, giving 256 possibilities for specifying different instructions. This leaves 24
bits to specify the rest of the required information.

Let us examine some typical cases. The instruction


Add R1, R2

Has to specify the registers R1 and R2, in addition to the OP code. If the processor has 16
registers, then four bits are needed to identify each register. Additional bits are needed to
indicate that the Register addressing mode is used for each operand.
The instruction
Move 24(R0), R5

Requires 16 bits to denote the OP code and the two registers, and some bits to express
that the source operand uses the Index addressing mode and that the index value is 24.
The shift instruction
LShiftR #2, R0

And the move instruction


Move #$3A, R1
Have to indicate the immediate values 2 and #$3A, respectively, in addition to the 18
bits used to specify the OP code, the addressing modes, and the register. This limits the
size of the immediate operand to what is expressible in 14 bits.
Consider next the branch instruction
Branch >0 LOOP

Again, 8 bits are used for the OP code, leaving 24 bits to specify the branch
offset. Since the offset is a 2’s-complement number, the branch target address must be
within 223 bytes of the location of the branch instruction. To branch to an instruction
outside this range, a different addressing mode has to be used, such as Absolute or
Register Indirect. Branch instructions that use these modes are usually called Jump
instructions.

51
COMPUTER ORGANIZATION

In all these examples, the instructions can be encoded in a 32-bit word. Depicts a
possible format. There is an 8-bit Op-code field and two 7-bit fields for specifying the
source and destination operands. The 7-bit field identifies the addressing mode and the
register involved (if any). The “Other info” field allows us to specify the additional
information that may be needed, such as an index value or an immediate operand.

But, what happens if we want to specify a memory operand using the Absolute
addressing mode? The instruction

Move R2, LOC

(a) One-word instruction

Opcode Source Dest Other info

(b) Two-Word instruction

Opcode Source Dest Other info

Memory address/Immediate operand

(c ) Three-operand instruction

Op code Ri Rj Rk Other info

Requires 18 bits to denote the OP code, the addressing modes, and the register.
This leaves 14 bits to express the address that corresponds to LOC, which is clearly
insufficient.

And #$FF000000. R2

In which case the second word gives a full 32-bit immediate operand.

If we want to allow an instruction in which two operands can be specified using


the Absolute addressing mode, for example

52
COMPUTER ORGANIZATION

Move LOC1, LOC2

Then it becomes necessary to use tow additional words for the 32-bit addresses of
the operands.

This approach results in instructions of variable length, dependent on the number


of operands and the type of addressing modes used. Using multiple words, we can
implement quite complex instructions, closely resembling operations in high-level
programming languages. The term complex instruction set computer (CISC) has been
used to refer to processors that use instruction sets of this type.

The restriction that an instruction must occupy only one word has led to a style of
computers that have become known as reduced instruction set computer (RISC). The
RISC approach introduced other restrictions, such as that all manipulation of data must be
done on operands that are already in processor registers. This restriction means that the
above addition would need a two-instruction sequence

Move (R3), R1
Add R1, R2

If the Add instruction only has to specify the two registers, it will need just a
portion of a 32-bit word. So, we may provide a more powerful instruction that uses three
operands

Add R1, R2, R3

Which performs the operation

R3 ®[R1] + [R2]

A possible format for such an instruction in shown in fig c. Of course, the


processor has to be able to deal with such three-operand instructions. In an instruction set
where all arithmetic and logical operations use only register operands, the only memory
references are made to load/store the operands into/from the processor registers.
RISC-type instruction sets typically have fewer and less complex instructions
than CISC-type sets. We will discuss the relative merits of RISC and CISC approaches in
Chapter 8, which deals with the details of processor design.

53
UNIT-2
Processing Unit:-

Register Transfers
 Computer registers are designated by capital letters (sometimes followed by numerals) to
denote the function of the register.
 For example, the register that holds an address for the memory unit is usually called a
memory address register and is designated by the name MAR.
 Other designations for registers are PC (for program counter), IR (for instruction register,
and R1 (for processor register).
 The individual flip-flops in an n-bit register are numbered in sequence from 0 through n −
1, starting from 0 in the rightmost position and increasing the numbers toward the left.
Figure 2-1 shows the representation of registers in block diagram form.
 The most common way to represent a register is by a rectangular box with the name of
the register inside, as in Fig. 2-1(a).
 The individual bits can be distinguished as in (b).
 The numbering of bits in a 16-bit register can be marked on top of the box as shown in
(c).
 A 16-bit register is partitioned into two parts in (d).
 Bits 0 through 7 are assigned the symbol L (for low byte) and bits 8 through 15 are
assigned the symbol H (for high byte).
 The name of the 16-bit register is PC. The symbol PC (0−7) or PC(L) refers to the low-
order byte and PC(8−15) or PC(H) to the high-order byte.
 Information transfer from one register to another is designated in symbolic form by
means of a replacement operator. The statement denotes a transfer of the content of R2 ←
R1 register R1 into register R2.
 It designates a replacement of the content of R2 by the content of R1. By definition, the
content of the source register R1 does not change after the transfer.
 A statement that specifies a register transfer implies that circuits are available from the
outputs of the source register to the inputs of the destination register and that the
destination register has a parallel load capability. Normally, we want the transfer to
UNIT-2
 Occur only under a predetermined control condition. This can be shown by means of an
if-then statement.

If (P = 1) then (R2 ← R1)

 where P is a control signal generated in the control section. It is sometimes convenient to


separate the control variables from the register transfer operation by specifying a control
function. A control function is a Boolean variable that is equal to 1 or 0. The control
function is included in the statement as follows

P : R2 ← R1 R1

 The control condition is terminated with a colon. It symbolizes the requirement that the
transfer operation be executed by the hardware only if P = 1.
 Every statement written in a register transfer notation implies a hardware construction for
implementing the transfer. Figure 2.2 shows the block diagram that depicts the transfer
from R1 to R2.
 The letter n will be used to indicate any number of bits for the register. It will be replaced
by an actual number when the length of the register is known. Register R2 has a load
input that is activated by the control variable P.
 It is assumed that the control variable is synchronized with the same clock as the one
applied to the register.
 As shown in the timing diagram, P is activated in the control section by the rising edge of
a clock pulse at time t.
UNIT-2
 The next positive transition of the clock at time t + 1 finds the load input active and the
data inputs of R2 are then loaded into the register in parallel.
 P may go back to 0 at time t + 1; otherwise, the transfer will occur with every clock pulse
transition while P remains active.
 Note that the clock is not included as a variable in the register transfer statements. It is
assumed that all transfers occur during a clock edge transition.
 Even though the control condition such as P becomes active just after time t, the actual
transfer dose not occur until the register is triggered by the next positive transition of the
clock at time
 t + 1.
 The basic symbols of the register transfer notation are listed in Table 2-1. Registers are
denoted by capital letters, and numbers may follow the letters.
 Parentheses are used to denote a part of a register by specifying the range of bits or by
giving a symbol name to a portion of a register. The arrow denotes a transfer of
information and the direction of transfer. A comma is used to separate two or more
operations that are executed at the same time.

T: R2 ← R1, R1 ←R2

The statement denotes an operation that exchanges the contents of two registers during
one common clock pulse provided that T = 1. This simultaneous operation is possible
with registers that have edge-triggered flip-flops.
UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2

HARDWIRED CONTROL:
 To execute instructions, the processor must have some means of generating the control
signals needed in the proper sequence.
 Computer designers use a wide variety of techniques to solve this problem. The
approaches used fall into one of two categories:
 hardwired control and micro programmed control.
 Consider the sequence of control signals given in Figure 7.Each step in this sequence is
completed in one clock period.

Fig 7
 A counter may be used to keep track of the control steps, as shown in Figure 11. Each
state, or count, of this counter corresponds to one control step. The required control
signals are determined by the following information:
1. Contents of the control step counter
2. Contents of the instruction register
3. Contents of the condition code flags
UNIT-2
4. External input signals, such as MFC and interrupt requests

Fig 11
 To gain insight into the structure of the control unit, we start with a simplified view of the
hardware involved.
 The decoder/encoder block in Figure 11 is a combinational circuit that generates the
required control outputs, depending on the state of all its inputs. By separating the
decoding and encoding functions, we obtain the more detailed block diagram in Figure
12.
 The step decoder provides a separate signal line for each step, or time slot, in the control
sequence.
 Similarly, the output of the instruction decoder consists of a separate line for each
machine instruction.
 For any instruction loaded in the IR, one of the output lines INS1 through INSm is set to
1, and all other lines are set to 0.
 The input signals to the encoder block in Figure 12 are combined to generate the
individual control signals Yin,PCout, Add, End, and so on.
 An example of how the encoder generates the Zin control signal for the processor
organization in Figure 2 is given in Figure 13. This circuit implements the logic function
Zin=T1+T6 - ADD + T4-BR+---
 This signal is asserted during time slot Ti for all instructions, during T6 for an Add
instruction, during T4 for an unconditional branch instruction, and so on. The logic
function for Zin is derived from the control sequences in Figures 7 and 8.
 As another example, Figure 14 gives a circuit that generates the End control signal from
the logic
 function
End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) • BRN + • • •
 The End signal starts a new instruction fetch cycle by resetting the control step counter to
its starting value. Figure 12 contains another control signal called RUN. When
UNIT-2

set to 1, RUN causes the counter to be incremented by one at the end of every clock cycle. When
RUN is equal to 0, the counter stops counting. This is needed whenever the WMFC signal is
issued, to cause the processor to wait for the reply from the memory

Fig 13a

 The control hardware shown can be viewed as a state machine that changes from one
state to another in every clock cycle, depending on the contents of the instruction register,
the condition codes, and the external inputs.
 The outputs of the state machine are the control signals. The sequence of operations
carried out by this machine is determined by the wiring of the logic elements, hence the
name "hardwired." A controller that uses this approach can operate at high speed.
However, it has little flexibility, and the complexity of the instruction set it can
implement is limited.
UNIT-2

Fig 13 b

Fetching a word from memory:


CPU transfers the address of the needed information word to the memory address register
(MAR). Address of the needed word is transferred to the primary memory.

o In the meantime, the CPU uses the control lines of the memory bus to mention that a read
operation is needed.
o After issuing this request, the CPU waits till it retains an answer from the memory,
informing it that the required function has been finished. It is accomplished through the
use of another control signal on the memory bus, which will be denoted as Memory
Function Completed (MFC).
o The memory sets this signal to one to mention that the contents of the particular location
in the memory have been read and are available on the data lines of the memory bus.
o We will suppose that as soon as the MFC signal is set to one, the information on the data
lines is loaded into

MDR and is therefore available for use inside the CPU. It finishes the memory fetch operation.
UNIT-2

The actions required for instruction Move (R1), R2 are:

MAR - [R1]

o Begin Read operation on the memory bus


o Wait for the response of the MFC from the memory
o Load MDR from the memory bus
o R2 - [MDR]

Signals activated for that problem are:

o WMFC MDRout
o R1out, MARin,
o Read MDRinE, , R2in

Execution of Complete Instruction


UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2

MICROPROGRAMMED CONTROL
• Control-signals are generated by a program similar to machine language programs.
• Control word(CW) is a word whose individual bits represent various control-signals(like
Add, End, Zin). {Each of the control-steps in control sequence of an instruction defines a
unique combination of 1s & 0s in the CW}.
• Individual control-words in microroutine are referred to as microinstructions.
• A sequence of CWs corresponding to control-sequence of a machine instruction constitutes the
microroutine.
• The microroutines for all instructions in the instruction-set of a computer are stored in a
special memory calledthe control store(CS).
• Control-unit generates control-signals for any instruction by sequentially reading CWs of
correspondingmicroroutine from CS.
• Microprogram counter(µPC) is used to read CWs sequentially from CS.
• Every time a new instruction is loaded into IR, output of "starting address generator" is loaded
into µPC.
• Then, µPC is automatically incremented by clock causing successive microinstructions to be
read from CS. Hence, control-signals are delivered to various parts of processor in correct
sequence.]
UNIT-2

ORGANIZATION OF MICROPROGRAMMED CONTROL UNIT (TO SUPPORT


CONDITIONAL BRANCHING)
• In case of conditional branching, microinstructions specify which of the external inputs,
condition-codes shouldbe checked as a condition for branching to take place.
• The starting and branch address generator block loads a new address into µPC when a
microinstruction instructsit to do so.
• To allow implementation of a conditional branch, inputs to this block consist of
→ external inputs and condition-codes
→ contents of IR
• µPC is incremented every time a new microinstruction is fetched from microprogram
memory except in followingsituations
i) When a new instruction is loaded into IR, µPC is loaded with starting-address of
microroutine for thatinstruction.
ii) When a Branch microinstruction is encountered and branch condition is
satisfied, µPC is loaded withbranch-address.
iii) When an End microinstruction is encountered, µPC is loaded with address of first
CW in microroutine forinstruction fetch cycle.
UNIT-2

MICROINSTRUCTIONS

• Drawbacks of microprogrammed control:


1) Assigning individual bits to each control-signal results in long microinstructions
because the number of required signals is usually large.
2) Available bit-space is poorly used because only a few bits are set to 1 in any given
microinstruction.
• Solution: Signals can be grouped because
1) Most signals are not needed simultaneously.
2) Many signals are mutually exclusive.
• Grouping control-signals into fields requires a little more hardware because decoding-circuits
must be used to decode bit patterns of each field into individual control signals.
• Advantage: This method results in a smaller control-store (only 20 bits are needed to store
the patterns for the42 signals).
Vertical organization Horizontal organization
Highly encoded schemes that use compact The minimally encoded scheme in which
codes to specify only a small number of many resources can be controlled with a
control functions in each microinstruction single microinstuction is called a horizontal
arereferred to as a vertical organization organization

This approach results in considerably This approach is useful when a higher


slower operating speeds because more operating speed is desired and when the
micrinstructions are needed to perform the machine structure allows parallel use of
desired control functions resources
UNIT-2
Microinstruction

MICROPROGRAM SEQUENCING
• Two major disadvantage of microprogrammed control is:
1) Having a separate microroutine for each machine instruction results in a
large total number ofmicroinstructions and a large control-store.
2) Execution time is longer because it takes more time to carry out the required
branches.
 Consider the instruction Add src,Rdst ;which adds the source-operand to the contents of
Rdst and places thesum in Rdst.
 Let source-operand can be specified in following addressing modes:
register,autoincrement, autodecrement andindexed as well as the indirect forms of these 4
modes.
 Each box in the chart corresponds to a microinstruction that controls the transfers and
operations indicatedwithin the box.
 The microinstruction is located at the address indicated by the octal number (001,002).
UNIT-2
UNIT-2
WIDE BRANCH ADDRESSING
• The instruction-decoder(InstDec) generates the starting-address of the microroutine
that implements theinstruction that has just been loaded into the IR.
• Here, register IR contains the Add instruction, for which the instruction decoder generates
the microinstruction address 101. (However, this address cannot be loaded as is into the
μPC).
The source-operand can be specified in any of several addressing-modes. The bit-ORing
technique can be usedto modify the starting-address generated by the instruction-decoder to reach
the appropriate path.
Use of WMFC
• WMFC signal is issued at location 112 which causes a branch to the microinstruction in
location 171.
• WMFC signal means that the microinstruction may take several clock cycles to complete.
If the branch is allowed to happen in the first clock cycle, the microinstruction at location
171 would be fetched and executed prematurely. To avoid this problem, WMFC signal
must inhibit any change in the contents of the μPC during the waiting-period.
Detailed Examination
• Consider Add (Rsrc)+,Rdst; which adds Rsrc content to Rdst content, then stores the
sum in Rdst and finallyincrements Rsrc by 4 (i.e. auto-increment mode).
• In bit 10 and 9, bit-patterns 11, 10, 01 and 00 denote indexed, auto-decrement, auto-
increment and register modes respectively. For each of these modes, bit 8 is used to
specify the indirect version.
• The processor has 16 registers that can be used for addressing purposes; each specified using a
4-bit-code.
• There are 2 stages of decoding:
1) The microinstruction field must be decoded to determine that an Rsrc or Rdst register
is involved.
2) The decoded output is then used to gate the contents of the Rsrc or Rdst fields in
the IR into a second decoder, which produces the gating-signals for the actual
registers R0 to R15.
UNIT-2
UNIT-2
MICROINSTRUCTIONS WITH NEXT-ADDRESS FIELDS
UNIT-2

• The microprogram requires several branch microinstructions which perform no useful


operation. Thus, they detract from the operating speed of the computer.
• Solution: Include an address-field as a part of every microinstruction to indicate the
location of the next microinstruction to be fetched. (This means every microinstruction
becomes a branch microinstruction).
• The flexibility of this approach comes at the expense of additional bits for the address-field.
• Advantage: Separate branch microinstructions are virtually eliminated. There are few
limitations in assigning addresses to microinstructions. There is no need for a counter to
keep track of sequential addresse. Hence, the μPC is replaced with a μAR
(Microinstruction Address Register). {which is loaded from the next-address field in each
microinstruction}.
• The next-address bits are fed through the OR gate to the μAR, so that the address can be
modified on the basis of the data in the IR, external inputs and condition-codes.
• The decoding circuits generate the starting-address of a given microroutine on the basis of the
opcode in the IR.

PREFETCHING MICROINSTRUCTIONS
• Drawback of microprogrammed control: Slower operating speed because of the time it
takes to fetchmicroinstructions from the control-store.
• Solution: Faster operation is achieved if the next microinstruction is pre-fetched while
the current one is beingexecuted.
Emulation
• The main function of microprogrammed control is to provide a means for simple,
flexible and relativelyinexpensive execution of machine instruction.
• Its flexibility in using a machine's resources allows diverse classes of instructions to be
implemented.
• Suppose we add to the instruction-repository of a given computer M1, an entirely new set
of instructions that isin fact the instruction-set of a different computer M2.
UNIT-2
• Programs written in the machine language of M2 can be then be run on computer M1 i.e. M1
emulates M2.
• Emulation allows us to replace obsolete equipment with more up-to-date machines.
• If the replacement computer fully emulates the original one, then no software changes
have to be made to runexisting programs.
• Emulation is easiest when the machines involved have similar architectures.
UNIT-III PROCESS MANAGEMENT
1. Process Concept
TheProcess
A process is a program in execution. A process is more than the program code, which is
sometimes known as the text section.It also includes the current activity, as represented by the value of
the programcounter and the contents of the processor's registers. A process generally alsoincludes the
process stack, which contains temporary data (such as functionparameters, return addresses, and local
variables), and a data section, whichcontains global variables. A process may also include a heap,
which is memorythat is dynamically allocated during process run time. The structure of a process in
memory is shown in Figure.

a program is a passive entity, such as a file containing a list of instructions stored on disk (often
called an executable file), whereas a process is an active entity, with a program counter specifying the
next instruction to execute and a set of associated resources. a program becomes a process when an
executable file is loaded into memory.
Process State
As a process executes, it changes state. The state of a process is defined in part by the current
activity of that process. Each process may be in one of the following states:
• New. The process is being created.
• Running. Instructions are being executed.
• Waiting. The process is waiting for some event to occur (such as an I/O completion or reception of a
signal).
• Ready. The process is waiting to be assigned to a processor.
• Terminated. The process has finished execution.
It is important to realize that only one process can be running on any processor at any instant.
Many processes may be ready and limiting, however. The state diagram corresponding to these states
is presented in Figure.
Process Control Block
Each process is represented in the operating system by a process control block(PCB)—also
called a task control block. A PCB is shown in Figure . It contains many pieces of information associated
with a specific process, including these:

• Process state. The state may be new, ready, running, waiting, halted, and so on.
• Program counter. The counter indicates the address of the next instruction to be executed for this
process.
• CPU registers. The registers vary in number and type, depending on the computer architecture. They
include accumulators, index registers, stack pointers, and general-purpose registers, plus any condition-
code information. Along with the program counter, this state information must be saved when an
interrupt occurs, to allow the process to be continued correctly afterward.
• CPU-scheduling information. This information includes a process priority, pointers to scheduling
queues, and any other scheduling parameters.
• Memory-management information. This information may include such information as the value of
the base and limit registers, the page tables, or the segment tables, depending on the memory system used
by the operating system.
• Accounting information. This information includes the amount of CPU and real time used,
time limits, account members, job or process numbers ,and so on.
• I/O status information. This information includes the list of I/O devices allocated to the process, a list
of open files, and so on. In brief, the PCB simply serves as the repository for any information that may
vary from process to Process.

CPU Switch from Process to Process


2. Process Scheduling
The objective of multiprogramming is to have some process running at all times, to maximize
CPU utilization. The objective of time sharing is to switch the CPU among processes so frequently that
users can interact with each program while it is running. To meet these objectives, the process scheduler
selects an available process (possibly from a set of several available processes) for program execution on
the CPU. For a single-processor system, there will never be more than one running process. If there are
more processes, the rest will have to wait until the CPU is free and can be rescheduled.
Scheduling Queues
As processes enter the system, they are put into a job queue, which consists of all processes in
the system. The processes that are residing in main memory and are ready and waiting to execute are kept
on a list called the ready queue. This queue is generally stored as a linked list. A ready-queue header
contains pointers to the first and final PCBs in the list. Each PCB includes a pointer fieldthat points to the
next PCB in the ready queue.
The system also includes other queues. When a process is allocated theCPU, it executes for a
while and eventually quits, is interrupted, or waits forthe occurrence of a particular event, such as the
completion of an I/O request.Suppose the process makes an I/O request to a shared device, such as a
disk.Since there are many processes in the system, the disk may be busy with theI/O request of some
other process. The process therefore may have to wait forthe disk. The list of processes waiting for a
particular I/O device is called adevice queue. Each device has its own device queue (Figure).

The ready queue and various I/O device queues.


A common representation for a discussion of process scheduling is aqueueing diagram, such as
that in Figure . Each rectangular box representsa queue. Two types of queues are present: the ready queue
and a set of devicequeues. The circles represent the resources that serve the queues, and thearrows
indicate the flow of processes in the system.
Queueing-diagram representation of process scheduling.
A new process is initially put in the ready queue. It waits there tmtil it isselected for execution, or is
dispatched. Once the process is allocated the CPUand is executing, one of several events could occur:
• The process could issue an I/O request and then be placed in an I/O queue.
• The process could create a new subprocess and wait for the subprocess'stermination.
• The process could be removed forcibly from the CPU, as a result of aninterrupt, and be put back in
the ready queue.
Schedulers
A process migrates among the various scheduling queues throughout itslifetime. The operating
system must select, for scheduling purposes, processesfrom these queues in some fashion. The selection
process is carried out by theappropriate scheduler.
The long-term scheduler, or jobscheduler, selects processes from this pool and loads them into
memory forexecution. The short-term scheduler, or CPU scheduler, selects from amongthe processes
that are ready to execute and allocates the CPU to one of them.
The primary distinction between these two schedulers lies in frequencyof execution. The short-
term scheduler must select a new process for the CPUfrequently. A process may execute for only a few
milliseconds before waitingfor an I/O request. Often, the short-term scheduler executes at least once
every100 milliseconds. Because of the short time between executions, the short-termscheduler must be
fast. The long-term scheduler executes much less frequently; minutes may separatethe creation of one
new process and the next. The long- term schedulercontrols the degree of multiprogramming
It is important that the long-term scheduler make a careful selection. Ingeneral, most processes
can be described as either L/O bound or CPU bound. AnI/O-bound process is one that spends more of
its time doing I/O than it spendsdoing computations. A CPU-bound process, in contrast, generates I/O
requestsinfrequently, using more of its time doing computations. It is important that thelong-term
scheduler select a good process mix of I/O-bound and CPU-bound processes.
Addition of medium-term scheduling to the queueing diagram.

If all processes are I/O bound, the ready queue will almost alwaysbe empty, and the short-term
scheduler will have little to do. If all processesare CPU bound, the I/O waiting queue will almost always
be empty, devices
will go unused, and again the system will be unbalanced. The system with thebest performance will thus
have a combination of CPU-bound and I/O-boundprocesses.
Some operating systems, such as time-sharing systems, may introduce anadditional, intermediate
level of scheduling. This medium-term scheduler isdiagrammed in Figure . The key idea behind a
medium-term scheduler is
that sometimes it can be advantageous to remove processes from memory(and from active contention for
the CPU) and thus reduce the degree ofmultiprogramming. Later, the process can be reintroduced into
memory, and itsexecution can be continued where it left off. This scheme is called swapping.The process
is swapped out, and is later swapped in, by the medium-termscheduler. Swapping may be necessary to
improve the process mix or because
a change in memory requirements has overcommitted available memory,requiring memory to be freed up.
Context Switch
An interrupts cause the operating system to change a CPUfrom its current task and to run a kernel
routine. Such operations happenfrequently on general-purpose systems. When an interrupt occurs, the
systemneeds to save the current context of the process currently running on theCPU so that it can restore
that context when its processing is done, essentiallysuspending the process and then resuming it.
The context is represented inthe PCB of the process; it includes the value of the CPU registers, the
Process state (see Figure), and memory-management information. Generically, weperform a state save of
the current state of the CPU, be it in kernel or user mode,and then a state restore to resume operations.
Switching the CPU to another process requires performing a stat^ save of the current process and
a state restore of a different process. This task is known as a context switch. When a context switch
occurs, the kernel saves the context of the old process in its PCB and loads the saved context of the new
process scheduled to run. Context-switch time is pure overhead, because the system does no useful work
while switching.
3. Operations on Processes
The processes in most systems can execute concurrently, and they may be created and deleted
dynamically. Thus, these systems must provide a mechanism for process creation and termination.
Process Creation
A process may create several new processes, via a create-process system call, during the course of
execution. The creating process is called a parent process, and the new processes are called the children
of that process. Each of these
new processes may in turn create other processes, forming a tree of processes.
Most operating systems identify processes according to a unique process identifier (or pid),
which is typically an integer number. Figure illustrates a typical process tree for the Solaris operating
system, showing the name of each process and its pid. In Solaris, the process at the top of the tree is the
sched process, with pid of 0. The sched process creates several children processes—including pageout
and f sf lush. These processes are responsible for managing memory and file systems. The sched process
also creates the i n i t process, which serves as the root parent process for all user processes.

A tree of processes on a typical Solaris system


In general, a process will need certain resources (CPU time, memory, files, I/O devices) to
accomplish its task. When a process creates a subprocess, that subprocess may be able to obtain its
resources directly from the operatiiigsystem, or it may be constrained to a subset of the resources of the
parent process. The parent may have to partition its resources among its children, or it may be able to
share some resources (such as memory or files) among several of its children. Restricting a child process
to a subset of the parent's resources prevents any process from overloading the system by creating too
many subprocesses.
When a process creates a new process, two possibilities exist in terms ofexecution:

1. The parent continues to execute concurrently with its children.


2. The parent waits until some or all of its children have terminated.
There are also two possibilities in terms of the address space of the new process:
1. The child process is a duplicate of the parent process (it has the sameprogram and data as the parent).
2. The child process has a new program loaded into it.
Process Termination
A process terminates when it finishes executing its final statement and asks theoperating system
to delete it by using the exit () system call. At that point, theprocess may return a status value (typically
an integer) to its parent process (viathe wait() system call). All the resources of the process—including
physical andvirtual memory, open files, and I/O buffers—are deallocated by the operatingsystem.
A parent may terminate the execution of one of its children for a variety ofreasons, such as these:
• The child has exceeded its usage of some of the resources that it has beenallocated. (To determine
whether this has occurred, the parent must havea mechanism to inspect the state of its children.)
• The task assigned to the child is no longer required.
• The parent is exiting, and the operating system does not allow a child tocontinue if its parent terminates.
Some systems, including VMS, do not allow a child to exist if its parenthas terminated. In such systems,
if a process terminates (either normally orabnormally), then all its children must also be terminated. This
phenomenon,referred to as cascading termination, is normally initiated by the operatingsystem.
To illustrate process execution and termination, consider that, in UNIX, wecan terminate a
process by using the
e x i t() system call; its parent processmay wait for the termination of a child process by using the waitO
systemcall. The wait () system call returns the process identifier of a terminated childso that the parent
can tell which of its possibly many children has terminated.
4. Inter process Communication
Processes executing concurrently in the operating system may be either independent processes or
cooperating processes. A process is independent if it cannot affect or be affected by the other processes
executing in the system.Any process that does not share data with any other process is independent.
A process is cooperating if it can affect or be affected by the other processes executing in the system.
Clearly, any process that shares data with other processes is a cooperating process.
There are several reasons for providing an environment that allows process cooperation:
• Information sharing. Since several users may be interested in the same piece of information (for
instance, a shared file), we must provide an environment to allow concurrent access to such
information.
• Computation speedup. If we want a particular task to run faster, we must break it into subtasks,
each of which will be executing in parallel with the others.
• Modularity. We may want to construct the system in a modular fashion, dividing the system
functions into separate processes or threads.
• Convenience. Even an individual user may work on many tasks at the same time. For instance, a user
may be editing, printing, and compiling in parallel.
Cooperating processes require an interprocess communication (IPC) mechanism that will allow
them to exchange data and information. There are two fundamental models of interposes communication:
(1) shared memory and (2) message passing.
In the shared-memory model, a region of memory that is shared by cooperating processes is
established.
Processes can then exchange information by reading and writing data to the shared region. In the
messagepassingmodel, communication takes place by means of messages exchanged between the
cooperating processes. The two communications models are contrasted in Figure.

Communications models, (a) Message passing, (b) Shared memory.

Both of the models just discussed are common in operating systems, and many systems
implement both. Message passing is useful for exchanging smaller amounts of data, because no conflicts
need be avoided. Message passing is also easier to implement than is shared memory for intercomputer
communication. Shared memory allows maximum speed and convenience of communication, as it can be
done at memory speeds when within a computer.
Shared memory is faster than message passing, as message-passing systemsare typically
implemented using system calls and thus require the more timeconsumingtask of kernel intervention. In
contrast, in shared- memory systems,
system calls are required only to establish shared-memory regions. Once sharedmemory is established,
all accesses are treated as routine memory accesses, andno assistance from the kernel is required.
Shared-Memory Systems
Interprocess communication using shared memory requires communicatingprocesses to establish
a region of shared memory. Typically, a shared-memoryregion resides in the address space of the process
creating the shared-memorysegment. Other processes that wish to communicate using this shared-
memorysegment must attach it to their address space. They can then exchange information by reading
and writingdata in the shared areas. The form of the data and the location are determined bythese
processes and are not under the operating system's control. The processes
are also responsible for ensuring that they are not writing to the same locationsimultaneously.
To illustrate the concept of cooperating processes, let's consider theproducer-consumer problem,
which is a common paradigm for cooperatingprocesses. A producer process produces information that is
consumed by aconsumer process.
Message-Passing Systems
Message passing provides a mechanism to allow processes to communicate and to synchronize
their actions without sharing the same address space and is particularly useful in a distributed
environment, where the communicating
processes may reside on different computers connected by a network.
A message-passing facility provides at least two operations: send(message) and receive(message).
Messages sent by a process can be of either fixed or variable size. If only fixed-sized messages can be
sent, the system-level implementation is straightforward. This restriction, however, makes the task of
programming more difficult. Conversely, variable-sized messages require a more complex system-level
implementation, but the programming task becomes simpler. This is a common kind of tradeoff seen
throughout operating system design.

If processes P and Q want to communicate, they must send messages to and receive messages
from each other; a communication link must exist between them. This link can be implemented in a
variety of ways. Here are several methods for logically implementing a link and the send()/receive ()
operations:
• Direct or indirect communication
• Synchronous or asynchronous communication
• Automatic or explicit buffering
We look at issues related to each of these features next.
Naming
Processes that want to communicate must have a way to refer to each other. They can use either
direct or indirect communication.Under direct communication, each process that wants to
communicate must explicitly name the recipient or sender of the communication. In this scheme,
the send.0 and receive() primitives are defined as:
• send(P, message)—Send a message to process P.
• receive (Q, message)—Receive a message from process Q.

• A link is established automatically between every pair of processes that want to communicate. The
processes need to know only each other's identity to communicate.
• A link is associated with exactly two processes.
• Between each pair of processes, there exists exactly one link.
This scheme exhibits symmetry in addressing; that is, both the sender process and the receiver process
must name the other to communicate. A variant of this scheme employs asymmetry in addressing. Here,
only the sender names the recipient; the recipient is not required to name the sender. In this scheme, the
send() and receive () primitives are defined as follows:
• send(P, message)—Send a message to process P.
• receive(id, message)—-Receive a message from any process; the variable id is set to the name of the
process with which communication has taken place.
With indirect communication, the messages are sent to and received from mailboxes, or ports. A
mailbox can be viewed abstractly as an object into which messages can be placed by processes and from
which messages can be removed.
Each mailbox has a unique identification.
Two processes can communicate only if the processes have a shared mailbox, however. The sendC) and
receive () primitives are defined as follows:
• send(A, message)—Send a message to mailbox A.
• receive(A, message)—Receive a message from mailbox A.

• A link is established between a pair of processes only if both


members of the pair have a shared mailbox.
• A link may be associated with more than two processes.
• Between each pair of communicating processes, there may be a
number of different links, with each link corresponding to one
mailbox.
In contrast, a mailbox that is owned by the operating system has anexistence of its own. It is
independent and is not attached to any particularprocess. The operating system then must provide a
mechanism that allows aprocess to do the following:

• Create a new mailbox.


• Send and receive messages through the mailbox.
• Delete a mailbox.
The process that creates a new mailbox is that mailbox's owner by default.Initially, the owner is
the only process that can receive messages through thismailbox. However, the ownership and receiving
privilege may be passed toother processes through appropriate system calls. Of course, this
provisioncould result in multiple receivers for each mailbox.
Synchronization
Communication between processes takes place through calls to sendO andreceive () primitives. There are
different design options for implementingeach primitive. Message passing may be either blocking or
nonblockingalso known as synchronous and asynchronous.
• Blocking send. The sending process is blocked until the message isreceived by the receiving process or
by the mailbox.
• Nonblocking send. The sending process sends the message and resumesoperation.
• Blocking receive. The receiver blocks until a message is available.
• Nonblocking receive. The receiver retrieves either a valid message or anull.
Buffering
Whether communication is direct or indirect, messages exchanged by communicatingprocesses
reside in a temporary queue. Basically, such queues can beimplemented in three ways:
• Zero capacity. The queue has a maximum length of zero; thus, the linkcannot have any messages
waiting in it. In this case, the sender must blockuntil the recipient receives the message.
• Bounded capacity. The queue has finite length n; thus, at most n messagescan reside in it. If the queue
is not full when a new message is sent, themessage is placed in the queue (either the message is copied or
a pointerto the message is kept), and the sender can continue execution withoutwaiting. The links
capacity is finite, however. If the link is full, the sendermust block until space is available in the queue.
• Unbounded capacity. The queues length is potentially infinite; thus, anynumber of messages can
wait in it. The sender never blocks.
The zero-capacity case is sometimes referred to as a message system with nobuffering; the
other cases are referred to as systems with automatic buffering.
5. Overview of Threads
A thread is a basic unit of CPU utilization; it comprises a thread ID, a programcounter, a register
set, and a stack. It shares with other threads belongingto the same process its code section, data section,
and other operating-systemresources, such as open files and signals. A traditional (or heavyweight)
processhas a single thread of control. process has multiple threads of control, itcan perform more than
one task at a time. Figure illustrates the differencebetween a traditional single-threaded process and a
multithreaded process

Single-threaded and multithreaded processes.

The benefits of multithreaded programming can be broken down into fourmajor categories:
1. Responsiveness. Multithreading an interactive application may allow aprogram to continue running
even if part of it is blocked or is performinga lengthy operation, thereby increasing responsiveness to the
user. Forinstance, a multithreaded web browser could still allow user interactionin one thread while an
image was being loaded in another thread.
2. Resource sharing. By default, threads share the memory and theresources of the process to which
they belong. The benefit of sharingcode and data is that it allows an application to have several
differentthreads of activity within the same address space.
3. Economy. Allocating memory and resources for process creation is costly.Because threads share
resources of the process to which they belong, itis more economical to create and context-switch threads.
Empiricallygauging the difference in overhead can be difficult, but in general it ismuch more time
consuming to create and manage processes than threads.
4. Utilization of multiprocessor architectures. The benefits of multithreadingcan be greatly
increased in a multiprocessor architecture, wherethreads may be running in parallel on different
processors. A singlethreaded process can only run on one CPU, no matter how many areavailable.
Multithreading on a multi-CPU machine increases concurrency.
Multithreading Models
Supportfor threads may be provided either at the user level, for user threads, or by thekernel, for kernel
threads. User threads are supported above the kernel andare managed without kernel support, whereas
kernel threads are supportedand managed directly by the operating system.
Ultimately, there must exist a relationship between user threads and kernelthreads. In this section,
we look at three common ways of establishing thisrelationship.
Many-to-One Model
The many-to-one model maps many user-level threads to onekernel thread. Thread management
is done by the thread library in userspace, so it is efficient; but the entire process will block if a thread
makes ablocking system call. Also, because only one thread can access the kernel at atime, multiple
threads are unable to run in parallel on multiprocessors. Greenthreads—a thread library available for
Solaris—uses this model, as does GNUPortable Threads.

One-to-One Model
The one-to-one model maps each user thread to a kernel thread. Itprovides more concurrency than the
many-to- one model by allowing anotherthread to run when a thread makes a blocking system call; it also
allowsmultiple threads to run in parallel on multiprocessors. The only drawback tothis model is that
creating a user thread requires creating the correspondingkernel thread. Because the overhead of creating
kernel threads can burden

theperformance of an application, most implementations of this model restrict thenumber of threads


supported by the system. Linux, along with the family ofWindows operating systems—including
Windows 95, 98, NT, 2000, and XPimplement the one-to-one model.
Many-to-Many Model
The many-to-many model multiplexes many user-level threads toa smaller or equal number of
kernel threads. The number of kernel threadsmay be specific to either a particular application or a
particular machine (a on a uniprocessor). Whereas the many-to-one model allows the developer tocreate
as many user threads as she wishes, true concurrency is not gainedbecause the kernel can schedule only
one thread at a time. The one- to-onemodel allows for greater concurrency, but the developer has to be
careful notto create too many threads within an application (and in some instances maybe limited in the
number of threads she can create). The many- to-many modelsuffers from neither of these shortcomings:
Developers can create as many userthreads as necessary, and the corresponding kernel threads can run in
parallelon a multiprocessor. Also, when a thread performs a blocking system call, thekernel can schedule
another thread for execution.
One popular variation on the many-to-many model still multiplexes manyuser-level threads to a smaller
or equal number of kernel threads but also allowsa user-level thread to be bound to a kernel thread. This
variation, sometimesreferred to as the tivo-level model (Figure), is supported by operating systemssuch as
IRIX, HP-UX, and Tru64 UNIX. The Solaris operating system supportedthe two-level model .

6. CPU Scheduling
CPU-scheduling decisions may take place under the following four circumstances:
1. When a process switches from the running state to the waiting state
2. When a process switches from the running state to the ready state
3. When a process switches from the waiting state to the ready state
4. When a process terminates
For situations 1 and 4, there is no choice in terms of scheduling. A new process(if one exists in
the ready queue) must be selected for execution. There is achoice, however, for situations 2 and 3.
When scheduling takes place only under circumstances 1 and 4, we saythat the scheduling
scheme is nonpreemptive or cooperative; otherwise, itis preemptive. Under nonpreemptive scheduling,
once the CPU has beenallocated to a process, the process keeps the CPU until it releases the CPU
eitherby terminating or by

switching to the waiting state. This scheduling methodwas used by Microsoft Windows 3.x; Windows 95
introduced preemptivescheduling, and all subsequent versions of Windows operating systems haveused
preemptive scheduling.

Unfortunately, preemptive scheduling incurs a cost associated with accessto shared data.
Consider the case of two processes that share data. While oneis updating the data, it is preempted so that
the second process can run. Thesecond process then tries to read the data, which are in an inconsistent
state.
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher.The dispatcher is
the module that gives control of the CPU to the process selectedby the short-term scheduler. This
function involves the following:
• Switching context
• Switching to user mode
• Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible, since it is invoked during everyprocess switch. The
time it takes for the dispatcher to stop one process andstart another running is known as the dispatch
latency.
Scheduling Criteria
Different CPU scheduling algorithms have different properties, and the choiceof a particular
algorithm based onMany criteria have been suggested for comparing CPU scheduling algorithms.The
criteria include thefollowing:
• CPU utilization. We want to keep the CPU as busy as possible. Conceptually,CPU utilization can
range from 0 to 100 percent. In a real system, itshould range from 40 percent (for a lightly loaded
system) to 90 percent.
• Throughput. One measure of CPU work is the number of processes that are completedper time
unit, called throughput. For long processes, this rate may be oneprocess per hour; for short
transactions, it may be 10 processes per second.
• Turnaround time. It is how long it takes to execute that process. The intervalfrom the time of
submission of a process to the time of completion is theturnaround time. Turnaround time is the sum of
the periods spent waitingto get into memory, waiting in the ready queue, executing on the CPU,
anddoing I/O.
• Waiting time. Theamount of time that a process spends waiting in the ready queue. Waitingtime is the
sum of the periods spent waiting in the ready queue.
• Response time. The time from the submissionof a request until the first response is produced. This
measure, calledresponse time, is the time it takes to start responding, not the time it takesto output the
response. The turnaround time is generally limited by thespeed of the output device.
It is desirable to maximize CPU utilization and throughput and to minimizeturnaround time,
waiting time, and response time. In most cases, we optimizethe average measure. However, under some
circumstances, it is desirableto optimize the minimum or maximum values rather than the average.
Scheduling Algorithms
CPU scheduling deals with the problem of deciding which of the processesin the ready queue is
to be allocated the CPU. There are many different CPUscheduling algorithms. In this section, we
describe several of them.
1. First-Come, First-Served Scheduling
The simplest CPU-scheduling algorithm is the first-come, first-served(FCFS) scheduling
algorithm.
With this scheme, the process that requests theCPU first is allocated the CPU first. The implementation of
the
FCFS policy iseasily managed with a FIFO queue. When a process enters the ready queue, itsPCB is
linked onto the tail of the queue. When the CPU is free, it is allocated tothe process at the head of the
queue. The running process is then removed fromthe queue. The code for FCFS scheduling is simple to
write and understand.The average waiting time under the FCFS policy, however, is often quitelong.
Consider the following set of processes that arrive at time 0, with thelength of the CPU burst given in
milliseconds:
Process Burst Time
P1 24
P2 3
P3 3

If the processes arrive in the order P1, P2, P3, and are served in FCFS order,we get the result
shown in the following Gantt chart:

P1 P2 P3

0 24 2 30
The waiting time is 0 milliseconds for process Pi, 24 milliseconds for process
Pn, and 27 milliseconds for process Pj. Thus, the average waiting time is (0
+ 24 + 27)/3 = 17 milliseconds. If the processes arrive in the order Pi,
P3, Pi, however, the results will be as showrn in the following Gantt
chart:

P2 P3 P1

0 3 6 30
The average waiting time is now (6 + 0 + 3)/3 = 3 milliseconds. This reductionis substantial. Thus, the
average waiting time under an FCFS policy is generallynot minimal and may vary substantially if the
process's CPU burst times varygreatly.
In addition, consider the performance of FCFS scheduling in a dynamicsituation. Assume we
have one CPU-bound process and many I/O-boundprocesses. As the processes flow around the system,
the following scenariomay result. The CPU-bound process will get and hold the CPU. During thistime,
all the other processes will finish their I/O and will move into the readyqueue, waiting for the CPU.
While the processes wait in the ready queue, theI/O devices are idle. Eventually, the CPU-bound process
finishes its CPU burstand moves to an I/O device. All the I/O-bound processes, which have shortCPU
bursts, execute quickly and move back to the I/O queues. At this point,the CPU sits idle. The CPU-bound
process will then move back to the ready queue and be allocated the CPU. Again, all the I/O processes
end up waiting inthe ready queue until the CPU-bound process is done.
There is a convoy effect as all the other processes wait for the one big process to get off the CPU.
This effect results in lower CPU and device utilization than might be possible if the shorter processes
were allowed to go first.
The FCFS scheduling algorithm is non preemptive. Once the CPU has been allocated to a
process, that process keeps the CPU until it releases the CPU, eitherby terminating or by requesting I/O.
The FCFS algorithm

is thus particularly trouble some for time-sharing systems, where it is important that each user geta
share of the CPU at regular intervals.
2. Shortest-Job-First Scheduling
The shortest-job-first (SJF) scheduling algorithm associates with each process the length of the
process’s next CPU burst. When the CPU is available, it is assigned to the process that has the smallest
next CPU burst. If the next CPU bursts of two processes are the same, FCFS scheduling is used to break
the tie. Note that a more appropriate term for this scheduling method would be the shortest-next-CPU-
burst algorithm, because scheduling depends on the length of the next CPU burst of a process ,rather than
its total length. as an example of SJF scheduling, consider the following set of processes ,with the length
of the CPU burst given in milliseconds:
Process Burst Time
P1 6
P2 8
P3 7
P4 3
Using SJF scheduling, we would schedule these processes according to thefollowing Gantt chart:

P P3 P2
4 P1

0 3 9 16 24
The waiting time is 3 milliseconds for process P1, 16 milliseconds for processP2, 9 milliseconds
for process P3, and 0 milliseconds for process P4. Thus, theaverage waiting time is (3 + 16 + 9 + 0)/4 - 7
milliseconds. By comparison, ifwe were using the FCFS scheduling scheme, the average waiting time
would be 10.25 milliseconds.

The SJF scheduling algorithm is provably optimal, in that it gives theminimum average waiting
time for a given set of processes. Moving a shortprocess before a long one decreases the waiting time of
the short process morethan it increases the waiting time of the long process. Consequently, the
averagewaiting time decreases.
The real difficulty with the SJF algorithm is knowing the length of the nextCPU request. There is
no way to know the length of the nextCPU burst. One approach is to try to approximate SJF scheduling.
We may notknow the length of the next CPU burst, but we may be able to predict its value.We expect
that the next CPU burst will be similar in length to the previous ones.Thus, by computing an
approximation of the length of the next CPU burst, wecan pick the process with the shortest predicted
CPU burst.The next CPU burst is generally predicted as an exponential average of themeasured lengths
of previous CPU bursts. Let tn be the length of the
»th CPUburst, and let T,,+I be our predicted value for the next CPU burst. Then, for a, 0< a < 1, define

T n + 1 =atn + ( l – a)Tn.
This formula defines an exponential average. The value of tn contains ourmost recent
information; in stores the past history. The parameter a controlsthe relative weight of recent and past
history in our prediction. If a = 0, thenTn,+I =Tn,, and recent history has no effect (current conditions are
assumedto be transient); if a = 1, then Tn+1= tn, and only the most recent CPU burstmatters (history is
assumed to be old and irrelevant). More

commonly, a =1/2, so recent history and past history are equally weighted. The initial T0 canbe defined
as a constant or as an overall system average.
The SJF algorithm can be either preemptive or nonpreemptive. The choicearises when a new
process arrives at the ready queue while a previous process isstill executing. The next CPU burst of the
newly arrived process may be shorterthan what is left of the currently executing process. A preemptive
SJF algorithmwill preempt the currently executing process, whereas a nonpreemptiTe SJFalgorithm will
allow the currently running process to finish its CPU burst.Preemptive SJF scheduling is sometimes
called shortest-remaining- time-firstscheduling.
As an example, consider the following four processes, with the length ofthe CPU burst given in
milliseconds:
Process Arrival Time Burst Time
P1 0 8
P2 1 4
P3 2 9
P4 3 5
If the processes arrive at the ready queue at the times shown and need theindicated burst times,
then the resulting preemptive SJF schedule is as depictedin the following Gantt chart:

Process Pi is started at time 0, since it is the only process in the queue. ProcessP2 arrives at time
1. The remaining time for process Pi (7 milliseconds) islarger than the time required by process P2 (4
milliseconds), so process Pi ispreempted, and process P2 is scheduled. The average waiting time for
thisexample is ((10 - 1) + (1
- 1) + (17 - 2) + (5 - 3))/4 = 26/4 = 6.5 milliseconds.Nonpreemptive SJF scheduling would result in an
average waiting time of 7.75milliseconds.
3 Priority Scheduling
The SJF algorithm is a special case of the general priority scheduling algorithm.A priority is
associated with each process, and the CPU is allocated to the processwith the highest priority. Equal-
priority processes are scheduled in FCFS order.An SJF algorithm is simply a priority algorithm where the
priority (p) is theinverse of the (predicted) next CPU burst. The larger the CPU burst, the lowerthe
priority, and vice versa.
Note that we discuss scheduling in terms of high priority and low priority.Priorities are generally
indicated by some fixed range of numbers, such as 0to 7 or 0 to 4,095. However, there is no general
agreement on whether 0 is thehighest or lowest priority. Some systems use low numbers to represent
lowpriority; others use low numbers for high priority. This difference can lead toconfusion. In this text,
we assume that low numbers represent high priority.
As an example, consider the following set of processes, assumed to havearrived at time 0, in the
order P1, P2, • • -, P5, with the length of the CPU burstgiven in milliseconds
Process Ps 2
Pi 1
Pi Burst Time 5
P3 10
PA 1 Priority

Using priority scheduling, we would schedule these processes according to thefollowing Gantt chart:

3 4 2
1 5
The average waiting time is 8.2 milliseconds.
Priorities can be defined either internally or externally. Internally definedpriorities use some
measurable quantity or quantities to compute the priorityof a process
Priority scheduling can be either preemptive or nonpreemptive. When aprocess arrives at the
ready queue, its priority is compared with the priorityof the currently running process. A preemptive
priority scheduling algorithmwill preempt the CPU if the priority of the newly arrived process is
higherthan the priority of the currently running process. A nonpreemptive priorityscheduling algorithm
will simply put the new process at the head of the readyqueue.
A major problem with priority scheduling algorithms is indefinite blocking,or starvation. A
process that is ready to run but waiting for the CPU canbe considered blocked. A priority scheduling
algorithm can leave some lowpriorityprocesses waiting indefinitely. In a heavily loaded computer
system, asteady stream of higher-priority processes can prevent a low-priority processfrom ever getting
the CPU. Generally, one of two things will happen. Either theprocess will eventually be run or the
computer system will eventually crash and lose allunfinished low-priority processes.
A solution to the problem of indefinite blockage of low-priority processesis aging. Aging is a
technique of gradually increasing the priority of processesthat wait in the system for a long time. For
example, if priorities range from127 (low) to 0 (high), we could increase the priority of a waiting process
by1 every 15 minutes. Eventually, even a process with an initial priority of 127would have the highest
priority in the system and would be executed. In fact,it would take no more than 32 hours for a priority-
127 process to age to apriority-0 process.
4 Round-Robin Scheduling
The round-robin (RR) scheduling algorithm is designed especially for timesharingsystems. It is
similar to FCFS scheduling, but preemption is added toswitch between processes. A small unit of time,
called a time quantum or timeslice, is defined. A time quantum is generally from 10 to 100
milliseconds. Theready queue is treated as a circular queue. The CPU scheduler goes around theready
queue, allocating the CPU to each process for a time interval of up to 1time quantum.
To implement RR scheduling, we keep the ready queue as a FIFO queue ofprocesses. New
processes are added to the tail of the ready queue. The CPUscheduler picks the first process from the
ready queue, sets a timer to interruptafter 1 time quantum, and dispatches the process.One of two things
will then happen.
The process may have a CPU burst ofless than 1 time quantum. In this case, the process itself will
release the CPUvoluntarily. The scheduler will then proceed to the next process in the readyqueue.
Otherwise, if the CPU burst of the currently running process is longerthan 1 time quantum, the timer will
go off and will cause an interrupt to theoperating system. A context switch will be executed, and the
process will beput at the tail of the ready queue. The CPU scheduler will then select the nextprocess in
the ready queue.

The average waiting time under the RR policy is often long. Consider thefollowing set of
processes that arrive at time 0, with the length of the CPU burstgiven in milliseconds:
Process Burst Time
Pi 24
Pi 3
P3 3
If we use a time quantum of 4 milliseconds, then process P1 gets the first4 milliseconds. Since it
requires another 20 milliseconds, it is preempted afterthe first time quantum, and the CPU is given to the
next process in the queue,process P2. Since process P2 does not need 4 milliseconds, it quits before
itstime quantum expires. The CPU is then given to the next process, process P3.Once each process has
received 1 time quantum, the CPU is returned to processP1 for an additional time quantum. The resulting
RR schedule is

P1 P2 P3 P1 P1 P1 P1 P1
0 4 7 10 14 18 22 26 30
The average waiting time is 17/3 = 5.66 milliseconds.
In the RR scheduling algorithm, no process is allocated the CPU for morethan 1 time quantum in
a row (unless it is the only runnable process). If aprocess's CPU burst exceeds 1 time quantum, that
process is preempted and isput back in the ready queue. The RR scheduling algorithm is thus
preemptive.If there are n processes in the ready queue and the time quantum is q,then each process gets
1/n of the CPU time in chunks of at most q time units.
Each process must wait no longer than (n — 1) x q time units until itsnext time quantum. For
example, with five processes and a time quantum of 20milliseconds, each process will get up to 20
milliseconds every 100 milliseconds.
The performance of the RR algorithm depends heavily on the size of thetime quantum. At one
extreme, if the time quantum is extremely large, the RRpolicy is the same as the FCFS policy If the time
quantum is extremely small(say, 1 millisecond), the RR approach is called processor sharing and (in
theory)creates the appearance that each of n processes has its own processor runningat 1/n the speed of
the real processor.
In software, we need also to consider the effect of context switching on theperformance of RR
scheduling. Let us assume that we have only one process of10 time units. If the quantum is 12 time units,
the process finishes in less than 1time quantum, with no overhead. If the quantum is 6 time units,
however, theprocess requires 2 quanta, resulting in a context switch. If the time quantum is1 time unit,
then nine context switches will occur, slowing the execution of theprocess accordingly.
Thus, we want the time quantum to be large with respect to the contextswitchtime. If the context-
switch time is approximately 10 percent of thetime quantum, then about 10 percent of the CPU time will
be spent in contextswitching. In practice, most modern systems have time quanta ranging from10 to 100
milliseconds. The time required for a context switch is typically lessthan 10 microseconds; thus, the
context-switch time is a small fraction of thetime quantum.

Time Quantum and Context Switch Time


Turnaround time also depends on the size of the time quantum. As we cansee from Figure, the
average turnaround time of a set of processes doesnot necessarily improve as the time-quantum size
increases. In general, theaverage turnaround time can be improved if most processes finish their nextCPU
burst in a single time quantum. For example, given three processes of 10time units each and a quantum
of 1 time unit, the average turnaround time is29. If the time quantum is 10, however, the average
turnaround time drops to20. If context-switch time is added in, the average turnaround time increasesfor
a smaller time quantum, since more context switches are required.Although the time quantum should be
large compared with the contextswitchtime, it should not be too large. If the time quantum is too large,
RRscheduling degenerates to FCFS policy. A rule of thumb is that 80 percent of theCPU bursts should be
shorter than the time quantum.
5 Multilevel Queue Scheduling
Another class of scheduling algorithms has been created for situations inwhich processes are
easily classified into different groups. For example, acommon division is made between foreground
(interactive) processes andbackground (batch) processes. These two types of processes have
differentresponse-time requirements and so may have different scheduling needs. Inaddition, foreground
processes may have priority (externally defined) overbackground processes.
A multilevel queue scheduling algorithm partitions the ready queue intoseveral separate queues
(Figure 5.6). The processes are permanently assigned toone queue, generally based on some property of
the process, such as memorysize, process priority, or process type. Each queue has its own scheduling
algorithm.

Multilevel queue scheduling


For example, separate queues might be used for foreground andbackground processes. The
foreground quetie might be scheduled by an RRalgorithm, while the background queue is scheduled by
an FCFS algorithm.In addition, there must be scheduling among the queues, which is
commonlyimplemented as fixed- priority preemptive scheduling. For example, theforeground queue may
have absolute priority over the background queue.Let's look at an example of a multilevel queue
scheduling algorithm withfive queues, listed below in order of priority:
1. System processes
2. Interactive processes
3. Interactive editing processes
4. Batch processes
5. Student processes
Each queue has absolute priority over lower-priority queues. No process in thebatch queue, for example, could
run unless the queues for system processes,interactive processes, and interactive editing processes were all
empty. If aninteractive editing process entered the ready queue while a batch process wasrunning, the batch
process would be preempted.
6 Multilevel Feedback-Queue Scheduling
Normally, when the multilevel queue scheduling algorithm is used, processesare permanently assigned
to a queue when they enter the system. If thereare separate queues for foreground and background processes, for
example,processes do not move from one queue to the other, since processes do notchange their foreground or
background nature. This setup has the advantageof low scheduling overhead, but it is inflexible.
The multilevel feedback-queue scheduling algorithm, in contrast, allowsa process to move between
queues. The idea is to separate processes accordingto the characteristics of their CPU bursts. If a process uses
too much CPU time,it will be moved to a lower-priority queue. This scheme leaves I/O-bound andinteractive
processes in the higher-priority queues. In addition, a process thatwaits too long in a lower-priority queue may
be moved to a higher-priorityqueue. This form of aging prevents starvation.
For example, consider a multilevel feedback-queue scheduler with threequeues, numbered from 0 to 2.
The scheduler first executes allprocesses in queue 0. Only when queue 0 is empty will it execute processes
in queue 1. Similarly, processes in queue 2 will only be executed if queues 0and 1 are empty. A process that
arrives for queue 1 will preempt a process inqueue 2. A process in queue 1 will in turn be preempted by a
process arrivingfor queue 0.A process entering the ready queue is put in queue 0. A process in queue 0is given a
time quantum of 8 milliseconds. If it does not finish within this time,it is moved to the tail of queue 1. If queue
0 is empty, the process at the headof queue 1 is given a quantum of 16 milliseconds. If it does not complete, it
ispreempted and is put into queue 2. Processes in queue 2 are run on an FCFSbasis but are run only when
queues 0 and 1 are empty.
This scheduling algorithm gives highest priority to any process with a CPUburst of 8 milliseconds or
less. Such a process will quickly get the CPU, finishits CPU burst, and go off to its next I/O burst. Processes
that need more than8 but less than 24 milliseconds are also served quickly, although with lowerpriority than
shorter processes. Long processes automatically sink to queue2 and are served in FCFS order with any CPU
cycles left over from queues 0 and 1.

Multilevel Feedback Queues

In general, a multilevel feedback-queue scheduler is defined by thefollowing parameters:


• The number of queues
• The scheduling algorithm for each queue
• The method used to determine when to upgrade a process to a higherpriorityqueue
• The method used to determine when to demote a process to a lowerpriorityqueue
• The method used to determine which queue a process will enter when thatprocess needs service
The definition of a multilevel feedback-queue scheduler makes it the mostgeneral CPU-scheduling algorithm.
7. Algorithm Evaluation
The first problem is defining the criteria to be used in selecting an algorithm.As we saw , criteria
are often defined in terms of CPU utilization,response time, or throughput. To select an algorithm, we
must first definethe relative importance of these measures. Our criteria may include severalmeasures,
such as:
• Maximizing CPU utilization under the constraint that the maximumresponse time is 1 second
• Maximizing throughput such that turnaround time is (on average) linearlyproportional to total
execution time
Once the selection criteria have been defined, we want to evaluate thealgorithms under
consideration.
We next describe the various evaluationmethods we can use.
1 Deterministic Modeling
One major class of evaluation methods is analytic evaluation. Analyticevaluation uses the given
algorithm and the system workload to produce aformula or number that evaluates the performance of the
algorithm for thatworkload.
One type of analytic evaluation is deterministic modeling. This methodtakes a particular
predetermined workload and defines the performance of eachalgorithm for that workload.
Deterministic modeling is simple and fast. It gives us exact numbers,allowing us to compare the
algorithms. However, it requires exact numbers forinput, and its answers apply only to those cases.
2 Queuing Models
On many systems, the processes that are run vary from day to day, so thereis no static set of
processes (or times) to use for deterministic modeling. Whatcan be determined, however, is the
distribution of CPU and I/O bursts. Thesedistributions can be measured and then approximated or
simply estimated. Theresult is a mathematical formula describing the probability of a particular
CPUburst. Commonly, this distribution is exponential and is described by its mean.
Similarly, we can describe the distribution of times when processes arrive inthe system (the
arrival-time distribution). From these two distributions, it ispossible to compute the average throughput,
utilization, waiting time, and soon for most algorithms.
Queueing analysis can be useful in comparing scheduling algorithms,but it also has limitations.
At the moment, the classes of algorithms anddistributions that can be handled are fairly limited. The
mathematics of complicated algorithms and distributions can be difficult to work with.
3 Simulations
To get a more accurate evaluation of scheduling algorithms, we can usesimulations. Running
simulations involves programming a model of thecomputer system. Software data structures represent
the major componentsof the system. Simulations can be expensive, often requiring hours of computer
time. Amore detailed simulation provides more accurate results, but it also requiresmore computer time.
In addition, trace tapes can require large amounts ofstorage space. Finally, the design, coding, and
debugging of the simulator canbe a major task.
4 Implementation
Even a simulation is of limited accuracy. The only completely accurate wayto evaluate a
scheduling algorithm is to code it up, put it in the operatingsystem, and see how it works. This approach
puts the actual algorithm in thereal system for evaluation under real operating conditions.
The major difficulty with this approach is the high cost. Another difficulty is that the
environment in which the algorithm is usedwill change. The environment will change not only in the
usual way, as newprograms are written and the types of problems change, but also as a resultof the
performance of the scheduler.

Thread Libraries
A thread library provides the programmer with an API for creating and managing threads.
There are two primary ways of implementing a thread library. The first approach is to provide a library
entirely in user space with no kernel support. All code and data structures for the library exist in user
space. This means that invoking a function in the library results in a local function call in user space and
not a system call.
The second approach is to implement a kernel-level library supported directly by the operating
system. In this case, code and data structures for the library exist in kernel space. Invoking a function in
the API for the library typically results in a system call to the kernel.
Three main thread libraries are in use today: POSIX Pthreads, Windows, and Java. Pthreads, the
threads extension of the POSIX standard, may be provided as either a user-level or a kernel-level library.
The Windows thread library is a kernel-level library available on Windows systems. The Java thread
API allows threads to be created and managed directly in Java programs. However, because in most
instances the JVM is running on top of a host operating system, the Java thread API is generally
implemented using a thread library available on the host system. This means that on Windows systems,
Java threads are typically implemented using the Windows API; UNIX and Linux systems often use
Pthreads.
Threading Issues

In this section, we discuss some of the issues to consider in designing multithreaded programs.

1. The fork() and exec() System Calls


If one thread in a program calls fork(), does the new process duplicate all threads, or is the new
process single-threaded? Some UNIX systems have chosen to have two versions of fork(), one that
duplicates all threads and another that duplicates only the thread that invoked the fork() system call.
The exec() system call typically works in the way , if a thread invokes the exec() system call,
the program specified in the parameter to exec() will replace the entire process— including all threads.

2. Signal Handling
A signal is used in UNIX systems to notify a process that a particular event has occurred. A
signal may be received either synchronously or asynchronously, depending on the source of and the
reason for the event being signaled. All signals, whether synchronous or asynchronous, follow the same
pattern:
1. A signal is generated by the occurrence of a particular event.
2. The signal is delivered to a process.

3. Once delivered, the signal must be handled.

Examples of synchronous signal include illegal memory access and divi- sion by 0. If a running
program performs either of these actions, a signal is generated. Synchronous signals are delivered to
the same process that performed the operation that caused the signal (that is the reason they are
considered synchronous).
A signal may be handled by one of two possible handlers:
1. A default signal handler
2. A user-defined signal handler
Every signal has a default signal handler that the kernel runs when handling that signal. This
default action can be overridden by a user-defined signal handler that is called to handle the signal.

Signals are handled in different ways. Some signals (such as changing the size of a window) are simply
ignored; others (such as an illegal memory access) are handled by terminating the program.
Thestandard UNIX function for delivering asignal is
kill(pid t pid, int signal)

POSIX Pthreads provides the following function, which allows a signal to be delivered to a
specified thread (tid):
pthread kill(pthread t tid, int signal)

Although Windows does not explicitly provide support for signals, it allows us to emulate them
using asynchronous procedure calls (APCs). The APC facility enables a user thread to specify a
function that is to be called when the user thread receives notification of a particular event.
Thread Cancellation
Thread cancellation involves terminating a thread before it has completed. For example, if
multiple threads are concurrently searching through a database and one thread returns the result, the
remaining threads might be canceled. Another situation might occur when a user presses a button on a web
browser that stops a web page from loading any further.A thread that is to be canceled is often referred
to as the target thread.
Cancellation of a target thread may occur in two different scenarios:

1. Asynchronous cancellation. One thread immediately terminates the target thread.

2. Deferred cancellation. The target thread periodically checks whether it should terminate, allowing it
an opportunity to terminate itself in an orderly fashion.

The difficulty with cancellation occurs in situations where resources have been allocated to a
canceled thread or where a thread is canceled while in the midst of updating data it is sharing with other
threads. This becomes especially troublesome with asynchronous cancellation.
pthreadcancel(tid);

Thread-Local Storage
Threads belonging to a process share the data of the process. Indeed, this data sharing provides
one of the benefits of multithreaded programming. However, in some circumstances, each thread might
need its own copy of certain data. We will call such data thread-local storage (or TLS.) For example, in
a transaction-processing system, we might service each transaction in a separate thread. Furthermore,
each transaction might be assigned a unique identifier. To associate each thread with its unique identifier,
we could use thread-local storage.

Scheduler Activations
A final issue to be considered with multithreaded programs concerns com- munication between
the kernel and the thread library, which may be required by the many-to-many and two-level model, Such
coordination allows the number of kernel threads to be dynamically adjusted to help ensure the best
performance.
Many systems implementing either the many-to-many or the two-level model place an
intermediate data structure between the user and kernel threads. This data structure— typically known as
a lightweight process, or LWP . To the user-thread library, the LWP appears to be a virtual processor on
which the application can schedule a user thread to run. Each LWP is attached to a kernel thread, and it
is kernel threads that the operating system schedules to run on physical processors. If a kernel thread
blocks (such as while waiting for an I/O operation to complete), the LWP blocks as well. Up the chain,
the user-level thread attached to the LWP also blocks.

LWP

Lightweight process (LWP) –Two-to Two level threading


Introduction to Operating Systems
A computer system has many resources (hardware and software), which may be required to complete a task.
The commonly required resources are input/output devices, memory, file storage space, CPU, etc. The
operating system acts as a manager of the above resources and allocates them to specific programs and
users, whenever necessary to perform a particular task. Therefore the operating system is the resource
manager i.e. it can manage the resource of a computer system internally. The resources are processor,
memory, files, and I/O devices. In simple terms, an operating system is an interface between the
computer user and the machine.
An operating system acts similarly like government means an operating system performs no useful
function by itself; though it provides an environment within which other programs can do useful work.
Below we have an abstract view of the components of the computer system:

• The Computer Hardware contains a central processing unit(CPU), the memory, and the
input/output (I/O) devices and it provides the basic computing resources for the system.
• The Application programs like spreadsheets, Web browsers, word processors, etc. are used to
define the ways in which these resources are used to solve the computing problems of the users. And
the System program mainly consists of compilers, loaders, editors, OS, etc.
• The Operating System is mainly used to control the hardware and coordinate its use among the
various application programs for the different users.
• Basically, Computer System mainly consists of hardware, software, and data.
OS is mainly designed in order to serve two basic purposes:
1. The operating system mainly controls the allocation and use of the computing System’s resources
among the various user and tasks.
2. It mainly provides an interface between the computer hardware and the programmer that simplifies
and makes feasible for coding, creation of application programs and debugging
Two Views of Operating System
1. User's View
2. System View
Operating System: User View
The user view of the computer refers to the interface being used. Such systems are designed for one user to
monopolize its resources, to maximize the work that the user is performing. In these cases, the operating
system is designed mostly for ease of use, with some attention paid to performance, and none paid to
resource utilization.
Operating System: System View
The operating system can be viewed as a resource allocator also. A computer system consists of many
resources like - hardware and software - that must be managed efficiently. The operating system acts as the
manager of the resources, decides between conflicting requests, controls the execution of programs, etc.
Operating System Management Tasks
1. Process management which involves putting the tasks into order and pairing them into manageable
size before they go to the CPU.
2. Memory management which coordinates data to and from RAM (random-access memory) and
determines the necessity for virtual memory.
3. Device management provides an interface between connected devices.
4. Storage management which directs permanent data storage.
5. An application that allows standard communication between software and your computer.
6. The user interface allows you to communicate with your computer.
1. Types of Operating System
Given below are different types of Operating System:
1. Simple Batch System 5. Distributed Operating System
2. Multiprogramming Batch System 6. Clustered System
3. Multiprocessor System 7. Realtime Operating System
4. Desktop System 8. Handheld System
Functions of Operating System
1. It boots the computer
2. It performs basic computer tasks e.g. managing the various peripheral devices e.g. mouse, keyboard
3. It provides a user interface, e.g. command line, graphical user interface (GUI)
4. It handles system resources such as the computer's memory and sharing of the central processing
unit(CPU) time by various applications or peripheral devices.
5. It provides file management which refers to the way that the operating system manipulates, stores,
retrieves, and saves data.
6. Error Handling is done by the operating system. It takes preventive measures whenever required to
avoid errors.
Advantages of Operating System
Given below are some advantages of the Operating System:
• The operating system helps to improve the efficiency of the work and helps to save a lot of time by
reducing the complexity.
• The different components of a system are independent of each other, thus failure of one component
does not affect the functioning of another.
• The operating system mainly acts as an interface between the hardware and the software.
• Users can easily access the hardware without writing large programs.
• With the help of an Operating system, sharing data becomes easier with a large number of users.
• easily install any game or application on the Operating system easily and can run them
• operating system can be refreshed easily from time to time without having any problems.
• operating system can be updated easily.
Disadvantages of an Operating system
Given below are the drawbacks of using an operating system:
• Expensive
There are some open-source platforms like Linux. But some operating systems are expensive.
• Virus Threat
Operating Systems are open to virus attacks and sometimes it happens that many users download the
malicious software packages on their system which pauses the functioning of the Operating system
and also slows it down.
• Complexity
Some operating systems are complex in nature because the language used to establish them is not
clear and well defined. If there occurs an issue in the operating system then the user becomes unable
to resolve that issue.
• System Failure
An operating system is the heart of the computer system if due to any reason it will stop functioning
then the whole system will crashes.
Examples of Operating System
• Windows • Linux
• Android • Window Phone OS
• iOS • Chrome OS
• Mac OS
2.Evolution of Operating Systems
The evolution of operating systems is directly dependent on the development of computer systems and how
users use them. Here is a quick tour of computing systems through the past fifty years in the timeline.
Early Evolution
• 1945: ENIAC, Moore School of • 1951: UNIVAC by Remington
Engineering, University of Pennsylvania. • 1952: IBM 701
• 1949: EDSAC and EDVAC • 1956: The interrupt
• 1949: BINAC - a successor to the ENIAC
• 1954-1957: FORTRAN was developed
Operating Systems - Late 1950s
By the late 1950s Operating systems were well improved and started supporting following usages:
• It was able to perform Single stream batch processing.
• It could use Common, standardized, input/output routines for device access.
• Program transition capabilities to reduce the overhead of starting a new job was added.
• Error recovery to clean up after a job terminated abnormally was added.
• Job control languages that allowed users to specify the job definition and resource requirements were
made possible.
Operating Systems - In 1960s
• 1961: The dawn of minicomputers
• 1962: Compatible Time-Sharing System (CTSS) from MIT
• powerful, and really useful.
• 1967-1968: Mouse was invented.
• 1964 and onward: Multics
• 1969: The UNIX Time-Sharing System from Bell Telephone Laboratories.
Supported OS Features by 1970s
• Multi User and Multi tasking was introduced.
• Dynamic address translation hardware and Virtual machines came into picture.
• Modular architectures came into existence.
• Personal, interactive systems came into existence.
Accomplishments after 1970
• 1971: Intel announces the microprocessor
• 1972: IBM comes out with VM: the Virtual Machine Operating System
• 1973: UNIX 4th Edition is published
• 1973: Ethernet
• 1974 The Personal Computer Age begins
• 1974: Gates and Allen wrote BASIC for the Altair
• 1976: Apple II
• August 12, 1981: IBM introduces the IBM PC
• 1983 Microsoft begins work on MS- • 1992 The first Windows virus comes out
Windows • 1993 Windows NT
• 1984 Apple Macintosh comes out • 2007: iOS
• 1990 Microsoft Windows 3.0 comes out • 2008: Android OS
• 1991 GNU/Linux

3.Operating System Operations

An operating system is a construct that allows the user application programs to interact with the system
hardware. Operating system by itself does not provide any function but it provides an atmosphere in which
different applications and programs can do useful work.
The major operations of the operating system are process management, memory management, device
management and file management. These are given in detail as follows:

Process Management
The operating system is responsible for managing the
processes i.e assigning the processor to a process at a
time. This is known as process scheduling. The
different algorithms used for process scheduling are
FCFS (first come first served), SJF (shortest job
first), priority scheduling, round robin scheduling etc.
There are many scheduling queues that are used to
handle processes in process management. When the
processes enter the system, they are put into the job
queue. The processes that are ready to execute in the
main memory are kept in the ready queue. The
processes that are waiting for the I/O device are kept
in the device queue.
Memory Management
Memory management plays an important part in operating system. It deals with memory and the moving of
processes from disk to primary memory for execution and back again.
The activities performed by the operating system for memory management are −
• The operating system assigns memory to the processes as required. This can be done using best fit,
first fit and worst fit algorithms.
• All the memory is tracked by the operating system i.e. it nodes what memory parts are in use by the
processes and which are empty.
• The operating system deallocated memory from processes as required. This may happen when a
process has been terminated or if it no longer needs the memory.
Device Management
There are many I/O devices handled by the operating system such as mouse, keyboard, disk drive etc. There
are different device drivers that can be connected to the operating system to handle a specific device. The
device controller is an interface between the device and the device driver. The user applications can access
all the I/O devices using the device drivers, which are device specific codes.
File Management
Files are used to provide a uniform view of data storage by the operating system. All the files are mapped
onto physical devices that are usually non volatile so data is safe in the case of system failure.
The files can be accessed by the system in two ways i.e. sequential access and direct access −
• Sequential Access
The information in a file is processed in order using sequential access. The files records are accessed
on after another. Most of the file systems such as editors, compilers etc. use sequential access.
• Direct Access
In direct access or relative access, the files can be accessed in random for read and write operations.
The direct access model is based on the disk model of a file, since it allows random accesses.
4.Operating System Structure
operating system is a construct that allows the user application programs to interact with the system
hardware. Since the operating system is such a complex structure, it should be created with utmost care so it
can be used and modified easily. An easy way to do this is to create the operating system in parts. Each of
these parts should be well defined with clear inputs, outputs and functions.
Simple Structure
There are many operating systems that have a rather simple structure. These started as small systems and
rapidly expanded much further than their scope. A common example of this is MS-DOS. It was designed
simply for a niche amount for people. There was no indication that it would become so popular.
An image to illustrate the structure of MS-DOS is as follows −

It is better that operating systems have a modular structure, unlike MS-DOS. That would lead to greater
control over the computer system and its various applications. The modular structure would also allow the
programmers to hide information as required and implement internal routines as they see fit without
changing the outer specifications.
Layered Structure
One way to achieve modularity in the operating system is the layered approach. In this, the bottom layer is
the hardware and the topmost layer is the user interface.
An image demonstrating the layered approach is as
follows −
As seen from the image, each upper layer is built on the bottom layer. All the layers hide some structures,
operations etc from their upper layers.
One problem with the layered structure is that each layer needs to be carefully defined. This is necessary
because the upper layers can only use the functionalities of the layers below them.
5. Operating System - Services

An Operating System provides services to both the users and to the programs.
• It provides programs an environment to execute.
• It provides users the services to execute the programs in a convenient manner.
Following are a few common services provided by an operating system −
• Program execution
• I/O operations
• File System manipulation
• Communication
• Error Detection
• Resource Allocation
• Protection
Program execution
Operating systems handle many kinds of activities from user programs to system programs like printer
spooler, name servers, file server, etc. Each of these activities is encapsulated as a process.
A process includes the complete execution context (code to execute, data to manipulate, registers, OS
resources in use). Following are the major activities of an operating system with respect to program
management −
• Loads a program into memory.
• Executes the program.
• Handles program's execution.
• Provides a mechanism for process synchronization.
• Provides a mechanism for process communication.
• Provides a mechanism for deadlock handling.
I/O Operation
An I/O subsystem comprises of I/O devices and their corresponding driver software. Drivers hide the
peculiarities of specific hardware devices from the users.
An Operating System manages the communication between user and device drivers.
• I/O operation means read or write operation with any file or any specific I/O device.
• Operating system provides the access to the required I/O device when required.
File system manipulation
A file represents a collection of related information. Computers can store files on the disk (secondary
storage), for long-term storage purpose. Examples of storage media include magnetic tape, magnetic disk
and optical disk drives like CD, DVD. Each of these media has its own properties like speed, capacity, data
transfer rate and data access methods.
A file system is normally organized into directories for easy navigation and usage. These directories may
contain files and other directions. Following are the major activities of an operating system with respect to
file management −
• Program needs to read a file or write a file.
• The operating system gives the permission to the program for operation on file.
• Permission varies from read-only, read-write, denied and so on.
• Operating System provides an interface to the user to create/delete files.
• Operating System provides an interface to the user to create/delete directories.
• Operating System provides an interface to create the backup of file system.
Communication
In case of distributed systems which are a collection of processors that do not share memory, peripheral
devices, or a clock, the operating system manages communications between all the processes. Multiple
processes communicate with one another through communication lines in the network.
The OS handles routing and connection strategies, and the problems of contention and security. Following
are the major activities of an operating system with respect to communication −
• Two processes often require data to be transferred between them
• Both the processes can be on one computer or on different computers, but are connected through a
computer network.
• Communication may be implemented by two methods, either by Shared Memory or by Message
Passing.
Error handling
Errors can occur anytime and anywhere. An error may occur in CPU, in I/O devices or in the memory
hardware. Following are the major activities of an operating system with respect to error handling −
• The OS constantly checks for possible errors.
• The OS takes an appropriate action to ensure correct and consistent computing.
Resource Management
In case of multi-user or multi-tasking environment, resources such as main memory, CPU cycles and files
storage are to be allocated to each user or job. Following are the major activities of an operating system
with respect to resource management −
• The OS manages all kinds of resources using schedulers.
• CPU scheduling algorithms are used for better utilization of CPU.
Protection
Considering a computer system having multiple users and concurrent execution of multiple processes, the
various processes must be protected from each other's activities.
Protection refers to a mechanism or a way to control the access of programs, processes, or users to the
resources defined by a computer system. Following are the major activities of an operating system with
respect to protection −
• The OS ensures that all access to system resources is controlled.
• The OS ensures that external I/O devices are protected from invalid access attempts.
• The OS provides authentication features for each user by means of passwords.
User Operating-System Interface
There are two fundamental approaches for users to interface with the operating system. One technique is to
provide a command-line interface or command interpreter that allows users to directly enter commands that
are to be performed by the operating system. The second approach allows the user to interface with the
operating system via a graphical user interface or GUI.
Command Interpreter
Some operating systems include the command interpreter in the kernel. Others, such as Windows XP and
UNIX, treat the command interpreter as a special program that is running when a job is initiated or when a
user first logs on (on interactive systems). On systems with multiple command interpreters to choose from,
the interpreters are known as shells. For example, on UNIX and Linux systems, there are several different
shells a user may choose from including the Bourne shell, C shell, Bourne-Again shell, the Korn shell, etc.
Most shells provide similar functionality with only minor differences; most users choose a shell based upon
personal preference. The main function of the command interpreter is to get and execute the next user-
specified command. Many of the commands given at this level manipulate files: create, delete, list, print,
copy, execute, and so on. The MS-DOS and UNIX shells operate in this way. There are two general ways
in which these commands can be implemented. In one approach, the command interpreter itself contains
the code to execute the command.
Graphical User Interfaces
A second strategy for interfacing with the operating system is through a userfriendly graphical user
interface or GUI. Rather than having users directly enter commands via a command-line interface, a GUI
allows provides a mouse-based window-and-menu system as an interface. A GUI provides a desktop
metaphor where the mouse is moved to position its pointer on images, or icons, on the screen (the desktop)
that represent programs, files, directories, and system functions. Depending on the mouse pointer's location,
clicking a button on the mouse can invoke a program, select a file or directory—known as a folder— or
pull down a menu that contains commands. Graphical user interfaces first appeared due in part to research
taking place in the early 1970s at Xerox PARC research facility. The first GUI appeared on the Xerox Alto
computer in 1973.
.

However, with the release of Mac OS X (which is in part implemented using a UNIX kernel), the operating
system now provides both a new Aqua interface and command-line interface as well. The user interface can
vary from system to system and even from user to user within a system. It typically is substantially
removed from the actual system structure. The design of a useful and friendly user interface is therefore not
a direct function of the operating system. In this book, we concentrate on the fundamental problems of
providing adequate service to user programs. From the point of view of the operating system, we do not
distinguish between user programs and system programs.
System calls in Operating System?

The interface between a process and an operating system is provided by system calls. In general, system
calls are available as assembly language instructions. They are also included in the manuals used by the
assembly level programmers. System calls are usually made when a process in user mode requires access to
a resource. Then it requests the kernel to provide the resource via a system call.
A figure representing the execution of the system call is given as follows −

As can be seen from this diagram, the processes execute normally in the user mode until a system call
interrupts this. Then the system call is executed on a priority basis in the kernel mode. After the execution of
the system call, the control returns to the user mode and execution of user processes can be resumed.
In general, system calls are required in the following situations −
• If a file system requires the creation or deletion of files. Reading and writing from files also require a
system call.
• Creation and management of new processes.
• Network connections also require system calls. This includes sending and receiving packets.
• Access to a hardware devices such as a printer, scanner etc. requires a system call.

Types of System Calls


There are mainly five types of
system calls. These are explained
in detail as follows −
Process Control
These system calls deal with
processes such as process creation,
process termination etc.
File Management
These system calls are responsible
for file manipulation such as
creating a file, reading a file,
writing into a file etc.
Device Management
These system calls are responsible for device manipulation such as reading from device buffers, writing into
device buffers etc.
Information Maintenance
These system calls handle information and its transfer between the operating system and the user program.
Communication
These system calls are useful for interprocess communication. They also deal with creating and deleting a
communication connection.
Some of the examples of all the above types of system calls in Windows and Unix are given as follows −
Types of System Calls Windows Linux
CreateProcess() fork()
Process Control ExitProcess() exit()
WaitForSingleObject() wait()
CreateFile() open()
ReadFile() read()
File Management
WriteFile() write()
CloseHandle() close()
SetConsoleMode() ioctl()
Device Management ReadConsole() read()
WriteConsole() write()
GetCurrentProcessID() getpid()
Information Maintenance SetTimer() alarm()
Sleep() sleep()
CreatePipe() pipe()
Communication CreateFileMapping() shmget()
MapViewOfFile() mmap()
There are many different system calls as shown above. Details of some of those system calls are as follows −
open()
The open() system call is used to provide access to a file in a file system. This system call allocates
resources to the file and provides a handle that the process uses to refer to the file. A file can be opened by
multiple processes at the same time or be restricted to one process. It all depends on the file organisation and
file system.
read()
The read() system call is used to access data from a file that is stored in the file system. The file to read can
be identified by its file descriptor and it should be opened using open() before it can be read. In general, the
read() system calls takes three arguments i.e. the file descriptor, buffer which stores read data and number of
bytes to be read from the file.
write()
The write() system calls writes the data from a user buffer into a device such as a file. This system call is
one of the ways to output data from a program. In general, the write system calls takes three arguments i.e.
file descriptor, pointer to the buffer where data is stored and number of bytes to write from the buffer.
close()
The close() system call is used to terminate access to a file system. Using this system call means that the file
is no longer required by the program and so the buffers are flushed, the file metadata is updated and the file
resources are de-allocated.
UNIT-IV PROCESS SYNCHRONIZATION
 Concurrent access to shared data may result in data inconsistency
 Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes
 Consider the consumer-producer problem: in which buffer is filled by the producer and emptied by the
consumer. For this, we need to maintain an integer count that keeps track of the number of full buffers.
Initially, count is set to 0.
 Counter is incremented by the producer after it produces a new buffer and is decremented by the consumer
after it consumes a buffer. Producer nee to wait if the buffer reaches maximum size i.e.
count==BUFFER_SIZE
 Counter is decremented by the consumer after it consumes a buffer. Consumer need to wait if the buffer is
empty i.e. count==0
Producer Program: Consumer Program:
while (true) while (true)
{ /* produce an item and put in nextProduced */ { while (count == 0) ; // do nothing
while (count == BUFFER_SIZE) ; nextConsumed = buffer[out];
// do nothing out = (out + 1) % BUFFER_SIZE;
buffer [in] = nextProduced; count--;
in = (in + 1) % BUFFER_SIZE; /* consume the item in
count++; nextConsumed */
} }
 Although both producer and consumer routines are correct separately, they may not function correctly
when executed concurrently
Race Condition:
 Internally count value is incremented with some local CPU register like register 1, then count++ could
be implemented as:
register1 = count
register1 = register1 + 1
count = register1
 Similarly count value is decremented with some other local CPU register like register 2, then count--
could be implemented as:
register2 = count
register2 = register2 - 1
count = register2
 Consider this Producer & Consumer execution by interleaving with “count = 5” initially:
S0: producer execute register1 = count {register1 = 5}
S1: producer execute register1 = register1 + 1 {register1 = 6}
S2: consumer execute register2 = count {register2 = 5}
S3: consumer execute register2 = register2 - 1 {register2 = 4}
S4: producer execute count = register1 {count = 6 }
S5: consumer execute count = register2 {count = 4}
 With this count variable will have incorrect state i.e. count==4, indicating that four buffers are full, but
actually five buffers are full. This is due to manipulation of same variable count by both processes.
 A situation like this, where several processes access and manipulate same data concurrently and out
comes of execution depends on particular order in which access take place, is called Race Condition.
 Process synchronization: is required to guard against the Race condition by ensuring that only one
process at a time is allowed to manipulate the data.
Critical-Section Problem:
• Consider system consisting of n process {p1, p2, …Pn}
• Critical Section: code in which process may be changing common variables
• No other process is allowed, when a process executing critical section
• Critical Section Problem:
– is to design protocol that process can use to cooperate
– Each process must request permission to enter its critical section
– Section of code implementing this request is Entry section
– It may be followed by Exit section
– Remaining code is called Remainder Section
A critical section problem can be represented as follows:
while(TRUE)
{ Entry Section
Critical Section
Remainder Section
}
Solution to Critical Section Problem:
• Mutual Exclusion
– If process Pi is executing in its critical section, then no other processes can be executing in their
critical sections
• Progress
– If no process is executing in its critical section
– If some processes that wish to enter their critical section
– Process not executing remainder section will participate in decision
– This decision of entry can not be postponed indefinitely
• Bounded Waiting
– There exists bound or limit on number of time that other process are allowed to enter their critical
sections after process has made request to enter its critical section and before that request granted
• Relative speed of N process: assume that each process is executing at non zero speed
• Many kernel mode process may be active in OS
• Kernel code is subject to several possible race conditions
• Ex: Kernel data structure that maintain list of all open files
– The list may be modified when a new file is opened or closed
– If two process were to open files simultaneously, separate updates to this list could result race
conditions
• Two general approaches to handle critical sections in OS:
– Preemptive Kernels: allow process to be preempted while it is running in kernel mode
– Non-preemptive Kernels: does not allow process running in kernel mode to be preempted
• Preemptive kernel is suitable for Real Time programming
Peterson’s Solution:
• Software based solution for critical solution problem
• Assume that the LOAD and STORE instructions are atomic; that is, cannot be interrupted.
Solution:
• Restricted to two process, alternatively executing critical & remainder section
• Let P0 & P1 be the two process, they will share two variables:
– int turn;
– boolean flag[2];
• turn indicates whose turn it is to enter its critical section
• flag array is used to indicate which process is ready for critical section entry
• To enter critical section process Pi sets flag[i] to True & sets turn to value j
• We can prove that this solution is correct:
– Mutual exclusion is preserved
– Progress requirements is satisfied
– Bounded waiting requirements is met
Algorithm for process Pi : Algorithm for process Pj :
while (true) while (true)
{ {
flag[i] = TRUE; flag[j] = TRUE;
turn = j; turn = i;
while ( flag[j] && turn == j); while ( flag[i] && turn ==i);
CRITICAL SECTION CRITICAL SECTION
flag[i] = FALSE; flag[j] = FALSE;
REMAINDER SECTION REMAINDER SECTION
} }
 Property1: Pi enters its critical section only if either flag[j]==false or turn==i
 Property 2 &3: Pi can be prevented from entering critical section only if it is stuck in while loop until
condition is satisfied

Synchronization Hardware:
 Any solution to critical section problem requires simple tool a Lock.
 Race conditions are prevented by requiring that critical regions protected by locks
 Hardware features make any programming task easier & improve system efficiency
 Uni Processor environment: could disable interrupts
 Disable interrupts while shared variables was being modified
 Currently running code would execute without preemption
 This the approach taken by Non-preemptive kernels
 Multi process environment:
 Time consuming: Message need to pass to all processors which delay in execution
 May modern machines provide special atomic hardware instructions
 Either test memory word and set value
 Swap contents of two memory words
Definition of TestAndndSet() Instruction: Mutual Exclusion implementation with TestAndSet():
boolean TestAndSet (boolean *target) while (true)
{ { while ( TestAndSet (&lock )) ; // do nothing
boolean rv = *target; // critical section
*target = TRUE; lock = FALSE;
return rv: // remainder section
} }
Definition of Swap() Instruction: Mutual Exclusion implementation with Swap():
void Swap (boolean *a, boolean *b) while (true)
{ { key = TRUE;
boolean temp = *a; while ( key == TRUE)
*a = *b; Swap (&lock, &key );
*b = temp: // critical section
} lock = FALSE;
// remainder section
}
Bounded Waiting implementation with TestAndSet():
while (true)
{ waiting[i]=TRUE;
key = TRUE;
while ( waiting[i] && key)
key=TestAndSet(&lock);
waiting[i]=FALSE;
// critical section
j=(i+1)%n;
while((j!=i)&&!waiting[j])
j=(j+1)%n;
if(j==i) lock=FALSE;
else waiting[j]=FALSE;
// remainder section
}

Semaphore:
• Hardware implementation is complicated for application programmers to use
• Semaphore is a synchronization tool to overcome shared data problems
• Concept:
– Semaphore S is integer variable
– Two standard atomic operations: wait() and signal()
Definition of wait() is: Definition of signal() is:
Wait(s) Signal (S)
{ While(S<=0) ;//no operation { S++;
S--; }
}
• When one process modifies semaphore value, no other process is allowed to modify semaphore value
• Testing (S<=0) and Modification (S--) must be executed without interruption
Usage:
• Counting semaphore: integer value can range over an unrestricted domain
– can be used to control access to given resource consisting of finite number
– It is initialized to number of resources available
– Each process wishes to user resource performs wait() operation
– When process releases resource, it perform signal()
– When count for semaphore goes to 0, all block resources are being used
– Process that wish to use resource will block until count becomes greater than 0
• Binary semaphore: integer value can range only between 0 and 1
– Also known as mutex locks, Provides mutual exclusion
– Mutual Exclusion implementation with Semaphores

while(TRUE)
{ Waiting(mutex);
//Critical Section
Signal(mutex);
//Remainder Section
}
• We can use semaphores to solve various synchronization problems:
• Ex: two process P1 with S1, P2 with S2, then we require S2 be executed only after S1
Implementation:
• Main problem in implementation is: Busy Waiting
• Busy Waiting:
– while process is in its critical section, any other process that tries to enter its critical section must
loop continuously in entry code
– This continual looping is clearly problem in real Multi programming systems
– It wastes CPU cycles that some other process might able to use productivity
– This type of semaphore is known as Spinlock because process spins while waiting for lock
• Solution: wait(), signal() definitions should be modified
– Wait() operation finds semaphore value is not positive, it must wait
– Instead of busy waiting, process can Block it self, by placing it in waiting queue of semaphore
– Control transfers to CPU to select another process
– Blocked process will be restarted, when some other process executes Signal() operation
– Process is restarted by wakeup() operation: which changes process from waiting to ready
C semaphore Structure: Wait() semaphore operation: Signal() semaphore operation:
typedef struct wait(semaphore *s) signal(semaphore *s)
{ { s->value--; { s->value++;
int value; if(s->value<0) if(s->value<=0)
struct process *list; { add this process to s->list; { remove process P from s->list;
}semaphore; block(); wakeup(P);
} }
} }
• Two operations are provided by OS as basic system calls:
– Block() operation, Wakeup(P) operation
• If semaphore value is negative, its magnitude is number of processes waiting on that semaphore
• List of waiting process can be easily implemented by link field in each PCB
• Each semaphore contains integer value & pointer to list of PCB’s
• FIFO queue is used for bounded waiting
• Critical aspects of semaphore:
– There must guarantee that no two process can execute wait() & signal()
– Uni-processor environment: It can be solve by simply disable interrupts while executing wait()
& signal()
– Multi processor environment: must provide alternative locking technique such as Spinlocks to
ensure that wait() & signal() are performed automatically
• It removed busy waiting from entry section to critical sections of applications
Deadlock and Starvation:
• Deadlock: two or more processes are waiting indefinitely for an event that can be caused by only one
of the waiting processes
• Let use consider two process P0, P1 each accessing two semaphores S and Q be two semaphores
initialized to 1:
P0 P1
wait (S); wait (Q);
wait (Q); wait (S);
. .
. .
signal (S); signal (Q);
signal (Q); signal (S);
• Resource acquisition & release
• Starvation: indefinite blocking, a situation in which process wait indefinitely with in semaphore
• Indefinite blocking may occur if we add & remove processes from list associated with semaphore in
LIFO

Classical Problems of Synchronization:


1. Bounded-Buffer Problem:
Problem:
• There are two process Producer & Consumer
• Producer produces items, Consumer consumes items
• Common Buffer is used to store items produced by producer until consumer consumes them
Implementation:
• Pool of N buffers, each can hold one item
• mutex Semaphore provides mutual exclusion, initialized to the value 1
• empty & full semaphores count number of empty & full buffers
• full initialized to the value 0, empty initialized to the value N
• Data Structures:
semaphores mutex, full, empty;
item nextp,nextc;
Producer Program: Consumer Program:
while(TRUE) while(TRUE)
{ {
// produce an item in nextp wait(full);
.. wait(mutex);
wait(empty); ..
wait(mutex); //remove item from buffer to nextpc
.. ..
//add nextp to buffer signal(mutex);
.. signal(full);
signal(mutex); ..
signal(full); //consume item in nextc
} }

2. Reader Writers Problem:


Problem:
• Data base is to be shared among several concurrent process
• Some of them want only Read known as Readers, some may want only Write known as Writers
• It require that the writers have exclusive access to share database
• It has several variations
– Case 1: no reader should wait for other reader to finish simply because writer is waiting, writer
may starve
– Case 2: if writer is waiting to access object, no new readers may start reading, reader may starve
Implementation for Case 1:
• mutex & wrt semaphores are initialized to 1
• readcount is initialized to 0, which tracks how many process are reading
• wrt semaphore is common to both reader & writer process
• Data Structure:
semaphore mutex, wrt;
int readcount;
Reader Program: Writer Program:
while(TRUE) while(TRUE)
{ {
wait(mutex); wait(wrt);
readcount++; ..
if(readcount==1) //writing is performed
wait(wrt); ..
signal(mutex); signal(wrt);
.. }
//reading is performed
..
wait(mutex);
readcount--;
if(readcount==0)
signal(wrt);
signal(mutex);
}
• This problem is generalized as Reader Writer locks
• Acquiring locks requires specifying mode of lock: Read or Write
• When process wishes to read shared data, it requests lock in Read mode
• When process wishes to modify shared data, it requests lock in Write mode
• When Multiple processes request locks concurrently, only process may acquire lock
• Locks are useful in :
– Applications where it is easy to identify which process only read shared data & which threads
only write shared data
– Applications that have more readers than writers

3. Dining Philosophers Problem:


Problem:
• Consider five Philosophers who spend their lives Thinking &
Eating
• They share circular table surrounded by five chairs
• In center of table is a bowl of rice & lied with five single
chopsticks
• When philosopher thinks, he does not interact with his
colleagues
• When philosopher got hungry, he tries to pick up two
chopsticks that are closest to him
• He picks only on chopstick at a time
• He can not pick up chopstick that is already in hand of neighbor
• Simple solution is to represent each chopstick with semaphore & initialized to 1
Data Structure:
Semaphore chopstick [5];
Structure of Philosopher
while(TRUE)
{ wait(chopstick[i]); wait(chopstick[(i+1)%5]);
..
//eat
..
signal(chopstick[i]);
signal(chopstick[(i+1)%5]);
..
//think
..
}
Implementation Problem:
• It guarantees that no two neighbors are eating simultaneously
• It may case Deadlock:
– if all five philosophers become hungry simultaneously
– Each one grab his left chopstick
– When each philosopher tries to grab his right chopstick cause deadlock
• Solution to prevent deadlock:
– Allow at most four philosophers to sit simultaneously
– Allow philosopher to pick up chopsticks, if both are available only
– Use an asymmetric solution:
• odd philosophers pick left chopstick first and then right chopstick
• Even philosophers pick right chopstick first and then left
Monitor:
Problems with Semaphores
• Incorrect use of semaphores can result in timing errors
• Correct sequence of mutex semaphore is:
– wait(mutex) …. Signal(mutex)
• Incorrect use of semaphore operations:
– signal (mutex) …. wait (mutex)
• In this case, several process may be
executing in their critical sections
simultaneously
• Violates mutual exclusion requirements
– wait (mutex) … wait (mutex)
• In this case, dead lock will occur
– Omitting of wait (mutex) or signal (mutex) (or
both)
• In this case, there is no use of creating
semaphore
• To solve all these problems, researchers have developed
high level language construct
• High level synchronization construct - Monitor
Usage:
• A type or abstract data type, encapsulates private data
with
public methods to operate on that data
• Monitor presents set of Programmer defined operations
that are provided mutual exclusion within the monitor.
Syntax of Monitor is:
monitor monitor-name
{ // shared variable declarations procedure
P1 (…) { …. }

procedure Pn (…) {……} Initialization
code ( ….) { … }

}
• Representation of monitor type cannot be used directly by various process
• Procedure defined within monitor can access only those variables declared locally with in monitor and
its formal parameters
• Local variables can be accessed by only local procedures
• Monitor can ensure that only one process at a time can be active with in the monitor
• It need some synchronization mechanism as Conditions
• One or more variable of conditions:
condition x,y;
• Operations on condition variables:
x.wait();
x.signal();
• When x.signal() operation is invoked by process P,
there is suspended process Q associated with
condition x.
• If Q is allowed to resume its execution, signaling
process P must Wait
• Otherwise, both P & Q must active simultaneously
with in monitor
• Both can continue with two possibilities :
• Signal & Wait:
– P either waits until Q leaves
– Waits for another condition
• Signal & Continue:
– Q either waits until P leaves
– Waits for another condition

Dinning Philosophers solution using Monitors:


Data Structure:
enum {thinking, hungry, eating}state[5];
Condition self[5];
Dinning Philosopher Program:
monitor DP
{ enum { THINKING; HUNGRY, EATING) state [5] ;
condition self [5];
void pickup (int i)
{ state[i] = HUNGRY;
test(i);
if (state[i] != EATING) self [i].wait;
}
void putdown (int i)
{ state[i] = THINKING;
test((i + 4) % 5);
test((i + 1) % 5);
}
void test (int i)
{ if ( (state[(i + 4) % 5] != EATING) && (state[i] == HUNGRY) && (state[(i + 1) % 5] != EATING) )
{ state[i] = EATING ;
self[i].signal () ;
}
}
initialization_code()
{ for (int i = 0; i < 5; i++)
state[i] = THINKING;
}
}
Sequence of operations need to invoke:
dp.pickup(i);

Eat

dp.putdown(i);
Monitor Implementation Using Semaphores:
• For each monitor, a semaphore mutex (initializes to 1) is provided
• Process must execute wiat(mutex) before entering monitor,
• Must execute signal(mutex) after leaving monitor
• Data structure:
semaphore mutex; // (initially = 1)
semaphore next; // (initially = 0)
int next-count = 0;
• Each procedure F will be replaced by:
wait(mutex);

body of F;

if (next-count > 0)
signal(next)
else
signal(mutex);
• Mutual exclusion within a monitor is ensured. For each condition variable x, we have:
semaphore x-sem; // (initially = 0)
int x-count = 0;
The operation x.wait() can be implemented as: The operation x.signal() can be implemented as:
x-count++; if (x-count > 0)
if (next-count > 0) { next-count++;
signal(next); signal(x-sem);
Else wait(next);
signal(mutex); next-count--;
wait(x-sem); }
x-count--;

Resuming Processes within Monitor:


• Process resumption order with in monitor
• Several process are suspended on condition X, X.signal() is exectued by some process, how to find
which process need to be resumed next?
• One solution is FCFS ordering
• Conditional-Wait construct can be used
x.wait(c);
• Value of c, which is called Priority Number
• When x.signal() is executed, process with smallest associated priority number is resumed next
• Following problems can occur in monitors:
– Process might access resource with out first gaining access permission to resource
– Process might never release resource once it has been granted access to resource
– Process might attempt to release resource that it never requested
– Process might request same resource twice
• One possible to current problem is to include resource access operations within the ResoruceAllocator
monitor
Monitor to allocate single resource: Sequence of execution:
Monitor ResourceAllocator R.acquire(t);
{ boolean busy; …
condition x; Access resource;
void acquire(int time) …
{ if (busy) R.release();
x.wait(time);
busy=TRUE;
}
void release()
{ busy=FALSE;
x.signal();
}
initialization_Code()
{ busy=FALSE;}
}
• Check two conditions to establish correctness of system:
– User access must always make their calls on monitor in correct sequence
– We must be sure that an un-cooperative process does not simple ignore mutual exclusion
gateway
Synchronization Examples:
Solaris Synchronization:
• Adaptive mutex: protects access to every critical data item
• On multi processor system, Adaptive mutex starts as standard semaphore implemented as Spinlock
• Adaptive mutex will do any on of them, if data is locked:
– If lock is held by process that is currently running on another CPU, process spins while waiting
for lock to become available
– If process holding lock is not currently in run state,
• Reader Writer locks are used to protect data that are accessed frequently but are usually accessed in read
only manner
• Turnstile is queue structure containing process blocked in lock
• Each synchronized object with at least one process blocked on objects lock require separate turnstile
• To prevent Priority Inversion, turnstile is organized according to Priority Inheritance Protocol
• To optimize Solaris performance, developers have refined and fine tuned this locking methods
Windows XP Synchronization:
• It is multi threaded kernel that provides support for Real time applications & Multi processors
• When XP kernel access global resource on uni-processor system, it temporarily masks interrupts for all
interrupt handlers
• On multi processor system, it protects access to global resources using spinlocks
• For thread synchronization outside kernel, XP provides Dispatcher Object
• System protects shared data by requiring thread to gain ownership of mutex to access data & to release
owner ship when it is finished
• Dispatch object has two states:
– Signaled State: indicates that an object is available
& thread will not block when acquiring object
– Non-signaled State: indicates that an object is not
available & thread will block when attempting to
acquire object
Linux Synchronization:
• It provides Non-preemptive kernel, meaning that process running in kernel mode could not be
preempted even if higher priority process to run
• It provides spinlocks and semaphores
• On SMP machines, fundamental locking mechanism is Spinlock
• On single processor machines, enabling & disabling kernel preemption
Single Processor Multiple Processor
Disable Kernel Preemption Acquire Spin Lock
Enable Kernel Preemption Release Spin Lock
• It provides two simple system calls to disable or enable kernel preemption:
– preempt_disable(), prempt_enable()
• Each task has thread_info structure containing counter(preempt_count), to indicate number of locks
being held by task
Pthreads Synchronization:
• Pthreads API provides mutex locks, condition variables, read-write locks for thread synchronization
• Mutex locks is used to protect critical sections of code
• Condition variables behaves much as Monitors
• Read-write locks behave similarly to locking mechanism
PRINCIPLES OF DEADLOCKS
System Model:
System consists of finite number of resources to be distributed among number of competing processes
 Resources are partitioned into several types, each consisting of some number of identical instances
 If system has two CPUs, then resource type CPU has two instances
 If process requests an instance of resource type, allocation of any instance of type will satisfy request
 Ex: system may have two printers, these two printers may be defined to same resource class if no one cares
which printer prints which output
 A process must request resources before using it and must release resource after using it
 A process may request as many resources as it requires to carry out its designated task
 Number of resources requested may not exceed total number of resources available in system
 Process may utilize resource in only following sequence:
o Request: if request cannot be granted immediately, then requesting process must wait until it can
acquire resource
o Use: process can operate on resource
o Release: process releases resource
 Request & Release of resources are System calls
 They are managed by OS and accomplished through wait() and signal() operations or through acquisition
& release of mutex locks
 System table records whether each resource is free or allocated
 When a resource is allocated, it also stores process to which it is allocated
 If process request resource that is currently allocated to another process, it can be added to queue of
processes waiting for this resource
 Set of processes is in dead lock state, when every process in the set is waiting for an event that can be
caused only by another process in the set

Deadlock Characterization:
Necessary Conditions:
 Deadlock can arise if four conditions hold simultaneously.
o Mutual exclusion: only one process at a time can use a resource.
o Hold and wait: a process holding at least one resource is waiting to acquire additional resources
held by other processes.
o No preemption: a resource can be released only voluntarily by the process holding it, after that
process has completed its task.
o Circular wait: there exists a set {P0, P1, …, P0} of waiting processes such that P0 is waiting for a
resource that is held by P1, P1 is waiting for a resource that is held by P2, …, Pn–1 is waiting for a
resource that is held by
Pn, and P0 is waiting for a resource that is held by P0.
Resource Allocation Graph:
 Deadlocks are described in terms of Directed Graph called System Resource Allocation graph
 This graph consists of vertices V, set of edges E
 Set of vertices are partitioned into two different types of nodes:
o P={P1, P2, …Pn}, set consisting of all active processes in system
o R={R1, R2, .. Rn}, set consisting of all resource types in system
 A directed edge from process Pi to resource type Rj is denoted by PiRj
o It signifies that Pi has requested an instance of resource type Rj
o Currently Pi is waiting for that resource
o It is called as Request edge
 A directed edge from process Rj to resource type P i is denoted by Rj  Pi
o It signifies that instance of resource Rj has been allocated to process Pi
o It is called as Assignment edge
 Circle is used for process, Rectangle is used for resource for representation
 Edge transformation:
o When a process requests resources Request edge is placed
o later it is transformed to Assignment Edge
o Finally edge is removed when operation is completed
Example of a Resource Allocation Graph
 Set P, R, E:
o P={P1, P2, P3}
o R={R1, R2, R3, R4}
o E={P1R1, P2R3, R1P2, R2P2, R2P1,
R3P3}
 Resource Instances:
o One instance of R1
o Two instances of R2
o One instance of R3
o Three instances of R4
 Process States:
o P1 is holding R2, waiting for R1
o P2 is holding R1 & R2, waiting for R3
o P3 is holding R3
 If graph contains no cycles, then no process in system is deadlocked
 If graph contains cycles, then deadlock may exist
 If each resource has exactly one instance, then cycle in the graph is necessary & sufficient condition for
existence of dead lock
 If each resource has several instances, then cycle does not necessarily imply that dead lock has occurred

With deadlock With cycle but no dead lock


P1R1P2R3P3R2P1 P1R1P3R2P1
P2R3P3R2P2

Methods for handling Deadlocks:


 Deadlock can be solved with one of three ways:
o Use protocol to prevent or avoid deadlocks, ensuring that system will never enter deadlock
o Allow system to enter deadlock state, detect it & recover
o Ignore problem altogether & assume that deadlocks never occur in system
 Third solution is used by most OS, including UNIX & WINDOWS
 Deadlock Prevention:
o It provides set of methods for ensuring that at least one of necessary conditions can not hold
o These methods prevent deadlocks by restricting how requests for resources can be made
 Deadlock Avoidance:
o It requires that OS be given in advance additional information concerning which resources process
will request and use during its life time
o With this knowledge, it can decided for each request whether or not process should wait
o To decide whether current request can be satisfied or must delayed, system must consider:
 Resources currently available
 Resources currently allocated to each process
 Future requests & release of each process
 Ignore Problem:
o If system does not employ either Deadlock prevention or avoidance algorithm, then dead lock
situation may arise
o In this environment, system can provide algorithm to examine dead lock state of the system
o If system neither ensure that deadlock will never occur nor provides mechanism for deadlock
detection & recovery, then deadlock may not be recognized
o In this case, undetected deadlock will decrease performance of system
 Eventually, system will stop functioning and will need to restart manually
 In many systems, deadlocks occur infrequently so this method is cheaper than others

Deadlock Prevention:
Ensuring that at least one of these conditions can not hold, we can prevent occurrence of deadlock
Mutual Exclusion:
 Mutual Exclusion must hold for non sharable resources
 Ex: printer cannot be simultaneously shared several processes
 Sharable resources do not requires mutual exclusion access & can not involve deadlock
 Ex: read only files are sharable resources
 A process never need to wait for sharable resource
Hold & Wait:
 Hold & wait never occur in system, we must guarantee that
 Whenever process requests resource, it does not hold any other resources
 Two Protocols are used for this:
o One protocol allows each process to request and be allocated all its resources before it begins
execution. System call can be used for providing this protocol
o Second protocol allows process to request resources only when it has no holding resource. All
resources must be released to request any additional resources
 Ex: Consider process that copies file from DVD to disk, sorts file, and print results
 Method1: Taking all resources at the beginning
o If all resources must be requested at the beginning of process, then it must initially request DVD,
Disk, Printer
o It will hold printer for its entire execution, even though it needs printer only at end
 Method2: Request resource by releasing all current resources
o Allow process to request initially only DVD and Disk
o It copies file from DVD to Disk then release both DVD and Disk
o Then again request Disk and Printer
o After sending file from Disk to Printer, release these two resources
 Disadvantage of Method1: Resource utilization may be low, since resources are unused for a long time
after allocation
 Disadvantage of Method2: Starvation is possible, process that needs several resources may have to wait
indefinitely
No Preemption:
 We need to use a protocol to ensure this condition
 General method:
o If process is holding some resources & requests another resource that cannot be immediately
allocated to it, then all resources currently being held are preempted
o All these resources are implicitly released
o Preempted resources are added to the list of resources for which process is waiting
o Process will be restarted only when it can regain its old resources as well as new ones that it is
requesting
 Alternative method:
o If process requests some resource, first check whether they are available
o If they are free, allocate them
o If they are not free, check whether they are allocated to some other process that is waiting for other
resource
o If so, preempt the desired resource from the waiting process & allocate them to requesting process
o If resources are neither available nor holding by waiting process, requesting process must wait
o While it is waiting some of its resources may be preempted, if only another process requests
o This protocol can be used to resources whose state can be easily saved & stored
Circular Wait:
 Simple method:
o Impose total ordering of all resource types and to require that each process requests resources in an
increasing order of enumeration
o Let R={R1,R2,…Rm} be set of resource types, each resource types is assigned with unique integer
number which allows to compare
o Define one to one function F:RN, where N is set of natural number
o Ex: F(tape drive)=1, F(disk drive)=5, F(printer)=12
o Consider following protocol for prevention:
 Each process can request resources only in increasing order of enumeration
 Process can request any number of instances of resource type say Ri
 Process can request resource type Rj if and only if F(Rj)>F(Ri)
 If several instances of same resource are needed, single request for all of them must be issued
 Whenever process requests an instance of resource type Rj, it has released Ri such that
F(Ri)>=F(Rj)
o Let the set of processes involved in the circular wait be { P0 , P1, ... , P11}, where Pi is waiting for a
resource Ri, which is held by process Pi+l
o Then, since process Pi+l is holding resource Ri, while requesting resource Ri+l we must have F(Ri) <
F(Ri+1) for all i
o But this condition means that F(R0) < F(R1) < ... < F(Rn) < F (R0)
o If these two protocols are used circular wait condition cannot hold
o Developing ordering & hierarchy itself does not prevent deadlock
o It is up to application developer to write programs that follow the ordering
o Certain software can be used to verify that locks are acquired in proper order
o One lock order verifier that works on UNIX such as FreeBSD, is known as Witness
Deadlock Avoidance:
 Disadvantages with deadlock prevention:
o Low device utilization
o Reduced system through put
 Deadlock avoidance require additional information about how resources are to be requested
 Ex: System has Printer & Tape Drive along with two Process P, Q
o P request first tap drive, later printer and are allotted
o Before releasing both resources, Q requests first printer & then tape drive
o With this knowledge of complete sequence of requests & release, we can decide for each request
whether or not process should wait
o System will consider current available resources and future requests & releases of each process to
issue request to avoid deadlocks
 Simplest & most useful avoidance algorithm requires
o Max. number of resources of each type that process may need
o Max. Number of resources of each type that process may request
o It dynamically examines resource allocation state to check circular wait
o Deadlock State is defined by no. of available & allocated resources and Max. demands of process
Safe State:
 State is Safe if the system can allocate resources to each process in some order & still avoid deadlock
 A sequence of processes {P1, P2, .. Pn} is Safe sequence for each state
o for each Pi, resource request that Pi can make can be satisfied
by currently available resources plus resources held by all Pj
with j<i
o If resource that Pi needs are not immediately available, then Pi
can wait until all Pj have finished
o When they have finished, Pi can obtain all of its needed
resources, complete its task and return its allocated resources
and terminate
o When Pi terminate, Pi+1 can obtain its needed resources and so
on
 If state is Safe, OS can avoid deadlock states; If state is Unsafe, OS
cannot prevent process from requesting resources such that deadlock occurs
 Ex: consider system with 12 magnetic tape drives & 3 Process
 At time T0: if 5 tapes for P0, 2 tapes for P1, 2 tapes for P2 are allotted then, system is in Safe State
Process Max. At time After Some After Some After Some After Some After Some
Need T0 time time time time time
5 5 5
P0 10 10 Completed Completed
(5 need) ( 5 need) ( 5 need)
2
P1 4 4 Completed Completed Completed Completed
(2 need)
2 2 2 2 2
P2 9 9
(7 need) (7 need) (7 need) (7 need) (7 need)
9 11 7 12 2 9
Total 23
(3 free) (1 free) (5 free) (0 free) (10 free) (3 free)
 At time T1: if 5 tapes for P0, 2 tapes for P1, 3 tapes for P2 are allotted then, system is in Unsafe State
Process Max. At time After Some After Some After Some time
Need T1 time time
5 5 5 P0 requires 5 tapes, P1
P0 10
(5 need) ( 5 need) ( 5 need)
requires 6 tapes to complete,
but only 4 tapes are free. So,
2 both P0, P1 will be
P1 4 4 Completed
(2 need) deadlocked.
3 3 3
P2 9
(6 need) (6 need) (6 need)
10 12 8
Total 23
(2 free) (0 free) (4 free)
Resource-Allocation Graph:
 In addition to the request and assignment edges, we introduce a new type of edge, called a claim edge
 A claim edge Pi Rj indicates that process Pi may request resource Rj at
some time in the future
 This edge resembles a request edge in direction, but is represented by a
dashed line
 When process Pi requests resource Rj, the claim edge Pi  Rj is converted
to a request edge
 Similarly, when a resource Rj is released by Pi, the assignment edge Rj 
Pi is reconverted to a claim edge Pi  Rj.
 Suppose that process Pi requests resource Rj.
 The request can be granted only if converting the request edge Pi  Rj to an assignment edge Rj  Pi does
not result in the formation of a cycle in the resource-allocation graph.
 If no cycle exists, then the allocation of the resource will leave the system in
a safe state. If a cycle is found, then the allocation will put the system in an
unsafe state.
 Therefore, process Pj will have to wait for its requests to be satisfied. If P1
requests R2, and P2 requests R1, then a deadlock will occur.
Banker’s Algorithm:
 The deadlock-avoidance algorithm that we describe next is applicable to
such a system, but is less efficient than the resource-allocation graph scheme.
 This algorithm is commonly known as the banker's algorithm.
 Data Structures for the Banker’s Algorithm:
Let n = number of processes, and m = number of resources types.
o Available: Vector of length m. If available [j] = k, there are k instances of resource type Rj
available.
o Max: n x m matrix. If Max [i,j] = k, then process Pi may request at most k instances of resource type
Rj.
o Allocation: n x m matrix. If Allocation[i,j] = k then Pi is currently allocated k instances of Rj.
o Need: n x m matrix. If Need[i,j] = k, then Pi may need k more instances of Rj to complete its task.
Need [i,j] = Max[i,j] – Allocation [i,j].
 Safety Algorithm:
o Let Work and Finish be vectors of length m and n, respectively. Initialize:
o Work = Available
o Finish [i] = false for i = 0, 1, …, n- 1.
o Find and i such that both:
 Finish [i] = false
 Needi  Work
o If no such i exists, go to step 4.
 Work = Work + Allocationi
Finish[i] = true
go to step 2.
o If Finish [i] == true for all i, then the system is in a safe state.
 Resource-Request Algorithm for Process Pi :
o Request = request vector for process Pi. If Requesti [j] = k then process Pi wants k instances of
resource type R j.
o If Requesti  Needi go to step 2. Otherwise, raise error condition, since process has exceeded its
maximum claim.
o If Requesti  Available, go to step 3. Otherwise Pi must wait, since resources are not available.
o Pretend to allocate requested resources to Pi by modifying the state as follows:
 Available = Available – Request;
 Allocationi = Allocationi + Requesti;
 Needi = Needi – Requesti;
 If safe  the resources are allocated to Pi.
 If unsafe  Pi must wait, and the old resource-allocation state is restored
Illustrative Example:
 Consider system with 5 processes P0 through P4, 3 resource types: A (10 instances), B (5instances), and C
(7 instances).
 Snapshot at time T0:

Allocation Max Need Available


ABC ABC ABC ABC
P0 010 753 743 332
P1 200 322 122
P2 302 902 600
P3 211 222 011
P4 002 433 431
 To find whether the system is in safe state or not, apply the Safety algorithms as follows:
 Now initialize Work and Finish as Work: = Available and Finish[i]:= False
 Update Work=Work+ Allocationi by checking condition Finish[i]=False and Needi < Work

 Since all finish values are True. So with this we can say that the system is in safe state and the safe
sequence is <P1, P3, P4, P0, P2>.
 If a process P1 requests one instance of A and 2 instances of C. then the Rquest 1= (1, 0, 2).
 In order to allocate this request or not, we need to apply
Allocation Max Need Available
ABC ABC ABC ABC
P0 010 753 743 230
P1 302 322 020
P2 302 902 600
P3 211 222 011
P4 002 433 431
 Resource request algorithm as follows:
o Rquest1(1, 0, 2)<=Need1(1, 2, 2) is True
o Rquest1(1, 0, 2)<=Available(3, 3, 2) is True
o Since both conditions are satisfied so allocate this request by updating their data structures as
follows:
 Available(2,3,0): = Available(3,3,2) - Request1(1,0,2)
 Allocation1(3,0,2): = Allocation1(2,0,0) + Request1(1,0,2)
 Need1 (0,2,0):= Need1(1,2,2) – Request1(1,0,2)
 Executing safety algorithm shows that sequence < P1, P3, P4, P0, P2> satisfies safety requirement.

Deadlock Detection:
 If a system does not employ either a deadlock-prevention or a deadlock-avoidance algorithm, then a
deadlock situation may occur
 In this environment, the system must provide:
1. An algorithm that examines the state of the system to determine whether a deadlock has occurred
2. An algorithm to recover from the deadlock
Single Instance of Each Resource Type:
 If all resources have only a single instance, then resource-allocation graph is sufficient for detecting
deadlock by constructing Wait-for graph.
 Wait for graph is obtained from resource-allocation graph by removing the nodes of type resource and
collapsing the appropriate edges
 An edge PiPj exists in a wait-for graph if and only if the corresponding resource allocation graph
contains two edges Pi Rq and Rq Pj for some resource Rq.
 Resource allocation graph and its wait for graph are as follows:
Converting Edges in Resource
Allocation Graph to Wait for graph:
P1R1P2 => P1P2
P2R4P3 => P2P3
P2R5P4 => P2P4
P2R3P5 => P2P5
P3R5P4 => P3P4
P4R2P1 => P4P1
Wait for graph contains only process
and 6 edges must be includes as shown

Several Instances of a Resource Type:


 The wait-for graph scheme is not applicable to a resource-allocation system with multiple instances of each
resource type.
 Periodically invoke an algorithm that searches for a cycle in the graph. If there is a cycle, there exists a
deadlock.
 An algorithm to detect a cycle in a graph requires an order of n2 operations, where n is the number of
vertices in the graph.
 Data Structures for the Banker’s Algorithm:
o Available: A vector of length m indicates the number of available resources of each type.
o Allocation: An n x m matrix defines the number of resources of each type currently allocated to
each process.
o Request: An n x m matrix indicates the current request of each process. If Request [ij] = k, then
process Pi is requesting k more instances of resource type. Rj.
 Detection Algorithm:
o Let Work and Finish be vectors of length m and n, respectively Initialize:
 Work = Available
 For i = 1,2, …, n,
 if Allocationi  0, then Finish[i] = False;
 Otherwise, Finish[i] = True.
o Find an index i such that both:
 Finish[i] == False
 Requesti  Work
 If no such i exists, go to step 4.
o Work = Work + Allocationi
Finish[i] = true
Go to step 2.
o If Finish[i] == false, for some i, 1  i  n, then the system is in deadlock state. Moreover, if
Finish[i] == false, then Pi is deadlocked.
Example of Detection Algorithm:
 Consider a system with five processes P0 to P4 and three resource types A with 7 instances, B with 2
instances and C with 6 instances.
 Suppose that, at time T0 resource allocation state is as shown.
Allocation Request Available
ABC ABC ABC
P0 010 000 000
P1 200 202
P2 303 000
P3 211 100
P4 002 002
 Now apply the detection algorithm to find the system has deadlock state or not.
 First initialize Work and Finish as Work: = Available, if Allocationi!=0, then Finish[i]:=False; otherwise,
Finish[i]:= True.
 Continue steps by updating with condition Finish[i] = False and Requesti ≤ Work
 We can easily say that system is not in deadlocked state, we will find the sequence <P0,P2,P3,P4,P1
> results in Finish[i]==True
 Suppose now P2 makes one additional request for an instance of type C, then

Detection-Algorithm Usage:
 When should we invoke the detection algorithm? The answer depends on two factors:
o How often is a deadlock likely to occur?
o How many processes will be affected by deadlock when it happens?
 If deadlocks occur frequently, then the detection algorithm should be invoked frequently.
 Resources allocated to deadlocked processes will be idle until the deadlock can be broken.
 In addition, the number of processes involved in the deadlock cycle may grow.
 Invoke Deadlock Detection algorithm every time a request for allocation cannot be granted immediately.
 In this case, we can identify not only the set of processes that is deadlocked, but also the specific process
that "caused" the deadlock.
 If this algorithm is invoked for every resource request, this will incur computational overhead
 A less expensive alternative is simply to invoke the algorithm at less frequent intervals. Ex: once per hour
or whenever CPU utilization drops below 40% etc.
Recovery from Deadlock:
 One possibility is to inform the operator that a deadlock has occurred, and to let the operator deal with the
deadlock manually.
 The other possibility is to let the system recover from the deadlock automatically.
 There are two options for breaking a deadlock.
o One solution is simply to abort one or more processes to break the circular wait.
o Second option is to preempt some resources from one or more of the deadlocked processes.
Process Termination:
 To eliminate deadlocks by aborting a process, we use one of two methods. In both methods, the system
reclaims all resources allocated to the terminated processes.
o Abort all deadlocked processes:
 This method clearly will break the deadlock cycle
 It is more expensive, since these processes may have computed for a long time, and the results
of these partial computations must be discarded, and probably must be recomputed later.
o Abort one process at a time until the deadlock cycle is eliminated:
 This method incurs considerable overhead, since after each process is aborted, deadlock-
detection algorithm must be invoked to determine whether any processes are still deadlocked.
 Aborting process may not easy, if process was in midst of updating file, terminating will leave file in
incorrect state
 If the partial termination method is used, then determine which deadlocked processes should be terminated
in an attempt to break the deadlock.
 This determination is policy decision, Many factors may affect which process is chosen, including:
o What the priority of the process is
o How long the process has computed, and how much longer the process will compute before
completing its designated task
o How many and what type of resources the process has used
o How many more resources the process needs in order to complete
o How many processes will need to be terminated
o Whether the process is interactive or batch
Resource Preemption:
 Successively preempt some resources until deadlock cycle is broken
 If preemption is required to deal with deadlocks, then three issues need to be addressed:
o Selecting a victim:
 Which resources and which processes are to be preempted?
 As in process termination, we must determine the order of preemption to minimize cost
 Cost factors may include such parameters as the number of resources a deadlock process is
holding, and the amount of time a deadlocked process has thus far consumed.
o Rollback:
 If we preempt a resource from a process, what should be done with that process?
 It cannot continue with its normal execution, because some needed resource will miss
 We must roll back the process to some safe state, and restart it from that state
 It is difficult to find safe state, so perform Total roll back until deadlock is broken
o Starvation:
 How do we ensure that starvation will not occur?
 Since victim is selected based on cost factor, some process may always picked as victim
 That process may not complete due to preemption
 Solution is pick the process for finite number of times, most common solution is include
number of roll backs in cost factor
UNIT V
MEMORY MANAGEMENT
Memory: - it is a large array of words or bytes, each with its own address.

Addressing binding:-
The binding of instructions and data to memory addresses can be called address binding.
The binding can be done at 3 ways.1) Compile time 2) load time 3) execution time.
Compile time: - binding can be done at the compilation time. i.e if it is known at the compile time where
the process will reside in memory, then the absolute code can be generated.
Load time:-binding can be done at the load time. if it is not known at compile time, where the process will
reside in memory, then the compiler must generate the releasable code. In this case the final binding is
delayed until load time.
Execution time:-the binding must be delayed until run time.

Dynamic loading:-
The loading being postponed until execution time. to obtain the better memory utilization we can use
dynamic loading. In this a routine is not loaded until it is called. All routines are kept on disk in a
relocartable load format.In this unused routine is never loaded.
Dynamic linking:-the linking being postponed until execution time. Most O.S support static linking, in
which all system language libraries are treated as module and kept in memory. For this much memory is
wasted. But in dynamic linking it used the stub. It is a small piece of code that indicates how to load the
library if the routine is not already present.
OVER LAYS:- The entire program and data of a process must be in physical memory for the process to
execute. The size of a process is limited to the size of physical memory. A process can large than the
amount of memory allocated to it,a technique called overlays.
Logical versus physical address space:-
L.M:-
1) the address generated by the C.P.U or user process is commonly referred as logical address.
2) It is relative address.
3) The set of all logical addresses generated by a program is called logical address space. The user
programmer deals with logical addresses.
4) the logical address is used in user mode.
P.M:-
1) an address seen by the memory unit is called physical address.
2) It is absolute address.
3) The set of all physical addresses corresponding to these
logical addresses is referred
4) A computer system has a physical memory, which is a H/W
device.
5) Physical address are only used in system mode.
6) to as a physical address space. User program never see the real
physical address.
In compile time and load time address binding the logical
address and physical addresses are same.

Swapping:-
A process needs to be in memory to be executed. A process however can be swapped temporarily out of
memory to a backing store, and then brought back in to memory for continued execution.
EX:- In round robin C.P.U scheduling algorithm, when a quantum expires, the memory manager will start
to swap out the process from the memory, and to swap in another process to that memory space.
EX:- In preemptive priority base algorithm if a higher priority process arrives and wants service, the
memory manager can swap out the lower priority process, so that it can be load and execute the higher
priority process. When higher priority process finishes, the lower priority process can be swapped back in
and continued. This is called Roll Out and Roll In.
Swapping requires a backing store. the backing store is commonly a fast disk. It must be large enough to
accommodate copies of all memory images for all users. And must provide direct access to these memory
images.
Never swap the process with pending I/O.

CONTIGUOUS ALLOCATION:-
The memory is usually divided in to two partitions. One for resident O.S, and one for the user processes.
It is possible to place the O.S in either low or high memory. But usually O.S is kept in low memory because
the interrupt vectors are often in low memory.
There are two types.
1) Single partition allocation2) Multiple partition allocation
a) fixed sized partition(M.F.T)Variable sized partition(M.V.T)
Single partition allocation:-
The O.S is residing in low memory and the user processes are executing in high memory. We need to
protect the O.S code and data from changes by the user processes. We can provide this protection by using a
relocation register and limit registers. The relocation register contains value of the smallest physical address
and limit register contains the range of Logical address.

In the above fig the address generated by C.P.U is compared with limit register, if the L.A is < limit register
then the value of logical address is added with relocation register and mapped address is sent to memory.

Disadvantages:-
1) Much memory is wasted.
2) Only one user process can run at a time.

Multiple partition allocation:-


If there are several user processes residing in memory at the same time, then the problem can occur, how
to allocate available memory to the various processes.
One of the simplest schemes for memory allocation is to divide memory.
1) Fixed sized partition (M.F.T):-
Memory is divided in to a no. of fixed sized partitions. Each partition may contain exactly one
process. Thus the degree of multiprogramming is bound by the no .of partitions. When a partition is free, a
process is selected from the input queue and is loaded in to the free partitions. When the process terminates
the partition becomes available for another process.
Disadvantages:-
1) In M.F.T, the internal fragmentation can occur.
2) Suppose the process needs less memory than its allocated partition, then some memory is wasted.
3) Similarly a process needs large memory than its allocated partition size, then the problem will occur.
Variable sized partitions (M.V.T):- here the O.S maintains a table indicating which parts of memory are
available and which are occupied. Initially all memory is available for user processes, and is considered as
one large block of available memory, called hole. When a process arrives and needs a memory, we search
for a hole large enough for this process. If we find one we allocate only as much memory as is needed,
remaining is keep available to satisfy the other request.
EX: - fig 8.8
Generally as the process enters the system, they are put in to an input queue. The O.S can take the
information from each process how much memory it needs. And O.s can also find how much memory is
available and determining which processes are allocated memory. Process loads in to memory. It executes
and when the process terminates, it release its memory. This memory is filled with other process by O.S.

We search for a hole among the set of holes to allocate the process, then 3 strategies used to select a
free hole from a set of available holes.

1) First fit 2) best fit 3) worst fit

1) First fit: - allocate the first hole that is big enough.

2) Best hole:- allocate the smallest hole that is big enough.

3) Worst hole: - allocate the largest hole.

Disadvantages:-
1) In which external fragmentation can occurs.

Fragmentation:-
The wastage of memory space is called fragmentation. There are two types .
1) Internal fragmentation 2) External fragmentation.
1) External fragmentation:- it exists when enough total memory space is available to satisfy a request, but
it is not contiguous.
In this example the process p5 can requesting 500k. but in fig total available memory space is
(300+260)=560k but that space is not contiguous.
2) Internal fragmentations:-
The wastage of memory at the internal block.
EX: - In M.F.T a process is allocated to a fixed partition. The
size of each partition is 5 bytes. The process requesting memory
4 bytes. Here 1 byte of memory is wasted in internal block. This
is called internal fragmentation.
Another type of problem arises in multiple partition
allocation is
Suppose that the next process request 18,462 bytes. If we
allocate exactly the requested block, then 2 bytes of hole is left
free. To maintain the 2 bytes of free hole O.S needs more
memory than this memory. So the general approach is to allocate
very small holes as part of the larger request. Thus the allocated
memory may be slightly larger than the requested memory. The
difference between these two numbers is internal fragmentation.
Compaction:- one solution to the problem of external
fragmentation is compaction.
The goal of the compaction is to shuffle the memory contents
to place all the free memory to gather in one large block.
Here the three holes of size 100k, 300k and
260k can be compacted in to one hole of size
660k.
Compaction is not always possible. We moved
the processes, then these processes to be able to
execute in their new locations, internal addresses
must be relocated. If the relocation is static,
compaction is not possible.
If the relocation is dynamic then compaction is
possible.
When a compaction is possible, we must
determine its cost. The simplest compaction
algorithm is to move all processes towards one
end of memory, all holes move in the other
direction, and producing one large hole of available memory. But it is very expensive.
We note that one large hole of available memory is not at the end of memory, but rather is in the middle.

Paging:-
Another possible solution to the external fragmentation problem is to paging.
The physical memory is broken into fixed sized blocks called frames.
The logical memory is also broken in to blocks of the same size called pages.
When the process is to be executed, its pages are loaded in to any available memory frames from the
backing store. The backing store is divided in to fixed sized blocks that are of the same size the memory
frames.

Paging hardware:-
Every address generated by the C.P.U is divided in to two parts:
1) page number 2) page offset
the page number is used as an index in to a page table. The page table contains the base address of each
page in physical memory. This base address is combined with the page offset to define the physical
memory.

The page size is defined by the H/W. the size of a page is typically a power of 2. Power of 2 is selected as
page size because translation of a logical address in to a page number and page offset is easy. If the size of
logical address is 2 and page size is 2n bytes then the high-order m-n bits of a logical address contains page
number and the n-low-order bits designate the page offset.
By using paging we have no external fragmentation. Any free frame is allocated to a process that needs it.
How ever we may have some internal fragmentation .the frames are allocated as units. Suppose the memory
requirements of a process is do not fix on page boundaries, the last frame allocated but may not be
completely full.
Ex:- a process would need ‘n’ pages plus one byte. It would be allocated n+1 frames, resulting an internal
fragmentation.
When a process arrives in the system to be executed, its size expressed in pages, is is examined.
Each page of the process needs one frame. Thus if the process requries’n’ pages, there must be at least’n’
frames available in memory. The first page of the process is loaded in to one of the allocated frame and the
frame number is put in the page table for this process. The next page is loaded in to another frame and its
frame number is put in to the page table and so on..
The O.S is managing physical memory. The O.S must be maintain the allocation details of physical
memory. Which frame are allocated, which frame are available, how many total frames there are and soon..
this information generally kept in a data structure called frame table.
The O.S maintains a copy of the page table for each process. Paging there fore increases the context switch
time

Structure of the page table:-


The page table can be implemented in 3 different ways.
1) the page table can be implemented as a set of dedicated registers. These registers are built with very
high logic to easy translation of logic address. The C.P.U dispatcher reloads these registers.
The use of registers for page table is satisfactory if the page table is small. The page table to be
very large then the registers are not feasible.
2) in this method the page table is kept in main memory and page table base register(PTBR) points to the
page table. Changing page table requires changing only this one register, reducing the context switch.
The problem with this approach is the time required to access a user memory location. If we want to
access location I , we must first index in to the page table using PTBR, this task requires memory access. It
provides us frame number, which is combined with page offset to produce the actual address. We can
access the desired place in memory. This scheme requires two memory accesses.
3)The solution to this problem is to use a special registers called associative registers or translation look
aside buffer(TLB’s) . a set of associative registers is built with very high speed memory. Each register
consist of two parts.
1)a key and 2) a value.
When the associative registers are presented with an item it is compared with all keys simultaneously. If
item is found the corresponding value field is output. The searching is very fast.

In the above fig the associative registers contains only a few of the page table entries. When a logical
address is generated by C.P.U its page number is presented as a set of associative registers that contains
page number and their corresponding frame number. If a page number is found in associative registers, its
frame number is immediately available and is used to access memory.
If page number is not in associative registers, a memory reference to the page table must be made.
Suppose if TLB is full then the O.S must select one for replacement. The percentage of times that a page
number is found in the associative registers is called hit ratio. An 80 percent of hit ratio means that we find
the desired page number in associative registers 80 percent of time.
If the page number is in the associative registers then it takes 20 nanoseconds to search the associative
registers and 100 nanoseconds to access memory. so the mapped memory access takes 120 nanoseconds.
If the page number is not in associative registers then it takes 220 nanoseconds.
i.e 20 for searching in associative registers
100 for first access memory for the page table and frame number
100 for access the desired byte in memory.

PROTECTION:-
Memory protection in paged environment is accomplished by two ways.
1) by using valid-invalid bit.
2) Read only (or) read& write.
These protection bits are attached to the each entry of the page table.
Here the valid-invalid bit is attached to the each entry in the page table.That bit containing “v” if it is
valid, i.e the page of the process is in its logical address space.
This bit containing “i” if it is invalid. I.e page of process is not in its logical address space.
Here a process containing 0 to 5 (six) pages. So page numbers 6,7,8 will have “i” in the valid-invalid bit . so
if pages 6,7,8 if referenced then trap will occur.
Other method is use read& write or readonly bits to the each entry of the page table. The page table can
be checked to verify that no writes are being made to a read only page.

MULTI LEVEL paging:-


Suppose if a process requires 232 bytes of memory and the page size is 212 . then the number of pages will
be 220
Instead of maintaining a single page table , using paging to pagetable.
TWO level paging:-
Our actual logical address is 2 32then the logical address is divided in to as follows.
| Page number| offset|
20bits 12
the above page number is again divided in to two pages.|page1 | page2 | d |
10 10 12
page1 can be treated as a page1+page(offset). If again page1 is splitted in to page and offset is called three
level paging.

Ex:- a system with a 64-bits logical address space, two level paging scheme is no longer appropriate. So a
three level paging scheme must be used.

INVERTED PAGE TABLE:-


1) each process has a page table associated with it, that has one entry for each page.
2) This table representation is natural, since reference pages through the page’s logical address.
3) Logical address is translated in to physical memory address.
4) The table stored by logical address and the O.S calculate where in the table in physical memory entry
and use direct value.
Draw backs:-
In this technique each page table may consist of millions of entries.
These tables consume large amount of physical memory.
This problem is solved by using inverted page table. It has one entry for each page of memory.
Each entry consists of the virtual address of the page.
These addresses are stored in real memory location with information.
Thus only one page table in the system has only one entry for each page of physical memory.

Each logical address in the system consists of a triple<process-id, page no, offset>
Each inverted page table entry is a pair<process-id, page no>
When a memory reference occurs, part of the virtual address consisting of<process-id, page no> is
presented to the memory subsystem. The I.P.T is then searched for a match . if a match is found say at
entry i. Then the physical address<I, offset> is made. If no match is found then an illegal address access has
attempted.
This scheme decrease the amount of memory needed to store each page table. but searching take far too
long. For this problem use hash tables to limit the search.
Shared pages:- Another advantages of paging is the sharing common code
SEGMENTASTION:-
Paging is an arbitrary division of the logical address space in to small fixed sized pieces. Instead of using
pages we could divide the logical address space of a process in to pieces based on the semantic of program.
Such pieces are called segments.
Segmentation is a memory management scheme, that support the user’s view of memory. A segment is a
division of logical address space that is visible to the programmer.
Segments can be of variable length. Segmentation leads to a two dimensional address space because each
memory cell is addressed with segment name and offset. Each segment has a name and offset. For easy
implementation the segmentation are numbered and are reffered to by a segment number.
<Segment number, offset>
the segment have a variable lengths. the user program is compiled and the compiler automatically construct
segments using input program. The loader would take all these segments and assign number.
SEGMENTATION H/W:-
A logical address consists of two parts: a segment number s, and an offset d. the segment number is used
as an index in to the segment table. The segment table contains segment base and segment limit. The
segment base contains the starting physical address where the segment resides in main memory. Segment
limit contains length of the segment length. Here the offset value is compared with limit is is ok then add
the d with base value to produce the address in physical memory.d is greater then tpap will generate.

Example of Segmentation

IMPLEMENTATION OF Segment table:-


The segment table can be implemented in 3 ways same as page tale implementation.
1) a segment table is kept in the registers, can be referenced quickly , the addition to the base and the
comparison with limit can be done simultaneously to save the time. suppose a program may consist of a
large number of segments it is not feasible to keep the table in registers.
2) In second method the segment table must kept in the memory. A segment table base register(STBR)
points to the segment table. Here we use STLR(segment table length register) because no of segments used
by a program may vary. Here we check segment number(s<STLR) then add the s to the STBR(s+STBR)
resulting in the address in memory of segment table entry. This memory is read from the memory and we
proceed as before. This also takes two memory references.
3) Similar to paging here we use associative registers.
PROTECTION:-
We can protect the memory by using the segments in two ways:
1) we can protect the segment by using the protection bits. The memory management H/W checks the
protection bits associated with each segment table entry to prevent the illegal access to memory.
2) Another method is to by placing an array in its own segment. The M.M.H/W will automatically
checks that array index are legal or illegal.
SHARING:- another advantage of segmentation involves the sharing of code or data. Segments are shared
when entries in the segment table of two different processes points to same physical locations.
The sharing occurs at the segment level. Any information can be shared if it is defined to be a segment.

SEGMENTATION with PAGING:-


Segments can be of different length, so it is difficult to find a place for a segment in memory than page.
With segmented logical address, we get benefits of logical memory, but we still have to do dynamic
storage allocation of physical memory. To avoid this it is possible to combine segmentation with paging in
to a two level logical memory system similar to two level paging. Each segment descriptor points to a page
table for that segment.
This gives you some of the advantages of the advantages of paging (easy placment) with some of
advantages of segments (logical division of the program).
Segmentation requires dynamic allocation of variable sized blocks, which lead to external fragmentation
and extra time. so paging divides physical memory in to equal sized frames and makes dynamic allocation
of space simple.
MULTICS:-the multics system a logical address is formed from an 18-bit segment number and a 16-bit
offiset.although this schem creates a 34-bit address space, the segment –table overhead is tolerable.

Virtual memory
Virtual memory is a technique that allows the execution of processes that may not be completely in main
memory. The main visible advantage of this scheme is that programs can be larger than physical memory.
Advantages:-
1) a program size may not be constrained by the amount of the physical memory that is available.
2) Each user would be able to write a programs for an extremely large virtual address space.
3) Each user program could take less physical memory. So more programs could be run at the same time. so
the CPU utilization and through put also increases.
4) Less I/O would be need to load or swap each user program in to memory. So each user program would
run faster.
5) Virtual memory makes the task of programming much easier , because the programmer no longer needs
to worry about the amount of P.M
6) virtual memory is commonly implemented by using demand paging. It can also be implemented in a
segmentation system.
Demand paging:-
a demand paging system is similar to a paging system with swapping. Processes resides on
secondary memory. When we want to execute a process, we swap it in to memory, rather than swapping the
entire process in to memory. We use a lazy swapper . a last swapper never swaps a page in to memory
unless that page will be pages.
Here we need some hard ware support to distinguish between those pages that are in memory and those
pages that’s are on the disk. The valid-invalid bit scheme can be used for this purpose. This bit is set to
“valid”, this value indicates that the associated page is both legal and in memory. if the bit is set to “invalid”
this value indicates that page either is not valid i.e not in the logical address of process or is currently on the
disk. The page that is not currently in main memory is simply marked invalid or contains the address of the
page on disk in the page table entry .
Suppose the process wants to execute , then access its pages that are memory resident, and execution
proceeds normally.
If the process tries to use a page that was not brought in to memory then it may happen that page fault can

occurs.
Page fault:-
If the process tries to use a page that was not currently in the memory. This
situation is called page fault.
What happens when the page fault occurs in our system:-
There are six step can occurs when page fault occurs in our system.
1) we check an internal table (page table kept in pcb) for this process. To determine
whether the reference was a valid or invalid memory access.
2) If the reference was invalid , we terminate the process. If it was valid, but it was
not yet in memory, and it is in disk.
3) We find the free frame.
4) We schedule a disk operation to read the desired page in to the newly allocated frame.
5) When the disk read is completed we modify the internal table kept with the process and the page table
indicates that page is now in memory.
6) We restart the instruction that was interrupted by the illegal address trap.

Pure demand paging:-


Never bring a page in to memory until it is required.
Suppose we start executing a process with no pages in memory. When the O.S set the instruction pointer to
the first instruction of the process, which is not in main memory , the process would immediately fault for
the page. After the page was brought in to the main memory , the process would continue to execute.
So faulting as necessary until every page was actually in memory. This called pure demand paging.
Page replacement:-
A user process is executing a page fault occurs. The H/W traps to the O.S, which checks the page
table to that this is a page fault and it is in the disk. The O.S determines where the desired page is residing
on the disk, but then finds there are no free-frames on the free-frame list, all memory in use.
The O.S has several options at this point. One way is to use page replacement.
The page replacement takes the following approach. If no frames is free, we find one that is not
currently being used. We can free a frame by writing its contents to swap space and changing the page
table. The free frame can now be used to place the page for which the process faulted. The page fault
service routine is now modified to include page replacement.
1) Find the location of the desired page on the disk.
2) Find a free frame.
a) If there is a free frame , use it.
b) Otherwise, use a page replacement algorithm to select a victim frame.
c) Write the victim page to the disk; change the page and frame tables accordingly.
3) Read the desired page in to the free frame , change the page and frame table.
4) Restart the user process.
If no frames are free, two page transfers are required. Then the page fault service routine time increases.

To reduce this over head we use modify or dirty bit.


Modify bit:-
The page table can maintain this modify bit. The modify bit for the page is set by the hardware. If this bit
is set that means that page is in the memory. If this bit is not set page is in the disk.
Page replacement algorithms:-
There are many different page replacement algorithms. Each algorithms has its own unique features.
But how we select a particular replacement algorithm?
Depending on the lowest page fault rate.
Here we use the reference string. The reference string is the string of memory references.
By using this reference string and frame numbers we evaluate an algorithm and compute the number of
page faults.
To determine the no.of page faults for a particular reference string, we also need to know about the page
frames.
The no.of available frames increases, then the
no.of page faults will decreases.

FIFO- ALGORITHM:-
The simplest page- replacement algorithm is
a FIFO algorithm. A FIFO replacement algorithm
uses the time when a page was brought in to
memory. To replace we must select oldest page.
EX:- (problem)
Reference string is
It is very easy to under stand . but its performance
is not always good.
This algorithm is effected by the belady’s anomaly.

Belady’s anomaly:-
The number of faults for n+1 frames is greater than the number of faults for n frames.
i.e the page fault rate may increases as the number of allocated frames increases.
EX:- SEE text book
Optimal algorithm:-
It is also called as OPT or MIN.
FIFO performance is not always good. So we search an
optimal page replacement algorithm. This use the time
when a page is to be used.
An optimal page replacement algorithm has the lowest
page- fault rate of all algorithms.
It never suffers from the belady’s anomaly.
REPLACE THE PAGE THAT WILL NOT BE
USED FOR THE LONGEST PERIOD OF THE TIME.

L.R.U:-
Replace the page that has not been used for the longest
period of time.
here we get 12 page faults. By using FIFO we get 15
faults and using optimal we get 9.
The major problems is how to implemented LRU
replacement. another problem is to determine an order
for frames defined by the time of last use.
It can not suffers from belady’s anomaly. There is a
stack algorithm. That can never exhibit Belady’s
anomally.
Stack algorithm is an algorithm, for which it can be shown that the set of the pages in memory for ‘n’
frames is always a subset of the set of pages that would be in memory with n+1 frames.
There is some problems by using the L.R.U .i.e determine an order for the frames defined by the time of
last use.
This can be implemented by using 2 methods.
1)counters 2) stack
counters:- the counters or logical clock’s are added to the C.P.U. the value of the counter is incremented
for every memory reference.
The one extra bit i.e time of last use field is added to the each entry of the page table. When ever a
reference is made to a page, the contents of counter register is copied to the time of use field in the page
table for that page. In this way we always have the “time” of last reference to each page. We replace the
page with the smallest time value.
But here we have some drawbacks.
1) this scheme requires search of page table to find LRU page.
2) Write the contents of counter to page table.
3) The time must also maintain, when page table are changed. So over flow of clocks can occurs.
Stacks:- another approach to implemented LRU replacement is to keep stack of page numbers. When ever a
page is reference, it is placed on the top of the stack. In this way the top of the stack is always the most
recently used page. And the bottom is LRU page.

LRU APPROXIMATION ALGORITHM:-


Here we use reference bit. The reference bit for a page is set by the H/W. when ever a page is referenced
that page reference bit is set to 1. the reference bits are placed with each entry in the page table.
Initially all bits are cleared to ‘0’ by O.S. a user process executes , the reference bit of each page is
set to 1 by the H/W. after some time we can determine which pages have been used and which have not
been used by examine the reference bit. But we don’t know the order of use or how many times the page is
referenced.
Counting algorithm:-
We have other algorithms used for page replacement. here we would keep a counter . the counter
contains no.of references that have been made to each page.
LFU:- least frequently used page replacement algorithm requires that page with small count be replaced.
MFU:- most frequently used page replacement algorithm is based on the argument, that the page that is just
brought in and has yet to be used.
Allocation of frames:-
How do you allocate the fixed amount of memory among the various processes?
Ex:-
Suppose we have 128k memory and there is only one process in the system, and the page size is 1k
then there are 128 frames. There any problem can not occurs when only one process in system.
Suppose there are 2 processes, and the O.S can use
35k memory, the remaining i.e 93k of memory is
available to user process. Then how many frames does
each process get?
Under the pure demand paging all 93 free frames
would be initially be put on the free frame list. When a
user process starts execution it would generate the
sequence of page faults. Then each page fault can use 93
frames. If no free frame is there then a page replacement
algorithm would be used to select one of 93 in memory to
replace with the 94 and so on….
When a process terminates, the 93 frames would once again be placed on the free frame list. In this method
also we can not get any problem on allocation of frames.

Minimum no.of frames:-


The minimum no.of frames per process is defined by the architecture, where as the maximum number
is defined by the amount of available physical memory.

Allocation algorithm:-
We have ‘m’ frames among ‘n’ processes then allocate the m/n frames to each process.
Ex:- if there are 93 frames and 5 processes each process will get 18 frames and the remaining 3 frames are
added to the free frame allocation list. This scheme is called equal allocation.
To solve this problem we use proportional allocation. We allocated available memory to each process
according to its size.
Let the size of the process pi be si and define
S=si
Then, if the total no.of available frames is m we allocate ai frames to process pi where
Ai=si/S*m;
Here the ai is greater then the minimum no.of frames required by that process, and not exceeding m.
For proportional allocation, we would split 62 frames between two processes one of 10 pages and one of
127 pages is
10/137*62= 4
127/137*62=57 in this way both processes share the available frames accordingly to their needs rather than
equally.

Global versus local allocation:-


The multiple processes competing for frames, we can classify page replacement algorithm in to two
broad categories:
Global replacement:-
It allows a process to select a replacement frame from the set of all frames, even is that frame is currently
allocated to some other process. One process can take the frames from another
Local replacement:- it allows that each process select from only its own set of allocated frames.

Thrashing:-
A process contains large no.of pages that are in active use. If the process does not have this no.of frames, it
will very quickly page fault. At this point it must replace some page, but here all its pages are in active use.
It must replace a page that will be needed again. Very quickly faults are come again and again.
The process continuous to fault , replacing page for which it will then fault and bring back. This high
paging activity is called thrashing. A process is thrashing if it is spending more time paging then execution.

Causes of the thrashing:-


The O.S monitors CPU utilization. If the CPU utilization too low, we increases the degree of
multiprogramming by introducing a new processes to the system. a global page replacement algorithm is
used and replacing pages with no regard to the process to which they belong.
A process enters in to system and needs more frames. But there is no such frames then page faulting
can occurs. Then it takes pages away from the other processes. These processes need those pages, then that
processes also get fault, that process taking pages from other process. The faulting processes must use the
paging device, then the ready queue empties. As a process, waiting for paging device, CPU utilization
decreases.
The CPU scheduler increases the degree of multiprogramming to increase the CPU utilization. As result
the new process tries to get started by taking pages from running processes, causing more page faults, a
longer queue is form for page device. As the result, the CPU utilization drops, and the scheduler tries to
increases the degree of multiprogramming even more. Thrashing has occurred and through put is drop.
Graph FIG
The graph is drawn between the degree of multiprogramming and the CPU utilization. As the degree of
multiprogramming increases, cpu utilization also increases more slowly until a maximum is reached. If the
degree of multiprogramming is increased even further, thrashing set in the CPU and utilization drops
sharply.
At this point , to increase CPU utilization and stop thrashing, we must decreases the degree of
multiprogramming.
The effect of thrashing can be limited by using local replacement algorithm. With local replacement each
process select from only its own set of allocated frames. If one process starts, thrashing , it can not steal
frames from another process, so other process can not effected by thrashing.
To prevent thrashing, we must provide the as many frames as it need to a processes. But how do we
know how many frames it needs? There are several techniques , by using some techniques, we find how
many frames a process is actually using.
This approach defines the locality model of process execution.
Locality:- a set of pages that are in actively used together. And the locality model states that , as a process
executes, it moves from one locality to other locality.
We see that localities are defined by the program structure and its data structures.
Work – set model:-
Work set model is based on the assumption pf locality. This model use a parameter .
It defines the work set window. The work set window indicates most recent page references. The set of
pages in the work set window is called work set.
If the pages are in active use, it will be in the work set. If it is no longer needed, it will drop from the
working set, after last reference.
Ex:- see text book FIG
The working set dependent on the selection of workset window. We have to computethe size of the
working set is wssi for each process in the system is defined by
D =wssi
Where D is the total demand for frames. If the total demand is greater than total no.of available
frames(D>m) then thrashing will occurs. The use of working set model is simple. But the difficulty with
this is keep tack of working set window is moving window. At each memory reference , a new reference
appears at one end and oldest reference is drop at other end.

Page fault frequency:- To prevent thrashing


we use page fault frequency. We know
thrashing has a high page fault rate. We
want to control the page fault rate is too
high, we know that the process needs more
frames. If the page fault rate is too low, then
the process may have too many frames.
We can establish upper and lower bounds on the desired page fault rate. If the actual page fault rate
exceeds the upper limit, we allocate another frames to that process, if page fault rate falls below, the lower
limit, we remove a frame from that processes.

File System Interface


A file is a named collection of related information that is recorded on secondary storage.
The information in a file is defined its creator. Many different types of information may be stored in a file.

File attributes:-
A file is named and for the user’s convince is referred to by its name. A name is usually a string of
characters. One user might create file, where as another user might edit that file by specifying its name. There are
different types of attributes.
1)name:- the name can be in the human readable form.
2)type:- this information is needed for those systems that support different types.
3)location:- this information is used to a device and to the location of the file on that device.
4)size:- this indicates the size of the file in bytes or words.
5)protection:-
6)time,date, and user identifications:-
the information about all files is kept in the directory structure, that also resides on secondary storage.

File operations:-

Creating a file:-
Two steps are necessary to create a file first, space in the file system must be found for the file. Second , an
entry for the new file must be made in the directory. The directory entry records the name of the file and the
location in the system.
Writing a file:-
To write a file give the name of the file, the system search the directory to find the location of the file. The
system must keep the writer pointer to the location in the file where the next write is to take place. The write
pointer must be updated whenever a write occurs.
Reading a file:- to read from a file, specifies the name of the file and directory is search for the associated
directory entry, and the system needs to keep read pointer to the location in the file where the next read is to take
place. Once the read has taken place, read pointer is updated.

Repositioning with in a file:-


The directory is searched for the appropriate entry and the current file position is set to given value. this is also
known as a file seek.
Deleting a file:- to delete a file , we search the directory for the name file. Found that file in the directory entry,
we release all file space and erase the directory entry.
Truncate a file:- this function allows all attributes to remain unchanged(except for file length) but for the file to
be reset to length zero.
Appending:- add new information to the end of an existing file .
Renaming:- give new name to an existing file.
Open a file:-if file need to be used, the first step is to open the file, using the open system call.
Close:- close is a system call used to terminate the use of an already used file.

File Types:-
1) A common technique for implementing file type is to include the type as part of the file name.
2) The name is split in to two parts. the name 2) and an extension .
the system uses the extension to indicate the type of the file and the type of operations that can be done on that
file. See fig 10.2
ACCESSMETHODS:-
There are several ways that the information in the file can be accessed.
1)sequential method 2) direct access method 3) other access methods.
1)sequential access method:-
the simplest access method is S.A. information in the file is processed in order, one after the other. the bulk
of the operations on a file are reads & writes. It is based on a tape model of a file. Fig 10.3

2)Direct access:- or relative access:-


a file is made up of fixed length records, that allow programs to read and write record rapidly in no
particular order. For direct access, file is viewed as a numbered sequence of blocks or records. A direct access
file allows, blocks to be read & write.
So we may read block15, block 54 or write block10. there is no restrictions on the order of reading or
writing for a direct access file. It is great useful for immediate access to large amount of information.
The file operations must be modified to include the block number as a parameter. We have read n, where n
is the block number.
3)other access methods:-
the other access methods are based on the index for the file. The indexed contain pointers to the various
blocks. To find an entry in the file , we first search the index and then use the pointer to access the file directly
and to find the desired entry. With large files. The index file itself, may become too large to be kept in
memory. One solution is to create an index for the index file. The primary index file would contain pointers to
secondary index files which would point to the actual data iteams. Fig 10.5.

directory structures:-
operations that are be on a directory (read in text book)

single level directory:-


the simple directory structure is the single level directory. All files are contained in the same directory.
Which is easy to understand. Since all files are in same directory, they must have unique names.
In a single level directory there is some limitations. When the no.of files increases or when there is more
than one user some problems can occurs. If the no.of files increases, it becomes difficult to remember the
names of all the files.

Two-level directory:-
The major disadvantages to a single level directory is the confusion of file names between different users.
The standard solution is to create separate directory for each user.
In 2-level directory structure, each user has her own user file directory(ufd). Each ufd has a similar structure,
the user first search the master file directory . the mfd is indexed by user name and each entry point to the ufd
for that user.fig 10.8
To create a file for a user, the O.S search only that user’s ufd to find whether another file of that name exists. To
delete a file the O.S only search to the local ufd and it can not accidentally delete another user’s file that has the
same name.
This solves the name collision problem, but it still have another. This is disadvantages when the user
wants to cooperate on some task and to access one another’s file . some systems simply do not allow local
user files to be accessed by other user.
Any file is accessed by using path name. Here the user name and a file name defines a path name.
Ex:- user1/ob
In MS-DOS a file specification is
C:/directory name/file name

Tree structured directory:-


This allows users to create their own subdirectories and to organize their files accordingly. here the tree
hasa root directory. And every file in the system has a unique path name. A path name is the path from the
root, through all the subdirectories to a specified file.FIG 10.9.

A directory contains a set of subdirectories or files. A directory is simply another file, but it is treated in a
special way. Here the path names can be of two types.
1)absolute path and 2) relative path.
An absolute path name begins at the root and follows a path down to the specified file, giving the directory
name on the path.
Ex:- root/spell/mail/prt/first.
A relative pathname defines a path from the current directory ex:- prt/first is relative path name.
A cyclic- graph directory:-
Consider two programmers who are working on a joint project. The files associated with that project can
be stored in a sub directory , separating them from other projects and files of the two programmers. The
common subdirectory is shared by both programmers. A shared directory or file will exist in the file system
in two places at once. Notice that a shared file is not the same as two copies of the file with two copies, each
programmer can view the copy rather than the original but if one programmer changes the file the changes
will not appear in the others copy with a shared file there is only one actual file, so any changes made by one
person would be immediately visible to the other.
A tree structure prohibits the sharing of files or directories. An acyclic graph allows directories to have
shared subdirectories and files

FIG 10.10 . it is more complex and more flexiable. Also several problems may occurs at the traverse and
deleting the file contents.
General graph directory:-

Protection:
When the information is kept in the system the major worry is its protection from the both physical
damage (Reliability) and improper access(Protection).
The reliability is generally provided by duplicate copies of files.
The protection can be provided in many ways . for some single system user, we might provide protection
by physically removing the floppy disks . in a multi-user systems, other mechanism are needed.
1) types of access:-
if the system do not permit access to the files of other users, protection is not needed. Protection
mechanism provided by controlling accessing. This can be provided by types of file access. Access is
permitted or denied depending on several factors. Suppose we mentioned read that file allows only for read .
Read:- read from the file. Write:- write or rewrite the file. Execute:- load the file in to memory and execute
it.Append:- write new information at the end of the file. Delete:- delete the file and free its space for possible
reuse.
2) Access Control:-
FILE SYSTEM IMPLEMENTATION

File allocation methods:-


There are 3 major methods of allocating disk
space.
1) Contiguous allocation:-
1.The contiguous allocation method requires each file
to occupy a set of contiguous block on the disk.
2. Contiguous allocation of a file is defined by the disk
address and length of the first block. If the file is ‘n’
block long and starts at location ‘b’ , then it occupies
blocks b,b+1,b+2,…..,b+n-1;
3.The directory entry for each file indicates the address of
the starting block and length of the area allocated for this file.
4. Contiguous allocation of file is very easy to access. For the sequential access , the file system remembers the
disk address of the last block referenced and, when necessary read next block. For direct access to block ‘i’ of a
file that starts at block ‘b’ , we can immediately access block b+i. Thus both sequential and direct access can be
supported by contagious allocation.
5.One difficulty with this method is finding space for a new file.
6.Also there are many problems with this method
a. External fragmentation:- files are allocated and deleted , the free disk space is broken in to little pieces. The
E.F exists when free space is broken in to chunks (large piece) and these chunks are not sufficient for a request of
new file.
There is a solution for E.F i.e compaction. All free space compact in to one contiguous space. But the cost of
compaction is time.
B.Another problem is determining how much space is needed for a file. When file is created the creator must
specifies the size of that file. This becomes to big problem. Suppose if we allocate too little space to a file , some
times it may not sufficient. Suppose if we allocate large space some times space is wasted.
C.Another problem is if one large file is deleted, that large space is becomes to empty. Another file is loaded in to
that space whose size is very small then some space is wasted. that wastage of space is called internal
fragmentation.
2) Linked allocation:-
1) Linked allocation solves all the problems of contagious
allocation. With linked allocation, each file is a linked
list of disk blocks, the disk block may be scattered any
where on the disk.
2) The directory contains a pointer to the first and last
blocks of the file.

Ex:- a file have five blocks start at block 9, continue at block


16,then block 1, block 10 and finally block 25. each block
contains a ponter to the next block. These pointers are not
available to the user.
3) To create a new file we simply creates a new entry in
directory. With linked allocation, each directory entry has a
pointer to the first disk block of the file.
3) There is no external fragmentation with linked allocation. Also there is no need to declare the size of a
file when that file is created. A file can continue to grows as long as there are free blocks.
4) But it have disadvantage. The major problem is that it can be used only for sequential access-files.
5) To find the I th block of a file , we must start at the beginning of that file, and follow the pointers until
we get to the I th block. It can not support the direct access.
6) Another disadvantage is it requires space for the pointers. If a pointer requires 4 bytes out of 512 byte
block, then 0.78% of disk is being used for pointers, rather than for information.
7) The solution to this problem is to allocate blocks in to multiples, called clusters and to allocate the
clusters rather than blocks.
8) Another problem is reliability. The files are linked together by pointers scattered all over the disk what
happen if a pointer were lost or damaged.
FAT( file allocation table):-
An important variation on the linked allocation
method is the use of a file allocation table.
The table has one entry for each disk block, and is
indexed by block number. The FAT is used much as is
a linked list.
The directory entry contains the block number of the
first block of the file. The table entry contains the
block number then contains the block number of the
next block in the file. This chain continuous until the
last block, which has a special end of file values as the
table entry. Unused blocks are indicated by a ‘0’ table
value. Allocation a new block to a file is a simple. First finding the first 0-value table entry, and replacing
the previously end of file value with the address of the new block. The 0 is then replaced with end of file
value.
3)Indexed allocation:-
1) linked allocation solves the external fragmentation and size declaration problems of contagious
allocation. How ever in the absence of a FAT , linked allocation can not support efficient direct access.
2) The pointers to the blocks are scattered with the blocks
themselves all over the disk and need to be retrieved in order.
3) Indexed allocation solves this problem by bringing all the
pointers together in to one location i.e the index block.
4) Each file has its own index block ,which is an array of disk
block addresses. The I th entry in the index block points to the
ith block of the file.
5) The directory contains the address of the index block.
To read the ith block we use the pointer in the ith index block entry
to find and read the desired block.
6) When the file is created, all pointers in the index block are set to
nil. When the ith block is first written, a block is obtained from
the free space manager, and its address is put in the ith index
block entry.
7) It supports the direct access with out suffering from external fragmentation, but it suffer from the
wasted space. The pointer overhead of the index block is generally greater than the pointer over head of
linked allocation.
Free space management:-
1) to keep track of free disk space, the system maintains a free space list. The free space list records all disk
blocks that are free.
2) To create a file we search the free space list for the required amount of space, and allocate that space to the
new file. This space is then removed from the free space list.
3) When the file is deleted , its disk space is added to the free space list.
There are many methods to find the free space.
1) bit vector:-
The free space list is implemented as a bit map or bit vector. Each block is represented by 1 bit. If the block is
free the bit is 1 if the block is allocated the bit is 0.
Ex:- consider a disk where blocks 2,3,4,5,8,9,10,11,12,13,17,18,25, are free and rest of blocks are
allocated the free space bit map would be
001111001111110001100000010000……..
the main advantage of this approach is that it is relatively simple and efficient to find the first free block or ‘n’
consecutive free blocks on the disk
2) Linked list:-
Another approach is to link together all the free disk blocks, keeping a pointer to the first free block in
a special location on the disk and caching it in memory. This first block contain a pointer to the next free disk
block, and so on.
How ever this scheme is not efficient to traverse the list, we must read each block, which requires I/O
time.
Disk space is also wasted to maintain the pointer to next free space.
3) Grouping:-
Another method is store the addresses of ‘n’ free blocks in the
first free block. The first (n-1) of these blocks are actually free. The last
block contains the addresses of another ‘n’ free blocks and so on.

Advantages:- the main advantage of this approach is that the addresses


of a large no.of blocks can be found quickly.
4) Counting:-
Another approach is counting. Generally several contiguous
blocks may be allocated or freed simultaneously. Particularly when
space is allocated with the contiguous allocation algorithm rather than
keeping a list of ‘n’ free disk address. We can keep the address of first
free block and the number ‘n’ of free contiguous blocks that follow the
first block. Each entry in the free space list then consists of a disk
address and a count.
Mass-Storage Systems
Disk Structure Disk Attachment Disk Scheduling Disk Management
➢ Magnetic disks provide bulk of secondary storage of modern
computers
Drives rotate at 60 to 200 times per second
Transfer rate is rate at which data flow between drive and computer
Positioning time (random-access time) is time to move disk arm to desired cylinder (seek time)
and time for desired sector to rotate under the disk head (rotational latency)
Head crash results from disk head making contact with the disk surface
That’s bad
➢ Disks can be removable
Drive attached to computer via I/O bus
Busses vary, including EIDE, ATA, SATA, USB, Fibre Channel, SCSI
Host controller in computer uses bus to talk to disk controller built into drive or storage array

Magnetic tape
➢ Was early secondary-storage medium
➢ Relatively permanent and holds large quantities of data
➢ Access time slow
➢ Random access ~1000 times slower than disk
➢ Mainly used for backup, storage of infrequently-used data, transfer
medium between systems
➢ Kept in spool and wound or rewound past read-write head
➢ Once data under head, transfer rates comparable to disk
➢ 20-200GB typical storage
➢ Common technologies are 4mm, 8mm, 19mm, LTO-2 and SDLT

Disk Structure
Disk drives are addressed as large 1-dimensional arrays of logical blocks,
where the logical block is the smallest unit of transfer.

The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially.
➢ Sector 0 is the first sector of the first track on the outermost cylinder.
➢ Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and
then through the rest of the cylinders from outermost to innermost.
Disk Attachment
Host-attached storage accessed through I/O ports talking to I/O busses
SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator requests operation and SCSI targets
perform tasks
Each target can have up to 8 logical units (disks attached to device controller
FC is high-speed serial architecture
Can be switched fabric with 24-bit address space – the basis of storage area networks (SANs)
in which many hosts attach to many storage units
Can be arbitrated loop (FC-AL) of 126 devices
Network-Attached Storage
Network-attached storage (NAS) is storage made available over a network rather than over a local
connection (such as a bus)
1.NFS and CIFS are common protocols
2.Implemented via remote procedure
calls (RPCs) between host and storage
New iSCSI protocol uses IP network to
carry the SCSI protocol
Storage Area Network
1.Common in large storage
environments (and becoming more
common)
2.Multiple hosts attached to multiple storage arrays - flexible

Disk Scheduling
➢ The operating system is responsible for using hardware efficiently — for the disk drives, this means
having a fast access time and disk bandwidth.
➢ Access time has two major components
✓ Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector.
✓ Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head.
➢ Minimize seek time
➢ Seek time  seek distance
➢ Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request
for service and the completion of the last transfer.
➢ Several algorithms exist to schedule the servicing of disk I/O requests.
➢ We illustrate them with a request queue (0-199).
98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53
1.FCFS
Illustration shows total head movement of 640 cylinders

2.SSTF
➢ Selects the request with the minimum seek time from the current head position.
➢ SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests.
➢ Illustration shows total head movement of 236 cylinders.
3.SCAN
➢ The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the
other end of the disk, where the head movement is reversed and servicing continues.
➢ Sometimes called the elevator algorithm.
➢ Illustration shows total head movement of 208 cylinders.

4.C-SCAN
➢ Provides a more uniform wait time than SCAN.
➢ The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other
end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip.
➢ Treats the cylinders as a circular list that wraps around from the last cylinder to the first one.

5.C-LOOK
➢ Version of C-SCAN
➢ Arm only goes as far as the last request in each direction, then reverses direction immediately, without
first going all the way to the end of the disk.

Selecting a Disk-Scheduling Algorithm


➢ SSTF is common and has a natural appeal
➢ SCAN and C-SCAN perform better for systems that place a heavy load on the disk.
➢ Performance depends on the number and types of requests.
➢ Requests for disk service can be influenced by the file-allocation method.
➢ The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy