CO & OS 5 Units
CO & OS 5 Units
UNIT-1
Basic Structure of Computers: Computer Types, Functional Units, Basic
Historical Perspective.
1
COMPUTER ORGANIZATION
CHAPTER – 1
BASIC STRUCTURE OF COMPUTERS
Computer types
A computer can be defined as a fast electronic calculating machine that accepts
the (data) digitized input information process it as per the list of internally stored
instructions and produces the resulting information.
List of instructions are called programs & internal storage is called computer
memory.
Functional unit
A computer consists of five functionally independent main parts input, memory,
arithmetic logic unit (ALU), output and control unit.
2
COMPUTER ORGANIZATION
Input ALU
I/O Processor
Memory
Finally the results are sent to the outside world through output device. All of
these actions are coordinated by the control unit.
Input unit: -
The source program/high level language program/coded information/simply data
is fed to a computer through input devices keyboard is a most common type. Whenever a
key is pressed, one corresponding word or number is translated into its equivalent binary
code over a cable & fed either to memory or processor.
Memory unit: -
Its function into store programs and data. It is basically to two types
1. Primary memory
2. Secondary memory
1. Primary memory: - Is the one exclusively associated with the processor and operates
at the electronics speeds programs must be stored in this memory while they are being
executed. The memory contains a large number of semiconductors storage cells. Each
3
COMPUTER ORGANIZATION
capable of storing one bit of information. These are processed in a group of fixed site
called word.
Number of bits in each word is called word length of the computer. Programs
must reside in the memory during execution. Instructions and data can be written into the
memory or read out under the control of processor.
Memory in which any location can be reached in a short and fixed amount of
time after specifying its address is called random-access memory (RAM).
The time required to access one word in called memory access time. Memory
which is only readable by the user and contents of which can’t be altered is called read
only memory (ROM) it contains operating system.
Caches are the small fast RAM units, which are coupled with the processor and
are aften contained on the same IC chip to achieve high performance. Although primary
storage is essential it tends to be expensive.
2 Secondary memory: - Is used where large amounts of data & programs have to be
stored, particularly information that is accessed infrequently.
Examples: - Magnetic disks & tapes, optical disks (ie CD-ROM’s), floppies etc.,
The control and the ALU are may times faster than other devices connected to a
computer system. This enables a single processor to control a number of external devices
such as key boards, displays, magnetic and optical disks, sensors and other mechanical
controllers.
Output unit:-
These actually are the counterparts of input unit. Its basic function is to send the
processed results to the outside world.
4
COMPUTER ORGANIZATION
Control unit:-
It effectively is the nerve center that sends signals to other units and senses their
states. The actual timing signals that govern the transfer of data between input unit,
processor, memory and output unit are generated by the control unit.
1. First the instruction is fetched from the memory into the processor.
2. The operand at LOCA is fetched and added to the contents of R0
3. Finally the resulting sum is stored in the register R0
The preceding add instruction combines a memory access operation with an ALU
Operations. In some other type of computers, these two types of operations are performed
by separate instructions for performance reasons.
Load LOCA, R1
Add R1, R0
Transfers between the memory and the processor are started by sending the
address of the memory location to be accessed to the memory unit and issuing the
appropriate control signals. The data are then transferred to or from the memory.
5
COMPUTER ORGANIZATION
MEMORY
MAR MDR
CONTROL
PC R0
R1
…
… ALU
…
IR
…
n- GPRs
The fig shows how memory & the processor can be connected. In addition to the
ALU & the control circuitry, the processor contains a number of registers used for several
different purposes.
The instruction register (IR):- Holds the instructions that is currently being executed.
Its output is available for the control circuits which generates the timing signals that
control the various processing elements in one execution of instruction.
Besides IR and PC, there are n-general purpose registers R0 through Rn-1 .
6
COMPUTER ORGANIZATION
The other two registers which facilitate communication with memory are: -
1. MAR – (Memory Address Register):- It holds the address of the location to be
accessed.
2. MDR – (Memory Data Register):- It contains the data to be written into or read
out of the address location.
An interrupt is a request signal from an I/O device for service by the processor.
The processor provides the requested service by executing an appropriate interrupt
service routine.
The Diversion may change the internal stage of the processor its state must be
saved in the memory location before interruption. When the interrupt-routine service is
7
COMPUTER ORGANIZATION
completed the state of the processor is restored so that the interrupted program may
continue.
Bus structure
The simplest and most common way of interconnecting various parts of the
computer. To achieve a reasonable speed of operation, a computer must be organized so
that all its units can handle one full word of data at a given time.A group of lines that
serve as a connecting port for several devices is called a bus.
In addition to the lines that carry the data, the bus must have lines for address and
control purpose. Simplest way to interconnect is to use the single bus as shown
Since the bus can be used for only one transfer at a time, only two units can
actively use the bus at any given time. Bus control lines are used to arbitrate multiple
requests for use of one bus.
Low cost
Very flexible for attaching peripheral devices
Multiple bus structure certainly increases, the performance but also increases the
cost significantly.
8
COMPUTER ORGANIZATION
All the interconnected devices are not of same speed & time, leads to a bit of a
problem. This is solved by using cache registers (ie buffer registers). These buffers are
electronic registers of small capacity when compared to the main memory but of
comparable speed.
The instructions from the processor at once are loaded into these buffers and then
the complete transfer of data at a fast rate will take place.
Performance
The total time required to execute the program is elapsed time is a measure of the
performance of the entire computer system. It is affected by the speed of the processor,
the disk and the printer. The time needed to execute a instruction is called the processor
time.
Just as the elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the execution
of individual machine instructions. This hardware comprises the processor and the
memory which are usually connected by the bus as shown in the fig c.
Bus
The pertinent parts of the fig. c are repeated in fig. d which includes the cache
memory as part of the processor unit.
9
COMPUTER ORGANIZATION
Let us examine the flow of program instructions and data between the memory
and the processor. At the start of execution, all program instructions and the required data
are stored in the main memory. As the execution proceeds, instructions are fetched one
by one over the bus into the processor, and a copy is placed in the cache later if the same
instruction or data item is needed a second time, it is read directly from the cache.
The processor and relatively small cache memory can be fabricated on a single
IC chip. The internal speed of performing the basic steps of instruction processing on
chip is very high and is considerably faster than the speed at which the instruction and
data can be fetched from the main memory. A program will be executed faster if the
movement of instructions and data between the main memory and the processor is
minimized, which is achieved by using the cache.
For example:- Suppose a number of instructions are executed repeatedly over a short
period of time as happens in a program loop. If these instructions are available in the
cache, they can be fetched quickly during the period of repeated use. The same applies to
the data that are used repeatedly.
Processor clock: -
Processor circuits are controlled by a timing signal called clock. The clock
designer the regular time intervals called clock cycles. To execute a machine instruction
the processor divides the action to be performed into a sequence of basic steps that each
step can be completed in one clock cycle. The length P of one clock cycle is an important
parameter that affects the processor performance.
Processor used in today’s personal computer and work station have a clock rates
that range from a few hundred million to over a billion cycles per second.
We now focus our attention on the processor time component of the total elapsed
time. Let ‘T’ be the processor time required to execute a program that has been prepared
in some high-level language. The compiler generates a machine language object program
that corresponds to the source program. Assume that complete execution of the program
requires the execution of N machine cycle language instructions. The number N is the
actual number of instruction execution and is not necessarily equal to the number of
machine cycle instructions in the object program. Some instruction may be executed
more than once, which in the case for instructions inside a program loop others may not
be executed all, depending on the input data used.
10
COMPUTER ORGANIZATION
Suppose that the average number of basic steps needed to execute one machine
cycle instruction is S, where each basic step is completed in one clock cycle. If clock rate
is ‘R’ cycles per second, the program execution time is given by
T= N× S
R
this is often referred to as the basic performance equation.
We must emphasize that N, S & R are not independent parameters changing one
may affect another. Introducing a new feature in the design of a processor will lead to
improved performance only if the overall result is to reduce the value of T.
Consider Add R1 R2 R3
This adds the contents of R1 & R2 and places the sum into R3.
The contents of R1 & R2 are first transferred to the inputs of ALU. After the
addition operation is performed, the sum is transferred to R3. The processor can read the
next instruction from the memory, while the addition operation is being performed. Then
of that instruction also uses, the ALU, its operand can be transferred to the ALU inputs at
the same time that the add instructions is being transferred to R3.
In the ideal case if all instructions are overlapped to the maximum degree
possible the execution proceeds at the rate of one instruction completed in each clock
cycle. Individual instructions still require several clock cycles to complete. But for the
purpose of computing T, effective value of S is 1.
11
COMPUTER ORGANIZATION
the serial execution of program instructions. Now a days may processor are designed in
this manner.
Clock rate
These are two possibilities for increasing the clock rate ‘R’.
1. Improving the IC technology makes logical circuit faster, which reduces the time
of execution of basic steps. This allows the clock period P, to be reduced and the
clock rate R to be increased.
2. Reducing the amount of processing done in one basic step also makes it possible
to reduce the clock period P. however if the actions that have to be performed by
an instructions remain the same, the number of basic steps needed may increase.
Performance measurements
It is very important to be able to access the performance of a computer, comp
designers use performance estimates to evaluate the effectiveness of new features.
12
COMPUTER ORGANIZATION
The performance measure is the time taken by the computer to execute a given
bench mark. Initially some attempts were made to create artificial programs that could be
used as bench mark programs. But synthetic programs do not properly predict the
performance obtained when real application programs are run.
The program selected range from game playing, compiler, and data base
applications to numerically intensive programs in astrophysics and quantum chemistry. In
each case, the program is compiled under test, and the running time on a real computer is
measured. The same program is also compiled and run on one computer selected as
reference.
The ‘SPEC’ rating is computed as follows.
Means that the computer under test is 50 times as fast as the ultra sparc 10. This
is repeated for all the programs in the SPEC suit, and the geometric mean of the result is
computed.
Let SPECi be the rating for program ‘i’ in the suite. The overall SPEC rating for
the computer is given by
( )
n 1
n
SPEC rating = π SPECi
i= 1
13
COMPUTER ORGANIZATION
Since actual execution time is measured the SPEC rating is a measure of the
combined effect of all factors affecting performance, including the compiler, the OS, the
processor, the memory of comp being tested.
Where bi = 0 or 1 for 0 ¿i ≤ n ®1
. This vector can represent unsigned integer values V in
n
the range 0 to 2 -1, where
V(B) = bn®1× 2 +¿® ¿®
n®1 +b1 × 21+b0× 20
We obviously need to represent both positive and negative numbers. Three systems are
used for representing such numbers :
• Sign-and-magnitude
• 1’s-complement
• 2’s-complement
In all three systems, the leftmost bit is 0 for positive numbers and 1 for negative numbers.
Fig 2.1 illustrates all three representations using 4-bit numbers. Positive values have
identical representations in al systems, but negative values have different representations.
In the sign-and-magnitude systems, negative values are represented by changing the most
significant bit (b3 in figure 2.1) from 0 to 1 in the B vector of the corresponding positive
value. For example, +5 is represented by 0101, and -5 is represented by 1101. In 1’s-
14
COMPUTER ORGANIZATION
B Values represented
Sign and
b3b2b1 1's 2's
b0 magnitude complement complement
0 1 1
1 +7 +7 +7
0 1 1
0 +6 +6 +6
0 1 0
1 +5 +5 +5
0 1 0
0 +4 +4 +4
0 0 1
1 +3 +3 +3
0 0 1
0 +2 +2 +2
0 0 0
1 +1 +1 +1
0 0 0
0 +0 +0 +0
1 0 0
0 -0 -0 -0
1 0 0
1 -1 -1 -1
1 0 1
0 -2 -2 -2
1 0 1
1 -3 -3 -3
1 1 0
0 -4 -4 -4
1 1 0
1 -5 -5 -5
15
COMPUTER ORGANIZATION
1 1 1
0 -6 -6 -6
1 1 1
1 -7 -7 -7
Hence, the 2’s complement of a number is obtained by adding 1 to the 1’s complement of
that number.
0 1 0 1
+0 +0 +1 +1
0 1 1 10
Carry-out
Figure 2.2 Addition of 1-bit numbers.
Number and character operands, as well as instructions, are stored in the memory
of a computer. The memory consists of many millions of storage cells, each of which can
store a bit of information having the value 0 or 1. Because a single bit represents a very
small amount of information, bits are seldom handled individually. The usual approach is
to deal with them in groups of fixed size. For this purpose, the memory is organized so
that a group of n bits can be stored or retrieved in a single, basic operation. Each group of
n bits is referred to as a word of information, and n is called the word length. The
memory of a computer can be schematically represented as a collection of words as
shown in figure (a).
Modern computers have word lengths that typically range from 16 to 64 bits. If
the word length of a computer is 32 bits, a single word can store a 32-bit 2’s complement
number or four ASCII characters, each occupying 8 bits. A unit of 8 bits is called a byte.
16
COMPUTER ORGANIZATION
BYTE ADDRESSABILITY:-
We now have three basic information quantities to deal with: the bit, byte and
word. A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits.
The most practical assignment is to have successive addresses refer to successive byte
n bits
First word
Second word
…
…
…
i-th word
…
…
…
Last word
32 bits
b31 b30 ……. b1 b0
17
COMPUTER ORGANIZATION
Locations in the memory. This is the assignment used in most modern computers, and is
the one we will normally use in this book. The term byte-addressable memory is use for
this assignment. Byte locations have addresses 0,1,2, …. Thus, if the word length of the
machine is 32 bits, successive words are located at addresses 0,4,8,…., with each word
consisting of four bytes.
0 1 2 3 3 2 1 0
0 0
4 5 6 7 7 6 5 4
4 4
…. ….
…. ….
…. ….
2k-4 2k-4
18
COMPUTER ORGANIZATION
WORD ALIGNMENT:-
In the case of a 32-bit word length, natural word boundaries occur at addresses 0,
4, 8, …, as shown in above fig. We say that the word locations have aligned addresses .
in general, words are said to be aligned in memory if they begin at a byte address that is a
multiple of the number of bytes in a word. The memory of bytes in a word is a power of
2. Hence, if the word length is 16 (2 bytes), aligned words begin at byte addresses
0,2,4,…, and for a word length of 64 (23 bytes), aligned words begin at bytes addresses
0,8,16 ….
Memory operations
Both program instructions and data operands are stored in the memory. To
execute an instruction, the processor control circuits must cause the word (or words)
containing the instruction to be transferred from the memory to the processor. Operands
and results must also be moved between the memory and the processor. Thus, two basic
operations involving the memory are needed, namely, Load (or Read or Fetch) and Store
(or Write).
The load operation transfers a copy of the contents of a specific memory location
to the processor. The memory contents remain unchanged. To start a Load operation, the
processor sends the address of the desired location to the memory and requests that its
19
COMPUTER ORGANIZATION
contents be read. The memory reads the data stored at that address and sends them to the
processor.
An information item of either one word or one byte can be transferred between
the processor and the memory in a single operation. Actually this transfer in between the
CPU register & main memory.
Example, names for the addresses of memory locations may be LOC, PLACE, A,
VAR2; processor registers names may be R0, R5; and I/O register names may be
DATAIN, OUTSTATUS, and so on. The contents of a location are denoted by placing
square brackets around the name of the location. Thus, the expression
R1 ®[LOC]
Means that the contents of memory location LOC are transferred into processor register
R1.
As another example, consider the operation that adds the contents of registers R1
and R2, and then places their sum into register R3. This action is indicated as
R3 ®[R1] + [R2]
20
COMPUTER ORGANIZATION
This type of notation is known as Register Transfer Notation (RTN). Note that
the right-hand side of an RTN expression always denotes a value, and the left-hand side
is the name of a location where the value is to be places, overwriting the old contents of
that location.
The contents of LOC are unchanged by the execution of this instruction, but the
old contents of register R1 are overwritten.
BASIC INSTRUCTIONS:-
The operation of adding two numbers is a fundamental capability in any
computer. The statement
C=A+B
To carry out this action, the contents of memory locations A and B are fetched
from the memory and transferred into the processor where their sum is computed. This
result is then sent back to the memory and stored in location C.
21
COMPUTER ORGANIZATION
Operands A and B are called the source operands, C is called the destination
operand, and Add is the operation to be performed on the operands. A general instruction
of this type has the format.
Operation Source1, Source 2, Destination
If k bits are needed for specify the memory address of each operand, the encoded
form of the above instruction must contain 3k bits for addressing purposes in addition to
the bits needed to denote the Add operation.
®
Which performs the operations C [B], leaving the contents of location B unchanged.
®
Using only one-address instructions, the operation C [A] + [B] can be
performed by executing the sequence of instructions
Load A
Add B
Store C
22
COMPUTER ORGANIZATION
8 to 32, and even considerably more in some cases. Access to data in these registers is
much faster than to data stored in memory locations because the registers are inside the
processor.
Are generalizations of the Load, Store, and Add instructions for the single-accumulator
case, in which register Ri performs the function of the accumulator.
In processors where arithmetic operations are allowed only on operands that are
processor registers, the C = A + B task can be performed by the instruction sequence
Move A, Ri
Move B, Rj
23
COMPUTER ORGANIZATION
Add Ri, Rj
Move Rj, C
In processors where one operand may be in the memory but the other must be in
register, an instruction sequence for the required task would be
Move A, Ri
Add B, Ri
Move Ri, C
The speed with which a given task is carried out depends on the time it takes to
transfer instructions from memory into the processor and to access the operands
referenced by these instructions. Transfers that involve the memory are much slower than
transfers within the processor.
We have discussed three-, two-, and one-address instructions. It is also possible
to use instructions in which the locations of all operands are defined implicitly. Such
instructions are found in machines that store operands in a structure called a pushdown
stack. In this case, the instructions are called zero-address instructions.
…
…
…
A
…
24
COMPUTER ORGANIZATION
Let us consider how this program is executed. The processor contains a register
called the program counter (PC), which holds the address of the instruction to be
executed next. To begin executing a program, the address of its first instruction (I in our
example) must be placed into the PC. Then, the processor control circuits use the
information in the PC to fetch and execute instructions, one at a time, in the order of
increasing addresses. This is called straight-line sequencing. During the execution of each
instruction, the PC is incremented by 4 to point to the next instruction. Thus, after the
Move instruction at location i + 8 is executed, the PC contains the value i + 12, which is
the address of the first instruction of the next program segment.
BRANCHING:-
Consider the task of adding a list of n numbers. Instead of using a long list of add
instructions, it is possible to place a single add instruction in a program loop, as shown in
fig b. The loop is a straight-line sequence of instructions executed as many times as
needed. It starts at location LOOP and ends at the instruction Branch > 0. During each
pass through this loop, the address of the next list entry is determined, and that entry is
fetched and added to
Move NUM1, R0 i
Add NUM2, R0 i+4
Add NUM3, R0 i+8
…
…
Add NUMn, R0
Move R0, SUM
i+4n-4
…. i+4n
….
….
25
COMPUTER ORGANIZATION
Move N, R1
Clear R0
Determine address of
“Next” number and add LOOP
“Next” number to R0
Decrement R1 Program
Branch >0 LOOP loop
Move R0, SUM
…….
…….
…….
……
…...
SUM N
NUM1 NUM2
NUMn
Fig b Using a loop to add n numbers
Assume that the number of entries in the list, n, is stored in memory location N,
as shown. Register R1 is used as a counter to determine the number of time the loop is
executed. Hence, the contents of location N are loaded into register R1 at the beginning
of the program. Then, within the body of the loop, the instruction.
Decrement R1
Reduces the contents of R1 by 1 each time through the loop.
This type of instruction loads a new value into the program counter. As a result,
the processor fetches and executes the instruction at this new address, called the branch
target, instead of the instruction at the location that follows the branch instruction in
sequential address order. A conditional branch instruction causes a branch only if a
specified condition is satisfied. If the condition is not satisfied, the PC is incremented in
26
COMPUTER ORGANIZATION
the normal way, and the next instruction in sequential address order is fetched and
executed.
Branch > 0 LOOP
CONDITION CODES:-
The processor keeps track of information about the results of various operations
for use by subsequent conditional branch instructions. This is accomplished by recording
the required information in individual bits, often called condition code flags. These flags
are usually grouped together in a special processor register called the condition code
register or status register. Individual condition code flags are set to 1 or cleared to 0,
depending on the outcome of the operation performed.
27
COMPUTER ORGANIZATION
28
COMPUTER ORGANIZATION
29
COMPUTER ORGANIZATION
Addressing modes:
In general, a program operates on data that reside in the computer’s memory.
These data can be organized in a variety of ways. If we want to keep track of students’
names, we can write them in a list. Programmers use organizations called data structures
to represent the data used in computations. These include lists, linked lists, arrays,
queues, and so on.
EA = effective address
Value = a signed number
30
COMPUTER ORGANIZATION
Register mode - The operand is the contents of a processor register; the name (address)
of the register is given in the instruction.
Absolute mode – The operand is in a memory location; the address of this location is
given explicitly in the instruction. (In some assembly languages, this mode is called
Direct).
The instruction
Move LOC, R2
Processor registers are used as temporary storage locations where the data is a
register are accessed using the Register mode. The Absolute mode can represent global
variables in a program. A declaration such as
Integer A, B;
Places the value 200 in register R0. Clearly, the Immediate mode is only used to
specify the value of a source operand. Using a subscript to denote the Immediate mode is
not appropriate in assembly languages. A common convention is to use the sharp sign (#)
in front of the value to indicate that this value is to be used as an immediate operand.
Hence, we write the instruction above in the form
Move #200, R0
Indirect mode – The effective address of the operand is the contents of a register or
memory location whose address appears in the instruction.
31
COMPUTER ORGANIZATION
To execute the Add instruction in fig (a), the processor uses the value which is in
register R1, as the effective address of the operand. It requests a read operation from the
memory to read the contents of location B. the value read is the desired operand, which
the processor adds to the contents of register R0. Indirect addressing through a memory
location is also possible as shown in fig (b). In this case, the processor first reads the
contents of memory location A, then requests a second read operation using the value B
as an address to obtain the operand
Add (A), R0
Add (R1), R0
… …
… …
Main
… …
memory
B
A
Operand …
…
B …
Operands
R1 B Register
B
Address Contents
Move N, R1
Move #NUM, R2
Clear R0
LOOP Add (R2), R0
Add #4, R2
Decrement R1
Branch > 0 LOOP
Move R0, SUM
The register or memory location that contains the address of an operand is called
a pointer. Indirection and the use of pointers are important and powerful concepts in
programming.
32
COMPUTER ORGANIZATION
In the program shown Register R2 is used as a pointer to the numbers in the list,
and the operands are accessed indirectly through R2. The initialization section of the
program loads the counter value n from memory location N into R1 and uses the
immediate addressing mode to place the address value NUM1, which is the address of the
first number in the list, into R2. Then it clears R0 to 0. The first two instructions in the
loop implement the unspecified instruction block starting at LOOP. The first time
through the loop, the instruction Add (R2), R0 fetches the operand at location NUM1
and adds it to R0. The second Add instruction adds 4 to the contents of the pointer R2, so
that it will contain the address value NUM2 when the above instruction is executed in the
second pass through the loop.
Index mode – the effective address of the operand is generated by adding a constant
value to the contents of a register.
The register use may be either a special register provided for this purpose, or,
more commonly, it may be any one of a set of general-purpose registers in the processor.
In either case, it is referred to as index register. We indicate the Index mode symbolically
as
X (Ri)
Where X denotes the constant value contained in the instruction and Ri is the
name of the register involved. The effective address of the operand is given by
EA = X + [Rj]
33
COMPUTER ORGANIZATION
The contents of the index register are not changed in the process of generating
the effective address. In an assembly language program, the constant X may be given
either as an explicit number or as a symbolic name representing a numerical value.
Fig a illustrates two ways of using the Index mode. In fig a, the index register,
R1, contains the address of a memory location, and the value X defines an offset (also
called a displacement) from this address to the location where the operand is found. An
alternative use is illustrated in fig b. Here, the constant X corresponds to a memory
address, and the contents of the index register define the offset to the operand. In either
case, the effective address is the sum of two values; one is given explicitly in the
instruction, and the other is stored in a register.
Add 20(R1), R2
…
…
…
1000 1000 R1
…
…
…
20 = offset
Operands
1020
Add 1000(R1), R2
…
…
…
1000 20 R1
…
…
…
20 = offset
Operand
1020
34
COMPUTER ORGANIZATION
Move #LIST, R0
Clear R1
Clear R2
Clear R3
Move N, R4
LOOP Add 4(R0), R1
Add 8(R0), R2
Add 12(R0), R3
Add #16, R0
Decrement R4
Branch>0 LOOP
Move R1, SUM1
Move R2, SUM2
Move R3, SUM3
In the most basic form of indexed addressing several variations of this basic
form provide a very efficient access to memory operands in practical programming
situations. For example, a second register may be used to contain the offset X, in which
case we can write the Index mode as
(Ri, Rj)
The effective address is the sum of the contents of registers Ri and Rj. The
second register is usually called the base register. This form of indexed addressing
provides more flexibility in accessing operands, because both components of the effective
address can be changed.
Another version of the Index mode uses two registers plus a constant, which can
be denoted as
X(Ri, Rj)
In this case, the effective address is the sum of the constant X and the contents of
registers Ri and Rj. This added flexibility is useful in accessing multiple components
inside each item in a record, where the beginning of an item is specified by the (Ri, Rj)
part of the addressing mode. In other words, this mode implements a three-dimensional
array.
35
COMPUTER ORGANIZATION
RELATIVE ADDRESSING:-
We have defined the Index mode using general-purpose processor registers. A
useful version of this mode is obtained if the program counter, PC, is used instead of a
general purpose register. Then, X(PC) can be used to address a memory location that is X
bytes away from the location presently pointed to by the program counter.
Relative mode – The effective address is determined by the Index mode using the
program counter in place of the general-purpose register Ri.
This mode can be used to access data operands. But, its most common use is to
specify the target address in branch instructions. An instruction such as
Autoincrement mode – the effective address of the operand is the contents of a register
specified in the instruction. After accessing the operand, the contents of this register are
automatically to point to the next item in a list.
(Ri)+
Autodecrement mode – the contents of a register specified in the instruction are first
automatically decremented and are then used as the effective address of the operand.
-(Ri)
Move N, R1
Move #NUM1, R2
Clear R0
LOOP Add (R2)+, R0
Decrement R1
Branch>0 LOOP
Move R0, SUM
Fig c The Autoincrement addressing mode used in the program of fig 2.12
36
COMPUTER ORGANIZATION
ASSEMBLY LANGUAGE
Machine instructions are represented by patterns of 0s and 1s. Such patterns are
awkward to deal with when discussing or preparing programs. Therefore, we use
symbolic names to represent the pattern. So far, we have used normal words, such as
Move, Add, Increment, and Branch, for the instruction operations to represent the
corresponding binary code patterns. When writing programs for a specific computer, such
words are normally replaced by acronyms called mnemonics, such as MOV, ADD, INC,
and BR. Similarly, we use the notation R3 to refer to register 3, and LOC to refer to a
memory location. A complete set of such symbolic names and rules for their use
constitute a programming language, generally referred to as an assembly language.
ASSEMBLER DIRECTIVES:-
In addition to providing a mechanism for representing instructions in a program,
the assembly language allows the programmer to specify other information needed to
translate the source program into the object program. We have already mentioned that we
need to assign numerical values to any names used in a program. Suppose that the name
SUM is used to represent the value 200. This fact may be conveyed to the assembler
program through a statement such as
This statement does not denote an instruction that will be executed when the
object program is run; in fact, it will not even appear in the object program. It simply
informs the assembler that the name SUM should be replaced by the value 200 wherever
it appears in the program. Such statements, called assembler directives (or commands),
are used by the assembler while it translates a source program into an object program.
37
COMPUTER ORGANIZATION
Move N, R1 100
Move # NUM1,R2 104
Clear R0
Add (R2), R0 108
Add #4, R2 LOOP 112
Decrement R1 116
Branch>0 LOOP
Move R0, SUM 120
124
…. 128
….
…. 132
100
SUM 200
….
N
….
204
NUM1 208 NUM2 212
NUMn 604
The assembler assigns addresses to instructions and data blocks, starting at the
address given in the ORIGIN assembler directives. It also inserts constants that may be
given in DATAWORD commands and reserves memory space as requested by
RESERVE commands.
As the assembler scans through a source programs, it keeps track of all names
and the numerical values that correspond to them in a symbol table. Thus, when a name
appears a second time, it is replaced with its value from the table. A problem arises when
38
COMPUTER ORGANIZATION
a name appears as an operand before it is given a value. For example, this happens if a
forward branch is required. A simple solution to this problem is to have the assembler
scan through the source program twice. During the first pass, it creates a complete
symbol table. At the end of this pass, all names will have been assigned numerical values.
The assembler then goes through the source program a second time and substitutes values
for all names from the symbol table. Such an assembler is called a two-pass assembler.
The assembler stores the object program on a magnetic disk. The object program
must be loaded into the memory of the computer before it is executed. For this to happen,
another utility program called a loader must already be in the memory.
NUMBER NOTATION:-
When dealing with numerical values, it is often convenient to use the familiar
decimal notation. Of course, these values are stored in the computer as binary numbers.
In some situations, it is more convenient to specify the binary patterns directly. Most
assemblers allow numerical values to be specified in different ways, using conventions
that are defined by the assembly language syntax. Consider, for example, the number 93,
which is represented by the 8-bit binary number 01011101. If this value is to be used an
immediate operand, it can be given as a decimal number, as in the instructions.
ADD #93, R1
ADD #%01011101, R1
ADD #$5D, R1
39
COMPUTER ORGANIZATION
Consider a task that reads in character input from a keyboard and produces
character output on a display screen. A simple way of performing such I/O tasks is to use
a method known as program-controlled I/O. The rate of data transfer from the keyboard
to a computer is limited by the typing speed of the user, which is unlikely to exceed a few
characters per second. The rate of output transfers from the computer to the display is
much higher. It is determined by the rate at which characters can be transmitted over the
link between the computer and the display device, typically several thousand characters
per second. However, this is still much slower than the speed of a processor that can
execute many millions of instructions per second. The difference in speed between the
processor and I/O devices creates the need for mechanisms to synchronize the transfer of
data between them.
Bus
SIN SOUT
Keyboard Display
The keyboard and the display are separate device as shown in fig a. the action of
striking a key on the keyboard does not automatically cause the corresponding character
to be displayed on the screen. One block of instructions in the I/O program transfers the
character into the processor, and another associated block of instructions causes the
character to be displayed.
Striking a key stores the corresponding character code in an 8-bit buffer register
associated with the keyboard. Let us call this register DATAIN, as shown in fig a. To
40
COMPUTER ORGANIZATION
inform the processor that a valid character is in DATAIN, a status control flag, SIN, is set
to 1. A program monitors SIN, and when SIN is set to 1, the processor reads the contents
of DATAIN. When the character is transferred to the processor, SIN is automatically
cleared to 0. If a second character is entered at the keyboard, SIN is again set to 1, and the
processor repeats.
An analogous process takes place when characters are transferred from the
processor to the display. A buffer register, DATAOUT, and a status control flag, SOUT,
are used for this transfer. When SOUT equals 1, the display is ready to receive a
character.
In order to perform I/O transfers, we need machine instructions that can check
the state of the status flags and transfer data between the processor and the I/O device.
These instructions are similar in format to those used for moving data between the
processor and the memory. For example, the processor can monitor the keyboard status
flag SIN and transfer a character from DATAIN to register R1 by the following sequence
of operations.
41
COMPUTER ORGANIZATION
Fig b shows a stack of word data items in the memory of a computer. It contains
numerical values, with 43 at the bottom and -28 at the top. A processor register is used to
keep track of the address of the element of the stack that is at the top at any given time.
This register is called the stack pointer (SP). It could be one of the general-purpose
registers or a register dedicated to this function.
Fig b A stack of words in the memory
0
….
…. Stack pointer
….
-28 register
17 SP ®
739 Current
….
….
…. Stack
43
….
….
….
BOTTOMBottom element
2k-1
Another useful data structure that is similar to the stack is called a queue. Data
are stored in and retrieved from a queue on a first-in-first-out (FIFO) basis. Thus, if we
assume that the queue grows in the direction of increasing addresses in the memory,
which is a common practice, new data are added at the back (high-address end) and
retrieved from the front (low-address end) of the queue.
There are two important differences between how a stack and a queue are
implemented. One end of the stack is fixed (the bottom), while the other end rises and
falls as data are pushed and popped. A single pointer is needed to point to the top of the
stack at any given time. On the other hand, both ends of a queue move to higher
addresses as data are added at the back and removed from the front. So two pointers are
needed to keep track of the two ends of the queue.
42
COMPUTER ORGANIZATION
Another difference between a stack and a queue is that, without further control, a
queue would continuously move through the memory of a computer in the direction of
higher addresses. One way to limit the queue to a fixed region in memory is to use a
circular buffer. Let us assume that memory addresses from BEGINNING to END are
assigned to the queue. The first entry in the queue is entered into location BEGINNING,
and successive entries are appended to the queue by entering them at successively higher
addresses. By the time the back of the queue reaches END, space will have been created
at the beginning if some items have been removed from the queue. Hence, the back
pointer is reset to the value BEGINNING and the process continues. As in the case of a
stack, care must be taken to detect when the region assigned to the data structure is either
completely full or completely empty.
Subroutines
In a given program, it is often necessary to perform a particular subtask many
times on different data-values. Such a subtask is usually called a subroutine. For example,
a subroutine may evaluate the sine function or sort a list of values into increasing or
decreasing order.
After a subroutine has been executed, the calling program must resume
execution, continuing immediately after the instruction that called the subroutine. The
subroutine is said to return to the program that called it by executing a Return instruction.
The way in which a computer makes it possible to call and return from
subroutines is referred to as its subroutine linkage method. The simplest subroutine
linkage method is to save the return address in a specific location, which may be a
register dedicated to this function. Such a register is called the link register. When the
subroutine completes its task, the Return instruction returns to the calling program by
branching indirectly through the link register.
The Call instruction is just a special branch instruction that performs the
following operations
43
COMPUTER ORGANIZATION
Memory Memory
Location Calling program location Subroutine SUB
….
….
200 Call SUB 1000 first instruction
204 next instruction ….
…. ….
…. Return
….
1000
204
PC
Link 204
Call Return
Subroutine nesting can be carried out to any depth. Eventually, the last
subroutine called completes its computations and returns to the subroutine that called it.
The return address needed for this first return is the last one generated in the nested call
44
COMPUTER ORGANIZATION
sequence. That is, return addresses are generated and used in a last-in-first-out order. This
suggests that the return addresses associated with subroutine calls should be pushed onto
a stack. A particular register is designated as the stack pointer, SP, to be used in this
operation. The stack pointer points to a stack called the processor stack. The Call
instruction pushes the contents of the PC onto the processor stack and loads the
subroutine address into the PC. The Return instruction pops the return address from the
processor stack into the PC.
PARAMETER PASSING:-
When calling a subroutine, a program must provide to the subroutine the
parameters, that is, the operands or their addresses, to be used in the computation. Later,
the subroutine returns other parameters, in this case, the results of the computation. This
exchange of information between a calling program and a subroutine is referred to as
parameter passing. Parameter passing may be accomplished in several ways. The
parameters may be placed in registers or in memory locations, where they can be
accessed by the subroutine. Alternatively, the parameters may be placed on the processor
stack used for saving the return address.
The purpose of the subroutines is to add a list of numbers. Instead of passing the
actual list entries, the calling program passes the address of the first number in the list.
This technique is called passing by reference. The second parameter is passed by value,
that is, the actual number of entries, n, is passed to the subroutine.
45
COMPUTER ORGANIZATION
Old TOS
The pointers SP and FP are manipulated as the stack frame is built, used, and
dismantled for a particular of the subroutine. We begin by assuming that SP point to the
old top-of-stack (TOS) element in fig b. Before the subroutine is called, the calling
program pushes the four parameters onto the stack. The call instruction is then executed,
resulting in the return address being pushed onto the stack. Now, SP points to this return
address, and the first instruction of the subroutine is about to be executed. This is the
point at which the frame pointer FP is set to contain the proper memory address. Since FP
is usually a general-purpose register, it may contain information of use to the Calling
program. Therefore, its contents are saved by pushing them onto the stack. Since the SP
now points to this position, its contents are copied into FP.
After these instructions are executed, both SP and FP point to the saved FP contents.
Subtract #12, SP
Finally, the contents of processor registers R0 and R1 are saved by pushing them
onto the stack. At this point, the stack frame has been set up as shown in the fig.
The subroutine now executes its task. When the task is completed, the subroutine
pops the saved values of R1 and R0 back into those registers, removes the local variables
from the stack frame by executing the instruction.
Add #12, SP
46
COMPUTER ORGANIZATION
And pops the saved old value of FP back into FP. At this point, SP points to the
return address, so the Return instruction can be executed, transferring control back to the
calling program.
Logic instructions
Logic operations such as AND, OR, and NOT, applied to individual bits, are the basic
building blocks of digital circuits, as described. It is also useful to be able to perform
logic operations is software, which is done using instructions that apply these operations
to all bits of a word or byte independently and in parallel. For example, the instruction
Not dst
Logical shifts:-
Two logical shift instructions are needed, one for shifting left (LShiftL) and
another for shifting right (LShiftR). These instructions shift an operand over a number of
bit positions specified in a count operand contained in the instruction. The general form
of a logical left shift instruction is
R0 0
0 0 1 1 1 0 . . . 0 1 1
before :
after: 1 1 1 0 . . . 0 1 1 00
47
COMPUTER ORGANIZATION
R0 C
Before: 0 1 1 1 0 . . . 0 1 1 0
1
0 0 0 1 1 1 0 . . . 0
After:
R0 C
Before: 1 0 0 1 1 . . . 0 1 0 0
1 1 1 0 0 1 1 . . . 0 1
After:
Rotate Operations:-
In the shift operations, the bits shifted out of the operand are lost, except for the
last bit shifted out which is retained in the Carry flag C. To preserve all bits, a set of
rotate instructions can be used. They move the bits that are shifted out of one end of the
operand back into the other end. Two versions of both the left and right rotate instructions
48
COMPUTER ORGANIZATION
are usually provided. In one version, the bits of the operand are simply rotated. In the
other version, the rotation includes the C flag.
C
R0
Before: 0 0 1 1 1 0 . . . 0 1 1
After: 1 1 1 0 . . . 0 1 1 0 1
C R0
0 1 1 1 0 . . . 0 11
Before: 0
1 1 1 0 . . 0 1 1 0 0
after:
49
COMPUTER ORGANIZATION
C
R0
Before: 0 1 1 1 0 . . . 0 1 1 0
1 1 0 1 1 1 0 . . . 0 1
After:
R0 C
Before: 0 1 1 1 0 . . . 0 1 1 0
after: 1 0 0 1 1 1 0 . . . 0 1
50
COMPUTER ORGANIZATION
language instructions, which are converted into the machine instructions using the
assembler program.
We have seen instructions that perform operations such as add, subtract, move,
shift, rotate, and branch. These instructions may use operands of different sizes, such as 32-
bit and 8-bit numbers or 8-bit ASCII-encoded characters. The type of operation that is to
be performed and the type of operands used may be specified using an encoded binary pattern
referred to as the OP code for the given instruction. Suppose that 8 bits are allocated for
this purpose, giving 256 possibilities for specifying different instructions. This leaves 24
bits to specify the rest of the required information.
Has to specify the registers R1 and R2, in addition to the OP code. If the processor has 16
registers, then four bits are needed to identify each register. Additional bits are needed to
indicate that the Register addressing mode is used for each operand.
The instruction
Move 24(R0), R5
Requires 16 bits to denote the OP code and the two registers, and some bits to express
that the source operand uses the Index addressing mode and that the index value is 24.
The shift instruction
LShiftR #2, R0
Again, 8 bits are used for the OP code, leaving 24 bits to specify the branch
offset. Since the offset is a 2’s-complement number, the branch target address must be
within 223 bytes of the location of the branch instruction. To branch to an instruction
outside this range, a different addressing mode has to be used, such as Absolute or
Register Indirect. Branch instructions that use these modes are usually called Jump
instructions.
51
COMPUTER ORGANIZATION
In all these examples, the instructions can be encoded in a 32-bit word. Depicts a
possible format. There is an 8-bit Op-code field and two 7-bit fields for specifying the
source and destination operands. The 7-bit field identifies the addressing mode and the
register involved (if any). The “Other info” field allows us to specify the additional
information that may be needed, such as an index value or an immediate operand.
But, what happens if we want to specify a memory operand using the Absolute
addressing mode? The instruction
(c ) Three-operand instruction
Requires 18 bits to denote the OP code, the addressing modes, and the register.
This leaves 14 bits to express the address that corresponds to LOC, which is clearly
insufficient.
And #$FF000000. R2
In which case the second word gives a full 32-bit immediate operand.
52
COMPUTER ORGANIZATION
Then it becomes necessary to use tow additional words for the 32-bit addresses of
the operands.
The restriction that an instruction must occupy only one word has led to a style of
computers that have become known as reduced instruction set computer (RISC). The
RISC approach introduced other restrictions, such as that all manipulation of data must be
done on operands that are already in processor registers. This restriction means that the
above addition would need a two-instruction sequence
Move (R3), R1
Add R1, R2
If the Add instruction only has to specify the two registers, it will need just a
portion of a 32-bit word. So, we may provide a more powerful instruction that uses three
operands
R3 ®[R1] + [R2]
53
UNIT-2
Processing Unit:-
Register Transfers
Computer registers are designated by capital letters (sometimes followed by numerals) to
denote the function of the register.
For example, the register that holds an address for the memory unit is usually called a
memory address register and is designated by the name MAR.
Other designations for registers are PC (for program counter), IR (for instruction register,
and R1 (for processor register).
The individual flip-flops in an n-bit register are numbered in sequence from 0 through n −
1, starting from 0 in the rightmost position and increasing the numbers toward the left.
Figure 2-1 shows the representation of registers in block diagram form.
The most common way to represent a register is by a rectangular box with the name of
the register inside, as in Fig. 2-1(a).
The individual bits can be distinguished as in (b).
The numbering of bits in a 16-bit register can be marked on top of the box as shown in
(c).
A 16-bit register is partitioned into two parts in (d).
Bits 0 through 7 are assigned the symbol L (for low byte) and bits 8 through 15 are
assigned the symbol H (for high byte).
The name of the 16-bit register is PC. The symbol PC (0−7) or PC(L) refers to the low-
order byte and PC(8−15) or PC(H) to the high-order byte.
Information transfer from one register to another is designated in symbolic form by
means of a replacement operator. The statement denotes a transfer of the content of R2 ←
R1 register R1 into register R2.
It designates a replacement of the content of R2 by the content of R1. By definition, the
content of the source register R1 does not change after the transfer.
A statement that specifies a register transfer implies that circuits are available from the
outputs of the source register to the inputs of the destination register and that the
destination register has a parallel load capability. Normally, we want the transfer to
UNIT-2
Occur only under a predetermined control condition. This can be shown by means of an
if-then statement.
P : R2 ← R1 R1
The control condition is terminated with a colon. It symbolizes the requirement that the
transfer operation be executed by the hardware only if P = 1.
Every statement written in a register transfer notation implies a hardware construction for
implementing the transfer. Figure 2.2 shows the block diagram that depicts the transfer
from R1 to R2.
The letter n will be used to indicate any number of bits for the register. It will be replaced
by an actual number when the length of the register is known. Register R2 has a load
input that is activated by the control variable P.
It is assumed that the control variable is synchronized with the same clock as the one
applied to the register.
As shown in the timing diagram, P is activated in the control section by the rising edge of
a clock pulse at time t.
UNIT-2
The next positive transition of the clock at time t + 1 finds the load input active and the
data inputs of R2 are then loaded into the register in parallel.
P may go back to 0 at time t + 1; otherwise, the transfer will occur with every clock pulse
transition while P remains active.
Note that the clock is not included as a variable in the register transfer statements. It is
assumed that all transfers occur during a clock edge transition.
Even though the control condition such as P becomes active just after time t, the actual
transfer dose not occur until the register is triggered by the next positive transition of the
clock at time
t + 1.
The basic symbols of the register transfer notation are listed in Table 2-1. Registers are
denoted by capital letters, and numbers may follow the letters.
Parentheses are used to denote a part of a register by specifying the range of bits or by
giving a symbol name to a portion of a register. The arrow denotes a transfer of
information and the direction of transfer. A comma is used to separate two or more
operations that are executed at the same time.
T: R2 ← R1, R1 ←R2
The statement denotes an operation that exchanges the contents of two registers during
one common clock pulse provided that T = 1. This simultaneous operation is possible
with registers that have edge-triggered flip-flops.
UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2
UNIT-2
HARDWIRED CONTROL:
To execute instructions, the processor must have some means of generating the control
signals needed in the proper sequence.
Computer designers use a wide variety of techniques to solve this problem. The
approaches used fall into one of two categories:
hardwired control and micro programmed control.
Consider the sequence of control signals given in Figure 7.Each step in this sequence is
completed in one clock period.
Fig 7
A counter may be used to keep track of the control steps, as shown in Figure 11. Each
state, or count, of this counter corresponds to one control step. The required control
signals are determined by the following information:
1. Contents of the control step counter
2. Contents of the instruction register
3. Contents of the condition code flags
UNIT-2
4. External input signals, such as MFC and interrupt requests
Fig 11
To gain insight into the structure of the control unit, we start with a simplified view of the
hardware involved.
The decoder/encoder block in Figure 11 is a combinational circuit that generates the
required control outputs, depending on the state of all its inputs. By separating the
decoding and encoding functions, we obtain the more detailed block diagram in Figure
12.
The step decoder provides a separate signal line for each step, or time slot, in the control
sequence.
Similarly, the output of the instruction decoder consists of a separate line for each
machine instruction.
For any instruction loaded in the IR, one of the output lines INS1 through INSm is set to
1, and all other lines are set to 0.
The input signals to the encoder block in Figure 12 are combined to generate the
individual control signals Yin,PCout, Add, End, and so on.
An example of how the encoder generates the Zin control signal for the processor
organization in Figure 2 is given in Figure 13. This circuit implements the logic function
Zin=T1+T6 - ADD + T4-BR+---
This signal is asserted during time slot Ti for all instructions, during T6 for an Add
instruction, during T4 for an unconditional branch instruction, and so on. The logic
function for Zin is derived from the control sequences in Figures 7 and 8.
As another example, Figure 14 gives a circuit that generates the End control signal from
the logic
function
End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) • BRN + • • •
The End signal starts a new instruction fetch cycle by resetting the control step counter to
its starting value. Figure 12 contains another control signal called RUN. When
UNIT-2
set to 1, RUN causes the counter to be incremented by one at the end of every clock cycle. When
RUN is equal to 0, the counter stops counting. This is needed whenever the WMFC signal is
issued, to cause the processor to wait for the reply from the memory
Fig 13a
The control hardware shown can be viewed as a state machine that changes from one
state to another in every clock cycle, depending on the contents of the instruction register,
the condition codes, and the external inputs.
The outputs of the state machine are the control signals. The sequence of operations
carried out by this machine is determined by the wiring of the logic elements, hence the
name "hardwired." A controller that uses this approach can operate at high speed.
However, it has little flexibility, and the complexity of the instruction set it can
implement is limited.
UNIT-2
Fig 13 b
o In the meantime, the CPU uses the control lines of the memory bus to mention that a read
operation is needed.
o After issuing this request, the CPU waits till it retains an answer from the memory,
informing it that the required function has been finished. It is accomplished through the
use of another control signal on the memory bus, which will be denoted as Memory
Function Completed (MFC).
o The memory sets this signal to one to mention that the contents of the particular location
in the memory have been read and are available on the data lines of the memory bus.
o We will suppose that as soon as the MFC signal is set to one, the information on the data
lines is loaded into
MDR and is therefore available for use inside the CPU. It finishes the memory fetch operation.
UNIT-2
MAR - [R1]
o WMFC MDRout
o R1out, MARin,
o Read MDRinE, , R2in
MICROPROGRAMMED CONTROL
• Control-signals are generated by a program similar to machine language programs.
• Control word(CW) is a word whose individual bits represent various control-signals(like
Add, End, Zin). {Each of the control-steps in control sequence of an instruction defines a
unique combination of 1s & 0s in the CW}.
• Individual control-words in microroutine are referred to as microinstructions.
• A sequence of CWs corresponding to control-sequence of a machine instruction constitutes the
microroutine.
• The microroutines for all instructions in the instruction-set of a computer are stored in a
special memory calledthe control store(CS).
• Control-unit generates control-signals for any instruction by sequentially reading CWs of
correspondingmicroroutine from CS.
• Microprogram counter(µPC) is used to read CWs sequentially from CS.
• Every time a new instruction is loaded into IR, output of "starting address generator" is loaded
into µPC.
• Then, µPC is automatically incremented by clock causing successive microinstructions to be
read from CS. Hence, control-signals are delivered to various parts of processor in correct
sequence.]
UNIT-2
MICROINSTRUCTIONS
MICROPROGRAM SEQUENCING
• Two major disadvantage of microprogrammed control is:
1) Having a separate microroutine for each machine instruction results in a
large total number ofmicroinstructions and a large control-store.
2) Execution time is longer because it takes more time to carry out the required
branches.
Consider the instruction Add src,Rdst ;which adds the source-operand to the contents of
Rdst and places thesum in Rdst.
Let source-operand can be specified in following addressing modes:
register,autoincrement, autodecrement andindexed as well as the indirect forms of these 4
modes.
Each box in the chart corresponds to a microinstruction that controls the transfers and
operations indicatedwithin the box.
The microinstruction is located at the address indicated by the octal number (001,002).
UNIT-2
UNIT-2
WIDE BRANCH ADDRESSING
• The instruction-decoder(InstDec) generates the starting-address of the microroutine
that implements theinstruction that has just been loaded into the IR.
• Here, register IR contains the Add instruction, for which the instruction decoder generates
the microinstruction address 101. (However, this address cannot be loaded as is into the
μPC).
The source-operand can be specified in any of several addressing-modes. The bit-ORing
technique can be usedto modify the starting-address generated by the instruction-decoder to reach
the appropriate path.
Use of WMFC
• WMFC signal is issued at location 112 which causes a branch to the microinstruction in
location 171.
• WMFC signal means that the microinstruction may take several clock cycles to complete.
If the branch is allowed to happen in the first clock cycle, the microinstruction at location
171 would be fetched and executed prematurely. To avoid this problem, WMFC signal
must inhibit any change in the contents of the μPC during the waiting-period.
Detailed Examination
• Consider Add (Rsrc)+,Rdst; which adds Rsrc content to Rdst content, then stores the
sum in Rdst and finallyincrements Rsrc by 4 (i.e. auto-increment mode).
• In bit 10 and 9, bit-patterns 11, 10, 01 and 00 denote indexed, auto-decrement, auto-
increment and register modes respectively. For each of these modes, bit 8 is used to
specify the indirect version.
• The processor has 16 registers that can be used for addressing purposes; each specified using a
4-bit-code.
• There are 2 stages of decoding:
1) The microinstruction field must be decoded to determine that an Rsrc or Rdst register
is involved.
2) The decoded output is then used to gate the contents of the Rsrc or Rdst fields in
the IR into a second decoder, which produces the gating-signals for the actual
registers R0 to R15.
UNIT-2
UNIT-2
MICROINSTRUCTIONS WITH NEXT-ADDRESS FIELDS
UNIT-2
PREFETCHING MICROINSTRUCTIONS
• Drawback of microprogrammed control: Slower operating speed because of the time it
takes to fetchmicroinstructions from the control-store.
• Solution: Faster operation is achieved if the next microinstruction is pre-fetched while
the current one is beingexecuted.
Emulation
• The main function of microprogrammed control is to provide a means for simple,
flexible and relativelyinexpensive execution of machine instruction.
• Its flexibility in using a machine's resources allows diverse classes of instructions to be
implemented.
• Suppose we add to the instruction-repository of a given computer M1, an entirely new set
of instructions that isin fact the instruction-set of a different computer M2.
UNIT-2
• Programs written in the machine language of M2 can be then be run on computer M1 i.e. M1
emulates M2.
• Emulation allows us to replace obsolete equipment with more up-to-date machines.
• If the replacement computer fully emulates the original one, then no software changes
have to be made to runexisting programs.
• Emulation is easiest when the machines involved have similar architectures.
UNIT-III PROCESS MANAGEMENT
1. Process Concept
TheProcess
A process is a program in execution. A process is more than the program code, which is
sometimes known as the text section.It also includes the current activity, as represented by the value of
the programcounter and the contents of the processor's registers. A process generally alsoincludes the
process stack, which contains temporary data (such as functionparameters, return addresses, and local
variables), and a data section, whichcontains global variables. A process may also include a heap,
which is memorythat is dynamically allocated during process run time. The structure of a process in
memory is shown in Figure.
a program is a passive entity, such as a file containing a list of instructions stored on disk (often
called an executable file), whereas a process is an active entity, with a program counter specifying the
next instruction to execute and a set of associated resources. a program becomes a process when an
executable file is loaded into memory.
Process State
As a process executes, it changes state. The state of a process is defined in part by the current
activity of that process. Each process may be in one of the following states:
• New. The process is being created.
• Running. Instructions are being executed.
• Waiting. The process is waiting for some event to occur (such as an I/O completion or reception of a
signal).
• Ready. The process is waiting to be assigned to a processor.
• Terminated. The process has finished execution.
It is important to realize that only one process can be running on any processor at any instant.
Many processes may be ready and limiting, however. The state diagram corresponding to these states
is presented in Figure.
Process Control Block
Each process is represented in the operating system by a process control block(PCB)—also
called a task control block. A PCB is shown in Figure . It contains many pieces of information associated
with a specific process, including these:
• Process state. The state may be new, ready, running, waiting, halted, and so on.
• Program counter. The counter indicates the address of the next instruction to be executed for this
process.
• CPU registers. The registers vary in number and type, depending on the computer architecture. They
include accumulators, index registers, stack pointers, and general-purpose registers, plus any condition-
code information. Along with the program counter, this state information must be saved when an
interrupt occurs, to allow the process to be continued correctly afterward.
• CPU-scheduling information. This information includes a process priority, pointers to scheduling
queues, and any other scheduling parameters.
• Memory-management information. This information may include such information as the value of
the base and limit registers, the page tables, or the segment tables, depending on the memory system used
by the operating system.
• Accounting information. This information includes the amount of CPU and real time used,
time limits, account members, job or process numbers ,and so on.
• I/O status information. This information includes the list of I/O devices allocated to the process, a list
of open files, and so on. In brief, the PCB simply serves as the repository for any information that may
vary from process to Process.
If all processes are I/O bound, the ready queue will almost alwaysbe empty, and the short-term
scheduler will have little to do. If all processesare CPU bound, the I/O waiting queue will almost always
be empty, devices
will go unused, and again the system will be unbalanced. The system with thebest performance will thus
have a combination of CPU-bound and I/O-boundprocesses.
Some operating systems, such as time-sharing systems, may introduce anadditional, intermediate
level of scheduling. This medium-term scheduler isdiagrammed in Figure . The key idea behind a
medium-term scheduler is
that sometimes it can be advantageous to remove processes from memory(and from active contention for
the CPU) and thus reduce the degree ofmultiprogramming. Later, the process can be reintroduced into
memory, and itsexecution can be continued where it left off. This scheme is called swapping.The process
is swapped out, and is later swapped in, by the medium-termscheduler. Swapping may be necessary to
improve the process mix or because
a change in memory requirements has overcommitted available memory,requiring memory to be freed up.
Context Switch
An interrupts cause the operating system to change a CPUfrom its current task and to run a kernel
routine. Such operations happenfrequently on general-purpose systems. When an interrupt occurs, the
systemneeds to save the current context of the process currently running on theCPU so that it can restore
that context when its processing is done, essentiallysuspending the process and then resuming it.
The context is represented inthe PCB of the process; it includes the value of the CPU registers, the
Process state (see Figure), and memory-management information. Generically, weperform a state save of
the current state of the CPU, be it in kernel or user mode,and then a state restore to resume operations.
Switching the CPU to another process requires performing a stat^ save of the current process and
a state restore of a different process. This task is known as a context switch. When a context switch
occurs, the kernel saves the context of the old process in its PCB and loads the saved context of the new
process scheduled to run. Context-switch time is pure overhead, because the system does no useful work
while switching.
3. Operations on Processes
The processes in most systems can execute concurrently, and they may be created and deleted
dynamically. Thus, these systems must provide a mechanism for process creation and termination.
Process Creation
A process may create several new processes, via a create-process system call, during the course of
execution. The creating process is called a parent process, and the new processes are called the children
of that process. Each of these
new processes may in turn create other processes, forming a tree of processes.
Most operating systems identify processes according to a unique process identifier (or pid),
which is typically an integer number. Figure illustrates a typical process tree for the Solaris operating
system, showing the name of each process and its pid. In Solaris, the process at the top of the tree is the
sched process, with pid of 0. The sched process creates several children processes—including pageout
and f sf lush. These processes are responsible for managing memory and file systems. The sched process
also creates the i n i t process, which serves as the root parent process for all user processes.
Both of the models just discussed are common in operating systems, and many systems
implement both. Message passing is useful for exchanging smaller amounts of data, because no conflicts
need be avoided. Message passing is also easier to implement than is shared memory for intercomputer
communication. Shared memory allows maximum speed and convenience of communication, as it can be
done at memory speeds when within a computer.
Shared memory is faster than message passing, as message-passing systemsare typically
implemented using system calls and thus require the more timeconsumingtask of kernel intervention. In
contrast, in shared- memory systems,
system calls are required only to establish shared-memory regions. Once sharedmemory is established,
all accesses are treated as routine memory accesses, andno assistance from the kernel is required.
Shared-Memory Systems
Interprocess communication using shared memory requires communicatingprocesses to establish
a region of shared memory. Typically, a shared-memoryregion resides in the address space of the process
creating the shared-memorysegment. Other processes that wish to communicate using this shared-
memorysegment must attach it to their address space. They can then exchange information by reading
and writingdata in the shared areas. The form of the data and the location are determined bythese
processes and are not under the operating system's control. The processes
are also responsible for ensuring that they are not writing to the same locationsimultaneously.
To illustrate the concept of cooperating processes, let's consider theproducer-consumer problem,
which is a common paradigm for cooperatingprocesses. A producer process produces information that is
consumed by aconsumer process.
Message-Passing Systems
Message passing provides a mechanism to allow processes to communicate and to synchronize
their actions without sharing the same address space and is particularly useful in a distributed
environment, where the communicating
processes may reside on different computers connected by a network.
A message-passing facility provides at least two operations: send(message) and receive(message).
Messages sent by a process can be of either fixed or variable size. If only fixed-sized messages can be
sent, the system-level implementation is straightforward. This restriction, however, makes the task of
programming more difficult. Conversely, variable-sized messages require a more complex system-level
implementation, but the programming task becomes simpler. This is a common kind of tradeoff seen
throughout operating system design.
If processes P and Q want to communicate, they must send messages to and receive messages
from each other; a communication link must exist between them. This link can be implemented in a
variety of ways. Here are several methods for logically implementing a link and the send()/receive ()
operations:
• Direct or indirect communication
• Synchronous or asynchronous communication
• Automatic or explicit buffering
We look at issues related to each of these features next.
Naming
Processes that want to communicate must have a way to refer to each other. They can use either
direct or indirect communication.Under direct communication, each process that wants to
communicate must explicitly name the recipient or sender of the communication. In this scheme,
the send.0 and receive() primitives are defined as:
• send(P, message)—Send a message to process P.
• receive (Q, message)—Receive a message from process Q.
• A link is established automatically between every pair of processes that want to communicate. The
processes need to know only each other's identity to communicate.
• A link is associated with exactly two processes.
• Between each pair of processes, there exists exactly one link.
This scheme exhibits symmetry in addressing; that is, both the sender process and the receiver process
must name the other to communicate. A variant of this scheme employs asymmetry in addressing. Here,
only the sender names the recipient; the recipient is not required to name the sender. In this scheme, the
send() and receive () primitives are defined as follows:
• send(P, message)—Send a message to process P.
• receive(id, message)—-Receive a message from any process; the variable id is set to the name of the
process with which communication has taken place.
With indirect communication, the messages are sent to and received from mailboxes, or ports. A
mailbox can be viewed abstractly as an object into which messages can be placed by processes and from
which messages can be removed.
Each mailbox has a unique identification.
Two processes can communicate only if the processes have a shared mailbox, however. The sendC) and
receive () primitives are defined as follows:
• send(A, message)—Send a message to mailbox A.
• receive(A, message)—Receive a message from mailbox A.
The benefits of multithreaded programming can be broken down into fourmajor categories:
1. Responsiveness. Multithreading an interactive application may allow aprogram to continue running
even if part of it is blocked or is performinga lengthy operation, thereby increasing responsiveness to the
user. Forinstance, a multithreaded web browser could still allow user interactionin one thread while an
image was being loaded in another thread.
2. Resource sharing. By default, threads share the memory and theresources of the process to which
they belong. The benefit of sharingcode and data is that it allows an application to have several
differentthreads of activity within the same address space.
3. Economy. Allocating memory and resources for process creation is costly.Because threads share
resources of the process to which they belong, itis more economical to create and context-switch threads.
Empiricallygauging the difference in overhead can be difficult, but in general it ismuch more time
consuming to create and manage processes than threads.
4. Utilization of multiprocessor architectures. The benefits of multithreadingcan be greatly
increased in a multiprocessor architecture, wherethreads may be running in parallel on different
processors. A singlethreaded process can only run on one CPU, no matter how many areavailable.
Multithreading on a multi-CPU machine increases concurrency.
Multithreading Models
Supportfor threads may be provided either at the user level, for user threads, or by thekernel, for kernel
threads. User threads are supported above the kernel andare managed without kernel support, whereas
kernel threads are supportedand managed directly by the operating system.
Ultimately, there must exist a relationship between user threads and kernelthreads. In this section,
we look at three common ways of establishing thisrelationship.
Many-to-One Model
The many-to-one model maps many user-level threads to onekernel thread. Thread management
is done by the thread library in userspace, so it is efficient; but the entire process will block if a thread
makes ablocking system call. Also, because only one thread can access the kernel at atime, multiple
threads are unable to run in parallel on multiprocessors. Greenthreads—a thread library available for
Solaris—uses this model, as does GNUPortable Threads.
One-to-One Model
The one-to-one model maps each user thread to a kernel thread. Itprovides more concurrency than the
many-to- one model by allowing anotherthread to run when a thread makes a blocking system call; it also
allowsmultiple threads to run in parallel on multiprocessors. The only drawback tothis model is that
creating a user thread requires creating the correspondingkernel thread. Because the overhead of creating
kernel threads can burden
6. CPU Scheduling
CPU-scheduling decisions may take place under the following four circumstances:
1. When a process switches from the running state to the waiting state
2. When a process switches from the running state to the ready state
3. When a process switches from the waiting state to the ready state
4. When a process terminates
For situations 1 and 4, there is no choice in terms of scheduling. A new process(if one exists in
the ready queue) must be selected for execution. There is achoice, however, for situations 2 and 3.
When scheduling takes place only under circumstances 1 and 4, we saythat the scheduling
scheme is nonpreemptive or cooperative; otherwise, itis preemptive. Under nonpreemptive scheduling,
once the CPU has beenallocated to a process, the process keeps the CPU until it releases the CPU
eitherby terminating or by
switching to the waiting state. This scheduling methodwas used by Microsoft Windows 3.x; Windows 95
introduced preemptivescheduling, and all subsequent versions of Windows operating systems haveused
preemptive scheduling.
Unfortunately, preemptive scheduling incurs a cost associated with accessto shared data.
Consider the case of two processes that share data. While oneis updating the data, it is preempted so that
the second process can run. Thesecond process then tries to read the data, which are in an inconsistent
state.
Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher.The dispatcher is
the module that gives control of the CPU to the process selectedby the short-term scheduler. This
function involves the following:
• Switching context
• Switching to user mode
• Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible, since it is invoked during everyprocess switch. The
time it takes for the dispatcher to stop one process andstart another running is known as the dispatch
latency.
Scheduling Criteria
Different CPU scheduling algorithms have different properties, and the choiceof a particular
algorithm based onMany criteria have been suggested for comparing CPU scheduling algorithms.The
criteria include thefollowing:
• CPU utilization. We want to keep the CPU as busy as possible. Conceptually,CPU utilization can
range from 0 to 100 percent. In a real system, itshould range from 40 percent (for a lightly loaded
system) to 90 percent.
• Throughput. One measure of CPU work is the number of processes that are completedper time
unit, called throughput. For long processes, this rate may be oneprocess per hour; for short
transactions, it may be 10 processes per second.
• Turnaround time. It is how long it takes to execute that process. The intervalfrom the time of
submission of a process to the time of completion is theturnaround time. Turnaround time is the sum of
the periods spent waitingto get into memory, waiting in the ready queue, executing on the CPU,
anddoing I/O.
• Waiting time. Theamount of time that a process spends waiting in the ready queue. Waitingtime is the
sum of the periods spent waiting in the ready queue.
• Response time. The time from the submissionof a request until the first response is produced. This
measure, calledresponse time, is the time it takes to start responding, not the time it takesto output the
response. The turnaround time is generally limited by thespeed of the output device.
It is desirable to maximize CPU utilization and throughput and to minimizeturnaround time,
waiting time, and response time. In most cases, we optimizethe average measure. However, under some
circumstances, it is desirableto optimize the minimum or maximum values rather than the average.
Scheduling Algorithms
CPU scheduling deals with the problem of deciding which of the processesin the ready queue is
to be allocated the CPU. There are many different CPUscheduling algorithms. In this section, we
describe several of them.
1. First-Come, First-Served Scheduling
The simplest CPU-scheduling algorithm is the first-come, first-served(FCFS) scheduling
algorithm.
With this scheme, the process that requests theCPU first is allocated the CPU first. The implementation of
the
FCFS policy iseasily managed with a FIFO queue. When a process enters the ready queue, itsPCB is
linked onto the tail of the queue. When the CPU is free, it is allocated tothe process at the head of the
queue. The running process is then removed fromthe queue. The code for FCFS scheduling is simple to
write and understand.The average waiting time under the FCFS policy, however, is often quitelong.
Consider the following set of processes that arrive at time 0, with thelength of the CPU burst given in
milliseconds:
Process Burst Time
P1 24
P2 3
P3 3
If the processes arrive in the order P1, P2, P3, and are served in FCFS order,we get the result
shown in the following Gantt chart:
P1 P2 P3
0 24 2 30
The waiting time is 0 milliseconds for process Pi, 24 milliseconds for process
Pn, and 27 milliseconds for process Pj. Thus, the average waiting time is (0
+ 24 + 27)/3 = 17 milliseconds. If the processes arrive in the order Pi,
P3, Pi, however, the results will be as showrn in the following Gantt
chart:
P2 P3 P1
0 3 6 30
The average waiting time is now (6 + 0 + 3)/3 = 3 milliseconds. This reductionis substantial. Thus, the
average waiting time under an FCFS policy is generallynot minimal and may vary substantially if the
process's CPU burst times varygreatly.
In addition, consider the performance of FCFS scheduling in a dynamicsituation. Assume we
have one CPU-bound process and many I/O-boundprocesses. As the processes flow around the system,
the following scenariomay result. The CPU-bound process will get and hold the CPU. During thistime,
all the other processes will finish their I/O and will move into the readyqueue, waiting for the CPU.
While the processes wait in the ready queue, theI/O devices are idle. Eventually, the CPU-bound process
finishes its CPU burstand moves to an I/O device. All the I/O-bound processes, which have shortCPU
bursts, execute quickly and move back to the I/O queues. At this point,the CPU sits idle. The CPU-bound
process will then move back to the ready queue and be allocated the CPU. Again, all the I/O processes
end up waiting inthe ready queue until the CPU-bound process is done.
There is a convoy effect as all the other processes wait for the one big process to get off the CPU.
This effect results in lower CPU and device utilization than might be possible if the shorter processes
were allowed to go first.
The FCFS scheduling algorithm is non preemptive. Once the CPU has been allocated to a
process, that process keeps the CPU until it releases the CPU, eitherby terminating or by requesting I/O.
The FCFS algorithm
is thus particularly trouble some for time-sharing systems, where it is important that each user geta
share of the CPU at regular intervals.
2. Shortest-Job-First Scheduling
The shortest-job-first (SJF) scheduling algorithm associates with each process the length of the
process’s next CPU burst. When the CPU is available, it is assigned to the process that has the smallest
next CPU burst. If the next CPU bursts of two processes are the same, FCFS scheduling is used to break
the tie. Note that a more appropriate term for this scheduling method would be the shortest-next-CPU-
burst algorithm, because scheduling depends on the length of the next CPU burst of a process ,rather than
its total length. as an example of SJF scheduling, consider the following set of processes ,with the length
of the CPU burst given in milliseconds:
Process Burst Time
P1 6
P2 8
P3 7
P4 3
Using SJF scheduling, we would schedule these processes according to thefollowing Gantt chart:
P P3 P2
4 P1
0 3 9 16 24
The waiting time is 3 milliseconds for process P1, 16 milliseconds for processP2, 9 milliseconds
for process P3, and 0 milliseconds for process P4. Thus, theaverage waiting time is (3 + 16 + 9 + 0)/4 - 7
milliseconds. By comparison, ifwe were using the FCFS scheduling scheme, the average waiting time
would be 10.25 milliseconds.
The SJF scheduling algorithm is provably optimal, in that it gives theminimum average waiting
time for a given set of processes. Moving a shortprocess before a long one decreases the waiting time of
the short process morethan it increases the waiting time of the long process. Consequently, the
averagewaiting time decreases.
The real difficulty with the SJF algorithm is knowing the length of the nextCPU request. There is
no way to know the length of the nextCPU burst. One approach is to try to approximate SJF scheduling.
We may notknow the length of the next CPU burst, but we may be able to predict its value.We expect
that the next CPU burst will be similar in length to the previous ones.Thus, by computing an
approximation of the length of the next CPU burst, wecan pick the process with the shortest predicted
CPU burst.The next CPU burst is generally predicted as an exponential average of themeasured lengths
of previous CPU bursts. Let tn be the length of the
»th CPUburst, and let T,,+I be our predicted value for the next CPU burst. Then, for a, 0< a < 1, define
T n + 1 =atn + ( l – a)Tn.
This formula defines an exponential average. The value of tn contains ourmost recent
information; in stores the past history. The parameter a controlsthe relative weight of recent and past
history in our prediction. If a = 0, thenTn,+I =Tn,, and recent history has no effect (current conditions are
assumedto be transient); if a = 1, then Tn+1= tn, and only the most recent CPU burstmatters (history is
assumed to be old and irrelevant). More
commonly, a =1/2, so recent history and past history are equally weighted. The initial T0 canbe defined
as a constant or as an overall system average.
The SJF algorithm can be either preemptive or nonpreemptive. The choicearises when a new
process arrives at the ready queue while a previous process isstill executing. The next CPU burst of the
newly arrived process may be shorterthan what is left of the currently executing process. A preemptive
SJF algorithmwill preempt the currently executing process, whereas a nonpreemptiTe SJFalgorithm will
allow the currently running process to finish its CPU burst.Preemptive SJF scheduling is sometimes
called shortest-remaining- time-firstscheduling.
As an example, consider the following four processes, with the length ofthe CPU burst given in
milliseconds:
Process Arrival Time Burst Time
P1 0 8
P2 1 4
P3 2 9
P4 3 5
If the processes arrive at the ready queue at the times shown and need theindicated burst times,
then the resulting preemptive SJF schedule is as depictedin the following Gantt chart:
Process Pi is started at time 0, since it is the only process in the queue. ProcessP2 arrives at time
1. The remaining time for process Pi (7 milliseconds) islarger than the time required by process P2 (4
milliseconds), so process Pi ispreempted, and process P2 is scheduled. The average waiting time for
thisexample is ((10 - 1) + (1
- 1) + (17 - 2) + (5 - 3))/4 = 26/4 = 6.5 milliseconds.Nonpreemptive SJF scheduling would result in an
average waiting time of 7.75milliseconds.
3 Priority Scheduling
The SJF algorithm is a special case of the general priority scheduling algorithm.A priority is
associated with each process, and the CPU is allocated to the processwith the highest priority. Equal-
priority processes are scheduled in FCFS order.An SJF algorithm is simply a priority algorithm where the
priority (p) is theinverse of the (predicted) next CPU burst. The larger the CPU burst, the lowerthe
priority, and vice versa.
Note that we discuss scheduling in terms of high priority and low priority.Priorities are generally
indicated by some fixed range of numbers, such as 0to 7 or 0 to 4,095. However, there is no general
agreement on whether 0 is thehighest or lowest priority. Some systems use low numbers to represent
lowpriority; others use low numbers for high priority. This difference can lead toconfusion. In this text,
we assume that low numbers represent high priority.
As an example, consider the following set of processes, assumed to havearrived at time 0, in the
order P1, P2, • • -, P5, with the length of the CPU burstgiven in milliseconds
Process Ps 2
Pi 1
Pi Burst Time 5
P3 10
PA 1 Priority
Using priority scheduling, we would schedule these processes according to thefollowing Gantt chart:
3 4 2
1 5
The average waiting time is 8.2 milliseconds.
Priorities can be defined either internally or externally. Internally definedpriorities use some
measurable quantity or quantities to compute the priorityof a process
Priority scheduling can be either preemptive or nonpreemptive. When aprocess arrives at the
ready queue, its priority is compared with the priorityof the currently running process. A preemptive
priority scheduling algorithmwill preempt the CPU if the priority of the newly arrived process is
higherthan the priority of the currently running process. A nonpreemptive priorityscheduling algorithm
will simply put the new process at the head of the readyqueue.
A major problem with priority scheduling algorithms is indefinite blocking,or starvation. A
process that is ready to run but waiting for the CPU canbe considered blocked. A priority scheduling
algorithm can leave some lowpriorityprocesses waiting indefinitely. In a heavily loaded computer
system, asteady stream of higher-priority processes can prevent a low-priority processfrom ever getting
the CPU. Generally, one of two things will happen. Either theprocess will eventually be run or the
computer system will eventually crash and lose allunfinished low-priority processes.
A solution to the problem of indefinite blockage of low-priority processesis aging. Aging is a
technique of gradually increasing the priority of processesthat wait in the system for a long time. For
example, if priorities range from127 (low) to 0 (high), we could increase the priority of a waiting process
by1 every 15 minutes. Eventually, even a process with an initial priority of 127would have the highest
priority in the system and would be executed. In fact,it would take no more than 32 hours for a priority-
127 process to age to apriority-0 process.
4 Round-Robin Scheduling
The round-robin (RR) scheduling algorithm is designed especially for timesharingsystems. It is
similar to FCFS scheduling, but preemption is added toswitch between processes. A small unit of time,
called a time quantum or timeslice, is defined. A time quantum is generally from 10 to 100
milliseconds. Theready queue is treated as a circular queue. The CPU scheduler goes around theready
queue, allocating the CPU to each process for a time interval of up to 1time quantum.
To implement RR scheduling, we keep the ready queue as a FIFO queue ofprocesses. New
processes are added to the tail of the ready queue. The CPUscheduler picks the first process from the
ready queue, sets a timer to interruptafter 1 time quantum, and dispatches the process.One of two things
will then happen.
The process may have a CPU burst ofless than 1 time quantum. In this case, the process itself will
release the CPUvoluntarily. The scheduler will then proceed to the next process in the readyqueue.
Otherwise, if the CPU burst of the currently running process is longerthan 1 time quantum, the timer will
go off and will cause an interrupt to theoperating system. A context switch will be executed, and the
process will beput at the tail of the ready queue. The CPU scheduler will then select the nextprocess in
the ready queue.
The average waiting time under the RR policy is often long. Consider thefollowing set of
processes that arrive at time 0, with the length of the CPU burstgiven in milliseconds:
Process Burst Time
Pi 24
Pi 3
P3 3
If we use a time quantum of 4 milliseconds, then process P1 gets the first4 milliseconds. Since it
requires another 20 milliseconds, it is preempted afterthe first time quantum, and the CPU is given to the
next process in the queue,process P2. Since process P2 does not need 4 milliseconds, it quits before
itstime quantum expires. The CPU is then given to the next process, process P3.Once each process has
received 1 time quantum, the CPU is returned to processP1 for an additional time quantum. The resulting
RR schedule is
P1 P2 P3 P1 P1 P1 P1 P1
0 4 7 10 14 18 22 26 30
The average waiting time is 17/3 = 5.66 milliseconds.
In the RR scheduling algorithm, no process is allocated the CPU for morethan 1 time quantum in
a row (unless it is the only runnable process). If aprocess's CPU burst exceeds 1 time quantum, that
process is preempted and isput back in the ready queue. The RR scheduling algorithm is thus
preemptive.If there are n processes in the ready queue and the time quantum is q,then each process gets
1/n of the CPU time in chunks of at most q time units.
Each process must wait no longer than (n — 1) x q time units until itsnext time quantum. For
example, with five processes and a time quantum of 20milliseconds, each process will get up to 20
milliseconds every 100 milliseconds.
The performance of the RR algorithm depends heavily on the size of thetime quantum. At one
extreme, if the time quantum is extremely large, the RRpolicy is the same as the FCFS policy If the time
quantum is extremely small(say, 1 millisecond), the RR approach is called processor sharing and (in
theory)creates the appearance that each of n processes has its own processor runningat 1/n the speed of
the real processor.
In software, we need also to consider the effect of context switching on theperformance of RR
scheduling. Let us assume that we have only one process of10 time units. If the quantum is 12 time units,
the process finishes in less than 1time quantum, with no overhead. If the quantum is 6 time units,
however, theprocess requires 2 quanta, resulting in a context switch. If the time quantum is1 time unit,
then nine context switches will occur, slowing the execution of theprocess accordingly.
Thus, we want the time quantum to be large with respect to the contextswitchtime. If the context-
switch time is approximately 10 percent of thetime quantum, then about 10 percent of the CPU time will
be spent in contextswitching. In practice, most modern systems have time quanta ranging from10 to 100
milliseconds. The time required for a context switch is typically lessthan 10 microseconds; thus, the
context-switch time is a small fraction of thetime quantum.
Thread Libraries
A thread library provides the programmer with an API for creating and managing threads.
There are two primary ways of implementing a thread library. The first approach is to provide a library
entirely in user space with no kernel support. All code and data structures for the library exist in user
space. This means that invoking a function in the library results in a local function call in user space and
not a system call.
The second approach is to implement a kernel-level library supported directly by the operating
system. In this case, code and data structures for the library exist in kernel space. Invoking a function in
the API for the library typically results in a system call to the kernel.
Three main thread libraries are in use today: POSIX Pthreads, Windows, and Java. Pthreads, the
threads extension of the POSIX standard, may be provided as either a user-level or a kernel-level library.
The Windows thread library is a kernel-level library available on Windows systems. The Java thread
API allows threads to be created and managed directly in Java programs. However, because in most
instances the JVM is running on top of a host operating system, the Java thread API is generally
implemented using a thread library available on the host system. This means that on Windows systems,
Java threads are typically implemented using the Windows API; UNIX and Linux systems often use
Pthreads.
Threading Issues
In this section, we discuss some of the issues to consider in designing multithreaded programs.
2. Signal Handling
A signal is used in UNIX systems to notify a process that a particular event has occurred. A
signal may be received either synchronously or asynchronously, depending on the source of and the
reason for the event being signaled. All signals, whether synchronous or asynchronous, follow the same
pattern:
1. A signal is generated by the occurrence of a particular event.
2. The signal is delivered to a process.
Examples of synchronous signal include illegal memory access and divi- sion by 0. If a running
program performs either of these actions, a signal is generated. Synchronous signals are delivered to
the same process that performed the operation that caused the signal (that is the reason they are
considered synchronous).
A signal may be handled by one of two possible handlers:
1. A default signal handler
2. A user-defined signal handler
Every signal has a default signal handler that the kernel runs when handling that signal. This
default action can be overridden by a user-defined signal handler that is called to handle the signal.
Signals are handled in different ways. Some signals (such as changing the size of a window) are simply
ignored; others (such as an illegal memory access) are handled by terminating the program.
Thestandard UNIX function for delivering asignal is
kill(pid t pid, int signal)
POSIX Pthreads provides the following function, which allows a signal to be delivered to a
specified thread (tid):
pthread kill(pthread t tid, int signal)
Although Windows does not explicitly provide support for signals, it allows us to emulate them
using asynchronous procedure calls (APCs). The APC facility enables a user thread to specify a
function that is to be called when the user thread receives notification of a particular event.
Thread Cancellation
Thread cancellation involves terminating a thread before it has completed. For example, if
multiple threads are concurrently searching through a database and one thread returns the result, the
remaining threads might be canceled. Another situation might occur when a user presses a button on a web
browser that stops a web page from loading any further.A thread that is to be canceled is often referred
to as the target thread.
Cancellation of a target thread may occur in two different scenarios:
2. Deferred cancellation. The target thread periodically checks whether it should terminate, allowing it
an opportunity to terminate itself in an orderly fashion.
The difficulty with cancellation occurs in situations where resources have been allocated to a
canceled thread or where a thread is canceled while in the midst of updating data it is sharing with other
threads. This becomes especially troublesome with asynchronous cancellation.
pthreadcancel(tid);
Thread-Local Storage
Threads belonging to a process share the data of the process. Indeed, this data sharing provides
one of the benefits of multithreaded programming. However, in some circumstances, each thread might
need its own copy of certain data. We will call such data thread-local storage (or TLS.) For example, in
a transaction-processing system, we might service each transaction in a separate thread. Furthermore,
each transaction might be assigned a unique identifier. To associate each thread with its unique identifier,
we could use thread-local storage.
Scheduler Activations
A final issue to be considered with multithreaded programs concerns com- munication between
the kernel and the thread library, which may be required by the many-to-many and two-level model, Such
coordination allows the number of kernel threads to be dynamically adjusted to help ensure the best
performance.
Many systems implementing either the many-to-many or the two-level model place an
intermediate data structure between the user and kernel threads. This data structure— typically known as
a lightweight process, or LWP . To the user-thread library, the LWP appears to be a virtual processor on
which the application can schedule a user thread to run. Each LWP is attached to a kernel thread, and it
is kernel threads that the operating system schedules to run on physical processors. If a kernel thread
blocks (such as while waiting for an I/O operation to complete), the LWP blocks as well. Up the chain,
the user-level thread attached to the LWP also blocks.
LWP
• The Computer Hardware contains a central processing unit(CPU), the memory, and the
input/output (I/O) devices and it provides the basic computing resources for the system.
• The Application programs like spreadsheets, Web browsers, word processors, etc. are used to
define the ways in which these resources are used to solve the computing problems of the users. And
the System program mainly consists of compilers, loaders, editors, OS, etc.
• The Operating System is mainly used to control the hardware and coordinate its use among the
various application programs for the different users.
• Basically, Computer System mainly consists of hardware, software, and data.
OS is mainly designed in order to serve two basic purposes:
1. The operating system mainly controls the allocation and use of the computing System’s resources
among the various user and tasks.
2. It mainly provides an interface between the computer hardware and the programmer that simplifies
and makes feasible for coding, creation of application programs and debugging
Two Views of Operating System
1. User's View
2. System View
Operating System: User View
The user view of the computer refers to the interface being used. Such systems are designed for one user to
monopolize its resources, to maximize the work that the user is performing. In these cases, the operating
system is designed mostly for ease of use, with some attention paid to performance, and none paid to
resource utilization.
Operating System: System View
The operating system can be viewed as a resource allocator also. A computer system consists of many
resources like - hardware and software - that must be managed efficiently. The operating system acts as the
manager of the resources, decides between conflicting requests, controls the execution of programs, etc.
Operating System Management Tasks
1. Process management which involves putting the tasks into order and pairing them into manageable
size before they go to the CPU.
2. Memory management which coordinates data to and from RAM (random-access memory) and
determines the necessity for virtual memory.
3. Device management provides an interface between connected devices.
4. Storage management which directs permanent data storage.
5. An application that allows standard communication between software and your computer.
6. The user interface allows you to communicate with your computer.
1. Types of Operating System
Given below are different types of Operating System:
1. Simple Batch System 5. Distributed Operating System
2. Multiprogramming Batch System 6. Clustered System
3. Multiprocessor System 7. Realtime Operating System
4. Desktop System 8. Handheld System
Functions of Operating System
1. It boots the computer
2. It performs basic computer tasks e.g. managing the various peripheral devices e.g. mouse, keyboard
3. It provides a user interface, e.g. command line, graphical user interface (GUI)
4. It handles system resources such as the computer's memory and sharing of the central processing
unit(CPU) time by various applications or peripheral devices.
5. It provides file management which refers to the way that the operating system manipulates, stores,
retrieves, and saves data.
6. Error Handling is done by the operating system. It takes preventive measures whenever required to
avoid errors.
Advantages of Operating System
Given below are some advantages of the Operating System:
• The operating system helps to improve the efficiency of the work and helps to save a lot of time by
reducing the complexity.
• The different components of a system are independent of each other, thus failure of one component
does not affect the functioning of another.
• The operating system mainly acts as an interface between the hardware and the software.
• Users can easily access the hardware without writing large programs.
• With the help of an Operating system, sharing data becomes easier with a large number of users.
• easily install any game or application on the Operating system easily and can run them
• operating system can be refreshed easily from time to time without having any problems.
• operating system can be updated easily.
Disadvantages of an Operating system
Given below are the drawbacks of using an operating system:
• Expensive
There are some open-source platforms like Linux. But some operating systems are expensive.
• Virus Threat
Operating Systems are open to virus attacks and sometimes it happens that many users download the
malicious software packages on their system which pauses the functioning of the Operating system
and also slows it down.
• Complexity
Some operating systems are complex in nature because the language used to establish them is not
clear and well defined. If there occurs an issue in the operating system then the user becomes unable
to resolve that issue.
• System Failure
An operating system is the heart of the computer system if due to any reason it will stop functioning
then the whole system will crashes.
Examples of Operating System
• Windows • Linux
• Android • Window Phone OS
• iOS • Chrome OS
• Mac OS
2.Evolution of Operating Systems
The evolution of operating systems is directly dependent on the development of computer systems and how
users use them. Here is a quick tour of computing systems through the past fifty years in the timeline.
Early Evolution
• 1945: ENIAC, Moore School of • 1951: UNIVAC by Remington
Engineering, University of Pennsylvania. • 1952: IBM 701
• 1949: EDSAC and EDVAC • 1956: The interrupt
• 1949: BINAC - a successor to the ENIAC
• 1954-1957: FORTRAN was developed
Operating Systems - Late 1950s
By the late 1950s Operating systems were well improved and started supporting following usages:
• It was able to perform Single stream batch processing.
• It could use Common, standardized, input/output routines for device access.
• Program transition capabilities to reduce the overhead of starting a new job was added.
• Error recovery to clean up after a job terminated abnormally was added.
• Job control languages that allowed users to specify the job definition and resource requirements were
made possible.
Operating Systems - In 1960s
• 1961: The dawn of minicomputers
• 1962: Compatible Time-Sharing System (CTSS) from MIT
• powerful, and really useful.
• 1967-1968: Mouse was invented.
• 1964 and onward: Multics
• 1969: The UNIX Time-Sharing System from Bell Telephone Laboratories.
Supported OS Features by 1970s
• Multi User and Multi tasking was introduced.
• Dynamic address translation hardware and Virtual machines came into picture.
• Modular architectures came into existence.
• Personal, interactive systems came into existence.
Accomplishments after 1970
• 1971: Intel announces the microprocessor
• 1972: IBM comes out with VM: the Virtual Machine Operating System
• 1973: UNIX 4th Edition is published
• 1973: Ethernet
• 1974 The Personal Computer Age begins
• 1974: Gates and Allen wrote BASIC for the Altair
• 1976: Apple II
• August 12, 1981: IBM introduces the IBM PC
• 1983 Microsoft begins work on MS- • 1992 The first Windows virus comes out
Windows • 1993 Windows NT
• 1984 Apple Macintosh comes out • 2007: iOS
• 1990 Microsoft Windows 3.0 comes out • 2008: Android OS
• 1991 GNU/Linux
An operating system is a construct that allows the user application programs to interact with the system
hardware. Operating system by itself does not provide any function but it provides an atmosphere in which
different applications and programs can do useful work.
The major operations of the operating system are process management, memory management, device
management and file management. These are given in detail as follows:
Process Management
The operating system is responsible for managing the
processes i.e assigning the processor to a process at a
time. This is known as process scheduling. The
different algorithms used for process scheduling are
FCFS (first come first served), SJF (shortest job
first), priority scheduling, round robin scheduling etc.
There are many scheduling queues that are used to
handle processes in process management. When the
processes enter the system, they are put into the job
queue. The processes that are ready to execute in the
main memory are kept in the ready queue. The
processes that are waiting for the I/O device are kept
in the device queue.
Memory Management
Memory management plays an important part in operating system. It deals with memory and the moving of
processes from disk to primary memory for execution and back again.
The activities performed by the operating system for memory management are −
• The operating system assigns memory to the processes as required. This can be done using best fit,
first fit and worst fit algorithms.
• All the memory is tracked by the operating system i.e. it nodes what memory parts are in use by the
processes and which are empty.
• The operating system deallocated memory from processes as required. This may happen when a
process has been terminated or if it no longer needs the memory.
Device Management
There are many I/O devices handled by the operating system such as mouse, keyboard, disk drive etc. There
are different device drivers that can be connected to the operating system to handle a specific device. The
device controller is an interface between the device and the device driver. The user applications can access
all the I/O devices using the device drivers, which are device specific codes.
File Management
Files are used to provide a uniform view of data storage by the operating system. All the files are mapped
onto physical devices that are usually non volatile so data is safe in the case of system failure.
The files can be accessed by the system in two ways i.e. sequential access and direct access −
• Sequential Access
The information in a file is processed in order using sequential access. The files records are accessed
on after another. Most of the file systems such as editors, compilers etc. use sequential access.
• Direct Access
In direct access or relative access, the files can be accessed in random for read and write operations.
The direct access model is based on the disk model of a file, since it allows random accesses.
4.Operating System Structure
operating system is a construct that allows the user application programs to interact with the system
hardware. Since the operating system is such a complex structure, it should be created with utmost care so it
can be used and modified easily. An easy way to do this is to create the operating system in parts. Each of
these parts should be well defined with clear inputs, outputs and functions.
Simple Structure
There are many operating systems that have a rather simple structure. These started as small systems and
rapidly expanded much further than their scope. A common example of this is MS-DOS. It was designed
simply for a niche amount for people. There was no indication that it would become so popular.
An image to illustrate the structure of MS-DOS is as follows −
It is better that operating systems have a modular structure, unlike MS-DOS. That would lead to greater
control over the computer system and its various applications. The modular structure would also allow the
programmers to hide information as required and implement internal routines as they see fit without
changing the outer specifications.
Layered Structure
One way to achieve modularity in the operating system is the layered approach. In this, the bottom layer is
the hardware and the topmost layer is the user interface.
An image demonstrating the layered approach is as
follows −
As seen from the image, each upper layer is built on the bottom layer. All the layers hide some structures,
operations etc from their upper layers.
One problem with the layered structure is that each layer needs to be carefully defined. This is necessary
because the upper layers can only use the functionalities of the layers below them.
5. Operating System - Services
An Operating System provides services to both the users and to the programs.
• It provides programs an environment to execute.
• It provides users the services to execute the programs in a convenient manner.
Following are a few common services provided by an operating system −
• Program execution
• I/O operations
• File System manipulation
• Communication
• Error Detection
• Resource Allocation
• Protection
Program execution
Operating systems handle many kinds of activities from user programs to system programs like printer
spooler, name servers, file server, etc. Each of these activities is encapsulated as a process.
A process includes the complete execution context (code to execute, data to manipulate, registers, OS
resources in use). Following are the major activities of an operating system with respect to program
management −
• Loads a program into memory.
• Executes the program.
• Handles program's execution.
• Provides a mechanism for process synchronization.
• Provides a mechanism for process communication.
• Provides a mechanism for deadlock handling.
I/O Operation
An I/O subsystem comprises of I/O devices and their corresponding driver software. Drivers hide the
peculiarities of specific hardware devices from the users.
An Operating System manages the communication between user and device drivers.
• I/O operation means read or write operation with any file or any specific I/O device.
• Operating system provides the access to the required I/O device when required.
File system manipulation
A file represents a collection of related information. Computers can store files on the disk (secondary
storage), for long-term storage purpose. Examples of storage media include magnetic tape, magnetic disk
and optical disk drives like CD, DVD. Each of these media has its own properties like speed, capacity, data
transfer rate and data access methods.
A file system is normally organized into directories for easy navigation and usage. These directories may
contain files and other directions. Following are the major activities of an operating system with respect to
file management −
• Program needs to read a file or write a file.
• The operating system gives the permission to the program for operation on file.
• Permission varies from read-only, read-write, denied and so on.
• Operating System provides an interface to the user to create/delete files.
• Operating System provides an interface to the user to create/delete directories.
• Operating System provides an interface to create the backup of file system.
Communication
In case of distributed systems which are a collection of processors that do not share memory, peripheral
devices, or a clock, the operating system manages communications between all the processes. Multiple
processes communicate with one another through communication lines in the network.
The OS handles routing and connection strategies, and the problems of contention and security. Following
are the major activities of an operating system with respect to communication −
• Two processes often require data to be transferred between them
• Both the processes can be on one computer or on different computers, but are connected through a
computer network.
• Communication may be implemented by two methods, either by Shared Memory or by Message
Passing.
Error handling
Errors can occur anytime and anywhere. An error may occur in CPU, in I/O devices or in the memory
hardware. Following are the major activities of an operating system with respect to error handling −
• The OS constantly checks for possible errors.
• The OS takes an appropriate action to ensure correct and consistent computing.
Resource Management
In case of multi-user or multi-tasking environment, resources such as main memory, CPU cycles and files
storage are to be allocated to each user or job. Following are the major activities of an operating system
with respect to resource management −
• The OS manages all kinds of resources using schedulers.
• CPU scheduling algorithms are used for better utilization of CPU.
Protection
Considering a computer system having multiple users and concurrent execution of multiple processes, the
various processes must be protected from each other's activities.
Protection refers to a mechanism or a way to control the access of programs, processes, or users to the
resources defined by a computer system. Following are the major activities of an operating system with
respect to protection −
• The OS ensures that all access to system resources is controlled.
• The OS ensures that external I/O devices are protected from invalid access attempts.
• The OS provides authentication features for each user by means of passwords.
User Operating-System Interface
There are two fundamental approaches for users to interface with the operating system. One technique is to
provide a command-line interface or command interpreter that allows users to directly enter commands that
are to be performed by the operating system. The second approach allows the user to interface with the
operating system via a graphical user interface or GUI.
Command Interpreter
Some operating systems include the command interpreter in the kernel. Others, such as Windows XP and
UNIX, treat the command interpreter as a special program that is running when a job is initiated or when a
user first logs on (on interactive systems). On systems with multiple command interpreters to choose from,
the interpreters are known as shells. For example, on UNIX and Linux systems, there are several different
shells a user may choose from including the Bourne shell, C shell, Bourne-Again shell, the Korn shell, etc.
Most shells provide similar functionality with only minor differences; most users choose a shell based upon
personal preference. The main function of the command interpreter is to get and execute the next user-
specified command. Many of the commands given at this level manipulate files: create, delete, list, print,
copy, execute, and so on. The MS-DOS and UNIX shells operate in this way. There are two general ways
in which these commands can be implemented. In one approach, the command interpreter itself contains
the code to execute the command.
Graphical User Interfaces
A second strategy for interfacing with the operating system is through a userfriendly graphical user
interface or GUI. Rather than having users directly enter commands via a command-line interface, a GUI
allows provides a mouse-based window-and-menu system as an interface. A GUI provides a desktop
metaphor where the mouse is moved to position its pointer on images, or icons, on the screen (the desktop)
that represent programs, files, directories, and system functions. Depending on the mouse pointer's location,
clicking a button on the mouse can invoke a program, select a file or directory—known as a folder— or
pull down a menu that contains commands. Graphical user interfaces first appeared due in part to research
taking place in the early 1970s at Xerox PARC research facility. The first GUI appeared on the Xerox Alto
computer in 1973.
.
However, with the release of Mac OS X (which is in part implemented using a UNIX kernel), the operating
system now provides both a new Aqua interface and command-line interface as well. The user interface can
vary from system to system and even from user to user within a system. It typically is substantially
removed from the actual system structure. The design of a useful and friendly user interface is therefore not
a direct function of the operating system. In this book, we concentrate on the fundamental problems of
providing adequate service to user programs. From the point of view of the operating system, we do not
distinguish between user programs and system programs.
System calls in Operating System?
The interface between a process and an operating system is provided by system calls. In general, system
calls are available as assembly language instructions. They are also included in the manuals used by the
assembly level programmers. System calls are usually made when a process in user mode requires access to
a resource. Then it requests the kernel to provide the resource via a system call.
A figure representing the execution of the system call is given as follows −
As can be seen from this diagram, the processes execute normally in the user mode until a system call
interrupts this. Then the system call is executed on a priority basis in the kernel mode. After the execution of
the system call, the control returns to the user mode and execution of user processes can be resumed.
In general, system calls are required in the following situations −
• If a file system requires the creation or deletion of files. Reading and writing from files also require a
system call.
• Creation and management of new processes.
• Network connections also require system calls. This includes sending and receiving packets.
• Access to a hardware devices such as a printer, scanner etc. requires a system call.
Synchronization Hardware:
Any solution to critical section problem requires simple tool a Lock.
Race conditions are prevented by requiring that critical regions protected by locks
Hardware features make any programming task easier & improve system efficiency
Uni Processor environment: could disable interrupts
Disable interrupts while shared variables was being modified
Currently running code would execute without preemption
This the approach taken by Non-preemptive kernels
Multi process environment:
Time consuming: Message need to pass to all processors which delay in execution
May modern machines provide special atomic hardware instructions
Either test memory word and set value
Swap contents of two memory words
Definition of TestAndndSet() Instruction: Mutual Exclusion implementation with TestAndSet():
boolean TestAndSet (boolean *target) while (true)
{ { while ( TestAndSet (&lock )) ; // do nothing
boolean rv = *target; // critical section
*target = TRUE; lock = FALSE;
return rv: // remainder section
} }
Definition of Swap() Instruction: Mutual Exclusion implementation with Swap():
void Swap (boolean *a, boolean *b) while (true)
{ { key = TRUE;
boolean temp = *a; while ( key == TRUE)
*a = *b; Swap (&lock, &key );
*b = temp: // critical section
} lock = FALSE;
// remainder section
}
Bounded Waiting implementation with TestAndSet():
while (true)
{ waiting[i]=TRUE;
key = TRUE;
while ( waiting[i] && key)
key=TestAndSet(&lock);
waiting[i]=FALSE;
// critical section
j=(i+1)%n;
while((j!=i)&&!waiting[j])
j=(j+1)%n;
if(j==i) lock=FALSE;
else waiting[j]=FALSE;
// remainder section
}
Semaphore:
• Hardware implementation is complicated for application programmers to use
• Semaphore is a synchronization tool to overcome shared data problems
• Concept:
– Semaphore S is integer variable
– Two standard atomic operations: wait() and signal()
Definition of wait() is: Definition of signal() is:
Wait(s) Signal (S)
{ While(S<=0) ;//no operation { S++;
S--; }
}
• When one process modifies semaphore value, no other process is allowed to modify semaphore value
• Testing (S<=0) and Modification (S--) must be executed without interruption
Usage:
• Counting semaphore: integer value can range over an unrestricted domain
– can be used to control access to given resource consisting of finite number
– It is initialized to number of resources available
– Each process wishes to user resource performs wait() operation
– When process releases resource, it perform signal()
– When count for semaphore goes to 0, all block resources are being used
– Process that wish to use resource will block until count becomes greater than 0
• Binary semaphore: integer value can range only between 0 and 1
– Also known as mutex locks, Provides mutual exclusion
– Mutual Exclusion implementation with Semaphores
while(TRUE)
{ Waiting(mutex);
//Critical Section
Signal(mutex);
//Remainder Section
}
• We can use semaphores to solve various synchronization problems:
• Ex: two process P1 with S1, P2 with S2, then we require S2 be executed only after S1
Implementation:
• Main problem in implementation is: Busy Waiting
• Busy Waiting:
– while process is in its critical section, any other process that tries to enter its critical section must
loop continuously in entry code
– This continual looping is clearly problem in real Multi programming systems
– It wastes CPU cycles that some other process might able to use productivity
– This type of semaphore is known as Spinlock because process spins while waiting for lock
• Solution: wait(), signal() definitions should be modified
– Wait() operation finds semaphore value is not positive, it must wait
– Instead of busy waiting, process can Block it self, by placing it in waiting queue of semaphore
– Control transfers to CPU to select another process
– Blocked process will be restarted, when some other process executes Signal() operation
– Process is restarted by wakeup() operation: which changes process from waiting to ready
C semaphore Structure: Wait() semaphore operation: Signal() semaphore operation:
typedef struct wait(semaphore *s) signal(semaphore *s)
{ { s->value--; { s->value++;
int value; if(s->value<0) if(s->value<=0)
struct process *list; { add this process to s->list; { remove process P from s->list;
}semaphore; block(); wakeup(P);
} }
} }
• Two operations are provided by OS as basic system calls:
– Block() operation, Wakeup(P) operation
• If semaphore value is negative, its magnitude is number of processes waiting on that semaphore
• List of waiting process can be easily implemented by link field in each PCB
• Each semaphore contains integer value & pointer to list of PCB’s
• FIFO queue is used for bounded waiting
• Critical aspects of semaphore:
– There must guarantee that no two process can execute wait() & signal()
– Uni-processor environment: It can be solve by simply disable interrupts while executing wait()
& signal()
– Multi processor environment: must provide alternative locking technique such as Spinlocks to
ensure that wait() & signal() are performed automatically
• It removed busy waiting from entry section to critical sections of applications
Deadlock and Starvation:
• Deadlock: two or more processes are waiting indefinitely for an event that can be caused by only one
of the waiting processes
• Let use consider two process P0, P1 each accessing two semaphores S and Q be two semaphores
initialized to 1:
P0 P1
wait (S); wait (Q);
wait (Q); wait (S);
. .
. .
signal (S); signal (Q);
signal (Q); signal (S);
• Resource acquisition & release
• Starvation: indefinite blocking, a situation in which process wait indefinitely with in semaphore
• Indefinite blocking may occur if we add & remove processes from list associated with semaphore in
LIFO
Deadlock Characterization:
Necessary Conditions:
Deadlock can arise if four conditions hold simultaneously.
o Mutual exclusion: only one process at a time can use a resource.
o Hold and wait: a process holding at least one resource is waiting to acquire additional resources
held by other processes.
o No preemption: a resource can be released only voluntarily by the process holding it, after that
process has completed its task.
o Circular wait: there exists a set {P0, P1, …, P0} of waiting processes such that P0 is waiting for a
resource that is held by P1, P1 is waiting for a resource that is held by P2, …, Pn–1 is waiting for a
resource that is held by
Pn, and P0 is waiting for a resource that is held by P0.
Resource Allocation Graph:
Deadlocks are described in terms of Directed Graph called System Resource Allocation graph
This graph consists of vertices V, set of edges E
Set of vertices are partitioned into two different types of nodes:
o P={P1, P2, …Pn}, set consisting of all active processes in system
o R={R1, R2, .. Rn}, set consisting of all resource types in system
A directed edge from process Pi to resource type Rj is denoted by PiRj
o It signifies that Pi has requested an instance of resource type Rj
o Currently Pi is waiting for that resource
o It is called as Request edge
A directed edge from process Rj to resource type P i is denoted by Rj Pi
o It signifies that instance of resource Rj has been allocated to process Pi
o It is called as Assignment edge
Circle is used for process, Rectangle is used for resource for representation
Edge transformation:
o When a process requests resources Request edge is placed
o later it is transformed to Assignment Edge
o Finally edge is removed when operation is completed
Example of a Resource Allocation Graph
Set P, R, E:
o P={P1, P2, P3}
o R={R1, R2, R3, R4}
o E={P1R1, P2R3, R1P2, R2P2, R2P1,
R3P3}
Resource Instances:
o One instance of R1
o Two instances of R2
o One instance of R3
o Three instances of R4
Process States:
o P1 is holding R2, waiting for R1
o P2 is holding R1 & R2, waiting for R3
o P3 is holding R3
If graph contains no cycles, then no process in system is deadlocked
If graph contains cycles, then deadlock may exist
If each resource has exactly one instance, then cycle in the graph is necessary & sufficient condition for
existence of dead lock
If each resource has several instances, then cycle does not necessarily imply that dead lock has occurred
Deadlock Prevention:
Ensuring that at least one of these conditions can not hold, we can prevent occurrence of deadlock
Mutual Exclusion:
Mutual Exclusion must hold for non sharable resources
Ex: printer cannot be simultaneously shared several processes
Sharable resources do not requires mutual exclusion access & can not involve deadlock
Ex: read only files are sharable resources
A process never need to wait for sharable resource
Hold & Wait:
Hold & wait never occur in system, we must guarantee that
Whenever process requests resource, it does not hold any other resources
Two Protocols are used for this:
o One protocol allows each process to request and be allocated all its resources before it begins
execution. System call can be used for providing this protocol
o Second protocol allows process to request resources only when it has no holding resource. All
resources must be released to request any additional resources
Ex: Consider process that copies file from DVD to disk, sorts file, and print results
Method1: Taking all resources at the beginning
o If all resources must be requested at the beginning of process, then it must initially request DVD,
Disk, Printer
o It will hold printer for its entire execution, even though it needs printer only at end
Method2: Request resource by releasing all current resources
o Allow process to request initially only DVD and Disk
o It copies file from DVD to Disk then release both DVD and Disk
o Then again request Disk and Printer
o After sending file from Disk to Printer, release these two resources
Disadvantage of Method1: Resource utilization may be low, since resources are unused for a long time
after allocation
Disadvantage of Method2: Starvation is possible, process that needs several resources may have to wait
indefinitely
No Preemption:
We need to use a protocol to ensure this condition
General method:
o If process is holding some resources & requests another resource that cannot be immediately
allocated to it, then all resources currently being held are preempted
o All these resources are implicitly released
o Preempted resources are added to the list of resources for which process is waiting
o Process will be restarted only when it can regain its old resources as well as new ones that it is
requesting
Alternative method:
o If process requests some resource, first check whether they are available
o If they are free, allocate them
o If they are not free, check whether they are allocated to some other process that is waiting for other
resource
o If so, preempt the desired resource from the waiting process & allocate them to requesting process
o If resources are neither available nor holding by waiting process, requesting process must wait
o While it is waiting some of its resources may be preempted, if only another process requests
o This protocol can be used to resources whose state can be easily saved & stored
Circular Wait:
Simple method:
o Impose total ordering of all resource types and to require that each process requests resources in an
increasing order of enumeration
o Let R={R1,R2,…Rm} be set of resource types, each resource types is assigned with unique integer
number which allows to compare
o Define one to one function F:RN, where N is set of natural number
o Ex: F(tape drive)=1, F(disk drive)=5, F(printer)=12
o Consider following protocol for prevention:
Each process can request resources only in increasing order of enumeration
Process can request any number of instances of resource type say Ri
Process can request resource type Rj if and only if F(Rj)>F(Ri)
If several instances of same resource are needed, single request for all of them must be issued
Whenever process requests an instance of resource type Rj, it has released Ri such that
F(Ri)>=F(Rj)
o Let the set of processes involved in the circular wait be { P0 , P1, ... , P11}, where Pi is waiting for a
resource Ri, which is held by process Pi+l
o Then, since process Pi+l is holding resource Ri, while requesting resource Ri+l we must have F(Ri) <
F(Ri+1) for all i
o But this condition means that F(R0) < F(R1) < ... < F(Rn) < F (R0)
o If these two protocols are used circular wait condition cannot hold
o Developing ordering & hierarchy itself does not prevent deadlock
o It is up to application developer to write programs that follow the ordering
o Certain software can be used to verify that locks are acquired in proper order
o One lock order verifier that works on UNIX such as FreeBSD, is known as Witness
Deadlock Avoidance:
Disadvantages with deadlock prevention:
o Low device utilization
o Reduced system through put
Deadlock avoidance require additional information about how resources are to be requested
Ex: System has Printer & Tape Drive along with two Process P, Q
o P request first tap drive, later printer and are allotted
o Before releasing both resources, Q requests first printer & then tape drive
o With this knowledge of complete sequence of requests & release, we can decide for each request
whether or not process should wait
o System will consider current available resources and future requests & releases of each process to
issue request to avoid deadlocks
Simplest & most useful avoidance algorithm requires
o Max. number of resources of each type that process may need
o Max. Number of resources of each type that process may request
o It dynamically examines resource allocation state to check circular wait
o Deadlock State is defined by no. of available & allocated resources and Max. demands of process
Safe State:
State is Safe if the system can allocate resources to each process in some order & still avoid deadlock
A sequence of processes {P1, P2, .. Pn} is Safe sequence for each state
o for each Pi, resource request that Pi can make can be satisfied
by currently available resources plus resources held by all Pj
with j<i
o If resource that Pi needs are not immediately available, then Pi
can wait until all Pj have finished
o When they have finished, Pi can obtain all of its needed
resources, complete its task and return its allocated resources
and terminate
o When Pi terminate, Pi+1 can obtain its needed resources and so
on
If state is Safe, OS can avoid deadlock states; If state is Unsafe, OS
cannot prevent process from requesting resources such that deadlock occurs
Ex: consider system with 12 magnetic tape drives & 3 Process
At time T0: if 5 tapes for P0, 2 tapes for P1, 2 tapes for P2 are allotted then, system is in Safe State
Process Max. At time After Some After Some After Some After Some After Some
Need T0 time time time time time
5 5 5
P0 10 10 Completed Completed
(5 need) ( 5 need) ( 5 need)
2
P1 4 4 Completed Completed Completed Completed
(2 need)
2 2 2 2 2
P2 9 9
(7 need) (7 need) (7 need) (7 need) (7 need)
9 11 7 12 2 9
Total 23
(3 free) (1 free) (5 free) (0 free) (10 free) (3 free)
At time T1: if 5 tapes for P0, 2 tapes for P1, 3 tapes for P2 are allotted then, system is in Unsafe State
Process Max. At time After Some After Some After Some time
Need T1 time time
5 5 5 P0 requires 5 tapes, P1
P0 10
(5 need) ( 5 need) ( 5 need)
requires 6 tapes to complete,
but only 4 tapes are free. So,
2 both P0, P1 will be
P1 4 4 Completed
(2 need) deadlocked.
3 3 3
P2 9
(6 need) (6 need) (6 need)
10 12 8
Total 23
(2 free) (0 free) (4 free)
Resource-Allocation Graph:
In addition to the request and assignment edges, we introduce a new type of edge, called a claim edge
A claim edge Pi Rj indicates that process Pi may request resource Rj at
some time in the future
This edge resembles a request edge in direction, but is represented by a
dashed line
When process Pi requests resource Rj, the claim edge Pi Rj is converted
to a request edge
Similarly, when a resource Rj is released by Pi, the assignment edge Rj
Pi is reconverted to a claim edge Pi Rj.
Suppose that process Pi requests resource Rj.
The request can be granted only if converting the request edge Pi Rj to an assignment edge Rj Pi does
not result in the formation of a cycle in the resource-allocation graph.
If no cycle exists, then the allocation of the resource will leave the system in
a safe state. If a cycle is found, then the allocation will put the system in an
unsafe state.
Therefore, process Pj will have to wait for its requests to be satisfied. If P1
requests R2, and P2 requests R1, then a deadlock will occur.
Banker’s Algorithm:
The deadlock-avoidance algorithm that we describe next is applicable to
such a system, but is less efficient than the resource-allocation graph scheme.
This algorithm is commonly known as the banker's algorithm.
Data Structures for the Banker’s Algorithm:
Let n = number of processes, and m = number of resources types.
o Available: Vector of length m. If available [j] = k, there are k instances of resource type Rj
available.
o Max: n x m matrix. If Max [i,j] = k, then process Pi may request at most k instances of resource type
Rj.
o Allocation: n x m matrix. If Allocation[i,j] = k then Pi is currently allocated k instances of Rj.
o Need: n x m matrix. If Need[i,j] = k, then Pi may need k more instances of Rj to complete its task.
Need [i,j] = Max[i,j] – Allocation [i,j].
Safety Algorithm:
o Let Work and Finish be vectors of length m and n, respectively. Initialize:
o Work = Available
o Finish [i] = false for i = 0, 1, …, n- 1.
o Find and i such that both:
Finish [i] = false
Needi Work
o If no such i exists, go to step 4.
Work = Work + Allocationi
Finish[i] = true
go to step 2.
o If Finish [i] == true for all i, then the system is in a safe state.
Resource-Request Algorithm for Process Pi :
o Request = request vector for process Pi. If Requesti [j] = k then process Pi wants k instances of
resource type R j.
o If Requesti Needi go to step 2. Otherwise, raise error condition, since process has exceeded its
maximum claim.
o If Requesti Available, go to step 3. Otherwise Pi must wait, since resources are not available.
o Pretend to allocate requested resources to Pi by modifying the state as follows:
Available = Available – Request;
Allocationi = Allocationi + Requesti;
Needi = Needi – Requesti;
If safe the resources are allocated to Pi.
If unsafe Pi must wait, and the old resource-allocation state is restored
Illustrative Example:
Consider system with 5 processes P0 through P4, 3 resource types: A (10 instances), B (5instances), and C
(7 instances).
Snapshot at time T0:
Since all finish values are True. So with this we can say that the system is in safe state and the safe
sequence is <P1, P3, P4, P0, P2>.
If a process P1 requests one instance of A and 2 instances of C. then the Rquest 1= (1, 0, 2).
In order to allocate this request or not, we need to apply
Allocation Max Need Available
ABC ABC ABC ABC
P0 010 753 743 230
P1 302 322 020
P2 302 902 600
P3 211 222 011
P4 002 433 431
Resource request algorithm as follows:
o Rquest1(1, 0, 2)<=Need1(1, 2, 2) is True
o Rquest1(1, 0, 2)<=Available(3, 3, 2) is True
o Since both conditions are satisfied so allocate this request by updating their data structures as
follows:
Available(2,3,0): = Available(3,3,2) - Request1(1,0,2)
Allocation1(3,0,2): = Allocation1(2,0,0) + Request1(1,0,2)
Need1 (0,2,0):= Need1(1,2,2) – Request1(1,0,2)
Executing safety algorithm shows that sequence < P1, P3, P4, P0, P2> satisfies safety requirement.
Deadlock Detection:
If a system does not employ either a deadlock-prevention or a deadlock-avoidance algorithm, then a
deadlock situation may occur
In this environment, the system must provide:
1. An algorithm that examines the state of the system to determine whether a deadlock has occurred
2. An algorithm to recover from the deadlock
Single Instance of Each Resource Type:
If all resources have only a single instance, then resource-allocation graph is sufficient for detecting
deadlock by constructing Wait-for graph.
Wait for graph is obtained from resource-allocation graph by removing the nodes of type resource and
collapsing the appropriate edges
An edge PiPj exists in a wait-for graph if and only if the corresponding resource allocation graph
contains two edges Pi Rq and Rq Pj for some resource Rq.
Resource allocation graph and its wait for graph are as follows:
Converting Edges in Resource
Allocation Graph to Wait for graph:
P1R1P2 => P1P2
P2R4P3 => P2P3
P2R5P4 => P2P4
P2R3P5 => P2P5
P3R5P4 => P3P4
P4R2P1 => P4P1
Wait for graph contains only process
and 6 edges must be includes as shown
Detection-Algorithm Usage:
When should we invoke the detection algorithm? The answer depends on two factors:
o How often is a deadlock likely to occur?
o How many processes will be affected by deadlock when it happens?
If deadlocks occur frequently, then the detection algorithm should be invoked frequently.
Resources allocated to deadlocked processes will be idle until the deadlock can be broken.
In addition, the number of processes involved in the deadlock cycle may grow.
Invoke Deadlock Detection algorithm every time a request for allocation cannot be granted immediately.
In this case, we can identify not only the set of processes that is deadlocked, but also the specific process
that "caused" the deadlock.
If this algorithm is invoked for every resource request, this will incur computational overhead
A less expensive alternative is simply to invoke the algorithm at less frequent intervals. Ex: once per hour
or whenever CPU utilization drops below 40% etc.
Recovery from Deadlock:
One possibility is to inform the operator that a deadlock has occurred, and to let the operator deal with the
deadlock manually.
The other possibility is to let the system recover from the deadlock automatically.
There are two options for breaking a deadlock.
o One solution is simply to abort one or more processes to break the circular wait.
o Second option is to preempt some resources from one or more of the deadlocked processes.
Process Termination:
To eliminate deadlocks by aborting a process, we use one of two methods. In both methods, the system
reclaims all resources allocated to the terminated processes.
o Abort all deadlocked processes:
This method clearly will break the deadlock cycle
It is more expensive, since these processes may have computed for a long time, and the results
of these partial computations must be discarded, and probably must be recomputed later.
o Abort one process at a time until the deadlock cycle is eliminated:
This method incurs considerable overhead, since after each process is aborted, deadlock-
detection algorithm must be invoked to determine whether any processes are still deadlocked.
Aborting process may not easy, if process was in midst of updating file, terminating will leave file in
incorrect state
If the partial termination method is used, then determine which deadlocked processes should be terminated
in an attempt to break the deadlock.
This determination is policy decision, Many factors may affect which process is chosen, including:
o What the priority of the process is
o How long the process has computed, and how much longer the process will compute before
completing its designated task
o How many and what type of resources the process has used
o How many more resources the process needs in order to complete
o How many processes will need to be terminated
o Whether the process is interactive or batch
Resource Preemption:
Successively preempt some resources until deadlock cycle is broken
If preemption is required to deal with deadlocks, then three issues need to be addressed:
o Selecting a victim:
Which resources and which processes are to be preempted?
As in process termination, we must determine the order of preemption to minimize cost
Cost factors may include such parameters as the number of resources a deadlock process is
holding, and the amount of time a deadlocked process has thus far consumed.
o Rollback:
If we preempt a resource from a process, what should be done with that process?
It cannot continue with its normal execution, because some needed resource will miss
We must roll back the process to some safe state, and restart it from that state
It is difficult to find safe state, so perform Total roll back until deadlock is broken
o Starvation:
How do we ensure that starvation will not occur?
Since victim is selected based on cost factor, some process may always picked as victim
That process may not complete due to preemption
Solution is pick the process for finite number of times, most common solution is include
number of roll backs in cost factor
UNIT V
MEMORY MANAGEMENT
Memory: - it is a large array of words or bytes, each with its own address.
Addressing binding:-
The binding of instructions and data to memory addresses can be called address binding.
The binding can be done at 3 ways.1) Compile time 2) load time 3) execution time.
Compile time: - binding can be done at the compilation time. i.e if it is known at the compile time where
the process will reside in memory, then the absolute code can be generated.
Load time:-binding can be done at the load time. if it is not known at compile time, where the process will
reside in memory, then the compiler must generate the releasable code. In this case the final binding is
delayed until load time.
Execution time:-the binding must be delayed until run time.
Dynamic loading:-
The loading being postponed until execution time. to obtain the better memory utilization we can use
dynamic loading. In this a routine is not loaded until it is called. All routines are kept on disk in a
relocartable load format.In this unused routine is never loaded.
Dynamic linking:-the linking being postponed until execution time. Most O.S support static linking, in
which all system language libraries are treated as module and kept in memory. For this much memory is
wasted. But in dynamic linking it used the stub. It is a small piece of code that indicates how to load the
library if the routine is not already present.
OVER LAYS:- The entire program and data of a process must be in physical memory for the process to
execute. The size of a process is limited to the size of physical memory. A process can large than the
amount of memory allocated to it,a technique called overlays.
Logical versus physical address space:-
L.M:-
1) the address generated by the C.P.U or user process is commonly referred as logical address.
2) It is relative address.
3) The set of all logical addresses generated by a program is called logical address space. The user
programmer deals with logical addresses.
4) the logical address is used in user mode.
P.M:-
1) an address seen by the memory unit is called physical address.
2) It is absolute address.
3) The set of all physical addresses corresponding to these
logical addresses is referred
4) A computer system has a physical memory, which is a H/W
device.
5) Physical address are only used in system mode.
6) to as a physical address space. User program never see the real
physical address.
In compile time and load time address binding the logical
address and physical addresses are same.
Swapping:-
A process needs to be in memory to be executed. A process however can be swapped temporarily out of
memory to a backing store, and then brought back in to memory for continued execution.
EX:- In round robin C.P.U scheduling algorithm, when a quantum expires, the memory manager will start
to swap out the process from the memory, and to swap in another process to that memory space.
EX:- In preemptive priority base algorithm if a higher priority process arrives and wants service, the
memory manager can swap out the lower priority process, so that it can be load and execute the higher
priority process. When higher priority process finishes, the lower priority process can be swapped back in
and continued. This is called Roll Out and Roll In.
Swapping requires a backing store. the backing store is commonly a fast disk. It must be large enough to
accommodate copies of all memory images for all users. And must provide direct access to these memory
images.
Never swap the process with pending I/O.
CONTIGUOUS ALLOCATION:-
The memory is usually divided in to two partitions. One for resident O.S, and one for the user processes.
It is possible to place the O.S in either low or high memory. But usually O.S is kept in low memory because
the interrupt vectors are often in low memory.
There are two types.
1) Single partition allocation2) Multiple partition allocation
a) fixed sized partition(M.F.T)Variable sized partition(M.V.T)
Single partition allocation:-
The O.S is residing in low memory and the user processes are executing in high memory. We need to
protect the O.S code and data from changes by the user processes. We can provide this protection by using a
relocation register and limit registers. The relocation register contains value of the smallest physical address
and limit register contains the range of Logical address.
In the above fig the address generated by C.P.U is compared with limit register, if the L.A is < limit register
then the value of logical address is added with relocation register and mapped address is sent to memory.
Disadvantages:-
1) Much memory is wasted.
2) Only one user process can run at a time.
We search for a hole among the set of holes to allocate the process, then 3 strategies used to select a
free hole from a set of available holes.
Disadvantages:-
1) In which external fragmentation can occurs.
Fragmentation:-
The wastage of memory space is called fragmentation. There are two types .
1) Internal fragmentation 2) External fragmentation.
1) External fragmentation:- it exists when enough total memory space is available to satisfy a request, but
it is not contiguous.
In this example the process p5 can requesting 500k. but in fig total available memory space is
(300+260)=560k but that space is not contiguous.
2) Internal fragmentations:-
The wastage of memory at the internal block.
EX: - In M.F.T a process is allocated to a fixed partition. The
size of each partition is 5 bytes. The process requesting memory
4 bytes. Here 1 byte of memory is wasted in internal block. This
is called internal fragmentation.
Another type of problem arises in multiple partition
allocation is
Suppose that the next process request 18,462 bytes. If we
allocate exactly the requested block, then 2 bytes of hole is left
free. To maintain the 2 bytes of free hole O.S needs more
memory than this memory. So the general approach is to allocate
very small holes as part of the larger request. Thus the allocated
memory may be slightly larger than the requested memory. The
difference between these two numbers is internal fragmentation.
Compaction:- one solution to the problem of external
fragmentation is compaction.
The goal of the compaction is to shuffle the memory contents
to place all the free memory to gather in one large block.
Here the three holes of size 100k, 300k and
260k can be compacted in to one hole of size
660k.
Compaction is not always possible. We moved
the processes, then these processes to be able to
execute in their new locations, internal addresses
must be relocated. If the relocation is static,
compaction is not possible.
If the relocation is dynamic then compaction is
possible.
When a compaction is possible, we must
determine its cost. The simplest compaction
algorithm is to move all processes towards one
end of memory, all holes move in the other
direction, and producing one large hole of available memory. But it is very expensive.
We note that one large hole of available memory is not at the end of memory, but rather is in the middle.
Paging:-
Another possible solution to the external fragmentation problem is to paging.
The physical memory is broken into fixed sized blocks called frames.
The logical memory is also broken in to blocks of the same size called pages.
When the process is to be executed, its pages are loaded in to any available memory frames from the
backing store. The backing store is divided in to fixed sized blocks that are of the same size the memory
frames.
Paging hardware:-
Every address generated by the C.P.U is divided in to two parts:
1) page number 2) page offset
the page number is used as an index in to a page table. The page table contains the base address of each
page in physical memory. This base address is combined with the page offset to define the physical
memory.
The page size is defined by the H/W. the size of a page is typically a power of 2. Power of 2 is selected as
page size because translation of a logical address in to a page number and page offset is easy. If the size of
logical address is 2 and page size is 2n bytes then the high-order m-n bits of a logical address contains page
number and the n-low-order bits designate the page offset.
By using paging we have no external fragmentation. Any free frame is allocated to a process that needs it.
How ever we may have some internal fragmentation .the frames are allocated as units. Suppose the memory
requirements of a process is do not fix on page boundaries, the last frame allocated but may not be
completely full.
Ex:- a process would need ‘n’ pages plus one byte. It would be allocated n+1 frames, resulting an internal
fragmentation.
When a process arrives in the system to be executed, its size expressed in pages, is is examined.
Each page of the process needs one frame. Thus if the process requries’n’ pages, there must be at least’n’
frames available in memory. The first page of the process is loaded in to one of the allocated frame and the
frame number is put in the page table for this process. The next page is loaded in to another frame and its
frame number is put in to the page table and so on..
The O.S is managing physical memory. The O.S must be maintain the allocation details of physical
memory. Which frame are allocated, which frame are available, how many total frames there are and soon..
this information generally kept in a data structure called frame table.
The O.S maintains a copy of the page table for each process. Paging there fore increases the context switch
time
In the above fig the associative registers contains only a few of the page table entries. When a logical
address is generated by C.P.U its page number is presented as a set of associative registers that contains
page number and their corresponding frame number. If a page number is found in associative registers, its
frame number is immediately available and is used to access memory.
If page number is not in associative registers, a memory reference to the page table must be made.
Suppose if TLB is full then the O.S must select one for replacement. The percentage of times that a page
number is found in the associative registers is called hit ratio. An 80 percent of hit ratio means that we find
the desired page number in associative registers 80 percent of time.
If the page number is in the associative registers then it takes 20 nanoseconds to search the associative
registers and 100 nanoseconds to access memory. so the mapped memory access takes 120 nanoseconds.
If the page number is not in associative registers then it takes 220 nanoseconds.
i.e 20 for searching in associative registers
100 for first access memory for the page table and frame number
100 for access the desired byte in memory.
PROTECTION:-
Memory protection in paged environment is accomplished by two ways.
1) by using valid-invalid bit.
2) Read only (or) read& write.
These protection bits are attached to the each entry of the page table.
Here the valid-invalid bit is attached to the each entry in the page table.That bit containing “v” if it is
valid, i.e the page of the process is in its logical address space.
This bit containing “i” if it is invalid. I.e page of process is not in its logical address space.
Here a process containing 0 to 5 (six) pages. So page numbers 6,7,8 will have “i” in the valid-invalid bit . so
if pages 6,7,8 if referenced then trap will occur.
Other method is use read& write or readonly bits to the each entry of the page table. The page table can
be checked to verify that no writes are being made to a read only page.
Ex:- a system with a 64-bits logical address space, two level paging scheme is no longer appropriate. So a
three level paging scheme must be used.
Each logical address in the system consists of a triple<process-id, page no, offset>
Each inverted page table entry is a pair<process-id, page no>
When a memory reference occurs, part of the virtual address consisting of<process-id, page no> is
presented to the memory subsystem. The I.P.T is then searched for a match . if a match is found say at
entry i. Then the physical address<I, offset> is made. If no match is found then an illegal address access has
attempted.
This scheme decrease the amount of memory needed to store each page table. but searching take far too
long. For this problem use hash tables to limit the search.
Shared pages:- Another advantages of paging is the sharing common code
SEGMENTASTION:-
Paging is an arbitrary division of the logical address space in to small fixed sized pieces. Instead of using
pages we could divide the logical address space of a process in to pieces based on the semantic of program.
Such pieces are called segments.
Segmentation is a memory management scheme, that support the user’s view of memory. A segment is a
division of logical address space that is visible to the programmer.
Segments can be of variable length. Segmentation leads to a two dimensional address space because each
memory cell is addressed with segment name and offset. Each segment has a name and offset. For easy
implementation the segmentation are numbered and are reffered to by a segment number.
<Segment number, offset>
the segment have a variable lengths. the user program is compiled and the compiler automatically construct
segments using input program. The loader would take all these segments and assign number.
SEGMENTATION H/W:-
A logical address consists of two parts: a segment number s, and an offset d. the segment number is used
as an index in to the segment table. The segment table contains segment base and segment limit. The
segment base contains the starting physical address where the segment resides in main memory. Segment
limit contains length of the segment length. Here the offset value is compared with limit is is ok then add
the d with base value to produce the address in physical memory.d is greater then tpap will generate.
Example of Segmentation
Virtual memory
Virtual memory is a technique that allows the execution of processes that may not be completely in main
memory. The main visible advantage of this scheme is that programs can be larger than physical memory.
Advantages:-
1) a program size may not be constrained by the amount of the physical memory that is available.
2) Each user would be able to write a programs for an extremely large virtual address space.
3) Each user program could take less physical memory. So more programs could be run at the same time. so
the CPU utilization and through put also increases.
4) Less I/O would be need to load or swap each user program in to memory. So each user program would
run faster.
5) Virtual memory makes the task of programming much easier , because the programmer no longer needs
to worry about the amount of P.M
6) virtual memory is commonly implemented by using demand paging. It can also be implemented in a
segmentation system.
Demand paging:-
a demand paging system is similar to a paging system with swapping. Processes resides on
secondary memory. When we want to execute a process, we swap it in to memory, rather than swapping the
entire process in to memory. We use a lazy swapper . a last swapper never swaps a page in to memory
unless that page will be pages.
Here we need some hard ware support to distinguish between those pages that are in memory and those
pages that’s are on the disk. The valid-invalid bit scheme can be used for this purpose. This bit is set to
“valid”, this value indicates that the associated page is both legal and in memory. if the bit is set to “invalid”
this value indicates that page either is not valid i.e not in the logical address of process or is currently on the
disk. The page that is not currently in main memory is simply marked invalid or contains the address of the
page on disk in the page table entry .
Suppose the process wants to execute , then access its pages that are memory resident, and execution
proceeds normally.
If the process tries to use a page that was not brought in to memory then it may happen that page fault can
occurs.
Page fault:-
If the process tries to use a page that was not currently in the memory. This
situation is called page fault.
What happens when the page fault occurs in our system:-
There are six step can occurs when page fault occurs in our system.
1) we check an internal table (page table kept in pcb) for this process. To determine
whether the reference was a valid or invalid memory access.
2) If the reference was invalid , we terminate the process. If it was valid, but it was
not yet in memory, and it is in disk.
3) We find the free frame.
4) We schedule a disk operation to read the desired page in to the newly allocated frame.
5) When the disk read is completed we modify the internal table kept with the process and the page table
indicates that page is now in memory.
6) We restart the instruction that was interrupted by the illegal address trap.
FIFO- ALGORITHM:-
The simplest page- replacement algorithm is
a FIFO algorithm. A FIFO replacement algorithm
uses the time when a page was brought in to
memory. To replace we must select oldest page.
EX:- (problem)
Reference string is
It is very easy to under stand . but its performance
is not always good.
This algorithm is effected by the belady’s anomaly.
Belady’s anomaly:-
The number of faults for n+1 frames is greater than the number of faults for n frames.
i.e the page fault rate may increases as the number of allocated frames increases.
EX:- SEE text book
Optimal algorithm:-
It is also called as OPT or MIN.
FIFO performance is not always good. So we search an
optimal page replacement algorithm. This use the time
when a page is to be used.
An optimal page replacement algorithm has the lowest
page- fault rate of all algorithms.
It never suffers from the belady’s anomaly.
REPLACE THE PAGE THAT WILL NOT BE
USED FOR THE LONGEST PERIOD OF THE TIME.
L.R.U:-
Replace the page that has not been used for the longest
period of time.
here we get 12 page faults. By using FIFO we get 15
faults and using optimal we get 9.
The major problems is how to implemented LRU
replacement. another problem is to determine an order
for frames defined by the time of last use.
It can not suffers from belady’s anomaly. There is a
stack algorithm. That can never exhibit Belady’s
anomally.
Stack algorithm is an algorithm, for which it can be shown that the set of the pages in memory for ‘n’
frames is always a subset of the set of pages that would be in memory with n+1 frames.
There is some problems by using the L.R.U .i.e determine an order for the frames defined by the time of
last use.
This can be implemented by using 2 methods.
1)counters 2) stack
counters:- the counters or logical clock’s are added to the C.P.U. the value of the counter is incremented
for every memory reference.
The one extra bit i.e time of last use field is added to the each entry of the page table. When ever a
reference is made to a page, the contents of counter register is copied to the time of use field in the page
table for that page. In this way we always have the “time” of last reference to each page. We replace the
page with the smallest time value.
But here we have some drawbacks.
1) this scheme requires search of page table to find LRU page.
2) Write the contents of counter to page table.
3) The time must also maintain, when page table are changed. So over flow of clocks can occurs.
Stacks:- another approach to implemented LRU replacement is to keep stack of page numbers. When ever a
page is reference, it is placed on the top of the stack. In this way the top of the stack is always the most
recently used page. And the bottom is LRU page.
Allocation algorithm:-
We have ‘m’ frames among ‘n’ processes then allocate the m/n frames to each process.
Ex:- if there are 93 frames and 5 processes each process will get 18 frames and the remaining 3 frames are
added to the free frame allocation list. This scheme is called equal allocation.
To solve this problem we use proportional allocation. We allocated available memory to each process
according to its size.
Let the size of the process pi be si and define
S=si
Then, if the total no.of available frames is m we allocate ai frames to process pi where
Ai=si/S*m;
Here the ai is greater then the minimum no.of frames required by that process, and not exceeding m.
For proportional allocation, we would split 62 frames between two processes one of 10 pages and one of
127 pages is
10/137*62= 4
127/137*62=57 in this way both processes share the available frames accordingly to their needs rather than
equally.
Thrashing:-
A process contains large no.of pages that are in active use. If the process does not have this no.of frames, it
will very quickly page fault. At this point it must replace some page, but here all its pages are in active use.
It must replace a page that will be needed again. Very quickly faults are come again and again.
The process continuous to fault , replacing page for which it will then fault and bring back. This high
paging activity is called thrashing. A process is thrashing if it is spending more time paging then execution.
File attributes:-
A file is named and for the user’s convince is referred to by its name. A name is usually a string of
characters. One user might create file, where as another user might edit that file by specifying its name. There are
different types of attributes.
1)name:- the name can be in the human readable form.
2)type:- this information is needed for those systems that support different types.
3)location:- this information is used to a device and to the location of the file on that device.
4)size:- this indicates the size of the file in bytes or words.
5)protection:-
6)time,date, and user identifications:-
the information about all files is kept in the directory structure, that also resides on secondary storage.
File operations:-
Creating a file:-
Two steps are necessary to create a file first, space in the file system must be found for the file. Second , an
entry for the new file must be made in the directory. The directory entry records the name of the file and the
location in the system.
Writing a file:-
To write a file give the name of the file, the system search the directory to find the location of the file. The
system must keep the writer pointer to the location in the file where the next write is to take place. The write
pointer must be updated whenever a write occurs.
Reading a file:- to read from a file, specifies the name of the file and directory is search for the associated
directory entry, and the system needs to keep read pointer to the location in the file where the next read is to take
place. Once the read has taken place, read pointer is updated.
File Types:-
1) A common technique for implementing file type is to include the type as part of the file name.
2) The name is split in to two parts. the name 2) and an extension .
the system uses the extension to indicate the type of the file and the type of operations that can be done on that
file. See fig 10.2
ACCESSMETHODS:-
There are several ways that the information in the file can be accessed.
1)sequential method 2) direct access method 3) other access methods.
1)sequential access method:-
the simplest access method is S.A. information in the file is processed in order, one after the other. the bulk
of the operations on a file are reads & writes. It is based on a tape model of a file. Fig 10.3
directory structures:-
operations that are be on a directory (read in text book)
Two-level directory:-
The major disadvantages to a single level directory is the confusion of file names between different users.
The standard solution is to create separate directory for each user.
In 2-level directory structure, each user has her own user file directory(ufd). Each ufd has a similar structure,
the user first search the master file directory . the mfd is indexed by user name and each entry point to the ufd
for that user.fig 10.8
To create a file for a user, the O.S search only that user’s ufd to find whether another file of that name exists. To
delete a file the O.S only search to the local ufd and it can not accidentally delete another user’s file that has the
same name.
This solves the name collision problem, but it still have another. This is disadvantages when the user
wants to cooperate on some task and to access one another’s file . some systems simply do not allow local
user files to be accessed by other user.
Any file is accessed by using path name. Here the user name and a file name defines a path name.
Ex:- user1/ob
In MS-DOS a file specification is
C:/directory name/file name
A directory contains a set of subdirectories or files. A directory is simply another file, but it is treated in a
special way. Here the path names can be of two types.
1)absolute path and 2) relative path.
An absolute path name begins at the root and follows a path down to the specified file, giving the directory
name on the path.
Ex:- root/spell/mail/prt/first.
A relative pathname defines a path from the current directory ex:- prt/first is relative path name.
A cyclic- graph directory:-
Consider two programmers who are working on a joint project. The files associated with that project can
be stored in a sub directory , separating them from other projects and files of the two programmers. The
common subdirectory is shared by both programmers. A shared directory or file will exist in the file system
in two places at once. Notice that a shared file is not the same as two copies of the file with two copies, each
programmer can view the copy rather than the original but if one programmer changes the file the changes
will not appear in the others copy with a shared file there is only one actual file, so any changes made by one
person would be immediately visible to the other.
A tree structure prohibits the sharing of files or directories. An acyclic graph allows directories to have
shared subdirectories and files
FIG 10.10 . it is more complex and more flexiable. Also several problems may occurs at the traverse and
deleting the file contents.
General graph directory:-
Protection:
When the information is kept in the system the major worry is its protection from the both physical
damage (Reliability) and improper access(Protection).
The reliability is generally provided by duplicate copies of files.
The protection can be provided in many ways . for some single system user, we might provide protection
by physically removing the floppy disks . in a multi-user systems, other mechanism are needed.
1) types of access:-
if the system do not permit access to the files of other users, protection is not needed. Protection
mechanism provided by controlling accessing. This can be provided by types of file access. Access is
permitted or denied depending on several factors. Suppose we mentioned read that file allows only for read .
Read:- read from the file. Write:- write or rewrite the file. Execute:- load the file in to memory and execute
it.Append:- write new information at the end of the file. Delete:- delete the file and free its space for possible
reuse.
2) Access Control:-
FILE SYSTEM IMPLEMENTATION
Magnetic tape
➢ Was early secondary-storage medium
➢ Relatively permanent and holds large quantities of data
➢ Access time slow
➢ Random access ~1000 times slower than disk
➢ Mainly used for backup, storage of infrequently-used data, transfer
medium between systems
➢ Kept in spool and wound or rewound past read-write head
➢ Once data under head, transfer rates comparable to disk
➢ 20-200GB typical storage
➢ Common technologies are 4mm, 8mm, 19mm, LTO-2 and SDLT
Disk Structure
Disk drives are addressed as large 1-dimensional arrays of logical blocks,
where the logical block is the smallest unit of transfer.
The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially.
➢ Sector 0 is the first sector of the first track on the outermost cylinder.
➢ Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and
then through the rest of the cylinders from outermost to innermost.
Disk Attachment
Host-attached storage accessed through I/O ports talking to I/O busses
SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator requests operation and SCSI targets
perform tasks
Each target can have up to 8 logical units (disks attached to device controller
FC is high-speed serial architecture
Can be switched fabric with 24-bit address space – the basis of storage area networks (SANs)
in which many hosts attach to many storage units
Can be arbitrated loop (FC-AL) of 126 devices
Network-Attached Storage
Network-attached storage (NAS) is storage made available over a network rather than over a local
connection (such as a bus)
1.NFS and CIFS are common protocols
2.Implemented via remote procedure
calls (RPCs) between host and storage
New iSCSI protocol uses IP network to
carry the SCSI protocol
Storage Area Network
1.Common in large storage
environments (and becoming more
common)
2.Multiple hosts attached to multiple storage arrays - flexible
Disk Scheduling
➢ The operating system is responsible for using hardware efficiently — for the disk drives, this means
having a fast access time and disk bandwidth.
➢ Access time has two major components
✓ Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector.
✓ Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head.
➢ Minimize seek time
➢ Seek time seek distance
➢ Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request
for service and the completion of the last transfer.
➢ Several algorithms exist to schedule the servicing of disk I/O requests.
➢ We illustrate them with a request queue (0-199).
98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53
1.FCFS
Illustration shows total head movement of 640 cylinders
2.SSTF
➢ Selects the request with the minimum seek time from the current head position.
➢ SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests.
➢ Illustration shows total head movement of 236 cylinders.
3.SCAN
➢ The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the
other end of the disk, where the head movement is reversed and servicing continues.
➢ Sometimes called the elevator algorithm.
➢ Illustration shows total head movement of 208 cylinders.
4.C-SCAN
➢ Provides a more uniform wait time than SCAN.
➢ The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other
end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip.
➢ Treats the cylinders as a circular list that wraps around from the last cylinder to the first one.
5.C-LOOK
➢ Version of C-SCAN
➢ Arm only goes as far as the last request in each direction, then reverses direction immediately, without
first going all the way to the end of the disk.