COA Full Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 70

Gambella University Department of Computer Science

CHAPTER-1
CENTRAL PROCESSING UNIT
1.1 Introduction
Computer organization refers to the operational units and their interconnections that
realize the architectural specifications. Organization, is the implementation of computer
system, in terms of its interconnection of functional units: CPU, memory, bus and I/O
devices.
Computer architecture is the computer structure and behavior visible to a
programmer. Architecture concerns more about the basic instruction design that may
lead to better performance of the system. The basic instruction design includes the
instruction formats, addressing modes, the instruction set, and the general organization
of the CPU registers.
So, organization of a computer is the implementation of its architecture, but tailored
to fit the intended price and performance measures.

CPU is the brain or engine of the PC and it performs the bulk of system’s calculating and data
processing. CPU is an integrated circuit that holds most of the works of a computer.

The CPU is usually the most expensive component in the system, costing up to four or more
times greater than the motherboard it plugs into.

The CPU is made up of three major parts, as shown in figure below.

Figure: Major Components of CPU


1. Arithmetic Logic Unit (ALU):- performs the required micro-operations
(arithmetic & logical operations) for executing the instructions.
2. Register set: - stores intermediate data used during the execution of the
instructions.

1|Page
Gambella University Department of Computer Science

3. Control Unit: -supervises the transfer of information among the registers and
instructs the ALU as to which operation to perform.

1.2 General Register Organization


Since having to refer to memory location for different applications is time consuming, it
is more convenient and more efficient to store intermediate values (such as pointers,
counters, return addresses, temporary results, etc.) in processor registers. The registers
communicate with each other not only for direct data transfers. Hence it is necessary to
provide a common unit that can perform all the arithmetic, logic, and shift micro-
operations in the processor.

The design of a CPU is a task that involves choosing the hardware for implementing the
machine instructions.

Let us describe how registers communicate with the ALU through buses and explain the
operation of the memory stack.

Figure: Register set with common ALU


The control unit that operates the CPU bus system directs the information flow through
the registers and ALU by selecting the various components in the system.

For example, to perform the operation:

2|Page
Gambella University Department of Computer Science

R1 R2 + R3
1. The control must provide binary selection variables to the following selector
inputs:
2. MUX A Selector (SELA): to place the content of R2 into bus A.
3. MUX B Selector (SELB): to place the content of R3 into bus B.
4. ALU operation selector (OPR): to provide the arithmetic addition A + B.
5. Decoder destination selector (SELD): to transfer the content of the output bus onto Rl.

There are therefore 14 binary selection inputs in the unit, and their combination value
specifies a control word. Control word (CW) is a word whose individual bits represent a
various control signals.

Table :Encoding of Register Selection Fields.

The 3-bit binary code listed in the first column of the table specifies the binary code for each
of the three fields.

The encoding of the ALU operations for the CPU is specified in table below.

Table: Encoding Of ALU Operation

3|Page
Gambella University Department of Computer Science

The OPR field has five bits and each operation is designated with a symbolic name.

1.3 Stack Organization

A useful feature that is included in the CPU of most computer is a stack or Last-In-First-Out
(LIFO) list. A stack is a storage device that stores information in such a manner that the item
stored last is the first item retrieved.

The stack in digital computers is essentially a memory unit with an address register that can
count only (after an initial value is loaded into it). The register that holds the address for the
stack is called a stack pointer (SP) because its value always points at the top item in the
stack. The physical registers of a stack are always available for reading or writing. It is the
content of the word that is inserted or deleted.

The two operations of a stack are the insertion and deletion of items.

 The operation of insertion is called push (or push-down) because it can be thought of
as the result of pushing a new item on top of a stack.
 The operation of deletion is called pop (or pop-up) because it can be through of as the
result of removing one item so that the stack pops up. However, nothing is pushed or
popped in a computer stack. These operations are simulated by incrementing or
decrementing the stack pointer register.

4|Page
Gambella University Department of Computer Science

Register Stack

A Stack can be placed in a portion of a large memory or it can be organized as a collection of


a finite number of memory words or registers. Figure below shows the organization of a 64-
word register stack.

Figure: Block Diagram of a 64 word stack

The stack pointer register SP contains a binary number whose values is equal to address of
the word that is currently on top of the stack. Three items are placed in the stack: A, B, and
C. In that order, item C is on top of the stack so that the contents of SP now 3.

In a 64 - word stack, the stack pointer contains 6 bits because 2 6=64. Since SP has only six
bits, it cannot exceed a number greater than 63 (111111 in binary). When 63 is incremented
by 1, the result is 0 since 111111 + 1 = 1000000 in binary, but SP can accommodate only the
six least significant bits. Similarly, when 000000 is decremented by1, the result is 111111.
The one-bit register 1 when the stack is empty of items. DR is the data register that hold the
binary data to be written into or read out of the stack.

Initially, SP is cleared to 0, EMTY is set to 1, and FULL is cleared to 0, so that SP points to


the word at address 0 and the stack is marked empty and not full. If the stack is not full (if
FULL = 0), a new item is inserted with a push operation. The push operation is implemented
with the following sequence of micro-operations:
5|Page
Gambella University Department of Computer Science

SP SP + 1 Increment stack pointer


M[SP] DR Write item on top of the stack
If(SP=0) then (FULL1) Check if Stack is full
EMTY 0 Mark the stack not empty
A new item is deleted from the stack if the stack is not empty (if EMTY = 0). The pop
operation is implemented with the following sequence of micro-operation:

DR  M[SP] Read item from the top of stack.


SP  SP - 1 Decrement stack pointer
If (SP=0) then (EMTY  1) Check if stack is empty
FULL  0 Mark the stack not full
Memory Stack
A stack can exist as a stand-alone unit or can be implemented in a RAM attached to a CPU.
The implementation of a stack in the CPU is done by assigning a portion of memory to a
stack operation and using a Program counter as a stack pointer. Figure below shows a
portion of computer memory partitioned into three segments: Program, data, and stack.

Figure: Computer memory with program data, and stack segments


 The program counter (PC) points at the address of the next instruction in the program.

6|Page
Gambella University Department of Computer Science

 The address register (AR) points at an array of data.


 The stack pointer (SP) points at the top of the stack.
 PC is used during the fetch phase to read as instruction.
 AR is used during the execute phase to read an operand.
 SP is used to push or pop items into or from the stack.

A new item is inserted with the push operation as follows:


SP  SP - 1
M[SP] DR
A new item is deleted with a pop operation as follows:
DR  M[SP]
SP  SP+ 1
Reverse Polish Notation
The common mathematical method of writing arithmetic expressions imposes difficulties
when evaluated by a computer.

 The common arithmetic expressions are written in infix notation, with each operator
written between the operands.
 Consider the simple arithmetic expression
 The arithmetic expressions can be represented in prefix notation (polish notation),
places the operator before the operands.
 The postfix notation (Reverse Polish Notation (RPN)), places the operator after the
operands.

The following examples demonstrate the three representations:

A+B Infix notation

+AB Prefix or polish notation

AB+ Postfix or reverse polish notation

The reverse polish notation is in a form suitable for stack manipulation. The expression:
A*B+C*D
Is written in reverse polish notation as:

7|Page
Gambella University Department of Computer Science

AB*CD*+
Exercise:
Convert the following infix notation to reverse polish notation using stack data structure.
(A + B) * [C *(D + E) + F]
Evaluation of Arithmetic Expressions

Reverse Polish notation, combined with a stack arrangement of registers, is the most efficient
way known for evaluating arithmetic expressions. The stack is particularly useful for handling
long, complex problems involving chain calculations. It is based on the fact that any
arithmetic expression can be expressed in parentheses-free polish notation.

Using stack, evaluate the following arithmetic expression:

(3*4)+ (5 * 6)
In reverse polish notation, it is expressed as:
34 *56 *+
Now consider the stack operations shown in figure below which shows the evaluation of the
postfix Expressions:

Figure: Stack operations to evaluate 3*4 + 5*6

8|Page
Gambella University Department of Computer Science

CHAPTER -2
INSTRUCTION FORMATS
A computer will usually have a variety of instruction code formats. It is the function of the
control unit within the CPU to interpret each instruction code and provide the necessary
control functions needed to process the instruction.

The format of an instruction is usually depicted in a rectangular box symbolizing the bits of
the instruction as they appear in memory words or in a control register. A computer
instruction has the following format:

The bits of the instruction are divided into groups called fields.

The most common fields found in instruction formats are:

1. An operation code field that specifies the operations to be performed (add, subtract,
complement, and shift.).
2. An address field that designates a memory address or a processor register.
3. A mode field that specifies the way the operand or the effective address is determined.
(Optional)

Computers may have instructions of several different lengths containing varying number
of addresses which depends on the internal organization of its registers. Most computer
fall into one of three types of CPU organization:
1. Single accumulator organization.
2. General register organization.
3. Stack organization.
All operations are performed with an implied accumulator register. The instruction
format in this type of computer uses one address field.

9|Page
Gambella University Department of Computer Science

2.1 Instruction type

1. Three- Address Instructions


Computer with three-address instruction formats can use each address field to specify
either a processor register or a memory operand. The program in assembly language
that evaluates X = (A+B)*(C+D) is shown below, together with comments that explain
the register transfer operation of each instruction.

ADD R1, A, B R1M[A] + M[B]

ADD R2, C, D R2  M[C] + M[D]

MUL X, R1, R2 M[X]  R1 * R2

2. Two-Address Instructions
Two -address instructions are the most common in commercial computers. Here again
each address field can specify either a processor register or a memory word. The
program to evaluate X = (A+B)*(C+D) is as follows:

MOV R1, A R1M[A]

ADD R1, B R1 R1 + M[B]

MOV R2, C R2 M[C]

ADD R2, D R2  R2 + M[D]

MUL R1, R2 R1R1 * R2

MOV X, R1 M[X] R1

3. One-Address Instructions
One-address instructions use an implied accumulator (AC) register for all data
manipulation. For multiplication and division there is a need for a second register.
However, here we will neglect the second register and assume that the AC contains the
result of all operations. The program to evaluate X= (A+B)*(C+D) is:
LOAD A AC M[A]
ADD B AC AC + M[B]
STORE T M[T]  AC
LOAD C AC  M[C]

10 | P a g e
Gambella University Department of Computer Science

ADD D AC  AC + M[D]
MUL T AC AC * M[T]
STORE X M[X]  AC

T is the address of a temporary memory location required for storing the intermediate
result.
4. Zero-Address Instructions
A Stack-organized computer does not use an address field for the instructions ADD and
MUL. The PUSH and POP instructions, however, need an address field to specify the
operand that communicates with the stack. The following program shows how X =
(A+B)*(C+D) will be written for a stack-organized computer. (TOS stands for top of
stack.)
PUSH A TOS A

PUSH B TOS B

ADD TOS (A + B)

PUSH C TOS  C

PUSH D TOS D

ADD TOS (C +D)

MUL TOS (C+D) * (A+B)

POP X M[X] TOS

RISC Instructions

The instruction set of a typical RISC processor is restricted to the use of load and store
instructions when communicating between memory and CPU. All other instructions are
executed within the registers of the CPU without referring to memory. The following is a
program to evaluate X = (A+B)*(C+D).

LOAD R1, A R1M[A]


LOAD R2, B R2 M[B]
LOAD R3, C R3 M[C]
LOAD R4, D R4 M[D]
ADD R1, R1, R2 R1R1 + R2
ADD R3, R3, R4 R3 R3 + R4
MUL R1, R1, R3 R1 R1 * R3
STORE X, R1 M[X]  R1

11 | P a g e
Gambella University Department of Computer Science

2.2 Addressing Modes

The way the operands are chosen during program execution is dependent on the
addressing mode of the instruction. The addressing mode specifies a rule for
interpreting or modifying the address field of the instruction before the operand is
actually referenced. Computers use addressing mode techniques for the purpose of
accommodating one or both of the following provisions:

1. To give programming versatility to the user by providing such facilities as


Pointers to memory, counters for loop control, indexing of data, and program
relocation.
2. To reduce the number of bits in the addressing field of the instruction.

To understand the various addressing modes, it is important to understand the following


basic instruction cycle phases of the computer:

1. Fetch the instruction from memory.


2. Decode the instruction
3. Execute the instruction.
 Program counter (PC) is a register that keeps track of the instructions in the
program stored in memory. PC holds the address of the instruction to be
executed next and is incremented each time an instruction is fetched from
memory.
 The decoding done in step 2 determines the operation to be performed, the
addressing mode of the instruction, and the location of the operands.
 The computer then executes the instruction and return to step 1 to fetch the next
instruction in sequence.

The common categories of addressing modes are the followings:

1. Implied Mode

In this mode the operands are specified implicitly in the definition of the instruction.

Examples:

 All register reference instructions that use an accumulator.

12 | P a g e
Gambella University Department of Computer Science

 Zero-address instructions in a stack- organized computer


2. Immediate Mode
In this mode the operand is specified in the instruction itself. In other words, an
immediate-mode instruction has an operand field rather than an address field.

Immediate- mode instructions are useful for initializing registers to a constant value.

Note:
Both implied and immediate modes do not need address field at all.
3. Register Mode

In this mode, the address field (operand) specifies a processor register. The particular
register is selected from a register field in the instruction. A k-bit field can specify any
one 2k registers.

4. Register Indirect Mode


In this mode the instruction specifies a register in the CPU whose contents give the
address of the operand in memory. In other words, the selected register contains the
address of the operand rather than the operand itself.

13 | P a g e
Gambella University Department of Computer Science

5. Auto increment or Auto decrement Mode


This is similar to the register indirect mode except that the register is automatically
incremented or decremented after (or before) its value is used to access memory.

6. Indirect Address Mode

In this mode the address field of the instruction gives the address where the effective
address is stored in memory. Control fetches the instruction from memory and uses its
address part to access memory again to read the effective address.

The effective address in these modes is obtained from the following computation:

Effective address = address part of instruction + content of CPU register


7. Relative Address Mode
In this mode the content of the program counter is added to the address part of the
instruction in order to obtain the effective address. The address part of the instruction is
usually a signed number (in 2’s complement representation) which can be either
positive or negative.

14 | P a g e
Gambella University Department of Computer Science

Example:
 Assume that a program counter contains the number 825 and the address part of
the instruction contains the number 24.
 The instruction at location 825 is read from memory during the fetch phase and
the program counter is then incremented by one to 826.
 The effective address = 826 + 24 = 850.
 This is 24 memory location forward from the address of the next instruction.
8. Indexed Addressing Mode
In this mode the content of an index register is added to the address part of the
instruction to obtain the effective address. The index register is a special CPU register
that contains an index value. The address field of the instruction defines the beginning
address of a data array in memory.

9. Base Register Addressing Mode


In this mode the content of a base register is added to the address part of the instruction
to obtain the effective address. This is similar to the indexed addressing mode except
that the register is now called a base register instead of an index register. The
difference between the two modes is in the way they are used rather than in the way
that they are computed.

2.3 Data Transfer and Manipulation


Most computer instructions can be classified into the following categories:

1. Data transfer instructions


2. Data manipulation instructions
3. Program control instructions

1. Data Transfer Instruction

Data transfer instructions move data from one place to another without changing the
data content. The most common transfer are between memory and processor register,
between processor register and input or output, and between the processor register s
themselves. The following table gives a list of eight data transfer instruction used in
many computers.

15 | P a g e
Gambella University Department of Computer Science

Table: Typical Data Transfer Instruction

Table: Eight Addressing Mode for the Load Instruction

2. Data Manipulation Instructions

Data manipulation instructions performs operations on data and provide the


computational capabilities for the computer. The data manipulation instructions in a
typical computer are usually divided into three basic types:

1. Arithmetic instructions
2. Logical and bit manipulation instructions
3. Shift instructions

Arithmetic Instructions
The four basic arithmetic operations are addition, subtraction, multiplication and
division. Most computers provide instructions for all four operations.
16 | P a g e
Gambella University Department of Computer Science

Table: Typical Arithmetic Instruction

The mnemonic for three add instructions that specify different data types are shown
below.
ADDI Add two binary integer numbers
ADDF Add two floating-point numbers
ADDD Add two decimal number in BCD
Logical and Bit Manipulation Instructions
Logical instructions performs binary operation on strings of bits stored in registers. They
are useful for manipulating individual bits or a group or bits that represent binary-coded
information.
Some typical logical and bit manipulation instructions are listed in table below.
Table: Typical Logical and Bit Manipulation Instructions

17 | P a g e
Gambella University Department of Computer Science

Shift Instructions

Instructions to shift the content of an operand are quite useful and are often provided in
several variations. Shifts are operations in which the bits or a word are moved to the left
or to right. Shift instructions may specify either logical shifts, arithmetic shifts, or rotate-
type operations.

Table: Typical Shift Instructions

A possible instruction code format of a shift instruction may include five fields as follows:
OP REG TYPE RL COUNT

 OP is the operation code field


 REG is a register address that specifies the location of the operand
 TYPE is a 2-bit field specifying the four different types of shifts
 RL is a 1-bit field specifying a shift right or left
 COUNT is a k-bit field specifying up to 2k - 1 shifts.

With such a format, it is possible to specify the type of Shift, the direction, and the
number of shifts, all in one instruction.

2.4 Program Control

In other words, program control instructions specify conditions for altering the content of
the program counter, while data transfer and manipulation instructions specify
conditions for data- processing operations. The change in value of the programcounter
as a result of the execution of a program control instruction causes a break in the
sequence of instruction execution. This is an important feature in digital computers, as it
18 | P a g e
Gambella University Department of Computer Science

provides control over the flow of program execution and a capability for branching to
different program segments.

Some typical program control instructions are listed in the following table.
Table: Typical Program Control Instructions

Status Bit Conditions

It is sometimes convenient to supplement the ALU circuit in the CPU with a status
register where status bit condition can be stored for further analysis. Status bits are also
called condition - code bits or flag bits.

The following figure shows the block diagram of an 8-bit ALU with a 4-bit status register.

Figure: Status Register Bits

19 | P a g e
Gambella University Department of Computer Science

The four status bits are symbolized by C, S, Z, and V. The bits are set or cleared as a
result of an operation performed in the ALU.

1. Bit C (carry) is set to 1 if the end carry C8 is 1. It is cleared to 0 if the carry is 0.


2. Bit S (sign) is set to 1 if the highest-order bit F7 is 1. It is set to 0 if the bit is 0.
3. Bit Z (Zero) is set to 1 if the output of the ALU contains all 0’s. It is cleared to 0
otherwise. In other words, Z = 1 if the output is zero and Z = 0 if the output is not
zero.
4. Bit V (overflow) is set to 1 if the exclusive-OR of the last two caries is equal to 1,
and cleared to 0 otherwise. This is the condition for an overflow when negative
numbers are in 2’s complement. For the 8-bit ALU, V= 1 if the output greater
than +127 or less than -128.

Example:

Consideration an 8-bit ALU. The largest unsigned number that can be accommodated in
8 bits is 255. The range of signed numbers is between +127 and -128. The subtraction
of two numbers is the same whether they are unsigned or in signed -2’s complement
representation. Let A=11110000 and B=000010100. To perform A-B, the ALU takes the
2’s complement of B and adds it to A.

The compare instruction updates the status bits as shown. C=1 because there is a carry
out of the last stage. S=1 because the leftmost bit is 1. V=0 because the last two carries
are both equal to 1, and Z=0 because the result is not equal to 0.

Conditional Branch Instruction

Table below gives a list of the most common branch instruction. Each mnemonic is
constructed with the letter B (for branch) and an abbreviation of the condition name.
When the opposite condition state is used, the letter N (for no) is inserted to define the 0
state.

20 | P a g e
Gambella University Department of Computer Science

Table: Conditional Branch Instructions

Subroutine Call and Return


A subroutine is a self-contained sequence of instructions that performs as given
computational task.

A subroutine call is implemented with the following micro operations:


SP  SP - 1 Decrement stack pointer
M[SP]  PC Push content of PC onto the stack
PCeffective address Transfer control to the subroutine
The instruction that returns from the last subroutine is implemented by the micro
operations:
PC  M [SP] Pop stack and transfer to PC
SP  SP + 1 Increment stack pointer

21 | P a g e
Gambella University Department of Computer Science

Program Interrupt
Program interrupt refers to the transfer of program control from a currently running
program to another service program as a result of an external or internal generated
request. Control returns to the original program after the service program is executed.
The interrupt procedure is similar to a subroutine call except for three variations:
1. The interrupt is usually initiated by an internal or external signal rather that from
the execution of an instruction (except for software interrupt
2. The address of the interrupt service program is determined by the hardware
rather that from the address field of an instruction; and
3. An interrupt procedure usually stores all the information necessary to define the
state of CPU rather than storing only the program counter.
The state of the CPU at the end of the execution cycle (when the interrupt is
recognized) is determined from:
1. The content of the program counter
2. The content of all processor registers
3. The content of certain status conditions

Program status word


The collection of all status bit conditions in the CPU is sometimes called a program
status word or PSW. The PSW is stored in a separate hardware register and contains
the status information that characterizes the state of the CPU.
Types of Interrupts
There are three major types of interrupts that cause a break in the normal execution of a
program.
1. External interrupts
2. Internal interrupts
3. Software interrupts

External interrupts come from input - output (I/O) devices, from a timing device, from a
circuit monitoring the power supply, or from any other external source. External
interrupts depend on external conditions that are independent of the program being
executed at the time.
Internal interrupts arise from illegal or erroneous use of an instruction or data. Internal
interrupts are also called traps. Examples of interrupts caused by internal error
conditions are register overflow, attempt to divide by zero, an invalid operation code,
stack overflow, and protection violation.

22 | P a g e
Gambella University Department of Computer Science

Software interrupt is initiated by executing an instruction. Software interrupt is a


special call instruction that behaves like an interrupt rather than a subroutine call. It can
be used by the programmer to initiate an interrupt procedure at any desired point in the
program. The most common use of software interrupt is associated with a supervisor
call instruction.
2.5 CISC and RISC
CISC and RISC are an aspects of computer architecture.
Complex Instruction Set Computer (CISC)
The major characteristics of CISC architecture are:
1. A large number of instructions - typically from 100 to 250 instructions
2. Some instructions that perform specialized tasks and are used infrequently.
3. A large variety of addressing modes-typically from 5 to 20 different modes.
4. Variable-length instruction formats
5. Instructions that manipulate operands in memory.

Reduced Instruction Set Computer (RISC)

. The major characteristics of RISC architecture are:

1. Relatively few instructions


2. Relatively few addressing modes.
3. Memory access limited to load and store instructions
4. All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction execution
6. Single-cycle instruction execution
7. Hardwired rather than micro programmed control

Other characteristics attributed to RISC architecture are:


1. A relatively large number of register in the processor unit
2. Use of overlapped register windows to speed-up procedure call and return.
3. Efficient instruction pipeline
4. Compiler support for efficient translation of high-level language programs into
machine language programs.

23 | P a g e
Gambella University Department of Computer Science

CHAPTER - 3

PIPELINE AND VECTOR PROCESSING

3.1 Parallel processing


Parallel processing is a term used to denote a large class of technique that are used to
provide simultaneous data processing tasks for the purpose of increasing computational
speed of a computer system.

A parallel processing system is able to perform concurrent data processing to achieve


faster execution time.

Example: While an instruction is being executed in the ALU, the next instruction can be
read from memory.

System may have two or more ALU to execute two or more executions at the same
time.

The purpose of parallel processing is to speed up the computer processing capability


and increase its throughput which is the amount of processing that can be interval of
time.

Parallel processing is established by distributing the data among the multiple functional
units.

Figure below shows one possible way of separating the execution unit into eight
functional units operating in parallel.

24 | P a g e
Gambella University Department of Computer Science

Figure: Processor with Multiple Functional Units


Parallel processing can be considered under the following topics:
1. Pipeline processing
2. Vector processing
3. Array processors

3.2 Pipeline

Pipelining is a technique of decomposing a sequential process into sub operations with


each sub process being executed in a special dedicated segment that operates
concurrently with all other segments.

Example: To perform the combined multiple and add operations with a stream of
numbers.
Ai*Bi+Cifor i= 1, 2, 3, ….7
The sub-operations performed in each segment of the pipeline are as follows:
25 | P a g e
Gambella University Department of Computer Science

Figure: Example of Pipeline Processing


3.2.1 Arithmetic Pipeline

Used to implement floating-point operations, multiplication of fixed-point numbers and


similar computations encountered in scientific problems.

Example: Consider the following arithmetic operations were pipeline is used in floating-
point adder pipeline binary numbers:

X=A*2a
Y=B*2b
Where A & B are two fractions that represent mantissa and a & b are the exponents.
The floating-point addition and subtraction can be performed in four segments as
follows:
26 | P a g e
Gambella University Department of Computer Science

1. Compare the exponents


2. Align the mantissas
3. Add or subtract the mantissas
4. Normalize the result.

The following procedure is outlined in the below figure.

Figure: Pipeline for Floating-Point Addition and Subtraction


3.2.2 Instruction Pipeline

Instruction pipeline is a technique for overlapping the execution of several instructions to


reduce the execution time of a set of instructions.

27 | P a g e
Gambella University Department of Computer Science

Six Phases in an Instruction Cycle:

1. Fetch the instruction from memory


2. Decode the instruction.
3. Calculate the effective address
4. Fetch the operands from memory
5. Execute the instruction
6. Store the result in the proper place.

Example: Four – segment instruction pipeline

The below Figure shows how the instruction cycle in the CPU can be processed with a
four segment pipeline.

Figure: Four Segment CPU Pipeline


The above figure with the abbreviated symbol
1. FI is the segment that fetches an instruction.
2. DA is the segment that decodes the instruction and calculates the effective
address

28 | P a g e
Gambella University Department of Computer Science

3. FO is segment that fetches the operand.


4. EX is the segment that executes the instruction.

Figure: Timing of Instruction Pipeline


In general there are three major difficulties that cause the instruction pipeline to deviate
from its normal operation:

1. Resource conflicts caused by access to memory by two segments at the same


time. Most of the self-conflicts can be resolved by using separate instruction and
data memories.
2. Data dependency conflicts arise when an instruction depends on the result of a
previous instructions but this result is not yet available.
3. Branch difficulties arise from branch and other instructions that change the value
of PC.

3.2.3 RISC Pipeline

The reduced instruction set computer (RISC) is its ability to use an efficient instruction
pipeline.

RISC is a machine with a very fast clock cycle that executes at the rate of one
instruction per cycle.
 Simple Instruction Set
 Fixed Length Instruction Format
 Register-to-Register Operations

29 | P a g e
Gambella University Department of Computer Science

Example: Three- segment instruction pipeline

Figure: Three-segment pipeline timing


The instruction cycle can be divided into three sub-operations and implemented in three
segments:

I: Instruction Fetch

A: ALU operation

E: Execute instruction

30 | P a g e
Gambella University Department of Computer Science

3.3 Vector Processing

Computers with vector processing capabilities are in demand in specialized application.


Some of the major application areas of Vector processing are:

 Long-range weather forecasting


 Petroleum explorations
 Seismic data analysis
 Medical diagnosis
 Aerodynamics and space flights simulations
 Artificial intelligence and expert systems
 Mapping the human genome
 Image processing

3.3.1 Vector operations

A vector is an ordered set of a one-dimensional ‘array of data items. A vector V of


length n is represented as row vector by V=[V1,V2,V3.....Vn].It may be represented as
column vector if the data item are listed in a column. Consequently operations on vector
must be broken down into single computations with subscripted variables. The element
Viif vector V is written asV(I) and the index I refers to a memory address or register
where the number is stored. To examine the difference between a convection scalar
processor and a vector processor consider the following Fortran Do loop

DO 20 I = 1,100

20 C(I) =B(I)+A(I)

This is a program for adding to vectors A and B of length 100 to produce a vector
C.

a) Matrix Multiplication

Matrix multiplication is one of the most computational intensive operations performed in


computer with vector processors. The multiplication of two n x n matrices consists of n2
inner products or n3 multiply -add operations. Ann x m matrix of numbers has n rows
and m column and may be considered as constituting a set of n row vector or a set of m
column vectors. Consider, for examples, the multiplication of two 3x3 matrices A and B.
31 | P a g e
Gambella University Department of Computer Science

The product matrix C is a 3 x 3 matrix whose elements are related to the elements of A
and B by the inner product:

For example, the number in the first row and first column of matrix C is calculated by
letting i=1, j=1, to obtain

Figure: Instruction format for vector processor

In general, the inner product consists of the sum of k products terms of the form

b) Memory Interleaving

Memory interleaving is the technique of using memory from two or more sources. An
instruction pipeline may require the fetching of instruction and an operand at the same
time from two different segments .Similarly, an arithmetic pipeline usually requires two
or more operands to enter the pipeline at the same time instead of using two memory
buses simultaneous access the memory can be partitioned into a number of modules
connected to a common memory address and data buses.

32 | P a g e
Gambella University Department of Computer Science

Figure: Multiple module memory organization

The advantage of a modular is that it allows the use of a technique called interleaving

3.4 Array Processors


An array processor is a processor that performs computations on large arrays of data.
The terms used to refer to two different types of processors. An attached array
processor is an auxiliary processor attached to a generally - purpose computer. It is
intended to improve the performance of the host computer in specific numerical
computational tasks. An SIMD array processor is a processor that has a single
instruction multiple-data organization. It manipulates vector instructions by means of
multiple functional units responsible to common instruction.
 Attached Array Processor
An attached array processor is designed peripheral for a convectional host computer
and its purpose to enhance the performance of the computer by providing vector
processing complex scientific applications. It achieves high performance by means of
parallel processing with multiple functional units. It includes an arithmetic unit containing
one or more pipeline floating adder and multipliers. The array processor can be
programmed by the user to accommodate variety arithmetic problems.

33 | P a g e
Gambella University Department of Computer Science

Figure: Attached array processor with host computer


The above figure represents the host computer connected to the array processor.
 SIMD Array Processor
An SIMD array processor is a computer with multiple processing units operating in
parallel. The processing units Synchronized to perform the same operation under the
control of a common control unit/thus providing a single instruction stream, multiple data
stream (SIMD) organization. A general block diagram of an array processor is shown in
below figure. It contains a set of identical processing elements (PEs) each having a
local memory M.

Figure: SIMD array processor organization


34 | P a g e
Gambella University Department of Computer Science

CHAPTER 4
INPUT/OUTPUT ORGANIZATION
4.1 Peripheral Devices

Peripheral devices are the I/O devices that are externally connected to the machine to read or
write an information.
Input Devices
 Keyboard
 Optical input devices
 Magnetic Input Devices- Magnetic Stripe Reader
 Screen Input Devices
- Touch Screen- Light Pen- Mouse
 Analog Input Devices
Output Devices
 Card Puncher, Paper Tape Puncher
 CRT
 Printer (Impact, Ink Jet, Laser, Dot Matrix)
 Plotter
 Analog
4.2 Input-Output Interface

 Input-output interface provides a method for transferring information between internal


storage (such as memory and CPU registers) and external I/O devices.
 Interface is a special hardware components between the CPU and peripherals
 Interface supervises and synchronizes all input and output transfers
I/O Bus and Interface Modules
 The I/O bus consists of data lines, address lines, and control lines (collectively called
system bus).

35 | P a g e
Gambella University Department of Computer Science

 Each peripheral has an interface module associated with it.


 Each interface:
 Decodes the device address (device code) and interprets them for the
peripherals
 Decodes control (operation/commands) received from the I/O bus and interprets
them for the peripherals
 Provides signals for the peripheral controller
 All peripherals whose address does not correspond to the address in the bus are
disabled by their interface.
 There are four types of commands that an interface may receive.
1) Control: is issued to activate the peripheral and to inform it what to do.
2) Status: is used to test various status conditions in the interface and the
peripheral.
3) Data Input: causes the interface to receive an item of data from the peripheral
and places it in its buffer register.
4) Data Output: causes the interface to respond by transferring data from the bus
into one of its registers.
Connection of I/O Bus to CPU

I/O Bus and Memory Bus


 Memory bus is for information transfers between CPU and the MU
 I/O bus is for information transfers between CPU and I/O devices through their I/O
interface.
 Like the I/O bus, the memory bus contains data, address, and read/write control lines.
 There are three ways that computer buses can be used to communicate with memory
and I/O:

1. Use two separate buses, one for memory and the other for I/O.

36 | P a g e
Gambella University Department of Computer Science

2. Use one common bus for both memory and I/O but have separate control lines
for each.

3. Use one common bus for memory and I/O with common control lines.
I/O Mapping
Two types of I/O mapping:
1. Isolated I/O
2. Memory-Mapped I/O
Isolated I/O
 Separate I/O read/write control lines in addition to memory read/write control lines
 Separate (isolated) memory and I/O address spaces
 Distinct input and output instructions
Memory-Mapped I/O
 A single set of read/write control lines (no distinction between memory and I/O transfer)
 Memory and I/O addresses share the common address space  reduces memory
address range available
 No specific input or output instruction the same memory reference instructions can be
used for I/O transfers
 Considerable flexibility in handling I/O operations

I/O Interface for an Input Device


The address decoder, the data and status registers,and the control circuitry required to
coordinateI/O transfers constitute the device’s interfacecircuit.

37 | P a g e
Gambella University Department of Computer Science

4.3 Types of Data Transfer

1. Synchronous -All devices derive the timing information from common clock line
2. Asynchronous -No common clock
Asynchronous data transfer
 Asynchronous data transfer between two independent units requires that control signals
be transmitted between the communicating units to indicate the time at which data is
being transmitted
 Two Asynchronous data transfer methods:
1. Strobe pulse- A strobe pulse is supplied by one unit to indicate the other unit
when the transfer has to occur
2. Handshaking- A control signal is accompanied with each data being transmitted
to indicate the presence of data. The receiving unit responds with another control
signal to acknowledge receipt of the data
Strobe Control
 Employs a single control line to time each transfer
 The strobe may be activated by either the source or the destination unit

 Source-Initiated: the source unit that initiates the transfer has no way of knowing
whether the destination unit has actually received data.
 Destination-Initiated: The destination unit that initiates the transfer no way of knowing
whether the source has actually placed the data on the bus.

38 | P a g e
Gambella University Department of Computer Science

Handshaking

 To solve problem of strobe method, the handshake method introduces a second control
signal to provide a reply to the unit that initiates the transfer Asynchronous Data
Transfer

 Handshaking provides a high degree of flexibility and reliability because the successful
completion of a data transfer relies on active participation by both units
 If one unit is faulty, data transfer will not be completed.=> it can be detected by means of
a timeout mechanism
Asynchronous Serial Transfer

 The transfer of data between two units may be done in parallel or serial.
 Parallel data transmission- each bit of the message has its own path and the total
message is transmitted at the same time.
 Serial data transmission- each bit in the message is sent in sequence one at a time.
 Serial transmission can be synchronous or asynchronous.

39 | P a g e
Gambella University Department of Computer Science

Asynchronous serial transfer:


 Binary information is sent only when it is available and the lid remains idle when there is
no information to be transmitted.
 Employs special bits which are inserted at both ends of the character code-bits
 Each character consists of three parts;
1. Start bit - always 0
2. Data bits
3. Stop bits - always 1

 A character can be detected by the receiver from the knowledge of 4 rules:-


 When data are not being sent, the line is kept in the I-state (Idle state)
 The initiation of a character transmission is detected by a Start bit, which is
always a 0 & used is used to indicate the beginning of a character.
 The character bits always follow the Start bit
 After the last character, a Stop bit is detected when the line returns to the I-state
for at least 1 bit time.

Asynchronous Communication Interface

 Integrated circuits are available which are specifically designed to provide the interface
between computer and similar interactive terminals.
 Known Universal Asynchronous Receiver-Transmitter (UART).

40 | P a g e
Gambella University Department of Computer Science

 Transmitter Register
 Accepts a data byte (from CPU) through the data bus
 Transferred to a shift register for serial transmission
 Receiver
 Receives serial information into another shift register
 Complete data byte is sent to the receiver register
 Status Register Bits
 Used for I/O flags and for recording errors
 Control Register Bits-
 Define baud rate, no. of bits in each character, whether to generate and check
parity, and number of stop bits
First-In-First-Out (FIFO) Buffer
 A first-in-first-out (FIFO) buffer is a memory unit that stores information in such a manner
that the item first in is the item first out.
 FIFO buffer is useful in some applications when data is transferred asynchronously
 The logic diagram of a typical 4 x 4 FIFO buffer is shown in the following figure. It
consists of four 4-bit registers Ri, i = 1,2,3,4, and a control register with flip-flops F i, i =
1,2,3,4, one for each register.

41 | P a g e
Gambella University Department of Computer Science

4.4 Modes of Transfer

There four different data transfer modes between the central computer (CPU & Memory) and peripherals:
1) Programmed-Controlled I/O
2) Interrupt-Initiated I/O
3) Direct Memory Access (DMA)
4) I/O Processor (IOP)
Programmed-Controlled I/O (I/O devices to CPU)
 Transfer of data under programmed I/O is between CPU and peripherals
 Programmed I/O operations are the result of I/O instructions written in the computer
program.
 An example of data transfer from an I/O device through an interface into the CPU is
shown in the following figure:

42 | P a g e
Gambella University Department of Computer Science

Interrupt - Initiated I/O

 Polling takes valuable CPU time


 Open communication only when some data has to be passed -> Interrupt.
 I/O interface, instead of the CPU, monitors the I/O device
 When the interface determines that the I/O device is ready for data transfer, it generates
an Interrupt Request to the CPU
 Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the
service routine to process the data transfer, and then returns to the task it was
performing
DMA (Direct Memory Access)
 DMA refers to the ability of an I/O device to transfer data directly to and from
 Large blocks of data transferred at a high speed to or from high speed devices, magnetic
drums, disks, tapes, etc.
 DMA controller:- Interface that provides I/O transfer of data directly to and from the
memory and the I/O device
 CPU initializes the DMA controller by sending a memory address and the number of
words to be transferred
 Actual transfer of data is done directly between the device and memory through DMA
controller Freeing CPU for other tasks
43 | P a g e
Gambella University Department of Computer Science

I/O Processor (IOP)


 Communicate directly with all I/O devices
 Fetch and execute its own instruction
 IOP instructions are specifically designed to facilitate I/O transfer
 DMAC must be set up entirely by the CPU
 Designed to handle the details of I/O processing
 The block diagram of a computer with two processors is shown in the following figure.

Command:
 Instruction that are read form memory by an IOP
 Distinguish from instructions that are read by the CPU
 Commands are prepared by experienced programmers and are stored in
memory
 Command word = IOP program
CPU - IOP Communication

 Memory units acts as a message center: Information


 Each processor leaves information for the other

44 | P a g e
Gambella University Department of Computer Science

I/O Channel

 Three types of channel


1) Multiplexer channel: slow-medium speed device, operating with a number of I/O
devices simultaneously
2) Selector channel: high-speed device, one I/O operation at a time
3) Block-Multiplexer channel: (1 + 2)

 I/O instruction format


Operation code: 8

 Start I/O, Start I/O fast release (less CPU time), Test I/O, Clear I/O, Halt I/O, Halt
device, Test channel, Store channel ID
 Channel Status Word:
 Always stored in Address 64 in memory
 Key: Protection used to prevent unauthorized access
 Address: Last channel command word address used by channel
 Count: 0 (if successful transfer)

45 | P a g e
Gambella University Department of Computer Science

4.5 Priority Interrupt

 Identify the source of the interrupt when several sources will request service
simultaneously
 Determine which condition is to be serviced first when two or more requests arrive
simultaneously
 Priority interrupt can be done by:
1) Software: Polling
2) Hardware: Daisy chain, Parallel priority
Polling
 Identify the highest-priority source by software means
 One common branch address is used for all interrupts
 Program polls the interrupt sources in sequence
 The highest-priority source is tested first
 Polling priority interrupt occurs:
 If there are many interrupt sources, the time required to poll them can exceed the
time available to service the I/O device
 Hardware priority interrupt

Daisy-Chaining
 Either a serial or a parallel connection of interrupt lines can establish the hardware
priority function.
 The serial connection is known as the daisy- chaining method.

46 | P a g e
Gambella University Department of Computer Science

The following figure shows one stage of the daisy-chain priorityarrangement:

1) No interrupt request
2) Invalid: interrupt request, but no acknowledge
3) No interrupt request: Pass to other device (other device requested interrupt)
4) Interrupt request
Parallel Priority Interrupt
 The parallel priority interrupt method uses a register whose bits are set separately by the
interrupt signal from each device.

47 | P a g e
Gambella University Department of Computer Science

 IEN: Set or Clear by instructions ION or IOF


 IST: Represents an unmasked interrupt has occurred. INTACK enables IST state Bus
Buffer to load VAD generated by the Priority Logic
 Interrupt Register:-
 Each bit is associated with an Interrupt Request from different Interrupt Source –
different priority level
 Each bit can be cleared by a program instruction
 Mask Register: -
 Mask Register is associated with Interrupt Register
 Each bit can be set or cleared by an Instruction

48 | P a g e
Gambella University Department of Computer Science

Interrupt Priority Encoder:

 The priority encoder is circuit that implements the priority function.


 Determines the highest priority interrupt when more than one interrupts take place
 The truth table of a four-input priority encoder is given in Table

Interrupt Cycle

 At the end of each instruction cycle, CPU checks IEN and IST
 If both IEN and IST equal to “1”
 CPU goes to an Instruction CycleSequence of micro-operation during Instruction Cycle

Initial Operation of ISR


1) Clear lower-level mask register bit
2) Clear interrupt status bit IST
3) Save contents of processor registers
4) Set interrupt enable bit IEN
5) Proceed with service routine
Final Operation of ISR:
1) Clear interrupt enable bit IEN
2) Restore contents of processor registers
3) Clear the bit in the interrupt register belonging to the source that has been serviced
4) Set lower-level priority bits in the mask register
5) Restore return address into PC and set IEN
4.6 Serial Communication
49 | P a g e
Gambella University Department of Computer Science

 A data communication can be classified into two:


1. Serial communication
2. Parallel communication
 Serial communication: is the process of sending data one bit at a time sequentially
over a communication channel or computer bus.

 Serial communication is used for all long-distance communication and most computer
networks
 Slow data transfer
 Parallel communication: is the process of transferring the whole data bits over
communication channel or computer bus simultaneously.

 Parallel communication is used for all short-distance communication.


 High-data transfer rate
 Data can be transmitted between two points in three different modes:
1. Simplex,
2. Half-duplex
3. Full-duplex
 A simplex line carries information in one direction only (e.g. Radio and television
broadcasting).
 A half-duplex transmission system is one that is capable of transmitting in both
directions at different time (e.g. walk talkie)
 A full-duplex transmission can send and receive data in both directions simultaneously.
(e.g. Telephone)

CHAPTER -5

50 | P a g e
Gambella University Department of Computer Science

MEMORY ORGANIZATION

5.1 Memory Hierarchy


Memory hierarchy in a computer system is shown in the figure below. Memory Hierarchy is to
obtain the highest possible access speed while minimizing the total cost of the memory system.

Figure: Memory hierarchy in a computer system


 Memory hierarchy in a computer system:

 Main Memory: memory unit that communicates directly with the CPU (RAM)
 Auxiliary Memory: device that provide backup storage (Disk Drives)
 Cache Memory: special very-high-speed memory to increase the processing
speed (Cache RAM)
 Multiprogramming: enables the CPU to process a number of independent
programconcurrently.
 Memory Management System: supervises the flow of information between auxiliary
memory and main memory.
5.2 Main Memory
 It is the memory unit that communicates directly with the CPU.
 It is the central storage unit in a computer system
 It is relatively large and fast memory used to store programs and data during the
computer operation.

a) RAM Chips

51 | P a g e
Gambella University Department of Computer Science

 RAM is used for storing the bulk of the programs and data that are subject to change.
 Available in two possible operating modes:
1. Static RAM- consists essentially of internal flip-flops that store binary information.
2. Dynamic RAM – stores the binary information in the form of electric charges that are
applied to capacitors.

Figure: Typical RAM chip


b) ROM chip
 ROM is used or storing programs permanently.
 The ROM portion is needed for storing and initial program called a bootstrap loader
 Bootstrap loader is a program whose function is to start the computer software operating
when power is turned on.
 The startup of a computer consists of turning the power on and starting the execution of
an initial program.

52 | P a g e
Gambella University Department of Computer Science

Figure: Typical ROM chip


Memory address map
 Memory address map is a table of pictorial representation of assigned address space
assignment to each memory chip in the system.
 Example: Memory Configuration: 512bytes RAM+ 512bytes ROM
 4x 128bytes RAM+ 1x 512byte ROM

Figure: Memory address map for microcomputer


Memory Connection to CPU
 RAM and ROM chips are connected to a CPU through the data and address buses
 The low-order lines in the address bus select the byte within the chips and other lines in
the address bus select a particular chip through its chip select inputs

53 | P a g e
Gambella University Department of Computer Science

Figure: Memory Connection to CPU


 Each RAM receives the seven low order bits of the address bus to select one of 128
possible bytes.
 The particular RAM chip selected is determined from lines 8 and 9 in the address bus.
This is done through a 2 x 4 decoder whose outputs go to the CS1 inputs in each RAM
chip.
 The selection between RAM and ROM is achieved through bus line 10.
5.3 Auxiliary Memory
Auxiliary memory are devices that provide backup storage.
The most common auxiliary memory devices used in computer systems are:
 Magnetic Disk: FDD, HDD
 Magnetic Tape: Backup or Program
 Optical Disk: CDR, ODD, DVD

54 | P a g e
Gambella University Department of Computer Science

5.4 Associative Memory


 It is also called Content Addressable Memory (CAM)
 It is a memory unit accessed by content of data rather than an address of data.

Figure: Block diagram of associative memory


 Compare each word in CAM in parallel with the content of A (Argument Register)
If CAM Word[i] = A, M(i) = 1
 Read sequentially accessing CAM for CAM Word (i) for M(i) = 1
 K (Key Register) provides a mask for choosing a particular field or key in the argument in
A(only those bits in the argument that have 1’s in their corresponding position of K are
compared)
 Example: Suppose that the argument register A and the key register K have the bit
configuration shown below. Only the three leftmost bits of A are compared with memory
words because K has 1’s in these positions.

55 | P a g e
Gambella University Department of Computer Science

 Word 2 matches the unmasked argument field because the three leftmost bits of the
argument and the word are equal.
 The relation between the memory array and external registers in an associated
memory is shown in figure below.

Figure: Associative memory of m word x n cells per word

 The cells in the array are marked by the letter C with two subscripts. The first
subscript gives the word number and the second specifies the bit position in the
word. Thus cell C is the cell for bit j in word i.
 A bit a, in the argument register is compared with all the bits in column j of the
array provided that k = 1. This is done for all columns j = 1,2...., n.
 If a match occurs between all the unmasked bits of the argument and the bits in
word I, the corresponding bit M, in the match register is set to 1. If one or more
unmasked bits of the argument and the word do not match, M is cleared to 0.
 The internal organization of a typical cell C is shown in figure below.

56 | P a g e
Gambella University Department of Computer Science

Figure: One Cell of Associative Memory


 It consists of a flip - flop storage element F, and the circuits for reading, writing, and
matching the cell.
 The input bit is transferred into the storage cell during a write operation. The bit
stored is read out during a read operation.
 The match logic compares the content of the storage cell with the corresponding
unmasked bit of the argument and provides an output for the decision logic that sets
the bit in M.
Match Logic
 The match logic for each word can be derived from the comparison algorithm for two
binary numbers.
 First neglect the key bits and compare the argument in A with the bits stored in the cells
of the words.
 Word i is equal to the argument in A if A = F for j = 1,2,...., n.
 Two bits are equal if they are both 1 or both 0.
 The equality of two bits can be expressed logically by the Boolean function.

xj =AjFij + A’jF’ij
Where xj = 1 if the pair of bits in position j are equal; otherwise xj = 0
 For a word i to be equal to the argument in A we must have all x jvariables equal to 1.
This is the condition for setting the corresponding match bit Mi to 1.
 The Boolean function for this condition is

Mi= x1x2x3.....................xn
57 | P a g e
Gambella University Department of Computer Science

And constitutes the AND operation of all pairs of matched bits in a word.

 The requirement is that if Kj = 0, the corresponding bits of Aj and Fij need no comparison.
Only when Kj = 1 must they be compared. This requirement is achieved by ORing each
term with K’j thus:

 The match logic for work I in an associative memory can now be expressed by the
following Boolean function:

 If we substitute the original definition of x j, the Boolean function above can be expressed
as follows:

Where is a product symbol designating the AND operation of all n terms.

Figure: Match logic for one word of associative memory

58 | P a g e
Gambella University Department of Computer Science

5.5 Cache Memory


 Cache memory is a small fast memory and is sometimes used to increase the
speed of processing by making current programs and data available to the CPU
at a rapid rate.
 Keeping the most frequently accessed instructions and data in the fast cache memory
 Locality of Reference: the references to memory tend to be confined within a few
localized areas in memory.
The basic operation of the cache is as follows.
 When the CPU needs to access memory, the cache is examined. If the word is found in
the cache, it is read from the fast memory.
 If the word addressed by the CPU is not found in the cache, the main memory is
accessed to read the word.
 A block of words containing the one just accessed is then transferred from main memory
to cache memory.

Hit ratio: is the quantity used to measure the performance of cache memory.

The ratio of the number of hits divided by the total CPU references (hits plus misses) to
memory.

 Hit: the CPU finds the word in the cache (0.9)


 Miss: the word is not found in cache (CPU must read main memory)
Example: A computer with cache memory access time = 100 ns, main memory access time =
1000 ns, hit ratio = 0.9

1 ns miss: 1 x 1000ns

9 ns hit: 9 x 100ns

1900ns/10 = 190ns - average access

Mapping is the transformation of data from main memory to the cache memory.
 There are three types mapping process:
1. Associative mapping
2. Direct mapping
3. Set-associative mapping

59 | P a g e
Gambella University Department of Computer Science

Example of cache memory

 Main memory: 32 K x 12 bit word (15 bit address lines)


 Cache memory: 512x 12 bit word
 CPU sends a 15-bit address to cache
 Hit: CPU accepts the 12-bit data from cache
 Miss: CPU reads the data from main memory (then data is written to cache)
1. Associative mapping
 The fastest and most flexible cache organization uses an associative memory.
 The associative memory stores both the address and content (data) of the memory
word.

Figure: Associative Mapping Cache (all number in octal)


 Any location in cache to store any word from main memory.
 The address value of 15 bits is shown as a five-digit octal number and its corresponding
12-bit word is shown as a four- digit octal number.
2. Direct mapping
 Each memory block has only one place to load in Cache
 Mapping Table is made of RAM instead of CAM
60 | P a g e
Gambella University Department of Computer Science

 In the general case, there are 2k words in cache memory and 2n words in main memory.
 n-bit memory address consists of 2 parts; k bits of Index field and n-k bits of Tag field
 n-bit addresses are used to access main memory and k-bit Index is used to access the
Cache

Figure: Addressing Relationships between Main and Cache Memories


Operation
 The CPU generates a memory request with the index field which is used for the address
to access the cache.
 The tag field of the CPU address is compared with the tag in the word read from the
cache.
 If the two tags match, there is a hit and the desired data word is in cache.
 If there is no match, there is miss and the required word is read from main memory.
 It is then stored in the cache together with the new tag replacing the previous value.
Example: Consider the numerical example of Direct Mapping shown in figure below

61 | P a g e
Gambella University Department of Computer Science

Figure: Direct Mapping Cache Organization


The word at address zero is presently stored in the cache (index=000, tag=00, data=1220).
Suppose that the CPU now wants to access the word at address 02000. The index address is
000, so it is used to access cache. The two tags are then compared. The cache tag is 00 but the
address tag is 02, which does not produce a match. Therefore, the main memory is accessed
and the data word 5670 is transferred to the CPU. The cache word at index address 000 is then
replaced with a tag of 02 and data of 5670.

The direct-mapping example just described uses a block size of one word. The same
organization but using a block size of 8 words is shown in figure below.

Figure: Direct Mapping cache with block size of 8 words


3. Set-Associative mapping
 Each data word is stored together with its tag and the number of tag-data items in one
word of cache.

Figure: Two-way set-associative mapping cache.

62 | P a g e
Gambella University Department of Computer Science

 Each memory block has a set of locations in the Cache to load Set Associative Mapping
Cache with set size of two
 In general, a set-associative cache of set size k will accommodate k words of main
memory in each word of cache.
Operation
 When the CPU generates a memory request, the index value of the address is used to
access the cache.
 The tag field of the CPU address is then compared with both tags in the cache to
determine if a match occurs.
 The comparison logic is done by an associative search of the tags in the set similar to an
associative memory search: thus the name “set-associative.”
 The hit ratio will improve as the set size increases because more words with the same
index but different tags can reside in cache. However, an increase in the set size
increases the number of bits in words of cache and requires more complex comparison
logic.
 When a miss occurs in a set-associative cache and the set is full, it is necessary to
replace one of the tag-data items with a new value.
The most common replacement algorithms used are:
1. Random Replacement: With the random replacement policy the control chooses one
tag-data item for replacement at random.
2. First-In, First-Out (FIFO): The FIFO procedure selects for replacement the item that
has been in the set the longest.
3. Least Recently Used (LRU):The LRU algorithm selects for re-placement the item that
has been least recently used by the CPU.
Both FIFO and LRU can be implemented by adding a few extra bits in each word of cache.
Writing into Cache
There are two ways of writing into memory:
1. Write Through
When writing into memory:
 If Hit, both Cache and memory is written in parallel
 If Miss, Memory is written
2. Write-Back
When writing into memory:
 If Hit, only cache is written

63 | P a g e
Gambella University Department of Computer Science

 If Miss, missing block is brought to cache and write into cache


Cache Initialization
 Cache is initialized:
 When power is applied to the computer
 When main memory is loaded with a complete set of programs from auxiliary
memory.
 Valid bit:
 Indicate whether or not the word contains valid data
5.6 Virtual Memory
 Translate program-generated (Aux. Memory) address into main memory location.
 Gives the programmer the illusion that the system has a very large memory, even
though the computer actually has a relatively small main memory
 Itis an imaginary memory: it gives you the illusion of a memory arrangement that’s not
physically there.
 It allows main memory (DRAM) to act like a cache for secondary storage (magnetic
disk).
 It divides physical memory into blocks, assigns them to different processes
 VM address translation provides a mapping from the virtual address of the processor to
the physical address in main memory or on disk.

Address Space & Memory Space

 Address Space: Virtual Address


 Address used by a programmer
 Memory Space: Physical Address (Location)
 Address in main memory

64 | P a g e
Gambella University Department of Computer Science

Figure:Relation between address and memory space in a virtual memory system.

Memory table for mapping a virtual address

Figure: Memory table for mapping a virtual address.


 Translate the20 bits Virtual address into the 15 bits Physical address

Address Mapping Using Pages


 The physical memory is broken down into groups of equal size called blocks, which may
range from 64 to 4096 words each.
 The term page refers to groups of address space of the same size.
 Although both a page and a block are split into groups of 1K words, a page refers to the
organization of address space, while a block refers to the organization of memory space.

65 | P a g e
Gambella University Department of Computer Science

Figure: Address space and memory space split into group of 1K words.

Figure: Memory table in a paged system.


Associative memory page table

 A more efficient way to organize the page table would be to construct it with a number of
words equal to the number of blocks in main memory.
66 | P a g e
Gambella University Department of Computer Science

 Each word in memory containing a page number together with its corresponding block
number.
 The page field in each word is compared with the page number in the virtual address.
 In a match occurs, the word is read from memory and its corresponding block number is
extracted.

Figure: An associative memory page table.

Page Replacement
 Page Fault: the page referenced by the CPU is not in main memory
 A new page should be transferred from auxiliary memory to main memory
 Replacement algorithms:
1. FIFO (First-In-First-Out)
 FIFO algorithm selects the page that has been in memory the longest time using a queue -
every time a page is loaded, its identification is inserted in the queue
 Easy to implement
 May result in a frequent page fault
2. Optimal Replacement (OPT)
 The lowest page fault rate of all algorithms
 Replace that page which will not be used for the longest period of time
3. LRU (Least Recently Used)
 OPT is difficult to implement since it requires future knowledge
 LRU uses the recent past as an approximation of near future.
 Replace that page which has not been used for the longest period of time
 LRU may require substantial hardware assistance

67 | P a g e
Gambella University Department of Computer Science

 The problem is to determine an order for the frames defined by the time of last
use
LRU Implementation Methods:
 Counters
 For each page table entry
 Time-of-use register
 Incremented for every memory reference
 Page with the smallest value in time-of-use register is replaced
 Stack
 Stack of page numbers
 Whenever a page is referenced its page number is removed from the stack and
pushed on top- ---Least recently used page number is at the bottom
 LRU Approximation
 Reference (or use) bit is used to approximate the LRU
 Turned on when the corresponding page is referenced after its initial loading
 Additional reference bits may be used
5.7 Memory Management Hardware
Memory management system is a collection of hardware and software procedures for managing the various
programs residing in memory.
Basic components of a Memory Management Unit:
1. Address mapping
2. Common program sharing
3. Program protection
MMU: OS
 CPU
 Memory controller
Segment
 A set of logically related instruction or data elements associated with a given name
 Example: a subroutine, an array of data, a table of symbol, user’s program
Logical Address
 The address generated by a segmented program
 Similar to virtual address
 Virtual Address: fixed-length page
 Logical Address: variable-length segment

68 | P a g e
Gambella University Department of Computer Science

Figure: Mapping in segmented - page MMU


Numerical Example

Logical address & Physical address


Logical address:
 4 bit segment: 16 segments
 8 bit page: 256 pages
 8bit word: 256 address field
Physical Address:
 12 bit block: 4096 blocks
 8 bit word: 256 address field

69 | P a g e
Gambella University Department of Computer Science

Memory Protection
 Typical segment descriptor

Access Rights: protecting the programs residing in memory


 Full read and write privileges: no protection
 Read only: write protection
 Execute only: program protection
 System only: operating system protection

70 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy