COA Full Notes
COA Full Notes
COA Full Notes
CHAPTER-1
CENTRAL PROCESSING UNIT
1.1 Introduction
Computer organization refers to the operational units and their interconnections that
realize the architectural specifications. Organization, is the implementation of computer
system, in terms of its interconnection of functional units: CPU, memory, bus and I/O
devices.
Computer architecture is the computer structure and behavior visible to a
programmer. Architecture concerns more about the basic instruction design that may
lead to better performance of the system. The basic instruction design includes the
instruction formats, addressing modes, the instruction set, and the general organization
of the CPU registers.
So, organization of a computer is the implementation of its architecture, but tailored
to fit the intended price and performance measures.
CPU is the brain or engine of the PC and it performs the bulk of system’s calculating and data
processing. CPU is an integrated circuit that holds most of the works of a computer.
The CPU is usually the most expensive component in the system, costing up to four or more
times greater than the motherboard it plugs into.
1|Page
Gambella University Department of Computer Science
3. Control Unit: -supervises the transfer of information among the registers and
instructs the ALU as to which operation to perform.
The design of a CPU is a task that involves choosing the hardware for implementing the
machine instructions.
Let us describe how registers communicate with the ALU through buses and explain the
operation of the memory stack.
2|Page
Gambella University Department of Computer Science
R1 R2 + R3
1. The control must provide binary selection variables to the following selector
inputs:
2. MUX A Selector (SELA): to place the content of R2 into bus A.
3. MUX B Selector (SELB): to place the content of R3 into bus B.
4. ALU operation selector (OPR): to provide the arithmetic addition A + B.
5. Decoder destination selector (SELD): to transfer the content of the output bus onto Rl.
There are therefore 14 binary selection inputs in the unit, and their combination value
specifies a control word. Control word (CW) is a word whose individual bits represent a
various control signals.
The 3-bit binary code listed in the first column of the table specifies the binary code for each
of the three fields.
The encoding of the ALU operations for the CPU is specified in table below.
3|Page
Gambella University Department of Computer Science
The OPR field has five bits and each operation is designated with a symbolic name.
A useful feature that is included in the CPU of most computer is a stack or Last-In-First-Out
(LIFO) list. A stack is a storage device that stores information in such a manner that the item
stored last is the first item retrieved.
The stack in digital computers is essentially a memory unit with an address register that can
count only (after an initial value is loaded into it). The register that holds the address for the
stack is called a stack pointer (SP) because its value always points at the top item in the
stack. The physical registers of a stack are always available for reading or writing. It is the
content of the word that is inserted or deleted.
The two operations of a stack are the insertion and deletion of items.
The operation of insertion is called push (or push-down) because it can be thought of
as the result of pushing a new item on top of a stack.
The operation of deletion is called pop (or pop-up) because it can be through of as the
result of removing one item so that the stack pops up. However, nothing is pushed or
popped in a computer stack. These operations are simulated by incrementing or
decrementing the stack pointer register.
4|Page
Gambella University Department of Computer Science
Register Stack
The stack pointer register SP contains a binary number whose values is equal to address of
the word that is currently on top of the stack. Three items are placed in the stack: A, B, and
C. In that order, item C is on top of the stack so that the contents of SP now 3.
In a 64 - word stack, the stack pointer contains 6 bits because 2 6=64. Since SP has only six
bits, it cannot exceed a number greater than 63 (111111 in binary). When 63 is incremented
by 1, the result is 0 since 111111 + 1 = 1000000 in binary, but SP can accommodate only the
six least significant bits. Similarly, when 000000 is decremented by1, the result is 111111.
The one-bit register 1 when the stack is empty of items. DR is the data register that hold the
binary data to be written into or read out of the stack.
6|Page
Gambella University Department of Computer Science
The common arithmetic expressions are written in infix notation, with each operator
written between the operands.
Consider the simple arithmetic expression
The arithmetic expressions can be represented in prefix notation (polish notation),
places the operator before the operands.
The postfix notation (Reverse Polish Notation (RPN)), places the operator after the
operands.
The reverse polish notation is in a form suitable for stack manipulation. The expression:
A*B+C*D
Is written in reverse polish notation as:
7|Page
Gambella University Department of Computer Science
AB*CD*+
Exercise:
Convert the following infix notation to reverse polish notation using stack data structure.
(A + B) * [C *(D + E) + F]
Evaluation of Arithmetic Expressions
Reverse Polish notation, combined with a stack arrangement of registers, is the most efficient
way known for evaluating arithmetic expressions. The stack is particularly useful for handling
long, complex problems involving chain calculations. It is based on the fact that any
arithmetic expression can be expressed in parentheses-free polish notation.
(3*4)+ (5 * 6)
In reverse polish notation, it is expressed as:
34 *56 *+
Now consider the stack operations shown in figure below which shows the evaluation of the
postfix Expressions:
8|Page
Gambella University Department of Computer Science
CHAPTER -2
INSTRUCTION FORMATS
A computer will usually have a variety of instruction code formats. It is the function of the
control unit within the CPU to interpret each instruction code and provide the necessary
control functions needed to process the instruction.
The format of an instruction is usually depicted in a rectangular box symbolizing the bits of
the instruction as they appear in memory words or in a control register. A computer
instruction has the following format:
The bits of the instruction are divided into groups called fields.
1. An operation code field that specifies the operations to be performed (add, subtract,
complement, and shift.).
2. An address field that designates a memory address or a processor register.
3. A mode field that specifies the way the operand or the effective address is determined.
(Optional)
Computers may have instructions of several different lengths containing varying number
of addresses which depends on the internal organization of its registers. Most computer
fall into one of three types of CPU organization:
1. Single accumulator organization.
2. General register organization.
3. Stack organization.
All operations are performed with an implied accumulator register. The instruction
format in this type of computer uses one address field.
9|Page
Gambella University Department of Computer Science
2. Two-Address Instructions
Two -address instructions are the most common in commercial computers. Here again
each address field can specify either a processor register or a memory word. The
program to evaluate X = (A+B)*(C+D) is as follows:
MOV X, R1 M[X] R1
3. One-Address Instructions
One-address instructions use an implied accumulator (AC) register for all data
manipulation. For multiplication and division there is a need for a second register.
However, here we will neglect the second register and assume that the AC contains the
result of all operations. The program to evaluate X= (A+B)*(C+D) is:
LOAD A AC M[A]
ADD B AC AC + M[B]
STORE T M[T] AC
LOAD C AC M[C]
10 | P a g e
Gambella University Department of Computer Science
ADD D AC AC + M[D]
MUL T AC AC * M[T]
STORE X M[X] AC
T is the address of a temporary memory location required for storing the intermediate
result.
4. Zero-Address Instructions
A Stack-organized computer does not use an address field for the instructions ADD and
MUL. The PUSH and POP instructions, however, need an address field to specify the
operand that communicates with the stack. The following program shows how X =
(A+B)*(C+D) will be written for a stack-organized computer. (TOS stands for top of
stack.)
PUSH A TOS A
PUSH B TOS B
PUSH C TOS C
PUSH D TOS D
RISC Instructions
The instruction set of a typical RISC processor is restricted to the use of load and store
instructions when communicating between memory and CPU. All other instructions are
executed within the registers of the CPU without referring to memory. The following is a
program to evaluate X = (A+B)*(C+D).
11 | P a g e
Gambella University Department of Computer Science
The way the operands are chosen during program execution is dependent on the
addressing mode of the instruction. The addressing mode specifies a rule for
interpreting or modifying the address field of the instruction before the operand is
actually referenced. Computers use addressing mode techniques for the purpose of
accommodating one or both of the following provisions:
1. Implied Mode
In this mode the operands are specified implicitly in the definition of the instruction.
Examples:
12 | P a g e
Gambella University Department of Computer Science
Immediate- mode instructions are useful for initializing registers to a constant value.
Note:
Both implied and immediate modes do not need address field at all.
3. Register Mode
In this mode, the address field (operand) specifies a processor register. The particular
register is selected from a register field in the instruction. A k-bit field can specify any
one 2k registers.
13 | P a g e
Gambella University Department of Computer Science
In this mode the address field of the instruction gives the address where the effective
address is stored in memory. Control fetches the instruction from memory and uses its
address part to access memory again to read the effective address.
The effective address in these modes is obtained from the following computation:
14 | P a g e
Gambella University Department of Computer Science
Example:
Assume that a program counter contains the number 825 and the address part of
the instruction contains the number 24.
The instruction at location 825 is read from memory during the fetch phase and
the program counter is then incremented by one to 826.
The effective address = 826 + 24 = 850.
This is 24 memory location forward from the address of the next instruction.
8. Indexed Addressing Mode
In this mode the content of an index register is added to the address part of the
instruction to obtain the effective address. The index register is a special CPU register
that contains an index value. The address field of the instruction defines the beginning
address of a data array in memory.
Data transfer instructions move data from one place to another without changing the
data content. The most common transfer are between memory and processor register,
between processor register and input or output, and between the processor register s
themselves. The following table gives a list of eight data transfer instruction used in
many computers.
15 | P a g e
Gambella University Department of Computer Science
1. Arithmetic instructions
2. Logical and bit manipulation instructions
3. Shift instructions
Arithmetic Instructions
The four basic arithmetic operations are addition, subtraction, multiplication and
division. Most computers provide instructions for all four operations.
16 | P a g e
Gambella University Department of Computer Science
The mnemonic for three add instructions that specify different data types are shown
below.
ADDI Add two binary integer numbers
ADDF Add two floating-point numbers
ADDD Add two decimal number in BCD
Logical and Bit Manipulation Instructions
Logical instructions performs binary operation on strings of bits stored in registers. They
are useful for manipulating individual bits or a group or bits that represent binary-coded
information.
Some typical logical and bit manipulation instructions are listed in table below.
Table: Typical Logical and Bit Manipulation Instructions
17 | P a g e
Gambella University Department of Computer Science
Shift Instructions
Instructions to shift the content of an operand are quite useful and are often provided in
several variations. Shifts are operations in which the bits or a word are moved to the left
or to right. Shift instructions may specify either logical shifts, arithmetic shifts, or rotate-
type operations.
A possible instruction code format of a shift instruction may include five fields as follows:
OP REG TYPE RL COUNT
With such a format, it is possible to specify the type of Shift, the direction, and the
number of shifts, all in one instruction.
In other words, program control instructions specify conditions for altering the content of
the program counter, while data transfer and manipulation instructions specify
conditions for data- processing operations. The change in value of the programcounter
as a result of the execution of a program control instruction causes a break in the
sequence of instruction execution. This is an important feature in digital computers, as it
18 | P a g e
Gambella University Department of Computer Science
provides control over the flow of program execution and a capability for branching to
different program segments.
Some typical program control instructions are listed in the following table.
Table: Typical Program Control Instructions
It is sometimes convenient to supplement the ALU circuit in the CPU with a status
register where status bit condition can be stored for further analysis. Status bits are also
called condition - code bits or flag bits.
The following figure shows the block diagram of an 8-bit ALU with a 4-bit status register.
19 | P a g e
Gambella University Department of Computer Science
The four status bits are symbolized by C, S, Z, and V. The bits are set or cleared as a
result of an operation performed in the ALU.
Example:
Consideration an 8-bit ALU. The largest unsigned number that can be accommodated in
8 bits is 255. The range of signed numbers is between +127 and -128. The subtraction
of two numbers is the same whether they are unsigned or in signed -2’s complement
representation. Let A=11110000 and B=000010100. To perform A-B, the ALU takes the
2’s complement of B and adds it to A.
The compare instruction updates the status bits as shown. C=1 because there is a carry
out of the last stage. S=1 because the leftmost bit is 1. V=0 because the last two carries
are both equal to 1, and Z=0 because the result is not equal to 0.
Table below gives a list of the most common branch instruction. Each mnemonic is
constructed with the letter B (for branch) and an abbreviation of the condition name.
When the opposite condition state is used, the letter N (for no) is inserted to define the 0
state.
20 | P a g e
Gambella University Department of Computer Science
21 | P a g e
Gambella University Department of Computer Science
Program Interrupt
Program interrupt refers to the transfer of program control from a currently running
program to another service program as a result of an external or internal generated
request. Control returns to the original program after the service program is executed.
The interrupt procedure is similar to a subroutine call except for three variations:
1. The interrupt is usually initiated by an internal or external signal rather that from
the execution of an instruction (except for software interrupt
2. The address of the interrupt service program is determined by the hardware
rather that from the address field of an instruction; and
3. An interrupt procedure usually stores all the information necessary to define the
state of CPU rather than storing only the program counter.
The state of the CPU at the end of the execution cycle (when the interrupt is
recognized) is determined from:
1. The content of the program counter
2. The content of all processor registers
3. The content of certain status conditions
External interrupts come from input - output (I/O) devices, from a timing device, from a
circuit monitoring the power supply, or from any other external source. External
interrupts depend on external conditions that are independent of the program being
executed at the time.
Internal interrupts arise from illegal or erroneous use of an instruction or data. Internal
interrupts are also called traps. Examples of interrupts caused by internal error
conditions are register overflow, attempt to divide by zero, an invalid operation code,
stack overflow, and protection violation.
22 | P a g e
Gambella University Department of Computer Science
23 | P a g e
Gambella University Department of Computer Science
CHAPTER - 3
Example: While an instruction is being executed in the ALU, the next instruction can be
read from memory.
System may have two or more ALU to execute two or more executions at the same
time.
Parallel processing is established by distributing the data among the multiple functional
units.
Figure below shows one possible way of separating the execution unit into eight
functional units operating in parallel.
24 | P a g e
Gambella University Department of Computer Science
3.2 Pipeline
Example: To perform the combined multiple and add operations with a stream of
numbers.
Ai*Bi+Cifor i= 1, 2, 3, ….7
The sub-operations performed in each segment of the pipeline are as follows:
25 | P a g e
Gambella University Department of Computer Science
Example: Consider the following arithmetic operations were pipeline is used in floating-
point adder pipeline binary numbers:
X=A*2a
Y=B*2b
Where A & B are two fractions that represent mantissa and a & b are the exponents.
The floating-point addition and subtraction can be performed in four segments as
follows:
26 | P a g e
Gambella University Department of Computer Science
27 | P a g e
Gambella University Department of Computer Science
The below Figure shows how the instruction cycle in the CPU can be processed with a
four segment pipeline.
28 | P a g e
Gambella University Department of Computer Science
The reduced instruction set computer (RISC) is its ability to use an efficient instruction
pipeline.
RISC is a machine with a very fast clock cycle that executes at the rate of one
instruction per cycle.
Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
29 | P a g e
Gambella University Department of Computer Science
I: Instruction Fetch
A: ALU operation
E: Execute instruction
30 | P a g e
Gambella University Department of Computer Science
DO 20 I = 1,100
20 C(I) =B(I)+A(I)
This is a program for adding to vectors A and B of length 100 to produce a vector
C.
a) Matrix Multiplication
The product matrix C is a 3 x 3 matrix whose elements are related to the elements of A
and B by the inner product:
For example, the number in the first row and first column of matrix C is calculated by
letting i=1, j=1, to obtain
In general, the inner product consists of the sum of k products terms of the form
b) Memory Interleaving
Memory interleaving is the technique of using memory from two or more sources. An
instruction pipeline may require the fetching of instruction and an operand at the same
time from two different segments .Similarly, an arithmetic pipeline usually requires two
or more operands to enter the pipeline at the same time instead of using two memory
buses simultaneous access the memory can be partitioned into a number of modules
connected to a common memory address and data buses.
32 | P a g e
Gambella University Department of Computer Science
The advantage of a modular is that it allows the use of a technique called interleaving
33 | P a g e
Gambella University Department of Computer Science
CHAPTER 4
INPUT/OUTPUT ORGANIZATION
4.1 Peripheral Devices
Peripheral devices are the I/O devices that are externally connected to the machine to read or
write an information.
Input Devices
Keyboard
Optical input devices
Magnetic Input Devices- Magnetic Stripe Reader
Screen Input Devices
- Touch Screen- Light Pen- Mouse
Analog Input Devices
Output Devices
Card Puncher, Paper Tape Puncher
CRT
Printer (Impact, Ink Jet, Laser, Dot Matrix)
Plotter
Analog
4.2 Input-Output Interface
35 | P a g e
Gambella University Department of Computer Science
1. Use two separate buses, one for memory and the other for I/O.
36 | P a g e
Gambella University Department of Computer Science
2. Use one common bus for both memory and I/O but have separate control lines
for each.
3. Use one common bus for memory and I/O with common control lines.
I/O Mapping
Two types of I/O mapping:
1. Isolated I/O
2. Memory-Mapped I/O
Isolated I/O
Separate I/O read/write control lines in addition to memory read/write control lines
Separate (isolated) memory and I/O address spaces
Distinct input and output instructions
Memory-Mapped I/O
A single set of read/write control lines (no distinction between memory and I/O transfer)
Memory and I/O addresses share the common address space reduces memory
address range available
No specific input or output instruction the same memory reference instructions can be
used for I/O transfers
Considerable flexibility in handling I/O operations
37 | P a g e
Gambella University Department of Computer Science
1. Synchronous -All devices derive the timing information from common clock line
2. Asynchronous -No common clock
Asynchronous data transfer
Asynchronous data transfer between two independent units requires that control signals
be transmitted between the communicating units to indicate the time at which data is
being transmitted
Two Asynchronous data transfer methods:
1. Strobe pulse- A strobe pulse is supplied by one unit to indicate the other unit
when the transfer has to occur
2. Handshaking- A control signal is accompanied with each data being transmitted
to indicate the presence of data. The receiving unit responds with another control
signal to acknowledge receipt of the data
Strobe Control
Employs a single control line to time each transfer
The strobe may be activated by either the source or the destination unit
Source-Initiated: the source unit that initiates the transfer has no way of knowing
whether the destination unit has actually received data.
Destination-Initiated: The destination unit that initiates the transfer no way of knowing
whether the source has actually placed the data on the bus.
38 | P a g e
Gambella University Department of Computer Science
Handshaking
To solve problem of strobe method, the handshake method introduces a second control
signal to provide a reply to the unit that initiates the transfer Asynchronous Data
Transfer
Handshaking provides a high degree of flexibility and reliability because the successful
completion of a data transfer relies on active participation by both units
If one unit is faulty, data transfer will not be completed.=> it can be detected by means of
a timeout mechanism
Asynchronous Serial Transfer
The transfer of data between two units may be done in parallel or serial.
Parallel data transmission- each bit of the message has its own path and the total
message is transmitted at the same time.
Serial data transmission- each bit in the message is sent in sequence one at a time.
Serial transmission can be synchronous or asynchronous.
39 | P a g e
Gambella University Department of Computer Science
Integrated circuits are available which are specifically designed to provide the interface
between computer and similar interactive terminals.
Known Universal Asynchronous Receiver-Transmitter (UART).
40 | P a g e
Gambella University Department of Computer Science
Transmitter Register
Accepts a data byte (from CPU) through the data bus
Transferred to a shift register for serial transmission
Receiver
Receives serial information into another shift register
Complete data byte is sent to the receiver register
Status Register Bits
Used for I/O flags and for recording errors
Control Register Bits-
Define baud rate, no. of bits in each character, whether to generate and check
parity, and number of stop bits
First-In-First-Out (FIFO) Buffer
A first-in-first-out (FIFO) buffer is a memory unit that stores information in such a manner
that the item first in is the item first out.
FIFO buffer is useful in some applications when data is transferred asynchronously
The logic diagram of a typical 4 x 4 FIFO buffer is shown in the following figure. It
consists of four 4-bit registers Ri, i = 1,2,3,4, and a control register with flip-flops F i, i =
1,2,3,4, one for each register.
41 | P a g e
Gambella University Department of Computer Science
There four different data transfer modes between the central computer (CPU & Memory) and peripherals:
1) Programmed-Controlled I/O
2) Interrupt-Initiated I/O
3) Direct Memory Access (DMA)
4) I/O Processor (IOP)
Programmed-Controlled I/O (I/O devices to CPU)
Transfer of data under programmed I/O is between CPU and peripherals
Programmed I/O operations are the result of I/O instructions written in the computer
program.
An example of data transfer from an I/O device through an interface into the CPU is
shown in the following figure:
42 | P a g e
Gambella University Department of Computer Science
Command:
Instruction that are read form memory by an IOP
Distinguish from instructions that are read by the CPU
Commands are prepared by experienced programmers and are stored in
memory
Command word = IOP program
CPU - IOP Communication
44 | P a g e
Gambella University Department of Computer Science
I/O Channel
Start I/O, Start I/O fast release (less CPU time), Test I/O, Clear I/O, Halt I/O, Halt
device, Test channel, Store channel ID
Channel Status Word:
Always stored in Address 64 in memory
Key: Protection used to prevent unauthorized access
Address: Last channel command word address used by channel
Count: 0 (if successful transfer)
45 | P a g e
Gambella University Department of Computer Science
Identify the source of the interrupt when several sources will request service
simultaneously
Determine which condition is to be serviced first when two or more requests arrive
simultaneously
Priority interrupt can be done by:
1) Software: Polling
2) Hardware: Daisy chain, Parallel priority
Polling
Identify the highest-priority source by software means
One common branch address is used for all interrupts
Program polls the interrupt sources in sequence
The highest-priority source is tested first
Polling priority interrupt occurs:
If there are many interrupt sources, the time required to poll them can exceed the
time available to service the I/O device
Hardware priority interrupt
Daisy-Chaining
Either a serial or a parallel connection of interrupt lines can establish the hardware
priority function.
The serial connection is known as the daisy- chaining method.
46 | P a g e
Gambella University Department of Computer Science
1) No interrupt request
2) Invalid: interrupt request, but no acknowledge
3) No interrupt request: Pass to other device (other device requested interrupt)
4) Interrupt request
Parallel Priority Interrupt
The parallel priority interrupt method uses a register whose bits are set separately by the
interrupt signal from each device.
47 | P a g e
Gambella University Department of Computer Science
48 | P a g e
Gambella University Department of Computer Science
Interrupt Cycle
At the end of each instruction cycle, CPU checks IEN and IST
If both IEN and IST equal to “1”
CPU goes to an Instruction CycleSequence of micro-operation during Instruction Cycle
Serial communication is used for all long-distance communication and most computer
networks
Slow data transfer
Parallel communication: is the process of transferring the whole data bits over
communication channel or computer bus simultaneously.
CHAPTER -5
50 | P a g e
Gambella University Department of Computer Science
MEMORY ORGANIZATION
Main Memory: memory unit that communicates directly with the CPU (RAM)
Auxiliary Memory: device that provide backup storage (Disk Drives)
Cache Memory: special very-high-speed memory to increase the processing
speed (Cache RAM)
Multiprogramming: enables the CPU to process a number of independent
programconcurrently.
Memory Management System: supervises the flow of information between auxiliary
memory and main memory.
5.2 Main Memory
It is the memory unit that communicates directly with the CPU.
It is the central storage unit in a computer system
It is relatively large and fast memory used to store programs and data during the
computer operation.
a) RAM Chips
51 | P a g e
Gambella University Department of Computer Science
RAM is used for storing the bulk of the programs and data that are subject to change.
Available in two possible operating modes:
1. Static RAM- consists essentially of internal flip-flops that store binary information.
2. Dynamic RAM – stores the binary information in the form of electric charges that are
applied to capacitors.
52 | P a g e
Gambella University Department of Computer Science
53 | P a g e
Gambella University Department of Computer Science
54 | P a g e
Gambella University Department of Computer Science
55 | P a g e
Gambella University Department of Computer Science
Word 2 matches the unmasked argument field because the three leftmost bits of the
argument and the word are equal.
The relation between the memory array and external registers in an associated
memory is shown in figure below.
The cells in the array are marked by the letter C with two subscripts. The first
subscript gives the word number and the second specifies the bit position in the
word. Thus cell C is the cell for bit j in word i.
A bit a, in the argument register is compared with all the bits in column j of the
array provided that k = 1. This is done for all columns j = 1,2...., n.
If a match occurs between all the unmasked bits of the argument and the bits in
word I, the corresponding bit M, in the match register is set to 1. If one or more
unmasked bits of the argument and the word do not match, M is cleared to 0.
The internal organization of a typical cell C is shown in figure below.
56 | P a g e
Gambella University Department of Computer Science
xj =AjFij + A’jF’ij
Where xj = 1 if the pair of bits in position j are equal; otherwise xj = 0
For a word i to be equal to the argument in A we must have all x jvariables equal to 1.
This is the condition for setting the corresponding match bit Mi to 1.
The Boolean function for this condition is
Mi= x1x2x3.....................xn
57 | P a g e
Gambella University Department of Computer Science
And constitutes the AND operation of all pairs of matched bits in a word.
The requirement is that if Kj = 0, the corresponding bits of Aj and Fij need no comparison.
Only when Kj = 1 must they be compared. This requirement is achieved by ORing each
term with K’j thus:
The match logic for work I in an associative memory can now be expressed by the
following Boolean function:
If we substitute the original definition of x j, the Boolean function above can be expressed
as follows:
Where is a product symbol designating the AND operation of all n terms.
58 | P a g e
Gambella University Department of Computer Science
Hit ratio: is the quantity used to measure the performance of cache memory.
The ratio of the number of hits divided by the total CPU references (hits plus misses) to
memory.
1 ns miss: 1 x 1000ns
9 ns hit: 9 x 100ns
Mapping is the transformation of data from main memory to the cache memory.
There are three types mapping process:
1. Associative mapping
2. Direct mapping
3. Set-associative mapping
59 | P a g e
Gambella University Department of Computer Science
In the general case, there are 2k words in cache memory and 2n words in main memory.
n-bit memory address consists of 2 parts; k bits of Index field and n-k bits of Tag field
n-bit addresses are used to access main memory and k-bit Index is used to access the
Cache
61 | P a g e
Gambella University Department of Computer Science
The direct-mapping example just described uses a block size of one word. The same
organization but using a block size of 8 words is shown in figure below.
62 | P a g e
Gambella University Department of Computer Science
Each memory block has a set of locations in the Cache to load Set Associative Mapping
Cache with set size of two
In general, a set-associative cache of set size k will accommodate k words of main
memory in each word of cache.
Operation
When the CPU generates a memory request, the index value of the address is used to
access the cache.
The tag field of the CPU address is then compared with both tags in the cache to
determine if a match occurs.
The comparison logic is done by an associative search of the tags in the set similar to an
associative memory search: thus the name “set-associative.”
The hit ratio will improve as the set size increases because more words with the same
index but different tags can reside in cache. However, an increase in the set size
increases the number of bits in words of cache and requires more complex comparison
logic.
When a miss occurs in a set-associative cache and the set is full, it is necessary to
replace one of the tag-data items with a new value.
The most common replacement algorithms used are:
1. Random Replacement: With the random replacement policy the control chooses one
tag-data item for replacement at random.
2. First-In, First-Out (FIFO): The FIFO procedure selects for replacement the item that
has been in the set the longest.
3. Least Recently Used (LRU):The LRU algorithm selects for re-placement the item that
has been least recently used by the CPU.
Both FIFO and LRU can be implemented by adding a few extra bits in each word of cache.
Writing into Cache
There are two ways of writing into memory:
1. Write Through
When writing into memory:
If Hit, both Cache and memory is written in parallel
If Miss, Memory is written
2. Write-Back
When writing into memory:
If Hit, only cache is written
63 | P a g e
Gambella University Department of Computer Science
64 | P a g e
Gambella University Department of Computer Science
65 | P a g e
Gambella University Department of Computer Science
Figure: Address space and memory space split into group of 1K words.
A more efficient way to organize the page table would be to construct it with a number of
words equal to the number of blocks in main memory.
66 | P a g e
Gambella University Department of Computer Science
Each word in memory containing a page number together with its corresponding block
number.
The page field in each word is compared with the page number in the virtual address.
In a match occurs, the word is read from memory and its corresponding block number is
extracted.
Page Replacement
Page Fault: the page referenced by the CPU is not in main memory
A new page should be transferred from auxiliary memory to main memory
Replacement algorithms:
1. FIFO (First-In-First-Out)
FIFO algorithm selects the page that has been in memory the longest time using a queue -
every time a page is loaded, its identification is inserted in the queue
Easy to implement
May result in a frequent page fault
2. Optimal Replacement (OPT)
The lowest page fault rate of all algorithms
Replace that page which will not be used for the longest period of time
3. LRU (Least Recently Used)
OPT is difficult to implement since it requires future knowledge
LRU uses the recent past as an approximation of near future.
Replace that page which has not been used for the longest period of time
LRU may require substantial hardware assistance
67 | P a g e
Gambella University Department of Computer Science
The problem is to determine an order for the frames defined by the time of last
use
LRU Implementation Methods:
Counters
For each page table entry
Time-of-use register
Incremented for every memory reference
Page with the smallest value in time-of-use register is replaced
Stack
Stack of page numbers
Whenever a page is referenced its page number is removed from the stack and
pushed on top- ---Least recently used page number is at the bottom
LRU Approximation
Reference (or use) bit is used to approximate the LRU
Turned on when the corresponding page is referenced after its initial loading
Additional reference bits may be used
5.7 Memory Management Hardware
Memory management system is a collection of hardware and software procedures for managing the various
programs residing in memory.
Basic components of a Memory Management Unit:
1. Address mapping
2. Common program sharing
3. Program protection
MMU: OS
CPU
Memory controller
Segment
A set of logically related instruction or data elements associated with a given name
Example: a subroutine, an array of data, a table of symbol, user’s program
Logical Address
The address generated by a segmented program
Similar to virtual address
Virtual Address: fixed-length page
Logical Address: variable-length segment
68 | P a g e
Gambella University Department of Computer Science
69 | P a g e
Gambella University Department of Computer Science
Memory Protection
Typical segment descriptor
70 | P a g e