Coa Notes 4th Semm Computer Organization and Architecture
Coa Notes 4th Semm Computer Organization and Architecture
CONTENT
SL NO. CHAPTER
1 Functional blocks of a computer
1 CPU
2 Memory
3 Input-output subsystems
4 Control unit
5 Instruction set architecture of a CPU
6 Registers
7 Instruction execution cycle
8 RTL interpretation of instructions
9 Addressing modes, instruction set
10 Case study – instruction sets of some common CPUs.
2 Data representation
1 Signed number representation
2 Fixed and floating point representations
3 character representation
4 Computer arithmetic – integer addition and subtraction
5 Ripple carry adder
6 carry look-ahead adder
7 Multiplication – shift- And add
8 Booth multiplier
9 carry save multiplier
10 Division restoring and non-restoring techniques
11 Floating point arithmetic
3 pipeline hazards
5 Memory organization
1 Memory interleaving
3 cache memory
5 mapping function
6 replacement algorithm
7 write policies.
SYLLABUS
Functional blocks of a computer: CPU, memory, input-output subsystems, control unit. Instruction set
architecture of a CPU-registers, instruction execution cycle, RTL interpretation of instructions,
addressing modes, instruction execution cycle, RTL interpretation of instructions, addressing modes,
instruction set. Case study – instruction sets of some common CPUs.
Data representation: signed number representation, fixed and floating-point representations, character
representation. Computer arithmetic – integer addition and subtraction, ripple carry adder, carry look-
ahead adder, etc. multiplication – shift-and add, Booth multiplier, carry save multiplier, etc. Division
restoring and non-restoring techniques, floating point arithmetic.
CPU control unit design: hardwired and micro-programmed design approaches, Case study – design of a
simple hypothetical CPU.
Peripheral devices and their characteristics: Input-Output subsystems, I/O device interface, I/O
transfers=program controlled, interrupt driven and DMA, privileged and non-privileged instructions,
software interrupts and exceptions, Programs and processes-role of interrupts in process state
transitions, I/O device interfaces – SCII, USB
Parallel Processors: Introduction to parallel processors, Concurrent access to memory and cache
coherency CPU Basics: Multiple CPUs, Cores, and Hyper-Threading, Introduction to Multiple-Processor
Scheduling in Operating System.
Module-I:
Functional blocks of a computer:
Input Unit :The input unit consists of input devices that are attached to the
computer. These devices take input and convert it into binary language that
the computer understands. Some of the common input devices are
keyboard, mouse, joystick, scanner etc.
Central Processing Unit (CPU) : Once the information is entered into the
computer by the input device, the processor processes it. The CPU is called
the brain of the computer because it is the control center of the computer. It
first fetches instructions from memory and then interprets them so as to
know what is to be done. If required, data is fetched from memory or input
device. Thereafter CPU executes or performs the required computation and
then either stores the output or displays on the output device. The CPU has
three main components which are responsible for different functions –
Arithmetic Logic Unit (ALU), Control Unit (CU) and Memory registers
Arithmetic and Logic Unit (ALU) : The ALU, as its name suggests
performs mathematical calculations and takes logical decisions. Arithmetic
calculations include addition, subtraction, multiplication and division. Logical
decisions involve comparison of two data items to see which one is larger or
smaller or equal.
Control Unit : The Control unit coordinates and controls the data flow in and
out of CPU and also controls all the operations of ALU, memory registers
and also input/output units. It is also responsible for carrying out all the
instructions stored in the program. It decodes the fetched instruction,
interprets it and sends control signals to input/output devices until the
required operation is done properly by ALU and memory.
Types of Control Unit – There are two types of control units: Hardwired control
unit and Microprogrammable control unit.
1. Hardwired Control Unit – In the Hardwired control unit, the control signals
that are important for instruction execution control are generated by specially
designed hardware logical circuits, in which we cannot modify the signal
generation method without physical change of the circuit structure. The
operation code of an instruction contains the basic data for control signal
generation. In the instruction decoder, the operation code is decoded. The
instruction decoder constitutes a set of many decoders that decode different
fields of the instruction opcode. As a result, few output lines going out from
the instruction decoder obtains active signal values. These output lines are
connected to the inputs of the matrix that generates control signals for
execution units of the computer. This matrix implements logical combinations
of the decoded signals from the instruction opcode with the outputs from the
matrix that generates signals representing consecutive control unit states
and with signals coming from the outside of the processor,
e.g. interrupt signals. The matrices are built in a similar way as a
programmable logic arrays.
that ends program execution, the control unit enters an operating system
state, in which it waits for a next user directive.
2. Microprogrammable control unit – The fundamental difference between
these unit structures and the structure of the hardwired control unit is the
existence of the control store that is used for storing words containing
encoded control signals mandatory for instruction execution. In
microprogrammed control units, subsequent instruction words are fetched
into the instruction register in a normal way. However, the operation code of
each instruction is not directly decoded to enable immediate control signal
generation but it comprises the initial address of a microprogram contained
in the control store.
With a single-level control store: In this, the instruction opcode from
the instruction register is sent to the control store address register. Based
on this address, the first microinstruction of a microprogram that
interprets execution of this instruction is read to the microinstruction
register. This microinstruction contains in its operation part encoded
control signals, normally as few bit fields. In a set microinstruction field
decoders, the fields are decoded. The microinstruction also contains the
address of the next microinstruction of the given instruction microprogram
and a control field used to control activities of the microinstruction
address generator
microinstruction that fetches the next instruction from the main memory to
the instruction register.
With a two-level control store: In this, in a control unit with a two-level
control store, besides the control memory for microinstructions, a nano-
instruction memory is included. In such a control unit, microinstructions
do not contain encoded control signals. The operation part of
microinstructions contains the address of the word in the nano-instruction
memory, which contains encoded control signals. The nano-instruction
memory contains all combinations of control signals that appear in
microprograms that interpret the complete instruction set of a given
computer, written once in the form of nano-instructions.
Memory : Memory attached to the CPU is used for storage of data and
instructions and is called internal memory The internal memory is divided into
many storage locations, each of which can store data or instructions. Each memory
location is of the same size and has an address. With the help of the address, the
computer can read any memory location easily without having to search the entire
memory. when a program is executed, it’s data is copied to the internal memory
and is stored in the memory till the end of the execution. The internal memory is
also called the Primary memory or Main memory. This memory is also called as
RAM, i.e. Random Access Memory. The time of access of data is independent of
its location in memory, therefore this memory is also called Random Access
memory (RAM).
RAM: The term is based on the fact that any storage location can be
accessed directly by the processor.
Static RAM: SRAM retains data bits in its memory for as long as power is
supplied to it. Unlike DRAM, which stores bits in cells consisting of a
capacitor and a transistor, SRAM does not have to be periodically
refreshed.
Output Unit : The output unit consists of output devices that are attached with
the computer. It converts the binary data coming from CPU to human
understandable form. The common output devices are monitor, printer, etc.
Registers: Registers are a type of computer memory used to quickly accept, store,
and transfer data and instructions that are being used immediately by the CPU. The
registers used by the CPU are often termed as Processor registers. A processor register
may hold an instruction, a storage address, or any data (such as bit sequence or
individual characters). The computer needs processor registers for manipulating data
and a register for holding a memory address. The register holding the memory location
is used to calculate the address of the next instruction after the execution of the current
instruction is completed.
Following is the list of some of the most common registers used in a basic
computer:
o The Memory unit has a capacity of 4096 words, and each word contains 16 bits.
o The Data Register (DR) contains 16 bits which hold the operand read from the
memory location.
o The Memory Address Register (MAR) contains 12 bits which hold the address for
the memory location.
o The Program Counter (PC) also contains 12 bits which hold the address of the
next instruction to be read from memory after the current instruction is executed.
o The Accumulator (AC) register is a general purpose processing register.
o The instruction read from memory is placed in the Instruction register (IR).
o The Temporary Register (TR) is used for holding the temporary data during the
processing.
o The Input Registers (IR) holds the input characters given by the user.
o The Output Registers (OR) holds the output after processing the input data.
Instruction Execution Cycle: The instruction cycle comprises three main stages and is
also addressed as the fetch-decode-execute cycle or fetch-execute cycle because of the steps
involved. The stages are as follows:
Fetch stage
Decode stage
Execute stage
The CPU reads the effective address from memory if the instruction has an indirect address
The register retrieves instructions from memory.
It is capable of carrying out the command
Decoding education is the primary task assigned
The loop continues carrying out the task repeatedly until a halt condition is met. Till then, the cycle
keeps on restarting.
The code for Instruction Cycle determines which phase of the cycle it is in.
Cycle of Fetch
The address instruction to be executed in the order is stored on the program counter. The processor
retrieves the education from the pointed memory. The register increments the PC and obtains the
instruction’s address. The instruction is then written to the instruction register. The processor reads
the instruction and then executes it.
Cycle of Execution
The data transfer for implementation occurs in two ways, which are as follows:
The data is sent from the processor to memory or from memory to the processor. Processor-Input/
Output Data can be transferred to or from a peripheral device via a processor-I/O device transfer.
The processor performs essential operations on the information during the execution cycle, and the
control consistently requests a change in the data implementation sequence. These two methods
are linked and complete the execution cycle.
The Register-reference instructions are represented by the Opcode 111 with a 0 in the
leftmost bit (bit 15) of the instruction. A Register-reference instruction specifies an
operation on or a test of the AC (Accumulator) register.
Addressing Modes:
Instructions that define the address of a definite memory location are known as memory
reference instructions. The method in which a target address or effective address is
recognized within the instruction is known as addressing mode.
The address field for instruction can be represented in two different ways are as follows
−
through the computer determines the number of bits needed for the opcode. The
minimum bits accessible to the opcode should be n for 2n operations. These operations
are implemented on information that is saved in processor registers or memory.
Address
The address is represented as the location where a specific instruction is constructed in
the memory. The address bits of an instruction code is used as an operand and not as
an address. In such methods, the instruction has an immediate operand. If the second
part has an address, the instruction is referred to have a direct address.
There is another possibility in the second part including the address of the operand.
This is referred to as an indirect address. In the instruction code, one bit can signify if
the direct or indirect address is executed.
The figure shows a diagram showing direct and indirect addresses.
Instruction sets of some common CPUs: An instruction is a set of codes that the
computer processor can understand. The code is usually in 1s and 0s, or machine language. It
contains instructions or tasks that control the movement of bits and bytes within the processor.
RISC Architecture
to minimize the number of instructions per program but at the cost of an increase in the
number of cycles per instruction.
The design of an instruction set for a computer must take into consideration not only
machine language constructs but also the requirements imposed on the use of high
level programming languages.
The goal of CISC is to attempt to provide a single machine instruction for each
statement that is written in a high level language.
Characteristics of CISC
The characteristics of CISC are as follows −
A large number of instructions typically from 100 to 250 instructions.
Some instructions that perform specialized tasks and are used infrequently.
A large variety of addressing modes- typically from 5 to 20 different modes.
Variable length instruction formats.
Instructions that manipulate operands in memory.
RISC CISC
It emphasizes on software to optimize the instruction set. It emphasizes on hardware to optimize the
instruction set.
It is a hard wired unit of programming in the RISC Microprogramming unit in CISC Processor.
Processor.
It requires multiple register sets to store the instruction. It requires a single register set to store the
instruction.
RISC has simple decoding of instruction. CISC has complex decoding of instruction.
Uses of the pipeline are simple in RISC. Uses of the pipeline are difficult in CISC.
It uses a limited number of instruction that requires less It uses a large number of instruction that requires
time to execute the instructions. more time to execute the instructions.
It uses LOAD and STORE that are independent instructions It uses LOAD and STORE instruction in the
in the register-to-register a program's interaction. memory-to-memory interaction of a program.
RISC has more transistors on memory registers. CISC has transistors to store complex instructions.
The execution time of RISC is very short. The execution time of CISC is longer.
RISC architecture can be used with high-end applications CISC architecture can be used with low-end
like telecommunication, image processing, video applications like home automation, security
processing, etc. system, etc.
The program written for RISC architecture needs to take Program written for CISC architecture tends to
more space in memory. take less space in memory.
Example of RISC: ARM, PA-RISC, Power Architecture, Alpha, Examples of CISC: VAX, Motorola 68000 family,
AVR, ARC and the SPARC. System/360, AMD and the Intel x86 CPUs.
Module-II:
Data representation:
Signed number representation: A signed integer is an integer with a positive ‘+’
or negative sign ‘-‘ associated with it. Since the computer only understands
binary, it is necessary to represent these signed integers in binary form.
In binary, signed Integer can be represented in three ways:
1. Signed bit.
2. 1’s Complement.
3. 2’s Complement.
Signed bit Representation: In the signed integer representation method the
following rules are followed:
1. The MSB (Most Significant Bit) represents the sign of the Integer.
2. Magnitude is represented by other bits other than MSB i.e. (n-1) bits where n
is the no. of bits.
3. If the number is positive, MSB is 0 else 1.
4. The range of signed integer representation of an n-bit number is given as –
(2^{n-1}-1) to (2)^{n-1}-1.
1’s Complement representation of a signed integer:
In 1’s complement representation the following rules are used:
1. For +ve numbers the representation rules are the same as signed integer
representation.
2. For –ve numbers, we can follow any one of the two approaches:
Write the +ve number in binary and take 1’s complement of it.
Write Unsigned representation of 2^n-1-X for –X.
2’s Complement representation:
In 2’s Complement representation the following rules are used:
1. For +ve numbers, the representation rules are the same as signed integer
representation.
2. For –ve numbers, there are two different ways we can represent the number.
Write an unsigned representation of 2^n-X for –X in n-bit representation.
Write a representation of +X and take 2’s Complement.
So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is
the exponent value, and Bias is the bias number.
Note that signed integers and exponent are represented by either sign representation,
or one’s complement representation, or two’s complement representation.
The floating point representation is more flexible. Any non-zero number can be
represented in the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a
number x.
Character Representation:
Computers work in binary. As a result, all characters, whether they are letters,
punctuation or digits, are stored as binary numbers. All of the characters that a
computer can use are called a character set.
A 65 1000001 41
Z 90 1011010 5A
a 97 1100001 61
z 122 1111010 7A
0 48 0110000 30
9 57 0111001 39
Space 32 0100000 20
! 33 0100001 21
When data is stored or transmitted, its ASCII or Unicode number is used, not the
character itself.
Computer arithmetic –
Integer addition and subtraction: There are eight conditions to consider while
adding or subtracting signed numbers. These conditions are based on the operations
implemented and the sign of the numbers.
The table displays the algorithm for addition and subtraction. The first column in the
table displays these conditions. The other columns of the table define the actual
operations to be implemented with the magnitude of numbers. The last column of the
table is needed to avoid a negative zero. This defines that when two same numbers are
subtracted, the output must not be - 0. It should consistently be +0.
In the table, the magnitude of the two numbers is defined by P and Q.
most important feature of it is to add the input bit sequences whether the sequence is 4
bit or 5 bit or any.
“One of the most important point to be considered in this carry adder is the final output
is known only after the carry outputs are generated by each full adder stage and
forwarded to its next stage. So there will be a delay to get the result with using of this
carry adder”.
Let’s take an example of two input sequences 0101 and 1010. These are
representing the A4 A3 A2 A1 and B4 B3 B2 B1.
As per this adder concept, input carry is 0.
When Ao & Bo are applied at 1st full adder along with input carry 0.
Here A1 =1 ; B1=0 ; Cin=0
Sum (S1) and carry (C1) will be generated as per the Sum and Carry
equations of this adder. As per its theory, the output equation for the Sum =
A1⊕B1⊕Cin and Carry = A1B1⊕B1Cin⊕CinA1
As per this equation, for 1st full adder S1 =1 and Carry output i.e., C1=0.
Same like for next input bits A2 and B2, output S2 = 1 and C2 = 0. Here the
important point is the second stage full adder gets input carry i.e., C1 which
is the output carry of initial stage full adder.
Like this will get the final output sequence (S4 S3 S2 S1) = (1 1 1 1) and
Output carry C4 = 0
This is the addition process for 4-bit input sequences when it’s applied to
this carry adder.
In the case of binary multiplication, since the digits are 0 and 1, each step of the
multiplication is simple. If the multiplier digit is 1, a copy of the multiplicand (1 ×
multiplicand) is placed in the proper positions; if the multiplier digit is 0, a number of 0
digits (0 × multiplicand) are placed in the proper positions.
Consider the multiplication of positive numbers. The first version of the multiplier
circuit, which implements the shift-and-add multiplication method for two n-bit
numbers, is shown in Figure
The 2n-bit product register (A) is initialized to 0. Since the basic algorithm shifts
the multiplicand register (B) left one position each step to align the multiplicand
with the sum being accumulated in the product register, we use a 2n-bit
multiplicand register with the multiplicand placed in the right half of the register
and with 0 in the left half.
Figure shows the basic steps needed for the multiplication. The algorithm starts
by loading the multiplicand into the B register, loading the multiplier into the Q
register, and initializing the A register to 0. The counter N is initialized to n. The
least significant bit of the multiplier register (Q0) determines whether the
multiplicand is added to the product register. The left shift of the multiplicand has
the effect of shifting the intermediate products to the left, just as when multiplying
by paper and pencil. The right shift of the multiplier prepares the next bit of the
multiplier to examine in the following iteration.
Booth’s multiplication:
The booth algorithm is a multiplication algorithm that allows us to multiply the two signed
binary integers in 2's complement, respectively. It is also used to speed up the performance of
the multiplication process. It is very efficient too. It works on the string bits 0's in the multiplier
that requires no additional bit only shift the right-most string bits and a string of 1's in a
multiplier bit weight 2k to weight 2m that can be considered as 2k+ 1 - 2m.
In the above flowchart, initially, AC and Qn + 1 bits are set to 0, and the SC is a sequence counter
that represents the total bits set n, which is equal to the number of bits in the multiplier. There
are BR that represent the multiplicand bits, and QR represents the multiplier bits. After that,
we encountered two bits of the multiplier as Qn and Qn + 1, where Qn represents the last bit of
QR, and Qn + 1 represents the incremented bit of Qn by 1. Suppose two bits of the multiplier is
equal to 10; it means that we have to subtract the multiplier from the partial product in the
accumulator AC and then perform the arithmetic shift operation (ashr). If the two of the
multipliers equal to 01, it means we need to perform the addition of the multiplicand to the
partial product in accumulator AC and then perform the arithmetic shift operation (ashr),
including Qn + 1. The arithmetic shift operation is used in Booth's algorithm to shift AC and QR
bits to the right by one and remains the sign bit in AC unchanged. And the sequence counter is
continuously decremented till the computational loop is repeated, equal to the number of bits
(n).
Example: 0100 + 0110 => 1010, after adding the binary number shift each bit by 1 to
the right and put the first bit of resultant to the beginning of the new bit.
Example: Multiply the two numbers 7 and 3 by using the Booth's multiplication
algorithm.
Ans. Here we have two numbers, 7 and 3. First of all, we need to convert 7 and 3 into
binary numbers like 7 = (0111) and 3 = (0011). Now set 7 (in binary 0111) as
multiplicand (M) and 3 (in binary 0011) as a multiplier (Q). And SC (Sequence Count)
represents the number of bits, and here we have 4 bits, so set the SC = 4. Also, it shows
the number of iteration cycles of the booth's algorithms and then cycles run SC = SC - 1
time.
Qn Qn + 1 M = (0111) AC Q Qn + 1 SC
M' + 1 = (1001) & Operation
1001
0 1 Addition (A + M) 0111
0101 0100
However, the topology is so that the carry-out from one adder is not connected to the
carry-in of the next adder. Hence preventing a ripple carry.
In this section, we are going to perform restoring algorithm with the help of an
unsigned integer. We are using restoring term because we know that the value of
register A will be restored after each iteration. We will also try to solve this problem
using the flow chart and apply bit operations.
Here, register Q is used to contain the quotient, and register A is used to contain the
remainder. Here, the divisor will be loaded into the register M, and n-bit divided will be
loaded into the register Q. 0 is the starting value of a register. The values of these types
of registers are restored at the time of iteration. That's why it is known as restoring.
Now we will learn some steps of restoring division algorithm, which is described as
follows:
Step 1: In this step, the corresponding value will be initialized to the registers, i.e.,
register A will contain value 0, register M will contain Divisor, register Q will contain
Dividend, and N is used to specify the number of bits in dividend.
Step 2: In this step, register A and register Q will be treated as a single unit, and the
value of both the registers will be shifted left.
Step 3: After that, the value of register M will be subtracted from register A. The result
of subtraction will be stored in register A.
Step 4: Now, check the most significant bit of register A. If this bit of register A is 0, then
the least significant bit of register Q will be set with a value 1. If the most significant bit
of A is 1, then the least significant bit of register Q will be set to with value 0, and
restore the value of A that means it will restore the value of register A before subtraction
with M.
Step 5: After that, the value of N will be decremented. Here n is used as a counter.
Step 6: Now, if the value of N is 0, we will break the loop. Otherwise, we have to again
go to step 2.
Step 7: This is the last step. In this step, the quotient is contained in the register Q, and
the remainder is contained in register A.
For example:
1. Dividend = 11
2. Divisor = 3
N M A Q Operation
Now we will learn steps of the non-restoring division algorithm, which are described as
follows:
Step 1: In this step, the corresponding value will be initialized to the registers, i.e.,
register A will contain value 0, register M will contain Divisor, register Q will contain
Dividend, and N is used to specify the number of bits in dividend.
Step 3: If this bit of register A is 1, then shift the value of AQ through left, and perform
A = A + M. If this bit is 0, then shift the value of AQ into left and perform A = A - M.
That means in case of 0, the 2's complement of M is added into register A, and the
result is stored into A.
Step 5: If this bit of register A is 1, then Q[0] will become 0. If this bit is 0, then Q[0] will
become 1. Here Q[0] indicates the least significant bit of Q.
Step 6: After that, the value of N will be decremented. Here N is used as a counter.
Step 7: If the value of N = 0, then we will go to the next step. Otherwise, we have to
again go to step 2.
Step 9: This is the last step. In this step, register A contains the remainder, and register
Q contains the quotient.
For example:
In this example, we will perform a Non-Restoring Division algorithm with the help of an
Unsigned integer.
1. Dividend = 11
2. Divisor = 3
3. -M = 11101
N M A Q Action
So, register A contains the remainder 2, and register Q contains the quotient 3.
Module III:
Introduction to x86 architecture:
x86 is an Intel CPU architecture used to denote the microprocessor family released after the
original 8086 processor. It was originated with the 16-bit 8086 processor in 1978, which denotes
the microprocessor family on the basis of the Intel 8086 and 8088 microprocessors. Generally,
X86 is the term for Intel processors that comprises of 286, 386, 486, and 586 processors. In
modern times, the term "x86" is used to denote to any 32-bit processor that ensures backward
compatibility for x86 instruction set architectures. Since the full name of the processors is 80286,
80386, 80486, and 80586; therefore, the term x86 is short for 80x86. Typically, the "80" is used to
avoid redundancy.
Some of the highlights of the evolution of x86 architecture are:
1. 8080 – It was the world’s first general-purpose microprocessor. It was an 8-
bit machine, with an 8-bit data path to memory. It was used in the first
personal computer.
2. 8086 – It was a 16-bit machine and was far more powerful than the previous
one. It had a wider data path of 16-bits and larger registers along with an
instruction cache or queue that prefetches a few instructions before they are
executed. It is the first appearance of 8086 architecture. It has a real mode
and an addressable memory of 1 MB.
3. 80286 – It has an addressable memory of 16 MB instead of just 1 MB and
contains two modes-real mode and first-generation 16-bit protected mode. It
has a data transfer width of 16-bits and a programming model of 16-bits (16-
bits general purpose registers and 16-bit addressing).
4. 80386 – It was Intel’s first 32-bit machine. Due to its 32-bit architecture, it
was able to compete against the complexity and power of microcomputers
and mainframes introduced just a few years earlier. It was the first processor
to support multitasking and contained the 32-bit protected mode. It also
implemented the concept of paging (permitted 32-bit virtual memory address
to be translated into 32-bit physical memory address). It has an addressable
physical memory of 4 GB and a data transfer width of 32 bits.
5. 80486 – It introduced the concept of cache technology and instruction
pipelining. It contained a write protect feature and offered a built-in math co-
processor that offloaded complex math operations from the main CPU.
6. Pentium – The use of superscalar techniques was introduced as multiple
instructions started executing in parallel. The page size extension (PSE)
feature was added as a minor enhancement in paging.
7. Pentium Pro – It used register renaming, branch prediction, data flow
analysis, speculative execution, and more pipeline stages. Advanced
optimization techniques in microcode were also added along with level 2
cache. It implemented the second-generation address translation in which a
32-bit virtual address is translated into a 36-bit physical memory address.
Memory Capacity 1 MB 16 MB 4 GB 4 GB 4 GB
PC Type (IBM) PC – XT PC – AT PC – AT PC – AT PC – AT
Advantages:
Disadvantages:
Complex Instruction Set: The x86 architecture has a complex instruction set,
which makes it difficult to optimize code for performance. This complexity can
also make it harder to debug software and hardware issues.
Power Consumption: The evolution of x86 microprocessors has led to a
significant increase in power consumption. This has become a major issue in
mobile devices, where battery life is critical.
Heat Dissipation: As x86 processors have become more powerful, they have
also become hotter. This has led to the development of more sophisticated
cooling systems, which can add to the cost and complexity of systems.
Cost: The x86 architecture is licensed by Intel, which can make it more
expensive than other processor architectures that are available. This can be a
significant barrier to entry for smaller hardware and software vendors.
The hardwired control consists of a combinational circuit that outputs desired controls
for decoding and encoding functions. The instruction that is loaded in the IR is decoded
by the instruction decoder. If the IR is an 8-bit register, then the instruction decoder
generates 28 (256) lines.
Inputs to the encoder are given from the instruction step decoder, external inputs, and
condition codes. All these inputs are used and individual control signals are generated.
The end signal is generated after all the instructions get executed. Furthermore, it
results in the resetting of the control step counter, making it ready to generate the
control step for the next instruction.
The major goal of implementing the hardwired control is to minimize the cost of the
circuit and to achieve greater efficiency in the operation speed. Some of the methods
that have come up for designing the hardwired control logic are as follows −
There are the following steps followed by the microprogrammed control are −
It can execute any instruction. The CPU should divide it down into a set of
sequential operations. This set of operations are called microinstruction. The
sequential micro-operations need the control signals to execute.
Control signals saved in the ROM are created to execute the instructions on the
data direction. These control signals can control the micro-operations concerned
with a microinstruction that is to be performed at any time step.
The address of the microinstruction is executed next is generated.
The previous 2 steps are copied until all the microinstructions associated with the
instruction in the set are executed.
The address that is supported to the control ROM originates from the micro counter
register. The micro counter received its inputs from a multiplexer that chooses the
output of an address ROM, a current address incrementer, and an address that is
saved in the next address field of the current microinstruction.
Case study – design of a simple hypothetical CPU:
Microprocessing unit is synonymous to central processing unit, CPU used in traditional
computer. Microprocessor (MPU) acts as a device or a group of devices which do the
following tasks.
Block Diagram
ALU
The ALU perform the computing function of microprocessor. It includes the accumulator,
temporary register, arithmetic & logic circuit & and five flags. Result is stored in
accumulator & flags.
Block Diagram
Accumulator
It is an 8-bit register that is part of ALU. This register is used to store 8-bit data & in
performing arithmetic & logic operation. The result of operation is stored in accumulator.
Diagram
Flags
Flags are programmable. They can be used to store and transfer the data from the
registers by using instruction. The ALU includes five flip-flops that are set and reset
according to data condition in accumulator and other registers.
S (Sign) flag − After the execution of an arithmetic operation, if bit D 7 of the result
is 1, the sign flag is set. It is used to signed number. In a given byte, if D 7 is 1
means negative number. If it is zero means it is a positive number.
Z (Zero) flag − The zero flag is set if ALU operation result is 0.
AC (Auxiliary Carry) flag − In arithmetic operation, when carry is generated by
digit D3 and passed on to digit D4, the AC flag is set. This flag is used only
internally BCD operation.
P (Parity) flag − After arithmetic or logic operation, if result has even number of
1s, the flag is set. If it has odd number of 1s, flag is reset.
C (Carry) flag − If arithmetic operation result is in a carry, the carry flag is set,
otherwise it is reset.
Register section
It is basically a storage device and transfers data from registers by using instructions.
Stack Pointer (SP) − The stack pointer is also a 16-bit register which is used as
a memory pointer. It points to a memory location in Read/Write memory known as
stack. In between execution of program, sometime data to be stored in stack. The
beginning of the stack is defined by loading a 16-bit address in the stack pointer.
Program Counter (PC) − This 16-bit register deals with fourth operation to
sequence the execution of instruction. This register is also a memory pointer.
Memory location have 16-bit address. It is used to store the execution address.
The function of the program counter is to point to memory address from which
next byte is to be fetched.
Storage registers − These registers store 8-bit data during a program execution.
These registers are identified as B, C, D, E, H, L. They can be combined as
register pair BC, DE and HL to perform some 16 bit operations.
Time and Control Section
This unit is responsible to synchronize Microprocessor operation as per the clock pulse
and to generate the control signals which are necessary for smooth communication
between Microprocessor and peripherals devices. The RD bar and WR bar signals are
synchronous pulses which indicates whether data is available on the data bus or not.
The control unit is responsible to control the flow of data between microprocessor,
memory and peripheral devices.
PIN diagram
1 Address bus
The 8085 microprocessor has 8 signal
line, A15 - A8 which are uni directional and
used as a high order address bus.
2 Data bus
The signal line AD7 - AD0 are bi-
directional for dual purpose. They are
used as low order address bus as well as
data bus.
Control Signal
RD bar − It is a read control signal (active
low). If it is active then memory read the
data.
WR bar − It is write control signal (active
low). It is active when written into
selected memory.
Status signal
ALU (Address Latch Enable) − When
3 Control signal and Status signal ALU is high. 8085 microprocessor use
address bus. When ALU is low. 8085
microprocessor is use data bus.
IO/M bar − This is a status signal used to
differentiate between i/o and memory
operations. When it is high, it indicate an
i/o operation and when it is low, it indicate
memory operation.
S1 and S0 − These status signals, similar
to i/o and memory bar, can identify
various operations, but they are rarely
used in small system.
Instruction Format
Each instruction is represented by a sequence of bits within the computer. The
instruction is divided into group of bits called field. The way instruction is expressed is
known as instruction format. It is usually represented in the form of rectangular box. The
instruction format may be of the following types.
Advantage
These formats have good code density.
Drawback
These instruction formats are very difficult to decode and pipeline.
Advantage
They are easy to decode & pipeline.
Drawback
They don't have good code density.
Thus semiconductor devices are preferred as primary memory. With the rapid growth in
the requirement for semiconductor memories there have been a number of technologies
and types of memory that have emerged. Names such as ROM, RAM, EPROM,
EEPROM,etc.
Electronic semiconductor memory technology can be split into two main types
or categories, according to the way in which the memory operates :
There is a large variety of types of ROM and RAM that are available. These
arise from the variety of applications and also the number of technologies
available.
DRAM
Dynamic RAM is a form of random access memory. DRAM uses a capacitor to store
each bit of data, and the level of charge on each capacitor determines whether that bit
is a logical 1 or 0. However these capacitors do not hold their charge indefinitely, and
therefore the data needs to be refreshed periodically. As a result of this dynamic
refreshing it gains its name of being a dynamic RAM.
DRAM is the form of semiconductor memory that is often used in equipment including
personal computers and workstations where it forms the main RAM for the computer.
The semiconductor devices are normally available as integrated circuits for use in PCB
assembly in the form of surface mount devices or less frequently now as leaded
components.
Disadvantages of DRAM
SRAM
However they consume more power, they are less dense and more expensive
than DRAM. As a result of this SRAM is normally used for caches, while
DRAM is used as the main semiconductor memory technology.
For example, the BIOS of a computer will be stored in ROM. As the name
implies, data cannot be easily written to ROM. Depending on the technology
used in the ROM, writing the data into the ROM initially may require special
hardware. Although it is often possible to change the data, this gain requires
special hardware to erase the data ready for new data to be written in.
PROM
EPROM
window is normally covered by a label, especially when the data may need to
be preserved for an extended period.
EEPROM
Flash memory
Flash memory stores data in an array of memory cells. The memory cells are
made from floating-gate MOSFETS (known as FGMOS). These FG
MOSFETs (or FGMOS in short) have the ability to store an electrical charge
for extended periods of time (2 to 10 years) even without a connecting to a
power supply.
Memory organization:
The memory is organized in the form of a cell, each cell is able to be identified
with a unique number called address. Each cell is able to recognize control
signals such as “read” and “write”, generated by CPU when it wants to read or
write address. Whenever CPU executes the program there is a need to transfer
the instruction from the memory to CPU because the program is available in
memory. To access the instruction CPU generates the memory request.
Memory Request:
Memory request contains the address along with the control signals. For
Example, When inserting data into the stack, each block consumes memory
(RAM) and the number of memory cells can be determined by the capacity of a
memory chip.
Word Size: It is the maximum number of bits that a CPU can process at a time
and it depends upon the processor. Word size is a fixed size piece of data
handled as a unit by the instruction set or the hardware of a processor.
Word size varies as per the processor architectures because of generation and
the present technology, it could be low as 4-bits or high as 64-bits depending on
what a particular processor can handle. Word size is used for a number of
concepts like Addresses, Registers, Fixed-point numbers, Floating-point
numbers.
handles all the input- output operations of the computer system. Input or
output devices that are connected to computer are called peripheral
devices. There are three types of peripherals:
1. Input peripherals : Allows user input, from the outside world to the
computer. Example: Keyboard, Mouse etc.
Usually the program controls data transfer to and from CPU and peripheral. Transferring
data under programmed I/O requires constant monitoring of the peripherals by the CPU. It is
time-consuming as it keeps the CPU busy needlessly.
DMA(Direct Memory Access): DMA transfer is used for large data transfers.
Here, a memory bus is used by the interface to transfer data in & out of a memory
unit. The CPU provides starting address & number of bytes to be transferred to the
interface to initiate the transfer, after that it proceeds to execute other tasks. DMA
requests a memory cycle through the memory bus when the transfer is made. DMA
transfers the data directly into the memory when the request is granted by the
memory controller. To allow direct memory transfer(I/O), the CPU delays its
memory access operation. So, DMA allows I/O devices to directly access memory
with less intervention of the CPU.
interrupt occurred, so it is simply a matter of getting this address and make the
CPU continue to execute at this address.
When the parent process executes the fork system call model and it moves into a state
where it is ready to run (3 or 5) at that time the process enters the created state.
The scheduler will eventually pick the process and the process enters the 'kernel
running' state where it completes its part of the fork system call. After the completion of
the system call, it may move to the 'user running'. When interrupts occur (such as
system call), it again moves to the state 'kernel running'.
After completing the task of the interrupt the kernel may decide to schedule another
process to execute, so the first process enters the state 'preempted'. The state
preempted is actually same as the state ready to run in memory, but they are denoted
separately to stress that a process executing in kernel mode can be preempted only
when it is about to return to user mode. Consequently, the kernel could swap a process
from the state preempted if necessary. Eventually, it will return to the 'user running'
again.
When the system call is executed, it leaves the state user running and enters the state
kernel running. If in kernel mode, the process needs to sleep for some reason (such as
waiting for I/O), it enters the state asleep in memory. When the event on it which it has
slept, happens, the interrupt handler awakens the process, and it enters the state ready
to run in memory.
Suppose the system is executing many processes that do not set at the same time into
main memory, then the swapper (process 0) swaps out a process to make space for
another process that is in the state ready to run swapped. When forcefully takes from
main memory, the process enters the state ready to run swapped. Finally, swapper
chooses the process as most eligible to run and it re-enters the state ready to run in
memory. And then when it is scheduled, it will enter the state kernel running. When a
process completes and invokes exit system call, thus entering the states kernel running
and finally, the zombie state.
Some state transitions can be managed by the users, but not all. User can create a
process. But the user has no control over when a process transitions to sleeping in
memory to sleeping in the swap, or ready to run in memory to ready to run in the swap,
etc. A process can make a system call to transition itself to kernel running state. But it
has no control over when it will return from kernel mode. Finally, a process can exit
whenever it wants, but that is not the only reason for exit to be called.
Two kernel data structures describe the state of a process: the process table entry and
the u-area as we have studied already. The process table contains information that
should be accessible to the kernel and the u-area contains the information that should
be accessible to the process only when it's running. The kernel allocates space for u-
area only when creating a process. It does not need u-area for process table entries
that do not have processes. For example, the process table entries that contain the
information about the kernel context, do not have processes.
and CD writers. Small Computer System Interface is most commonly used for RAID,
servers, highly efficient desktop computers, and storage area networks. The Small
Computer System Interface has control, which is responsible for transmitting data across
the Small Computer System Interface bus and the computers. It can be fixed on
a motherboard, or one client adapter is installed through an extension on the
computer's motherboard. The controller also incorporates a simple SCSI input/output
system, which is a small chip that provides access and control equipment with the
necessary software. The SCSI ID is his number. Using serial storage architecture
initiators, new serial SCSI IDs such as serial attached SCSI use an automatic process
which assigns a 7-bit number.
USB:
A USB is a common computer port, which shorts for Universal Serial Bus and allows
communication between a computer and peripheral or other devices. It is the most common
interface used in today's computers, which can be used to connect printers,
scanners, keyboards, mice, game controllers, digital cameras, external hard drives and flash
drives. The USB has replaced a wide range of interfaces like the parallel and serial port because
it is used for a wide variety of uses as well as offers better support for electrical power. With a
single USB port, up to 127 peripherals can be connected with the help of a few USB hubs,
although that will need quite a bit of dexterity.
In modern times, to connect with the computer, there are many different USB devices.
Some common are as follows:
o Keyboard
o Smartphone
o Tablet
o Webcams
o Keypad
o Microphone
o Mouse
o Joystick
In modern times, all computers contain at least one USB port in different locations.
Below, a list is given that contains USB port locations on the devices that may help
you out to find them.
o Laptop computer: A laptop computer may contain one to four ports on the left
or right side, and some laptops have on the behind of the laptop computer.
o Desktop computer: Usually, a desktop computer has 2 to 4 USB ports in the
front and 2 to 8 ports on the backside.
o Tablet computer: On the tablet, a USB connection is situated in the charging
port and is sometimes USB-C and usually micro USB.
o Smartphone: In the form of micro USB or USB-C, a USB port is used for both
data transfer and charging, similar to tablets on smartphones.
Module-IV:
Pipelining:
Pipelining defines the temporal overlapping of processing. Pipelines are emptiness
greater than assembly lines in computing that can be used either for instruction
processing or, in a more general method, for executing any complex operations. It can
be used efficiently only for a sequence of the same task, much similar to assembly
lines.
A basic pipeline processes a sequence of tasks, including instructions, as per the
following principle of operation −
Each task is subdivided into multiple successive subtasks as shown in the figure. For
instance, the execution of register-register instructions can be broken down into
instruction fetch, decode, execute, and writeback.
Stage 1:
Stage 1 is the instruction fetch. Here, an instruction is read from memory (I-Memory).
In this stage, a program counter is used to contain the value of memory. With the help
of incrementing the current PC, we are able to compute the NPC. The pipeline registers
will be updated by writing the instruction into PR. The process of instruction fetch stage
is described as follows:
Stage 2:
Stage 2 is the instruction decodes stage. Here instruction is decoded, and control
signals are generated for the opcode bits. In this stage, the source operands are read
from the register file. With the help of specifiers, the indexing of register file is done. The
pipeline register will send the operands and immediate value to the next stage. It will
also pass NPC and control signals to the next stage. The process of instruction decoder
stage is described as follows:
Stage 3:
Stage 3 is the Instruction executes stage. The ALU (arithmetical logical unit) operations
are performed in this stage. It takes the two operands to perform the ALU operation.
The first operand is used to contain the content of a register, and the second operand is
used to contain either immediate value or a register. In this stage, the branch target can
be computed with the help of following formula:
The pipeline register (PR) will update the ALU result, branch target, control signals, and
destination. The process of instruction execution is described as follows:
Stage 4:
Stage 4 is the Memory Access stage. Here, memory operands are able to read and
write to/from memory, which exists in the instruction. The pipeline register (PR) will
update the ALU result from execution, destination register, and loaded data from D-
Memory. The process of memory access is described as follows:
Stage 5:
Stage 5 is the Write Back stage. Here, the fetched value is written back to the register,
which exists in the instruction. This stage only needs one write part, which can either be
used to write the loaded data into the register file or to write the result of ALU into the
destination register.
ET pipeline = k + n -1 cycle
= (k + n -1) Tp
In a non-pipeline processor, when we take the same case and try to execute the 'n'
instructions, then the time taken to do this is described as follows:
ET non-pipeline = n * k * Tp
So, when we perform 'n' tasks on the same processor, the speedup (S) of pipeline
processor over non-pipelined processor is described as follows:
processor
As the execution time and the performance of process is inversely proportional to each
other. So we have the following relation:
S = ET non-pipeline / ET pipeline
S = [n * k] / [k + n - 1]
The following relation will contain if the n number of tasks is larger than 'k' that means n
>>k.
S = n * k / n
S = k
Here S is used to show the max speed up, and Smax is used to indicate the Maximum
speed up.
So, Efficiency = S / k
So throughput = n / (k + n + 1) * Tp
Note: For the ideal pipeline processor, the value of Cycle per instruction (CPI) is 1.
Pipeline Conflicts
The performance of pipelines is affected by various factors. Some of the factors are
described as follows:
Timing Variations
We know that the pipeline cannot take same amount of time for all the stages. The
problem related to timing variation will occur if we are processing some instructions
where different instructions need different operands and take different processing
times.
Data Hazards
The problem of data hazards will occur when there is parallel execution of several
instructions, and these instructions reference the same data. So we should be careful
that the next instruction does not try to access the data which is already accessed by the
current instruction. If this situation arises, it will generate the incorrect result.
Branching
We should know about the instruction before we try to fetch and execute the next
instruction. Suppose there is a case in which the current instruction contains the
conditional branch, and the result of this instruction will lead to the next instruction. In
this case, we will not be able to know the next instruction as long as the current
instruction is proceeding.
Interrupts
Because of the interrupts, the unwanted instruction will be set into the instruction
stream. The execution of instruction is also affected by the interrupt.
Data dependency
The situation of data dependency will occur when the result of previous instruction will
lead to the next instruction, and this result is not yet available.
Advantage of Pipelining
o The pipeline has the ability to increase the throughput of the system.
o We can use the pipeline in a modern processor. It is also used to reduce the cycle time of
processor.
o It is used to make the system reliable.
o It is used to arrange the hardware so that it can perform more than one operation at
once.
o Suppose there is an application that wants to repeat the same task with some other set
of data in many time. In this case, this technique will be very efficient.
Disadvantage of Pipelining
o In this pipeline, the instruction latency is more.
o The process of designing a pipeline is very costly and complex because it contains
additional hardware.
Pipeline hazards:
Pipeline hazards are conditions that can occur in a pipelined machine that impede the execution
of a subsequent instruction in a particular cycle for a variety of reasons.
There are three kinds of hazards:
Structural Hazards
Data Hazards
Control Hazards
There are many specific solutions to dependencies. The simplest is introducing a bubble which
stalls the pipeline and reduces the throughput. The bubble makes the next instruction wait until
the earlier instruction is done with.
Structural Hazards
Structural hazards arise due to hardware resource conflict amongst the instructions in
the pipeline. A resource here could be the Memory, a Register in GPR or ALU. This
resource conflict is said to occur when more than one instruction in the pipe is requiring
access to the same resource in the same clock cycle. This is a situation that the hardware
cannot handle all possible combinations in an overlapped pipelined execution.
Data Hazards
Data hazards in pipelining emerge when the execution of one instruction is dependent
on the results of another instruction that is still being processed in the pipeline. The
order of the READ or WRITE operations on the register is used to classify data threats
into three groups.
Control Hazards
Branch hazards are caused by branch instructions and are known as control hazards in
computer architecture. The flow of program/instruction execution is controlled by branch
Parallel Processors:
Parallel processing can be described as a class of techniques which enables the system
to achieve simultaneous data-processing tasks to increase the computational speed of a
computer system.
Cache Coherence
A cache coherence issue results from the concurrent operation of several processors and
the possibility that various caches may hold different versions of the identical memory
block. The practice of cache coherence makes sure that alterations in the contents of
associated operands are quickly transmitted across the system.
o Write Through
o Write Back
Write Through
The easiest and most popular method is to write through. Every memory write operation
updates the main memory. If the word is present in the cache memory at the requested
address, the cache memory is also updated simultaneously with the main memory.
The benefit of this approach is that the RAM and cache always hold the same
information. In systems with direct memory access transfer, this quality is crucial. It
makes sure the information in the main memory is up-to-date at all times so that a
device interacting over DNA can access the most recent information.
Write Back
Only the catch location is changed during a write operation in this approach. When the
word is withdrawn from the cache, the place is flagged, so it is replicated in the main
memory. The right-back approach was developed because words may be updated
numerous times while they are in the cache. However, as long as they are still there, it
doesn't matter whether the copy that is stored in the main memory is outdated because
requests for words are fulfilled from the cache.
An accurate copy must only be transferred back to the main memory when the word is
separated from the cache. According to the analytical findings, between 10% and 30%
of all memory references in a normal program are written into memory.
The important terms related to the data or information stored in the cache as well as in
the main memory are as follows:
o Modified - The modified term signifies that the data stored in the cache and
main memory are different. This means the data in the cache has been modified,
and the changes need to be reflected in the main memory.
o Exclusive - The exclusive term signifies that the data is clean, i.e., the cache and
the main memory hold identical data.
o Shared - Shared refers to the fact that the cache value contains the most current
data copy, which is then shared across the whole cache as well as main memory.
o Owned - The owned term indicates that the block is currently held by the cache
and that it has acquired ownership of it, i.e., complete privileges to that specific
block.
o Invalid - When a cache block is marked as invalid, it means that it needs to be
fetched from another cache or main memory.
A multicore processor is an integrated circuit that has two or more processor cores
attached for enhanced performance and reduced power consumption. These
processors also enable more efficient simultaneous processing of multiple tasks, such
as with parallel processing and multithreading.
In Hyper-Threading one core contains 1 or more threads and that threads again
behave like logical CPU’s. Because of it, several processes can run on different
threads and improve the performance of the system.
Advantages
Multitasking: If system supports Hyper-Threading then several processes
can run simultaneously on the different threads of the CPU’s core.
Resource Utilization: Using Hyper-Threading resources can’t be stay idle
and utilized in great extent, because of several simultaneously running
processes that can need resources. Because of this, the system’s overall
efficiency and performance increases.
Responsiveness: Concurrent execution helps to decrease the response
time for user.
Disadvantages
If many processes are running simultaneously because of Hyper-Threading,
and these processes compete for limited resources, such as cache memory
then resources may get blocked.
Because of many running processes, resources are shared among them,
which can lead to increase the contention and delays in input/output of
resources.
If Thread(s) per Core is greater than 1 then we can say that this system
support the Hyper-Threading. Overall, Hyper-Threading improve the
performance of multi-threaded applications on systems in modern
computers via concurrent execution.
The multiple CPUs in the system are in close communication, which shares a common
bus, memory, and other peripheral devices. So we can say that the system is tightly
coupled. These systems are used when we want to process a bulk amount of data, and
these systems are mainly used in satellite, weather forecasting, etc.
There are cases when the processors are identical, i.e., homogenous, in terms of their
functionality in multiple-processor scheduling. We can use any processor available to
run any process in the queue.
Processor Affinity
Processor Affinity means a process has an affinity for the processor on which it is
currently running. When a process runs on a specific processor, there are certain effects
on the cache memory. The data most recently accessed by the process populate the
cache for the processor. As a result, successive memory access by the process is often
satisfied in the cache memory.
Module-V:
Memory organization:
1. Registers
Registers are small, high-speed memory units located in the CPU. They are
used to store the most frequently used data and instructions. Registers have
the fastest access time and the smallest storage capacity, typically ranging from
16 to 64 bits.
2. Cache Memory
Cache memory is a small, fast memory unit located close to the CPU. It stores
frequently used data and instructions that have been recently accessed from
the main memory. Cache memory is designed to minimize the time it takes to
access data by providing the CPU with quick access to frequently used data.
3. Main Memory
Main memory, also known as RAM (Random Access Memory), is the primary
memory of a computer system. It has a larger storage capacity than cache
memory, but it is slower. Main memory is used to store data and instructions
that are currently in use by the CPU.
4. Secondary Storage
Secondary storage, such as hard disk drives (HDD) and solid-state drives
(SSD), is a non-volatile memory unit that has a larger storage capacity than
main memory. It is used to store data and instructions that are not currently in
use by the CPU. Secondary storage has the slowest access time and is
typically the least expensive type of memory in the memory hierarchy.
5. Magnetic Disk
Magnetic Disks are simply circular plates that are fabricated with either a metal
or a plastic or a magnetized material. The Magnetic disks work at a high speed
inside the computer and these are frequently used.
6. Magnetic Tape
Level 1 2 3 4
Secondary
Name Register Cache Main Memory
Memory
less than 16
Size <1 KB <16GB >100 GB
MB
0.25ns to
Access Time 0.5 to 25ns 80ns to 250ns 50 lakh ns
0.5ns
20000 to 1
Bandwidth 5000 to 15000 1000 to 5000 20 to 150
lakh MB
Level 1 2 3 4
Operating Operating
Managed by Compiler Hardware
System System
Cache memory:
Cache Memory is a special very high-speed memory. The cache is a smaller
and faster memory that stores copies of the data from frequently used main
memory locations. There are various different independent caches in a CPU,
which store instructions and data. The most important use of cache memory is
that it is used to reduce the average time to access data from the main
memory.
Cache Memory
Levels of Memory
Level 1 or Register: It is a type of memory in which data is stored and
accepted that are immediately stored in the CPU. The most commonly used
register is Accumulator, Program counter, Address Register, etc.
Level 2 or Cache memory: It is the fastest memory that has faster access
time where data is temporarily stored for faster access.
Level 3 or Main Memory: It is the memory on which the computer works
currently. It is small in size and once power is off data no longer stays in this
memory.
Level 4 or Secondary Memory: It is external memory that is not as fast as
the main memory but data stays permanently in this memory.
Cache Performance
When the processor needs to read or write a location in the main memory, it
first checks for a corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a Cache
Hit has occurred and data is read from the cache.
If the processor does not find the memory location in the cache, a cache
miss has occurred. For a cache miss, the cache allocates a new entry and
copies in data from the main memory, then the request is fulfilled from the
contents of the cache.
The performance of cache memory is frequently measured in terms of a
quantity called Hit ratio.
Hit Ratio(H) = hit / (hit + miss) = no. of hits/total accesses
Miss Ratio = miss / (hit + miss) = no. of miss/total accesses = 1 -
hit ratio(H)
We can improve Cache performance using higher cache block size, and higher
associativity, reduce miss rate, reduce miss penalty, and reduce the time to hit
in the cache.
Cache Mapping
There are three different types of mapping used for the purpose of cache
memory which is as follows:
Direct Mapping
Associative Mapping
Set-Associative Mapping
1. Direct Mapping
The simplest technique, known as direct mapping, maps each block of main
memory into only one possible cache line. or In Direct mapping, assign each
memory block to a specific line in the cache. If a line is previously taken up by a
memory block when a new block needs to be loaded, the old block is trashed.
An address space is split into two parts index field and a tag field. The cache is
used to store the tag field whereas the rest is stored in the main memory. Direct
mapping`s performance is directly proportional to the Hit ratio.
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
Direct Mapping
For purposes of cache access, each main memory address can be viewed as
consisting of three fields. The least significant w bits identify a unique word or
byte within a block of main memory. In most contemporary machines, the
address is at the byte level. The remaining s bits specify one of the 2 s blocks of
main memory. The cache logic interprets these s bits as a tag of s-r bits (the
most significant portion) and a line field of r bits. This latter field identifies one of
the m=2r lines of the cache. Line offset is index bits in the direct mapping.
2. Associative Mapping
In this type of mapping, associative memory is used to store the content and
addresses of the memory word. Any block can go into any line of the cache.
This means that the word id bits are used to identify which word in the block is
needed, but the tag becomes all of the remaining bits. This enables the
placement of any word at any place in the cache memory. It is considered to be
the fastest and most flexible mapping form. In associative mapping, the index
bits are zero.
3. Set-Associative Mapping
This form of mapping is an enhanced form of direct mapping where the
drawbacks of direct mapping are removed. Set associative addresses the
problem of possible thrashing in the direct mapping method. It does this by
saying that instead of having exactly one line that a block can map to in the
cache, we will group a few lines together creating a set. Then a block in
memory can map to any one of the lines of a specific set. Set-associative
mapping allows each word that is present in the cache can have two or more
words in the main memory for the same index address. Set associative cache
mapping combines the best of direct and associative cache mapping
techniques. In set associative mapping the index bits are given by the set offset
bits. In this case, the cache consists of a number of sets, each of which consists
of a number of lines.
.
Application of Cache Memory
Here are some of the applications of Cache Memory.
1. Primary Cache: A primary cache is always located on the processor chip.
This cache is small and its access time is comparable to that of processor
registers.
2. Block Size: Block size is the unit of information changed between cache
and main memory. As the block size will increase from terribly tiny to larger
sizes, the hit magnitude relation can initially increase as a result of the
principle of locality.the high chance that knowledge within the neck of the
woods of a documented word square measure possible to be documented
within the close to future. As the block size increases, a lot of helpful
knowledge square measure brought into the cache. The hit magnitude
relation can begin to decrease, however, because the block becomes even
larger and also the chance of victimization the new fetched knowledge
becomes but the chance of reusing the information that ought to be
abstracted of the cache to form area for the new block.
Block size= Cache block size = cache line size = line size
Mapping functions:
Mapping functions are a group of functions that could be applied successively to one or
more lists of elements. The results of applying these functions to a list are placed in a
new list and that new list is returned.
For example, the mapcar function processes successive elements of one or more lists.
The first argument of the mapcar function should be a function and the remaining
arguments are the list(s) to which the function is applied.
The argument function is applied to the successive elements that results into a newly
constructed list. If the argument lists are not equal in length, then the process of
mapping stops upon reaching the end of the shortest list. The resulting list will have the
same number of elements as the shortest input list.
Replacement algorithms:
Caching is the process of storing some data near where It's supposed to be used rather
than accessing them from an expensive origin, every time a request comes in.
Cache replacement algorithms do just that. They decide which objects can stay and
which objects should be evicted.
LRU
The least recently used (LRU) algorithm is one of the most famous cache replacement
algorithms and for good reason!
As the name suggests, LRU keeps the least recently used objects at the top and evicts
objects that haven't been used in a while if the list reaches the maximum capacity.
So it's simply an ordered list where objects are moved to the top every time they're
accessed; pushing other objects down.
LRU is simple and providers a nice cache-hit rate for lots of use-cases.
LFU
the least frequently used (LFU) algorithm works similarly to LRU except it keeps track of
how many times an object was accessed instead of how recently it was accessed.
Each object has a counter that counts how many times it was accessed. When the list
reaches the maximum capacity, objects with the lowest counters are evicted.
LFU has a famous problem. Imagine an object was repeatedly accessed for a short
period only. Its counter increases by a magnitude compared to others so it's very hard
to evict this object even if it's not accessed for a long time.
FIFO
This algorithm randomly selects an object when it reaches maximum capacity. It has the
benefit of not keeping any reference or history of objects and being very simple to
implement at the same time.
This algorithm has been used in ARM processors and the famous Intel i860.
Write policies:
A cache's write policy is the behavior of a cache while performing a write operation. A cache's
write policy plays a central part in all the variety of different characteristics exposed by the
cache.
To make sure write operation is performed carefully, we can adopt two cache write methods:
1. Write-through policy
2. Write-back policy
Write-through policy
Write-through policy is the most commonly used methods of writing into the cache memory.
In write-through method when the cache memory is updated simultaneously the main memory is
also updated. Thus at any given time, the main memory contains the same data which is
available in the cache memory.
Write-back policy
Write-back policy can also be used for cache writing.
During a write operation only the cache location is updated while following write-back method.
When update in cache occurs then updated location is marked by a flag. The flag is known as
modified or dirty bit.
When the word is replaced from the cache, it is written into main memory if its flag bit is set. The
logic behind this technique is based on the fact that during a cache write operation, the word
present in the cache may be accessed several times (temporal locality of reference). This
method helps reduce the number of references to main memory.
The only limitation to write-back technique is that inconsistency may occur while adopting this
technique due to two different copies of the same data, one in cache and other in main memory.