CTE 433 Computer Architecture II
CTE 433 Computer Architecture II
CTE 433 Computer Architecture II
Microprocessing unit is synonymous to central processing unit, CPU used in traditional computer.
Microprocessor (MPU) acts as a device or a group of devices which do the following tasks.
8085 Microprocessor
The 8085 microprocessor is an 8-bit general purpose microprocessor which is capable to address 64k of
memory. This processor has forty pins, requires +5 V single power supply and a 3-MHz single-phase clock.
ALU: The ALU perform the computing function of microprocessor. It includes the accumulator, temporary
register, arithmetic & logic circuit & and five flags. Result is stored in accumulator & flags.
Accumulator: It is an 8-bit register that is part of ALU. This register is used to store 8-bit data & in
performing arithmetic & logic operation. The result of operation is stored in accumulator.
Flags: Flags are programmable. They can be used to store and transfer the data from the registers by using
instruction. The ALU includes five flip-flops that are set and reset according to data condition in
accumulator and other registers.
S (Sign) flag − After the execution of an arithmetic operation, if bit D7 of the result is 1, the sign flag
is set. It is used to signed number. In a given byte, if D7 is 1 means negative number. If it is zero
means it is a positive number.
Z (Zero) flag − The zero flag is set if ALU operation result is 0.
AC (Auxiliary Carry) flag − In arithmetic operation, when carry is generated by digit D3 and passed
on to digit D4, the AC flag is set. This flag is used only internally BCD operation.
P (Parity) flag − After arithmetic or logic operation, if result has even number of 1s, the flag is set. If
it has odd number of 1s, flag is reset.
C (Carry) flag − If arithmetic operation result is in a carry, the carry flag is set, otherwise it is reset.
Register section: It is basically a storage device and transfers data from registers by using instructions.
Stack Pointer (SP) − The stack pointer is also a 16-bit register which is used as a memory pointer. It
points to a memory location in Read/Write memory known as stack. In between execution of
program, sometime data to be stored in stack. The beginning of the stack is defined by loading a 16-
bit address in the stack pointer.
Program Counter (PC) − This 16-bit register deals with fourth operation to sequence the execution
of instruction. This register is also a memory pointer. Memory location have 16-bit address. It is used
to store the execution address. The function of the program counter is to point to memory address
from which next byte is to be fetched.
1
Storage registers − These registers store 8-bit data during a program execution. These registers are
identified as B, C, D, E, H, L. They can be combined as register pair BC, DE and HL to perform
some 16 bit operations.
Time and Control Section: This unit is responsible to synchronize Microprocessor operation as per the
clock pulse and to generate the control signals which are necessary for smooth communication between
Microprocessor and peripherals devices. The RD bar and WR bar signals are synchronous pulses which
indicates whether data is available on the data bus or not. The control unit is responsible to control the flow
of data between microprocessor, memory and peripheral devices.
INSTRUCTION FORMAT
Each instruction is represented by a sequence of bits within the computer. The instruction is divided into
group of bits called field. The way instruction is expressed is known as instruction format. It is usually
represented in the form of rectangular box. The instruction format may be of the following types.
These are the instruction formats in which the instruction length varies on the basis of opcode & address
specifiers.
For Example, VAX instruction vary between 1 and 53 bytes while X86 instruction vary between 1 and 17
bytes.
In this type of instruction format, all instructions are of same size. For Example, MIPS, Power PC, Alpha,
ARM.
Advantage of Fixed Instruction Format: They are easy to decode & pipeline.
Drawback of Fixed Instruction Format: They don't have good code density.
2
HYBRID INSTRUCTION FORMATS
In this type of instruction formats, we have multiple format length specified by opcode. For example, IBM
360/70, MIPS 16, Thumb.
These compromise between code density & instruction of these type are very easy to decode.
ADDRESSING MODES
Addressing mode provides different ways for accessing an address to given data to a processor. Operated
data is stored in the memory location, each instruction required certain data on which it has to operate. There
are various techniques to specify address of data. These techniques are called Addressing Modes.
Direct addressing mode − In the direct addressing mode, address of the operand is given in the
instruction and data is available in the memory location which is provided in instruction. We will
move this data in desired location.
Indirect addressing mode − In the indirect addressing mode, the instruction specifies a register
which contain the address of the operand. Both internal RAM and external RAM can be accessed via
indirect addressing mode.
Immediate addressing mode − In the immediate addressing mode, direct data is given in the
operand which move the data in accumulator. It is very fast.
Relative addressing mode − In the relative address mode, the effective address is determined by the
index mode by using the program counter in stead of general purpose processor register. This mode
is called relative address mode.
Index addressing mode − In the index address mode, the effective address of the operand is
generated by adding a content value to the contents of the register. This mode is called index address
mode.
3
The major steps in the instruction cycle are:
1. Fetch: In the fetch cycle, the CPU retrieves the instruction from memory. The instruction is typically
stored at the address specified by the program counter (PC). The PC is then incremented to point to the
next instruction in memory.
2. Decode: In the decode cycle, the CPU interprets the instruction and determines what operation needs to
be performed. This involves identifying the opcode and any operands that are needed to execute the
instruction.
3. Execute: In the execute cycle, the CPU performs the operation specified by the instruction. This may
involve reading or writing data from or to memory, performing arithmetic or logic operations on data, or
manipulating the control flow of the program.
4. There are also some additional steps that may be performed during the instruction cycle, depending on
the CPU architecture and instruction set:
5. Fetch operands: In some CPUs, the operands needed for an instruction are fetched during a separate
cycle before the execute cycle. This is called the fetch operands cycle.
6. Store results: In some CPUs, the results of an instruction are stored during a separate cycle after the
execute cycle. This is called the store results cycle.
7. Interrupt handling: In some CPUs, interrupt handling may occur during any cycle of the instruction
cycle. An interrupt is a signal that the CPU receives from an external device or software that requires
immediate attention. When an interrupt occurs, the CPU suspends the current instruction and executes an
interrupt handler to service the interrupt.
INTERRUPTS
An interrupt is a signal from a device attached to a computer or from a program within the computer that
requires the operating system to stop and figure out what to do next.
Interrupt systems are nothing but while the CPU can process the programs if the CPU needs any IO
operation. Then, it is sent to the queue and it does the CPU process. Later on Input/output (I/O) operation is
ready.
The I/O devices interrupt the data which is available and does the remaining process; like that interrupts are
useful. If interrupts are not present, the CPU needs to be in idle state for some time, until the IO operation
needs to complete. So, to avoid the CPU waiting time interrupts are coming into picture.
Processor handle interrupts
Whenever an interrupt occurs, it causes the CPU to stop executing the current program. Then, comes the
control to interrupt handler or interrupt service routine.
These are the steps in which ISR handles interrupts. These are as follows −
Step 1 − When an interrupt occurs let assume processor is executing i' th instruction and program counter
will point to the next instruction (i+1)th.
Step 2 − When an interrupt occurs the program value is stored on the process stack and the program counter
is loaded with the address of interrupt service routine.
Step 3 − Once the interrupt service routine is completed the address on the process stack is popped and
placed back in the program counter.
Step 4 − Now it executes the resume for (i+1)th line.
Types of interrupts
There are two types of interrupts which are as follows −
4
HARDWARE INTERRUPTS
The interrupt signal generated from external devices and i/o devices are made interrupt to CPU when the
instructions are ready.
Hardware interrupts are classified into two types which are as follows −
Maskable Interrupt − The hardware interrupts that can be delayed when a highest priority interrupt
has occurred to the processor.
Non Maskable Interrupt − The hardware that cannot be delayed and immediately be serviced by the
processor.
SOFTWARE INTERRUPTS
The interrupt signal generated from internal devices and software programs need to access any system call
then software interrupts are present.
Software interrupt is divided into two types. They are as follows −
Normal Interrupts − The interrupts that are caused by the software instructions are called software
instructions.
Exception − Exception is nothing but an unplanned interruption while executing a program. For
example − while executing a program if we got a value that is divided by zero is called an exception.
Von-Neumann Model
Von-Neumann proposed his computer architecture design in 1945 which was later known as Von-Neumann
Architecture. It consisted of a Control Unit, Arithmetic, and Logical Memory Unit (ALU), Registers and
Inputs/Outputs.
Von Neumann architecture is based on the stored-program computer concept, where instruction data and
program data are stored in the same memory. This design is still used in most computers produced today.
Arithmetic and Logic Unit (ALU): The Arithmetic and Logic Unit (ALU) performs the required micro-
operations for executing the instructions. In simple words, ALU allows arithmetic (add, subtract, etc.) and
logic (AND, OR, NOT, etc.) operations to be carried out.
Control Unit: The Control Unit of a computer system controls the operations of components like ALU,
memory and input/output devices. The Control Unit consists of a program counter that contains the address
of the instructions to be fetched and an instruction register into which instructions are fetched from memory
for execution.
Registers: Registers refer to high-speed storage areas in the CPU. The data processed by the CPU are
fetched from the registers.
The following is the list of registers that plays a crucial role in data processing.
Registers Description
MAR (Memory Address This register holds the memory location of the data that needs to be accessed.
Register)
MDR (Memory Data This register holds the data that is being transferred to or from memory.
Register)
AC (Accumulator) This register holds the intermediate arithmetic and logic results.
PC (Program Counter) This register contains the address of the next instruction to be executed.
CIR (Current Instruction This register contains the current instruction during processing.
Register)
Buses: Buses are the means by which information is shared between the registers in a multiple-register
configuration system. A bus structure consists of a set of common lines, one for each bit of a register,
through which binary information is transferred one at a time. Control signals determine which register is
selected by the bus during each particular register transfer.
Von-Neumann Architecture comprised of three major bus systems for data transfer.
Bus Description
Address Bus Address Bus carries the address of data (but not the data) between the
processor and the memory.
Data Bus Data Bus carries data between the processor, the memory unit and the
input/output devices.
Memory Unit
6
A memory unit is a collection of storage cells together with associated circuits needed to transfer
information in and out of the storage. The memory stores binary information in groups of bits called words.
The internal structure of a memory unit is specified by the number of words it contains and the number of
bits in each word.
RISC ARCHITECTURE
Reduced Instruction Set Computing (RISC): RISC design, which emphasizes simplicity, efficiency, and
streamlined execution, marks a change from the prior complex instruction set computing
(CISC) methodology.
In RISC, the length of the instruction format is fixed. It took one-word memory. The fixed size of the
instruction format benefits the program counter as it knows that the next instruction starts from where due to
the fixed length of all instructions. In RISC, each instruction requires only one clock execution cycle. In
addition, RISC architectures are designed to be highly scalable and accommodate a more significant number
of instructions.
The hardware of RISC architecture is designed to execute the instruction quickly, which is possible because
of the more precise and smaller number of instructions and a large number of registers.
The processor uses a cache to reduce the access time to the main
memory. The instruction cache is beneficial for retrieving and
storing the data of frequently used instructions. It speeds up the
process of instruction execution. The data cache provides storage for
frequently used data from the main memory.
The hallmark principles of RISC architecture are summarized in these five points:
Single-cycle execution: In most traditional central processing unit (CPU) designs, the peak possible
execution rate is one instruction per basic machine cycle. For a given technology, the cycle time is
determined by the slowest instruction in the instruction set. RISC designs aim to execute most
instructions in a single cycle, increasing the processor's overall speed.
Load-store architecture: RISC architectures use a load-store architecture, meaning only load and store
instructions can access memory. All other instructions must operate on data in registers. This simplifies
the instruction set and reduces the number of memory accesses required.
7
Simple instructions: RISC architectures use simple instructions that can be executed quickly. This
reduces the complexity of the processor and allows it to operate at a higher clock speed.
Large register set: RISC architectures have many registers, which reduces the number of memory
accesses required and allows for more efficient use of the processor's resources.
Pipelining: RISC architectures use pipelining to increase the speed of instruction execution. Pipelining
allows multiple instructions to be executed simultaneously, increasing the processor's overall
throughput.
Simplified instruction set: RISC architecture uses a small instruction set that is highly optimized and
simple; instruction executes quickly.
Format length is fixed: In RISC architecture format length of instruction is fixed, which makes the
execution or decoding of instructions faster.
Register-based architecture: It stores data in the register within the processor, which is frequently
used. This improves the performance because it reduces the number of memory access.
Fewer cycles: In RISC architecture, the instruction requires less number of cycles to execute because of
the simple instruction set.
Load-Store architecture: RISC architecture uses load-store architecture, which means memory access
differs from logical and arithmetic operations. Therefore, the processor’s resources are used more
efficiently, improving performance.
Complex instructions and addressing modes: It isn't easy to process complex instructions and
complex addressing modes in the RISC architecture.
Direct memory-to-memory transfer: It uses load-store architecture. Hence, it doesn't allow a direct
memory-to-memory transfer.
Increase in the program length: RISC architecture has a small and simple instruction set. However, it
requires more instruction to operate CISC architecture, increasing the program's length.
VECTOR ARCHITECTURE
Vector architecture includes instruction set extensions to an ISA to support vector operations, which are
deeply pipelined. Data is transferred between a vector register and the memory system. Each vector
operation takes vector registers or a vector register and a scalar value as input.
Vector architecture can only be effective on applications that have significant Data-Level Parallelism (DLP).
Vector processing advantages greatly reduces the dynamic instruction bandwidth. Generally execution time
is reduced due to:
8
(2) Stalls only occurring on the first- vector element rather than on each vector element, and
Vector architectures grab sets of data elements scattered about memory, place them into large, sequential
register files, operate on data in those register files, and then disperse the results back into memory. A single
instruction operates on vectors of data, which results in dozens of register–register operations on
independent data elements.
VMIPS
The primary components of the instruction set architecture of VMIPS are the following:
■ Vector registers—Each vector register is a fixed-length bank holding a single vector. VMIPS has eight
vector registers, and each vector register holds 64 elements, each 64 bits wide. The vector register file needs
to provide enough ports to feed all the vector functional units. These ports will allow a high degree of
overlap among vector operations to different vector registers. The read and write ports, which total at least
16 read ports and 8 write ports, are connected to the functional unit inputs or outputs by a pair of crossbar
switches.
■ A Set of Scalar Registers—Scalar registers can also provide data as input to the vector functional units,
as well as compute addresses to pass to the vector load/store unit. These are the normal 32 general-purpose
registers and 32 floating-point registers of MIPS. One input of the vector functional units latches scalar
values as they are read out of the scalar register file.
9
THE CONTROL UNIT
The Control Unit is the part of the computer’s central processing unit (CPU), which directs the operation of
the processor.
It coordinates the sequence of data movements into, out of, and between a processor’s many sub-
units.
It interprets instructions.
It controls data flow inside the processor.
It receives external instructions or commands to which it converts to sequence of control signals.
It controls many execution units(i.e. ALU, data buffers and registers) contained within a CPU.
It also handles multiple tasks, such as fetching, decoding, execution handling and storing results.
Types of Control Unit
Hardwired
Micro programmable control unit.
In the Hardwired control unit, the control signals that are important for instruction execution control are
generated by specially designed hardware logical circuits, in which we cannot modify the signal generation
method without physical change of the circuit structure. The operation code of an instruction contains the
10
basic data for control signal generation. In the instruction decoder, the operation code is decoded. The
instruction decoder constitutes a set of many decoders that decode different fields of the instruction opcode.
The fundamental difference between these unit structures and the structure of the hardwired control unit is
the existence of the control store that is used for storing words containing encoded control signals mandatory
for instruction execution. In microprogrammed control units, subsequent instruction words are fetched into
the instruction register in a normal way. However, the operation code of each instruction is not directly
decoded to enable immediate control signal generation but it comprises the initial address of a microprogram
contained in the control store.
Efficient instruction execution: A well-designed control unit can execute instructions more
efficiently by optimizing the instruction pipeline and minimizing the number of clock cycles required
for each instruction.
Improved performance: A well-designed control unit can improve the performance of the CPU by
increasing the clock speed, reducing the latency, and improving the throughput.
Support for complex instructions: A well-designed control unit can support complex instructions
that require multiple operations, reducing the number of instructions required to execute a program.
Improved reliability: A well-designed control unit can improve the reliability of the CPU by
detecting and correcting errors, such as memory errors and pipeline stalls.
Lower power consumption: A well-designed control unit can reduce power consumption by
optimizing the use of resources, such as registers and memory, and reducing the number of clock
cycles required for each instruction.
Better branch prediction: A well-designed control unit can improve branch prediction accuracy,
reducing the number of branch mispredictions and improving performance.
11
Improved scalability: A well-designed control unit can improve the scalability of the CPU,
allowing it to handle larger and more complex workloads.
Better support for parallelism: A well-designed control unit can better support parallelism,
allowing the CPU to execute multiple instructions simultaneously and improve overall performance.
Improved security: A well-designed control unit can improve the security of the CPU by
implementing security features such as address space layout randomization and data execution
prevention.
Lower cost: A well-designed control unit can reduce the cost of the CPU by minimizing the number
of components required and improving manufacturing efficiency.
Reduced performance: A poorly-designed control unit can reduce the performance of the CPU by
introducing pipeline stalls, increasing the latency, and reducing the throughput.
Increased complexity: A poorly-designed control unit can increase the complexity of the CPU,
making it harder to design, test, and maintain.
Higher power consumption: A poorly-designed control unit can increase power consumption by
inefficiently using resources, such as registers and memory, and requiring more clock cycles for each
instruction.
Reduced reliability: A poorly-designed control unit can reduce the reliability of the CPU by
introducing errors, such as memory errors and pipeline stalls.
Limitations on instruction set: A poorly-designed control unit may limit the instruction set of the
CPU, making it harder to execute complex instructions and limiting the functionality of the CPU.
Inefficient use of resources: A poorly-designed control unit may inefficiently use resources such as
registers and memory, leading to wasted resources and reduced performance.
Limited scalability: A poorly-designed control unit may limit the scalability of the CPU, making it
harder to handle larger and more complex workloads.
Poor support for parallelism: A poorly-designed control unit may limit the ability of the CPU to
support parallelism, reducing the overall performance of the system.
Security vulnerabilities: A poorly-designed control unit may introduce security vulnerabilities, such
as buffer overflows or code injection attacks.
Higher cost: A poorly-designed control unit may increase the cost of the CPU by requiring
additional components or increasing the manufacturing complexity.
Instruction set
A set of codes that can only be understood by a processor of the computer or CPU is known as an instruction
set. These codes and machine languages are generally present as 1s and 0s. The movements of bits and bytes
are controlled by these instruction sets present in the processor.
12
1. Reduced instruction set computer (RISC): RISC has only a few cycles per instruction. It has a simpler
form than a complex set of instructions. RISC is also used in many supercomputers.
2. Minimal instruction set computers (MISC): A few codes and a set of instructions are basic for any
processor. They also include sub-codes. As a result, they are smaller and faster. A disadvantage of MISC is
that it has more sequential dependencies.
3. Complex instruction set computer (CISC): CISC is a set of instructions with a few instructions per
program. A CISC has fewer instructions than RISC.
4. Explicitly parallel instruction computing (EPIC): This is an instruction set that permits
microprocessors that help to execute instructions in parallel software. EPIC intends to give a simpler
performance.
5. Very long instruction word (VLIW): VLIW exploits parallelism at the instruction level. By this set of
instructions, instructions are processed in sequence only in the CPU. This set of instructions improves the
performance of the CPU.
6. Zero instruction set computer (ZISC): The instructions that do not include microinstructions are known
as ZISC. They are based on the pattern matching and can be compared to networks of synapses and
neurons.
7. One instruction set computer (OISC): The OISC set of instructions uses only one instruction for a
machine language. This set of instructions is used to teach computer architecture and to compute structural
computing research.
Bus structure
In computer architecture, a bus is a subsystem that transfers data between components inside a computer, or
between computers. Early computer buses were literally parallel electrical wires with multiple connections,
but Modern computer buses can use both parallel and bit serial connections.
Parts of a System bus: Processor, memory, Input and output devices are connected by system bus, which
consists of separate busses as shown in figure 1.3.2. They are:
(i)Address bus: Address bus is used to carry the address. It is unidirectional bus. The address is sent to from
CPU to memory and I/O port and hence unidirectional. It consists of 16, 20, 24 or more parallel signal lines.
(ii)Data bus: Data bus is used to carry or transfer data to and from memory and I/O ports. They are
bidirectional. The processor can read on data lines from memory and I/O port and as well as it can write data
to memory. It consists of 8, 16, 32 or more parallel signal lines.
(iii)Control bus: Control bus is used to carry control signals in order to regulate the control activities. They
are bidirectional. The CPU sends control signals on the control bus to enable the outputs of addressed
memory devices or port devices.
14
Single bus structure
When a word of data is transferred
between units, all its bits are transferred
in parallel, that is, the bits are transferred
simultaneously over many wires, or lines,
one bit per line. A group of lines that
serves as a connecting path for several
devices is called a bus. In addition to the lines that carry the data, the bus must have lines for address and
control purposes. The simplest way to interconnect functional units is to use a single bus as shown. All units
are connected to this bus. Because the bus can be used for only one transfer at a time, only two units can
actively use the bus at any given time. Bus control lines are used to arbitrate multiple requests for use of the
bus. The main virtue of the single-bus structure is its low cost and is flexibility for attaching peripheral"
devices.
In single bus organization, only one data item can be transferred over the bus in a clock cycle. To reduce the
number of steps needed, most commercial processors provide multiple internal paths that enable several
transfers to take place in parallel.
BUS TYPES
Faster CPUs
Increasing software demands
Greater multimedia requirements
ISA (Industry Standard Architecture): The 8-bit version cane on the original PC and the AT, but the
15
latter uses an extension to make it 16-bit. It has a maximum data transfer rate of about 8 megabits per second
on an AT, which is actually well above the capability of disk drives, or most network and video cards. Its
design makes it difficult to mix 8- and 16-bit RAM or ROM within the same 128K block of upper memory.
EISA (Extended Industry Standard Architecture): An evolution of ISA and (theoretically) backward
compatible with it, including the speed (8 MHz), so the increased data throughput is mainly due to the bus
doubling in size-but you must use EISA expansion cards. On advantage of EISA (and Micro Channel) is the
ease of setting up expansion cards-plus them in and run the configuration software which will automatically
detect their settings.
MCA (Micro Channel Architecture): It is incompatible with anything else. It comes in two versions, 16-
and 32-bit and, in practical terms, is capable of transferring around 20 mbps.
Local Bus: The local bus is one more directly suited to the CPU; it's next door (hence local), has the same
bandwidth and runs at the same speed, so the bottleneck is less (ISA was local in the early days). Data is
therefore moved along the bus at processor speeds. There are two varieties:
VL-BUS, a 32-bit bus which allows bus mastering, and uses two cycles to transfers a 32-bit word,
peaking at 66 Mb/sec.
PCI, which is a mezzanine bus, divorced from the CPU, giving it some independence and the ability
to cope with more devices, so it's more suited to cross-platform work. It is time multiplexed,
meaning that address and data lines share connections.
Memory Hierarchy is
one of the most
required things
in Computer
Memory as it helps in
optimizing the
memory available in
the computer. There
are multiple levels
present in the
memory, each one
having a different
size, different cost, etc.
2. Internal Memory or Primary Memory: Comprising of Main Memory, Cache Memory & CPU
registers. This is directly accessible by the processor.
1. Registers: Registers are small, high-speed memory units located in the CPU. They are used to store the
most frequently used data and instructions. Registers have the fastest access time and the smallest storage
capacity, typically ranging from 16 to 64 bits.
2. Cache Memory: Cache memory is a small, fast memory unit located close to the CPU. It stores
frequently used data and instructions that have been recently accessed from the main memory. Cache
memory is designed to minimize the time it takes to access data by providing the CPU with quick access to
frequently used data.
3. Main Memory: Main memory, also known as RAM (Random Access Memory), is the primary memory
of a computer system. It has a larger storage capacity than cache memory, but it is slower. Main memory is
used to store data and instructions that are currently in use by the CPU.
Static RAM: Static RAM stores the binary information in flip flops and information remains valid
until power is supplied. It has a faster access time and is used in implementing cache memory.
Dynamic RAM: It stores the binary information as a charge on the capacitor. It requires refreshing
circuitry to maintain the charge on the capacitors after a few milliseconds. It contains more memory
cells per unit area as compared to SRAM.
4. Secondary Storage: Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD), is a
non-volatile memory unit that has a larger storage capacity than main memory. It is used to store data and
instructions that are not currently in use by the CPU. Secondary storage has the slowest access time and is
typically the least expensive type of memory in the memory hierarchy.
5. Magnetic Disk: Magnetic Disks are simply circular plates that are fabricated with either a metal or a
plastic or a magnetized material. The Magnetic disks work at a high speed inside the computer and these are
frequently used.
6. Magnetic Tape: Magnetic Tape is simply a magnetic recording device that is covered with a plastic film.
It is generally used for the backup of data. In the case of a magnetic tape, the access time for a computer is a
little slower and therefore, it requires some amount of time for accessing the strip.
Capacity: It is the global volume of information the memory can store. As we move from top to
bottom in the Hierarchy, the capacity increases.
17
Access Time: It is the time interval between the read/write request and the availability of the data.
As we move from top to bottom in the Hierarchy, the access time increases.
Performance: Earlier when the computer system was designed without a Memory Hierarchy design,
the speed gap increased between the CPU registers and Main Memory due to a large difference in
access time. This results in lower performance of the system and thus, enhancement was required.
This enhancement was made in the form of Memory Hierarchy Design because of which the
performance of the system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to manipulate data.
Cost Per Bit: As we move from bottom to top in the Hierarchy, the cost per bit increases i.e. Internal
Memory is costlier than External Memory.
It helps in removing some destruction, and managing the memory in a better way.
It helps in spreading the data all over the computer system.
It saves the consumer’s price and time.
According to the memory Hierarchy, the system-supported memory standards are defined below:
Level 1 2 3 4
Name Register Cache Main Memory Secondary
Memory
Size <1 KB less than 16 MB <16GB >100 GB
Implementation Multi-ports On-chip/SRAM DRAM (capacitor Magnetic
memory)
Access Time 0.25ns to 0.5ns 0.5 to 25ns 80ns to 250ns 50 lakh ns
Bandwidth 20000 to 1 lakh 5000 to 15000 1000 to 5000 20 to 150
MB
Managed by Compiler Hardware Operating System Operating System
Backing From cache from Main from Secondary from ie
Mechanism Memory Memory
COMPUTER REGISTERS
Registers are a type of computer memory used to quickly accept, store, and transfer data and instructions
that are being used immediately by the CPU. The registers used by the CPU are often termed as Processor
registers.
A processor register may hold an instruction, a storage address, or any data (such as bit sequence or
individual characters). The computer needs processor registers for manipulating data and a register for
holding a memory address. The register holding the memory location is used to calculate the address of the
next instruction after the execution of the current instruction is completed.
Following is the list of some of the most common registers used in a basic computer:
18
Register Symbol Number of bits Function
Data register DR 16 Holds memory operand
Address register AR 12 Holds address for the memory
Accumulator AC 16 Processor register
Instruction register IR 16 Holds instruction code
Program counter PC 12 Holds address of the instruction
Temporary register TR 16 Holds temporary data
Input register INPR 8 Carries input character
Output register OUTR 8 Carries output character
The following image shows the register and memory configuration for a basic computer.
VIRTUAL MEMORY
Virtual Memory is a storage allocation scheme in which secondary memory can be addressed as though it
were part of the main memory. The addresses a program may use to reference memory are distinguished
from the addresses the memory system uses to identify physical storage sites and program-generated
addresses are translated automatically to the corresponding machine addresses.
The size of virtual storage is limited by the addressing scheme of the computer system and the amount of
secondary memory available not by the actual number of main storage locations.
It is a technique that is implemented using both hardware and software. It maps memory addresses used by a
program, called virtual addresses, into physical addresses in computer memory.
1. All memory references within a process are logical addresses that are dynamically translated
into physical addresses at run time. This means that a process can be swapped in and out of the main
19
memory such that it occupies different places in the main memory at different times during the
course of execution.
2. A process may be broken into a number of pieces and these pieces need not be continuously located
in the main memory during execution. The combination of dynamic run-time address translation and
the use of a page or segment table permit this.
Demand Paging
The process of loading the page into memory on demand (whenever a page fault occurs) is known as
demand paging. The process includes the following steps are as follows:
If the CPU tries to refer to a page that is currently not available in the main memory, it generates
an interrupt indicating a memory access fault.
The OS puts the interrupted process in a blocking state. For the execution to proceed the OS must
bring the required page into the memory.
The OS will search for the required page in the logical address space.
The required page will be brought from logical address space to physical address space. The page
replacement algorithms are used for the decision-making of replacing the page in physical
address space.
The page table will be updated accordingly.
The signal will be sent to the CPU to continue the program execution and it will place the process
back into the ready state.
Hence whenever a page fault occurs these steps are followed by the operating system and the required page
is brought into memory.
More processes may be maintained in the main memory: Because we are going to load only some
of the pages of any particular process, there is room for more processes. This leads to more efficient
utilization of the processor because it is more likely that at least one of the more numerous processes
will be in the ready state at any particular time.
A process may be larger than all of the main memory: One of the most fundamental restrictions
in programming is lifted. A process larger than the main memory can be executed because of
demand paging. The OS itself loads pages of a process in the main memory as required.
It allows greater multiprogramming levels by using less of the available (primary) memory for each
process.
It has twice the capacity for addresses as main memory.
It makes it possible to run more applications at once.
Users are spared from having to add memory modules when RAM space runs out, and applications
are liberated from shared memory management.
When only a portion of a program is required for execution, speed has increased.
Memory isolation has increased security.
It makes it possible for several larger applications to run at once.
Memory allocation is comparatively cheap.
It doesn’t require outside fragmentation.
It is efficient to manage logical partition workloads using the CPU.
20
Automatic data movement is possible.
It can slow down the system performance, as data needs to be constantly transferred between the
physical memory and the hard disk.
It can increase the risk of data loss or corruption, as data can be lost if the hard disk fails or if there is
a power outage while data is being transferred to or from the hard disk.
It can increase the complexity of the memory management system, as the operating system needs to
manage both physical and virtual memory.
Swapping
MEMORY MANAGEMENT
TECHNIQUES
Contiguous memory
management schemes
Non-Contiguous memory
management schemes
The main memory is a combination of two main portions- one for the operating system and other for the
user program. We can implement/achieve contiguous memory allocation by dividing the memory
partitions into fixed size partitions.
In a Contiguous memory management scheme, each program occupies a single contiguous block of storage
locations, i.e., a set of memory locations with consecutive addresses.
The Single contiguous memory management scheme is the simplest memory management scheme used in
the earliest generation of computer systems. In this scheme, the main memory is divided into two contiguous
areas or partitions. The operating systems reside permanently in one partition, generally at the lower
memory, and the user process is loaded into the other partition.
Simple to implement.
Easy to manage and design.
In a Single contiguous memory management scheme, once a process is loaded, it is given full
processor's time, and no other processor will interrupt it.
Disadvantages of Single contiguous memory management schemes:
Wastage of memory space due to unused memory as the process is unlikely to use all the
available memory space.
The CPU remains idle, waiting for the disk to load the binary image into the main memory.
It can not be executed if the program is too large to fit the entire available main memory space.
It does not support multiprogramming, i.e., it cannot handle multiple programs simultaneously.
Multiple Partitioning
22
The single Contiguous memory management scheme is inefficient as it limits computers to execute only one
program at a time resulting in wastage in memory space and CPU time. The problem of inefficient CPU use
can be overcome using multiprogramming that allows more than one program to run concurrently. To switch
between two processes, the operating systems need to load both processes into the main memory. The
operating system needs to divide the available main memory into multiple parts to load multiple processes
into the main memory. Thus multiple processes can reside in the main memory simultaneously.
Fixed Partitioning
Dynamic Partitioning
Fixed Partitioning: The main memory is divided into several fixed-sized partitions in a fixed partition
memory management scheme or static partitioning. These partitions can be of the same size or different
sizes. Each partition can hold a single process. The number of partitions determines the degree of
multiprogramming, i.e., the maximum number of processes in memory. These partitions are made at the
time of system generation and remain fixed after that.
Simple to implement.
Easy to manage and design.
Disadvantages of Fixed Partitioning memory management schemes:
Simple to implement.
Easy to manage and design.
Disadvantages of Dynamic Partitioning memory management schemes:
In a Non-Contiguous memory management scheme, the program is divided into different blocks and loaded
at different portions of the memory that need not necessarily be adjacent to one another. This scheme can be
classified depending upon the size of blocks and whether the blocks reside in the main memory or not.
Paging: Paging is a technique that eliminates the requirements of contiguous allocation of main memory. In
this, the main memory is divided into fixed-size blocks of physical memory called frames. The size of a
frame should be kept the same as that of a page to maximize the main memory and avoid external
fragmentation.
23
Advantages of paging:
Paging
Paging is a memory management technique in which process address space is broken into blocks of the
same size called pages (size is power of 2, between 512 bytes and 8192 bytes). The size of the process is
measured in the number of pages. Similarly, main memory is divided into small fixed-sized blocks of
(physical) memory called frames and the size of a frame is kept the same as that of a page to have optimum
utilization of the main memory and to avoid external fragmentation.
Similarly, main memory is divided into small fixed-sized blocks of (physical) memory called frames and the
size of a frame is kept the same as that of a page to have optimum utilization of the main memory and to
avoid external fragmentation.
Segmentation
Segmentation is a memory management technique in which each job is divided into several segments of
different sizes, one for each module that contains pieces that perform related functions. Each segment is
actually a different logical address space of the program. When a process is to be executed, its
corresponding segmentation are loaded into non-contiguous memory though every segment is loaded into a
contiguous block of available memory. Segmentation memory management works very similar to paging but
here segments are of variable-length where as in paging pages are of fixed size.
A program segment contains the program's main function, utility functions, data structures, and so on. The
operating system maintains a segment map table for every process and a list of free memory blocks along
with segment numbers, their size and corresponding memory locations in main memory. For each segment,
the table stores the starting address of the segment and the length of the segment. A reference to a memory
location includes a value that identifies a segment and an offset.
Paged segmentation and segmented paging are two methods that operating systems use to manage computer
memory allocation and address translation.
Paged segmentation is a hybrid memory management scheme that combines segmentation and paging.
Memory is divided into segments of different sizes, which are further divided into pages of fixed sizes. A
segment table tracks segment location and size, while a page table tracks page location and corresponding
24
physical addresses. When a program requests memory, the operating system allocates a segment of the
required size and divides it into pages.
Modern operating systems manage the memory of a computer system using a technique called paged
segmentation. It is a technique for segmenting a computer's main memory into fixed-size pages and
managing the mapping between logical addresses and physical locations using a table.
Depending on the hardware platform and operating system, the page size can change, however, it is
commonly 4KB or 8KB.
The operating system keeps track of the mapping between logical addresses and physical addresses
via the page table. An operating system must fetch the page from secondary storage (such as the hard
disc) and load it into the main memory when a program attempts to access a page that isn't currently
in the main memory. Page switching or paging is the name of this procedure.
The usage of virtual memory, a method that lets programmes access more memory than is actually
accessible in the system, is made possible via paged segmentation. When the main memory is full
and a new page needs to be loaded, the operating system must first make room for it by evicting one
from the main memory.
Based on several factors, including the page's size, age, and frequency of access, the page
replacement algorithm chooses which page to remove.
Overall, paged segmentation is an effective method for controlling a computer system's memory
since it allows for the efficient use of memory and the usage of virtual memory.
Segmented Paging
A memory management strategy that combines segmentation with paging is called segmented paging.
Memory is partitioned into segments of different sizes after being originally segmented into pages of set
sizes. A segment table tracks segment location and associated physical addresses, whereas a page table
tracks page location and size. The operating system allows a page of the requisite size and breaks it into
segments when a program demands memory.
It assigns each page to a corresponding page frame in physical memory and divides a process's
logical address space into segments of varying widths.
To translate logical addresses into physical addresses and to offer granular memory security against
unauthorized access, page tables are used.
By allocating memory in short, variable-sized segments, segmented paging makes optimum use of
the available memory and prevents fragmentation, enabling effective memory management.
25
PARALLEL COMPUTING
It is the use of multiple processing elements simultaneously for solving any problem. Problems are broken
down into instructions and are solved concurrently as each resource that has been applied to work is working
at the same time.
1. It saves time and money as many resources working together will reduce the time and cut potential
costs.
3. It can take advantage of non-local resources when the local resources are finite.
4. Serial Computing ‘wastes’ the potential computing power, thus Parallel Computing makes better
work of the hardware.
Types of Parallelism:
1. Bit-level parallelism – It is the form of parallel computing which is based on the increasing
processor’s size. It reduces the number of instructions that the system must execute in order to
perform a task on large-sized data.
2. Instruction-level parallelism – A processor can only address less than one instruction for each
clock cycle phase. These instructions can be re-ordered and grouped which are later on executed
concurrently without affecting the result of the program. This is called instruction-level parallelism.
3. Task Parallelism – Task parallelism employs the decomposition of a task into subtasks and then
allocating each of the subtasks for execution. The processors perform the execution of sub-tasks
concurrently.
4. Data-level parallelism (DLP) – Instructions from a single stream operate concurrently on several
data – Limited by non-regular data manipulation patterns and by memory bandwidth
The whole real-world runs in dynamic nature i.e. many things happen at a certain time but at
different places concurrently. This data is extensively huge to manage.
Real-world data needs more dynamic simulation and modeling, and for achieving the same, parallel
computing is the key.
Complex, large datasets, and their management can be organized only and only using parallel
computing’s approach.
Ensures the effective utilization of the resources. The hardware is guaranteed to be used effectively
whereas in serial computation only some part of the hardware was used and the rest rendered idle.
26
Also, it is impractical to implement real-time systems using serial computing.
FAULT-TOLERANT COMPUTING
Fault-tolerance is the process of working of a system in a proper way in spite of the occurrence of the
failures in the system..
Any system has two major components – Hardware and Software. Fault may occur in either of it. So there
are separate techniques for fault-tolerance in both hardware and software.
1. BIST – BIST stands for Build in Self Test. System carries out the test of itself after a certain period
of time again and again, that is BIST technique for hardware fault-tolerance. When system detects a
fault, it switches out the faulty component and switches in the redundant of it. System basically
reconfigure itself in case of fault occurrence.
2. TMR – TMR is Triple Modular Redundancy. Three redundant copies of critical components are
generated and all these three copies are run concurrently. Voting of result of all redundant copies are
done and majority result is selected. It can tolerate the occurrence of a single fault at a time.
27
result obtained is different from each processing. The idea of n-version programming is basically to
get the all errors during development only.
2. Recovery Blocks – Recovery blocks technique is also like the n-version programming but in
recovery blocks technique, redundant copies are generated using different algorithms only. In
recovery block, all the redundant copies are not run concurrently and these copies are run one by
one. Recovery block technique can only be used where the task deadlines are more than task
computation time.
3. Check-pointing and Rollback Recovery – This technique is different from above two techniques of
software fault-tolerance. In this technique, system is tested each time when we perform some
computation. This technique is basically useful when there is processor failure or data corruption.
PIPELINING
Pipelining refers to a technique of decomposing a sequential process into sub-operations, with each sub-
operation being executed in a dedicated segment that operates concurrently with all other segments.
The most important characteristic of a pipeline technique is that several computations can be in progress in
distinct segments at the same time. The overlapping of computation is made possible by associating a
register with each segment in the pipeline. The registers provide isolation between each segment so that each
can operate on distinct data simultaneously.
The structure of a pipeline organization can be represented simply by including an input register for each
segment followed by a combinational circuit.
Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream as well.
Most of the digital computers with complex instructions require instruction pipeline to carry out operations
like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence of steps.
28