Computer Organisation and Architecture (1)
Computer Organisation and Architecture (1)
weightage : 9 - 12 marks
(i) Introduction
(ii) Components of computer
(iii) types of registers
(iv) instruction cycle
(v) memory concept
(vi) byte & word addressable
(vii) system bus
(viii)byte ordering
chapter - 4 (pipelining)
Weightage Page 2
(i) Memory concept
(ii) types of memory organization
(iii) cache memory
- cache organization
- mapping technique
- replacement algorithm
- updating technique and multi level cache
Weightage Page 3
topic - 1 (introduction)
Monday, July 8, 2024 4:45 AM
Introduction of COA
Note :
First digital computer was ENIAC in 1943
(Electronic numerical integrator and computer)
COA
Note :
intel x86 share the same architecture but ogranisation is different(how
features are implemented that is different)
intel x86
80186
80286 family
80386
80486
80586
primary/main memory
2. memory
secondary/auxillary memory
Input device
3. I/O
Output device
Register : stores the data (temporary storage)(register present inside the CPU)
(fastest) (made with flip-flop; flip-flop is 1 bit storage device)
types of registers
memory buffer register : hold the
Special purpose register types : instructions or data
why MBR/DR/MDR(data)?
connected to the data line of
the system bus
8. Accumlator (AC)
Program counter : When instruction is fetched (fetch cycle executed) then PC denotes the starting address of next
instruction.
Address : MAR/AR
1. Memory read
Data : MBR
Address : MAR/AR
2. Memory write
Data : MBR
Instruction cycle
(OR)
(OR)
1. Fetch cycle : to fetch (bring) the instruction from main memory to CPU. (doesn't care
(works as a what is the instruction)
postman)
Memory CPU(IR)
instruction store hota hai
and at the end of fetch cycle program counter is incremented. memory mai aur execute
hota hai CPU mein
instruction is stored in memory location 1000
1000 I1 : Load [6000] (load : memory read)
address AC M[6000] address
accumulator mai memory location 6000 ka
data(instruction) daaldo (it loads in IR)
fetched instruction
is loaded into the IR
(memory)
2. Execute cycle : the objective of the execute cycle is to execute (to process) the fetch instruction.
it decodes; does the analysis of the instruction. (what is OPCODE, how many
operand, operand address calculation, operand fetch, processing, result storage)
Chapter - 1 (intro to COA) Page 9
operand, operand address calculation, operand fetch, processing, result storage)
(memory)
AC AC + M[6000]
and data [6000]
with the help
I1 : Load [6000]
(operand fetch)
of system buses
Step 1 : at the beginning of each instruction cycle the processor fetches an instruction
from memory.
Step 2 : the program counter (PC) holds the address of next instruction to be fetched next.
Step 3 : The processor increments the PC after each instruction fetch so that it will fetch the
next instruction in sequence.
Step 4 : The fetched instruction is loaded into the instruction register (IR)
Step 5 : The processor interpret the instruction and performs the action
execute cycle
fetch cycle
start
fetch cycle - when CPU encounter the interrupt then it push the PC (program
counter) value into the stack as a return address and control
transfer to ISR (interrupt subroutines)
execute cycle
service the
stack
interrupt
why PC value Push into stack?
because stack works in LIFO
the value you are leaving at last will be the first we get.
PC value
returns
insert
different
(X)
address
(Y)
I3 fetch PC:1002 becomes 1003
PC moves to next instruction
I1 : 1000 PC : 1003
I2 : 1001 when we will go to service the
I3 : 1002 (interrupt) interrupt we will be going before
I4 : 1003 telling the stack in CPU about
the next instruction so we will
continue the next time from here stack(1003)
after execution of I3 we have to
service interrupt (interrupt after servicing the interrupt i will
will have any address (call) POP the PC value from stack. stack
where we have to service) so PC
will save the address of
interrupt and after servicing
the interrupt we return to I4
memory
cells cells
8 bit 1 word
8 bit 1 word 1 cell size = 1 word
1 cell size = 8 bit
1 word = word size
8 bit = 1 byte
8 bit 1 word word size = 32 bit.
1 word = 32 bit.
8 bit 1 word
8 bit 1 word
8 bit 1 word
note:
word size is given in the question
I1 32 bit 2000
I2 32 bit 2001
2004 8 bit
2005 8 bit
I2 32 bit
2006 8 bit
2007 8 bit
starting address : 2000
8 bit when I2 executing means I2 fetch then PC denote
2008
next instruction starting address
2009 8 bit so, the value of PC will be 2008
2010 8 bit
2011 8 bit
Questions :
stack
- memory is a storage element in the computer which store instruction and data.
- memory is organised into equal parts, each parts is called cells.
- each cell is identified by a unique number called as address.
byte addressable
cells
instructions
and data 8 bit 00000000
1 cell size = 8 bit instructions 8 bit
and data
every cell storing
the operation that needs to be
8 bit data instructions
and data 8 bit done in that particular cell is
instructed by control signal
instructions
and data 8 bit
instructions
and data 8 bit
instructions
and data 8 bit 11111111 Address
every cell is identified with
a unique number that is
called address
Address :
reach Data : read/write Control line : work
through will carry that need to be perfomed
address line through data line
note :
word size = number of Data lines = data bus width
(data register size)(MBR/MDR/DR)
word (unit of data a/c to CPU) CPU talks in word unit.
note :
MAR/AR register size = number of address lines = address bus width
ex: ex.
32bit processor 24bit address
word size = 32bit Address line = 24bit
Data line = 32bit Address Bus = 24bit
Data bus = 32bit MAR = 24bit
MBR = 32bit AR = 24bit
DR/MDR = 32bit PC = 24bit
ALU = 32bit
AC = 32bit
Register = 32bit
Address line : specify the capacity of the memory (0000 - 4bit) (oo1o oo11 - 8bit)
Data line : specify the capacity of data (cell size)
or in some places :
NxM
N : total number of memory cells ( memory location)
Chapter - 1 (intro to COA) Page 19
N : total number of memory cells ( memory location)
M : each sell size
to represent any of the cells (among 2n) n bit address are required.
(or)
n bit address can represent 2n memory cells
1 cell ka size
N = 2n
0 m bit
1 m bit
n = .. bit address
. line
m bit
. total 2n
. m bit memory cells
.
. m bit x
kisi bhi 'x' cell ka
2n - 1 m bit
address n-bit
(n:address line)
memory 0 se start mai likha jayega
hoti hai islie - 1
o to 7 = 8
n = address (4bit)
m = cell size N = # memory cells
8bit 16 byte
0000 8bit 1 byte (group of 8 bits)
for example : 16 byte meri memory hai 0001 . .
.
total 2n mere cells hai 0010
.
. .
16 ke lie 4 bit lagega 0100 . .
n = 4 bit
. .
aur 1 byte means 8 bit 1000
. . address line
1001 .
. .
1010
. .
16 byte = 2n x m
0101
. . 24 [16 cells]
0110 . .
24 x 8bit 0111 . .
.
4 bit ki address line .
.
.
8 bit ki data line 1011 . .
. .
1101 .
.
n = 4 bit (address line) 1100 . .
. .
m = 8 bit (data line) 1110
.
8bit
N = 24 =16 memory locations/cells 1111 1 byte
cell size
8bit
n = 2 (capacity of memory) address line. cell 0
m = 8 (cell size/cell capacity) data line.
cell 1
N = 4 (total cells) 22 [4cells]
cell 2
N = 2n memory location/cells
N = 22 = 4 memory location/cells cell 3
m bit
memory = 2nx m
each cell size = m bit m bit
total 2n memory cells
m bit
total 2n
to represent any of the cells among 2 n m bit memory cells
n bit address are required.
m bit
m bit
Questions :
Q.1. Memory : 1k byte
n = 10 (address line)
m = 8 (data line) 10bit
N = 1024 (210) N = 210
0000000000 0 8 bit
.
.
. 1 8 bit
.
. .
8 bit
.
.
. total 210
. . 8 bit memory cells
.
. .
.
. . 8 bit x
.
1111111111 210 - 1 8 bit
n = 3 (address line)
m = 8 (data line) 3bit
N = 8 (23) N = 23 = 8 cells
0x0 000 0 8 bit
.
.
. 1 8 bit
.
. .
. 8 bit
. . total 23
.
.
. 8 bit memory cells
. .
.
. . 8 bit x
.
. .
8 bit
.
.
.
. . 8 bit kisi bhi 'x' cell ka
. address 3-bit
.
23 - 1 (n:address line)
0 x7 111 8 bit mai likha jayega
n = address (3bit)
n = 16 (address line)
m = 8 (data line)
N = 65536 (216) N = 216 = 65536 cells
(1hex = 4bit)
n = address (16bit)
n = 30 (address line)
m = 8 (data line)
N = 1G (230) N = 230 =1G
0000 0000 0000 .... 0000 00000000 0 8 bit
0 0 0 0
.
.
.
1 8 bit
. .
. 8 bit
. . total 230
.
.
. 8 bit memory cells
.
.
.
. . 8 bit x
.
. .
. 8 bit
.
.
.
.
. 8 bit kisi bhi 'x' cell ka
.
address 30-bit
1111 1111 1111 ..... 1111 3FFFFFFF 230 - 1 8 bit
(n:address line)
mai likha jayega
F F F F
n = address (30bit)
n = address (30bit)
hexa decimal
symbol : Oxor ()16
8 bit 1 word
8 bit 1 word
8 bit 1 word
8 bit
cell size = 1 word
8 bit size of word is given
in computer
8 bit - word addressable
8 bit
8 bit
1. Byte addressable memory : when the cell size is 8 bit then corresponding address is byte addressable.
4x8=2bit address
16x8=4bit address
32x8=5bit address
512x8=9bit address
1kx8=10bit address
1mx8=20bit address
2nx8=nbit address
2. word addressable memory : when the cell size is given in the form of words (depends on word length)
the corresponding address is word addressable.
4xw=2bit address
16xw=4bit address
32xw=5bit address
64xw=6bit address
cell size must be a word depends upon
the word length of the processor
256xw=8bit address
512xw=9bit address
1kxw=10bit address
1mxw=20bit address
2nxw=nbit address
memory
64kb 64kb
- but default data type in the CPU is words. so the operation are performed on a CPU according
to word format.
- to synchronise memory data type (byte) with CPU data type (word) memory interfacing
will be adjusted by the designer
- according to the word length of the CPU, to access the data from memory to CPU (multiple
byte access). kaunsa uthega : byte ordering
if my processor is working on 32 bits then i can say that my processor can perform operation on
32bit at a time but in memory chip it is stored in 8bit format so we have pick multiple memory cells.
kaunsi byte uthani hai vo byte ordering (endian's mechanism) batata hai.
Questions :
2G x 2byte = 2 30 . 21 = 231
4Gbyte
4Gbyte
2byte = 2Gwords 231 = 31 words
if only 128Mbyte
28.220 x 8bit
128Mbyte
128mb ko 4byte banana hai (1word)
kyuki word addressable hai
225 word
note :
8 bit : byte addressable
multiple of 8 : word addressable
Q. consider a 64 bit hypothetical processor which support 512Gbyte memory. system is enhanced
(new design) with a word addressable memory. then how many address line are required in the
new system?
512Gbyte
= 230 x 29 x 8 bit
= 239
1 word = 64bit
1word = 8byte
Chapter - 1 (intro to COA) Page 29
1word = 8byte
512Gbyte
512gb ko 8byte banana hai (1word)
kyuki word addressable hai
236 word
Q. consider a 64 bit hypothetical processor which support 4Gwords memory. system is enhanced (new design)
with a byte addressable memory. then how many address line are required in the new system?
note :
in the processor design operation are always perfomed on word format so when the word length of the
CPU/Processor is greater than 8bit then multiple byte (cells) accessing is required to process the data
parallely.
Q. how to fill memory ( starts from 1000 and 4 byte. 1 word = 4byte)
cells cells
1000 1007
I1 1001
1002
I2 1006
1005
1003 1004
1004 1003
I2 1005
1006
I1 1002
1001
1007 1000
note :
memory always in increasing order.
system bus
system bus is collection of lines/wires which are used to provide the communication (transmission media)
between the components of the computer [CPU, I/O, memory, etc].
address line
data line
control line
system bus
note :
one line carry 1 bit of data at any point of time.
note :
Address lines are uni-directional (from CPU to memory and I/O or components of system)
address memory
CPU and I/O
note :
Data lines are bi-directional(from CPU to memory and I/O and vice-versa)
data
memory
CPU
data
and I/O
note :
Control lines are Individual uni-directional (from CPU to memory and I/O and vice-versa)
individual
read signal
wires
memory
CPU through control lines
and I/O
interrupt
Control signal
2 types of architecture
computer : it is a computational device used to process the data under the control of program.
input output
computer
program
instructions
program
data
instruction : its a binary sequence which is designed inside the processor to perform some task
byte ordering
MSB LSB
most least
significant significant
portion of data. portion of data.
[big end of data] [little end of data]
memory
location Ax mai
M[6000] and
M[6001] ka
data daldo
(16bit binary)
Data : 0001 0001 0010 0001
Hexa : 1 1 2 1
decimal
4 HEX digit
4 x 4 = 16 bit.
opcode
(mnemonics)
destination
register
source
type of address
operation
memory
location r1 mai
M[6000],
M[6001],
M[6002] and
M[6003] ka
data daldo
M[6000]
M[6001] r1
M[6002]
M[6003]
(32bit binary)
Data : 0001 0001 0010 0001 0101 0001 0101 0110
Hexa : 1 1 2 1 5 1 5 6
decimal
8 HEX digit
8 x 4 = 32 bit.
note :
- memory chip is byte addressable, so in the memory chip data is stored byte wise.
- in the processor operation are performed on word format.
when word size is greater than 8bit [1byte] (i.e
- 512GB
= 230 x 29 x 8 bit
= 239
= 128Gword
= 230 x 27
Address = 37bit
Q. consider a 32bit processor with memory = 128Gword then no. of address bits if
memory is byte addressable
Conversions :
27 26 25 24 23 22 21 20
128 64 32 16 8 4 2 1
write 11 in 8 bit :
0 0 0 0 1 0 1 1
1+1=2
Ox O B
2+2=4
4+1=5
5 +5 = 10
10 + 1 = 11
BINARY HEX
0000 : 0 1010 = 10 = A
Chapter - 1 (intro to COA) Page 40
BINARY HEX
0000 : 0 1010 = 10 = A
0001 : 1
0010 : 2 1011 = 11 = B why 4 digit?
0011 : 3 because in hexadecimal
0100 : 4 1100 = 12 = C 1hex value = 4bit
0101 : 5 hex = 0-9 and A to F
0110 : 6 1101 = 13 = D
0111 : 7
1000 : 8 1110 = 14 = E
1001 : 9
1111 = 15 = F
Q. consider 32bit data 0x (1A 2B 3C 4D) stored at memory location 2000 onwards
then result is ?
solution :
Data : 0x (1A 2B 3C 4D)
1A 2B 3C 4D
0001 1010 0010 1011 0011 1100 0100 1101
8bit 8bit 8bit 8bit
(or)
byte byte
addresable addresable
hai 8 bit hai 8 bit
rakhega rakhega
data : Ox 01 02 03 04
9 D 8 F C D 3 6
data : Ox 9D 8F CD 36
types :
1. Little endian : lower address contain lower bytes and higher address
contain higher bytes.
(OR)
right to left
(OR)
little endian starts with little end and store at lower memory
address
(OR)
little end of the word stored at lowest memory address
9 D 8 F C D 3 6
data : Ox 9D 8F CD 36
byte
addresable
hai 8 bit
rakhega
data : Ox 01 02 03 04
0001 1010
04 1000
0010 1011
03 1001
0011 1100
02 1002
0100 1100
01 1003
byte
addresable
hai 8 bit
rakhega
data : Ox EF 11 CD AB
data : Ox EF 11 CD AB
1010 1011
AB 1000
1100 1101
CD 1001
0001 0001
11 1002
1110 1111
EF 1003
byte
addresable
hai 8 bit
rakhega
data : Ox 11 AB 22 CD 33 EF 44 56
44 2001
EF 2002
byte
33 2003 addresable
hai 8 bit
CD 2004 rakhega
22 2005
AB 2006
11 2007
2. Big endian : lower memory address contain higher byte and higher memory
address contain lower byte.
(OR)
left to right
(OR)
start from the MSB of word stored at lower memory address.
(OR)
big end of word stored at lowest memory address.
9 D 8 F C D 3 6
data : Ox 9D 8F CD 36
big end : starts from big end and big end stored
at lowest memory address
byte
addresable
hai 8 bit
rakhega
data : Ox 01 02 03 04
big end : starts from big end and big end stored
at lowest memory address
01 1000
02 1001
03 1002
04 1003
byte
addresable
hai 8 bit
rakhega
data : Ox AB CD 11 EF
Chapter - 1 (intro to COA) Page 48
location starting from 1000 then store the data in Big
endian format
data : Ox AB CD 11 EF
AB 1000
CD 1001
11 1002
EF 1003
byte
addresable
hai 8 bit
rakhega
data : Ox 11 AB 22 CD 33 EF 44 56
AB 2001
22 2002
byte
CD 2003
addresable
2004 hai 8 bit
33
rakhega
EF 2005
44 2006
56 2007
TYPE : 2
when data is already stored in main memory and we have to
write in little endian and big endian format?
little endian : starts from little end and little end stored at
lower address.
big endian : starting from big end and big end stored at lower
address
01 2003
02 2002
03 2001
04 2000
Little endian :
Ox 01 02 03 04
big endian :
Ox 04 03 02 01
BA 2003
09 2002
24 2001
AB 2000
Little endian :
Ox BA 09 24 AB
big endian :
Ox AB 24 09 BA
21 2050
18 2051
17 2052
16 2053
Little endian :
Ox 16 17 18 21
big endian :
Ox 21 18 17 15
21 2050
18 2051
17 2052
16 2053
Little endian :
Ox 16 17 18 21
big endian :
Ox 21 18 17 16
76 76
77 77
78 78
4D 80 1A 80
3C 81 2B 81
2B 82 3C 82
1A 83 4D 83
84 84
TYPE : 3
By watching memory layout how to write in little endian
and big endian.
step by step !
77
78
1A 80
2B 81
3C 82 big Little
endian endian
4D 83
84
big endian :
Ox 1A 2B 3C 4D
77
77
78
4D 80
3C 81
2B 82 big Little
endian endian
1A 83
84
big endian :
Ox 4D 3C 2B 1A
Practise sheet :
1.Endian ness (little endian and big endian) is used in processor design time. it
means endian ness is the property of the CPU (processor), not the property of main
memory.
2. Endianness does effect the ordering of data item (does apply) on multibyte data
value, individual data items.
3. Endianess does not effect the ordering of data item (does not apply on) structure
like strings, arrays and struct for every individual data item BUT (if multibyte
data) then endianness (byte ordering) concept is applied.
- If data structure : 2 byte (multibyte) then each with endianess
- If data structure : 1 byte then not with endianess
02 02
03 03
lower
04 bytes
04 lower bytes
01 02 03 04 04 03 02 01
1000 1001 1002 1003 1000 1001 1002 1003
lower higher lower higher
address address address address
contains contains contains contains
type : 4
if data is given in little endian then how to write in big endian.
Q. Ox 0001
Ox 6665
Ox 4243
Ox 0100
2byte unsigned integer, stored in little endian format then write
in big endian.
66 65 65 66 Ox 66 65
00 01 01 00 Ox 00 01
42 43 43 42 Ox 42 43
01 00 00 01 Ox 01 00
61 20 64 20
. . ek integer 4 byte ka hai
62 . ek integer 4 63 . means multibyte hence
63 . byte ka hai . ordering changed.
62
. .
64 24 61 24
Q. an array of 2 two byte integers is stored in big endian machine in byte address as shown
below what will be its storage pattern in little endian machine?
Address Data
0 x 104 78
0 x 103 56
0 x 102 34
0 x 101 12
it is multibyte but they are in 2 byte pair so the ordering will occur in
A and B individually
A B
Chapter - 1 (intro to COA) Page 65
A B
12 56
34 78
little endian
104 56
103 78
102 12
101 34
(i)active low pin : this pin is enabled when input is 0 or in the low state
denoted as : pinname
(ii) active high pin : this pin is enabled when input is 1 or in the high
state
ex. INTR, HLDA, ALE etc.
(iii) time multiplex pin : this pin carries the multiple meaning but one
only at a time
- address pins are time multiplexed with data pins to carry the address
and data.
1. Address line : address line are used to carry the address (CPU to memory) it is
unidirectional.
note :
based on address line we can determine the capacity of memory [number of Main
memory location/cells]
AD0-AD7 , A8-A15
A : address : 16bits
D: data : 8 bits and 8 bits
20bit processor
220 cells/location = 1M cells
AD0-AD15, A16-A19
A : address : 20bits
D: data : 16 bits
2. Data line : data lines are used to carry the data (CPU to memory) it is
bi-directional
note :
• based on the data line we can determine the word length of the processor or CPU
• the performance of the processor is measured by word length of CPU.
ex. 8085processor
AD0-AD7 , A8-A15
word length = 8bit
operation performed on 8bit data format
AD0-AD7 , A8-A15
Chapter - 1 (intro to COA) Page 67
AD0-AD7 , A8-A15
A : address : 16bits
D: data : 8 bits and 8 bits
ex. 8086processor
AD0-AD15, A16-A19
word length = 16bit
operation perfomed on 16bit data format
AD0-AD15, A16-A19
A : address : 20bits
D: data : 16 bits
topic - Instruction : instruction is a binary code (binary sequence) which is designed inside
the processor to perform some operations.
type of operation
either we get the
operand or its address
Address field : n bit address field can specify 2 n memory cells (loactions)
1kb memory then address : [log21kb] = [log 210] = 10bit
6bit = 26 = 64
Chapter - 2 (introduction format and addressing Page 70
2. if OPCODE is 6 bit then number of operations performed?
6bit = 26 = 64
n = 7bits
because , 26 = 64
27 = 128
hence, 7bits.
30 = 2n= 25
register address field = 5bits
55 6bit
210 8bit
30 5bit
50 6bit
90 7bit
25 register 5bit
50 register 6bit
16bits
example :
4bits 6bits 6bits
OPCODE OPERAND REFERENCE OPERAND REFERENCE
16bits
the instruction set architecture is the part of the processor that is visible to the programmer or
compiler writer. the ISA serves as a boundary between software and hardware. software is
converted to machine instructions using software (compiler). then the instructions are executed
An ISA contains :
- the functional definition of storage locations (registers, memory) and operations(add,
multiply,branch,load,store,etc)
- precise description of how to invoke and access them
2. Location of the operands : data mera kaha hoga? registers, accumulator or memory?
5. supported operations :
- ADD
- SUB
- MUL
- AND
- OR
- CMP
- MOVE
Chapter - 2 (introduction format and addressing Page 76
- MOVE
(i) stack based organisation : in the stack based organisation ALU operations are perfomed
on stack data. ALU operand stack mai milega aur result bhi.
in stack CPU "0" (zero) explicit operand (or) all operand are
implicit (alag se address pass karne ke zarurt nahi hai, dono
operand stack mai hi mil jayenge)that means ALU operand
(data) before the ALU operation must be present in stack.
(sidhe ADD likh dete hai)
- what is stack?
stack is a block(part) of a memory in RAM but to control, CPU keeps a stack pointer
register.
stack memory ka hi
stack part hai
but for visuailsation
ke lie dikhaya hai
stack
based
organisation TOS
ALU
operand
TOS se
aayega
ALU
main
memory
In depth explanation :
A 100
B 600
C 100 (TOS)
memory stack
12: PUSH B : push whatever value present at location B to the top of the
stack(which is present in location B)
TOS M[B] (TOS mai memory loaction B ka data daal dena)
600 (TOS)
C 100
memory stack
13: ADD : add the top two (2) elements of the stack (pop 2 TOP element) from the
stack and perform addition and save the result back to the TOP of the
stack.
ADD : 100 + 600 = 700 (TOS)
stack
I4: POP C : store the value(result) (which is present in TOS) at the top of the stack
to memory loaction C
M[C] TOS
700 (TOS)
stack
note:
- in stack CPU ALU operation are 0 address field
- In the STACK-CPU data transfer operation (PUSH, POP) are not 0 address field, its 1
address field.
- PUSH A :
- PUSH B :
- ADD :
- POP C :
A X
B Y
Z
X (TOS)
memory stack
Z Y (TOS)
X
memory stack
Y popped out to
perform MUL
X
stack
Y
Z (TOS)
Z
XY
memory stack
Z popped out to
Z XY perform ADD
XY
stack
I2: PUSH B
I3: ADD
I4: PUSH C
I5: PUSH D
I6: ADD
I7: MUL
I8: POP X
EMPTY
because we popped the result from stack
I2: PUSH B : 4B + 3B = 7B
operation address
I3: ADD : 4B = 4B or
instruction
I4: PUSH C : 4B + 3B = 7B
I5: PUSH D : 4B + 3B = 7B
I6: ADD : 4B = 4B
only operation
I7: MUL : 4B = 4B
I8: POP X : 4B + 3B = 7B
I8: POP X : 4B + 3B = 7B
--------------------------32bit-------------------------
OPCODE REG1 REG2 IMMEDIATE
6bit 6bit 6bit 14bit
Instruction format
- after the processing result is also stored in the accumulator (OR) accumulator is used as
destination to store the result of ALU operation.
- the operand in the accumulator is loaded from the memory using the LOAD command or
instruction and the result is stored in the memory from the accumulator using the STORE
command.
note :
here 1 operand is implicit (present in accumlator) and 1 operand is explicit (present in the
memory address)
type of operation
destination source 1 source 2
ADD B; AC AC + M[B]
destination source
destination source
destination source
AC
single
accumulator
organization
ALU
result
main
memory
I1 : LOAD A; AC M[A]
I2 : ADD B; AC AC +M[B]
A 100
B 500 load the content (data) of
memory location/address (A)
to the accumulator
A 100
B 500 fetch the content (data) from memory location
(address) B and add that contain with accumulator
(AC) data or value present in the accumulator and
save the result back to the Accumulator
B 500
store the value (data) from the
accumulator (AC) to the
memory location C.
C 600
Defintions
I2: ADD B; AC AC +M[B]; fetch the content (data) from memory location
(address) B and add that contain with accumulator
(AC) data or value present in the accumulator and
save the result back to the Accumulator
I3: STORE C; M[C] AC; store the value (data) from the accumulator (AC) to the
memory location C.
ex.3 Z = (X * Y)
I1 : LOAD X; AC M[X]
I2 : MUL Y; AC AC +M[Y]
I3 : STORE Z; M[Z] AC
I1 : LOAD X; AC M[X]
I2 : MUL Y; AC AC *M[Y]
I3 : ADD Z; AC AC+M[Z]
I3 : ADD Z; AC AC+M[Z]
I1 : LOAD X; AC M[X]
I2 : MUL Y; AC AC *M[Y]
I3 : ADD Z; AC AC+M[Z]
I4 : STORE A; M[A] AC
ex.5 X = [(A+B) * (C+D)] ; A,B,C,D and X are the variable in the memory.
I1 : LOAD A;
I2 : ADD B; AC AC + M[B]
memory spills : to store
I3 : STORE Temp; m(temp.) AC the intermediate result in
the memory
memory spills : 1
(i)register - memory : in this organisation first ALU operand present in the registers (general
purpose register) and second ALU operand present in memory and after
processing result stored in register.
type of operation
destination source 1 source 2
(register) (register) (memory)
destination source
.
.
register file
R3
R2
R1
ALU
result
main
memory
I1 : LOAD R1 X; R1 M[X]
I2 : ADD R1 Y; R1 R1 +M[Y]
I3 : STORE Z R1 ; M[Z] R1
I1 : LOAD R1 A; R1 M[A]
I2 : ADD R1 B; R1 R1 +M[B]
I2 : MUL R1 Y; R1 R1 * M[Y]
I3 : ADD R1 Z; R1 R1 + M[Z]
(ii) memory - memory : all the operand must be required in the memory
(we dont study it)
ADD A, B, C
m[A] m[B] + m[C]
(iii) register - register : in this architecture all the ALU operand required/must be present in the
(RISC)fastest register.
type of operation
destination source 1 source 2
(register) (register) (register)
destination source
note :
arithmatic operations does not access the memory only load and store instructions
are used to access the memory
.
.
register file
R3
R2
R1
ALU
LOAD
and
STORE
result
main
memory
I1 : LOAD R1 X; R1 M[X]
I2 : LOAD R2 Y; R2 M[Y]
I2 : LOAD R2 Y; R2 M[Y]
I3 : ADD R3 R1 R2 ; R3 R1 + R 2
I4 : STORE Z R3 ; M[Z] R3
ex.2 C=A+B using AC. CPU WHERE A, B, C are the memory address
I1 : LOAD R1 A
I2 : LOAD R2 R3
I3 : ADD R3 R1 R2 ; R3 R1 + R 2
I4 : STORE C R3 ; M[C] R3
I1 : LOAD R1 X
I2 : LOAD R2 Y
I3 : MUL R3 R1 R2 ; R3 R1 * R2
I4 : LOAD R4 Z
I5 : ADD R5 R3 R4 ; R5 R 3 + R4
I1 : LOAD R1 A
I2 : LOAD R2 B
I3 : ADD R3 R1 R2 ;
I4 : LOAD R4 C
I6 : ADD R6 R4 R5
I7 : MUL R7 R3 R6
I8 : STORE X R7
I1 : LOAD R1 A
I2 : LOAD R2 B
I3 : ADD R1 R2 R3 ; R1 A +B
I4 : LOAD R2 C
I5 : LOAD R3 D
I6 : ADD R2 R2 R3 ; R2 C + D
I7 : MUL R3 R1 R2 ; R3 R 1 * R2
I8 : STORE X R3 ; M[X] R3
3 Registers
16bit(instruction)
2AF: OPCODE AF1 AF2
2 7 7
16bit(instruction)
1AF: OPCODE AF1
9 7
expand opcode length is required in the fixed length instruction supported CPU design to
implement the various instruction with different formats (alag alag format ke operation
support karne ke lie)
expand opcode length is required in the fixed length instruction supported CPU design to
implement the various instruction
Assume category :
(iii) more (further) derived instruction : highest (more and more) bit in OPCODE.
Chapter - 2 (introduction format and addressing Page 96
(iii) more (further) derived instruction : highest (more and more) bit in OPCODE.
Step 1 : identify the primitive instruction (lowest OPCODE bit) in the CPU
according to the CPU
step 1 : find primitive instruction(type 1) : sabse pehle format likhlo jaise 1AF aur 0 AF diya
hai. phir OPCODE but dekho jisme kam vo primitive.
--------6bit--------- -----6bit------
OPCODE AF OPCODE
2bit 4bit 6bit
lowest OPCODE bit (derived
(primitive instruction) instruction)
(type 2)
step 3 : idenitfy number of free OPCODE after allocating 1 address instruction 4-2 = 2
-----6bit------
OPCODE
6bit
(derived instruction)
(type 2)
1AF 0AF
--------6bit--------- -----6bit------
OPCODE AF __ ____
2bit 4bit 2bit 4bit
lowest OPCODE bit (derived instruction)
(primitive instruction) (type 2)
__ ____ __ ____
10 0000 11 0000
0001 0001
. .
. .
. .
. .
. .
. .
10 1 1 1 0 11 1 1 1 0
1111 1111
16 16
type : 1
--------24bit --------
OPCODE AF1 AF2
8bit 8bit 8bit
step 3 : idenitfy number of free OPCODE after allocating 2 address instruction 256-254 = 2
type : 2
-----24bit -----
OPCODE AF1
16bit 8bit
idenitfy number of free OPCODE after allocating 1 address instruction 512 - 256 = 256
type : 3
--24bit --
OPCODE
24bit
type : 1
--------16bit --------
OPCODE AF1 AF2
8bit 4bit 4bit
step 3 : idenitfy number of free OPCODE after allocating 2 address instruction 256-x
type : 2
-----16bit -----
OPCODE AF1
12bit 4bit
11bit instruction
address field : 4bit
type : 1
--------11bit --------
OPCODE AF1 AF2
3bit 4bit 4bit
step 3 : idenitfy number of free OPCODE after allocating 2 address instruction : 8-5 = 3
type : 2
-----11bit -----
OPCODE AF1
7bit 4bit
type : 3
--11bit --
OPCODE
11bit
16bit instruction
address field : 6bit
type : 1
--------16bit --------
OPCODE AF1 AF2
4bit 6bit 6bit
type : 1
--------16bit --------
OPCODE AF1 AF2
4bit 6bit 6bit
type : 2
-----16bit -----
OPCODE AF1
10bit 6bit
idenitfy number of free OPCODE after allocating 1 address instruction : 128 - 127 = 1
type : 3
--16bit --
OPCODE
16bit
type : 1
-----16bit -----
OPCODE MEM
4bit 12bit
type : 2
--------16bit --------
OPCODE REG1 REG2
6bit 5bit 5bit
type : 3
--16bit --
OPCODE
16bit
- addressing mode is a technique used to calculate the effective address and operand.
Effective address
Effective address is the actual address of the operand.
Addressing mode
Different ways in which the location of the operand is specified in an instructions are referred as
addressing mode. Operand kaha present hai aur usko kaise lena hai)
1. Fetch cycle : to fetch (bring) the instruction from main memory to CPU. (doesn't care
(works as a what is the instruction)
postman)
fetched instruction
with the help of is loaded into the IR
memory line
(memory)
2. Execute cycle : the objective of the execute cycle is to execute (to process) the fetch instruction.
it decodes; does the analysis of the instruction. (what is OPCODE, how many
operand, operand address calculation, operand fetch, processing, result storage)
(memory)
AC AC + M[6000]
and data [6000]
with the help
I1 : Load [6000]
(operand fetch)
of system buses
data process and result store [6000] whatever is stored operand fetch AC M[6000]
decode I1 kya kehnta chata hai,
(memory) accumulator mai memory
location 6000 ka content
daaldo
operand address calculation
Load : Memory read
Store : Memory write
reason 2 :
whenever we write program in high level language (C, C++) then we use different structures (machine
language kaise deal karega)
(i) constant
assembly language
(ii) variable(global, local, static) H.L.L converted to all these features are
implement by addressing
mode in assembly language
(iii)pointers
reason 3 : instruction hota hai in binary (OPCODE pata hai humein lekin 5 kya hai)
OPCODE ADDRESS
can be
- value(constant)
- register (direct, indirect)
- memory (direct, indirect)
memory
mode field / mode bit : helps you
how to get the operand (or) how to if 4 am then , mode field = 2bit
use this address part if 7 am then, mode field = 3 bit
(immediate/memory/register
kaise use karna hai?)
so in the computer operand (data) are present in the register (or) memory (or) instruction itself
when operand (data) is present in either immediate field (or) register (or) memory then why
various addressing modes are used?
because by 3 only, we cannot implement variable, pointers, arrays, loops etc. so we need
different - different addressing modes.
H.L.L machine
assembly
language
(i) immediate addressing mode : in this addressing mode operand is present(placed) in the
instruction itself
(OR)
operand are present in the address field of the instruction.
instruction
OPCODE ADDRESS
DATA
(operand)
note : immediate addressing mode are used to access the constant (or) initialize the register
(or) variable with value.
MOV 100 R1
100 mai R1 nahi daal sakte
move immediate
MOV R1 #4000
(OR) MOV R5 #600
600 11
MVI R1 #4000 R5 600
R1 4000
(ii) memory direct / absolute addressing mode : in this addressing mode operand is present(placed) in
the memory and instruction contains the effective
address
(OR)
OPCODE AF
yahi mera effective address hoga
jo memory mai lekar jayega
memory
note :
- this addressing mode are used to access the variabes.
- 1 (one) memory reference is required to access (read (OR) write) the data.
example :
(i) (ii)
ADD R1 [1000] MOV R [6000]
R1 R1 + M[1000] R3 M[6000]
R1 mai R1 aur memory location R3 mai memory location
1000 ka data add hokar daal do. 6000 ka data daal do.
ADD R1 [4000]
R1 R1 + M[4000]
R1 R1 + 600
MOV R1 [4000] 600 11
R1 M[4000] R1 mai R1 + memory
R1 600 location 4000 ka data
daal do
R1 mai 600 (memory
location 4000 ka
R1 mai memory data) + R1 daal diya
location 4000 ka data 4000 600
daal do
R1 mai 600 (memory
location 4000 ka ADD R4 [600]
data) daal diya R4 R4 +M[600]
R4 R4 + 11
1 memory memory
reference for
write 1 memory
reference for
read 1 memory memory reference : kitni baar memory
reference for ko access karna pada
read
(iii) memory indirect addressing mode : in this addressing mode operand is present(placed) in
the memory and effecive address (address of operand)
is also present in the memory, instruction contains the
address of effective address.
OPCODE AF
effective address ka memory mai operand
address instruction
mai hota hai DATA (operand) - 2 memory reference for access
the operand
memory
example :
(i) (ii)
ADD R1 [1000] MOV R5 @7000
(0R) R5 M[7000]
ADD R1 @1000
R5 mai memory location
R1 R1 + M[1000] 7000 par jaakar uska
data daal R1 mai daaldo.
R1 mai R1 aur memory location
1000 ka data add hokar daal do.
ADD R1 @4000
MOV R3 @4000
R1 R1 + M[4000]
R3 M[4000]
R1 mai memory location 4000 par 600 11(DATA)
jaakar uska data daal R1 mai add R3 mai memory location 4000 par
karke R1 mai daaldo. jaakar uska data daal R3 mai daaldo
R3 M[600]
R1 R1 + M[600]
R1 mai memory location 600
R1 mai memory location 600 4000 600 (EA) (M[4000]) par jaakar uska data daal
(M[4000]) par jaakar uska data daal R1 mai daaldo.
R1 mai add karke R1 mai daaldo.
R1 11
R1 R1 + 11
R1 mai 11 (M[600])ko daaldo.
R1 mai 11 (M[600])ko R1 mai add
karke R1 mai daaldo.
memory
#indirect : direct na jaakr address pass karte hue address par jaa rahe hai
6500 21
memory
direct
1 memory
memory memory
indirect
reference for memory
write DATA 1 memory indirect
reference for
EA 1 memory
reference for
1 memory EA
reference for
read DATA 1 memory
reference for
read DATA
(iv) register addressing mode : this addressing mode is same as memory direct addressing
mode but the difference is here the operand are present in the
register instead of memory.
R3 par data
R3 DATA(operand)
mil jayega in this addressing mode operand store
R4 in the register and that register address
. (register name) is maintained in the
register ka address means
uska naam address field
. address field of the instruction.
mai milega instruction ki .
.
.
.
.
.
register file
example :
(i)
ADD R1 [6000] R1 @7000
M[6000] R1 + @7000
memory
direct
register
1 memory direct
reference
memory
1 register indirect
reference
2 memory
reference
(v) register indirect addressing mode : in this addressing mode operand are present or placed
in the memory and effective address present in the
register.
register file
example :
(i)
OPCODE destination source 1
LOAD R1 @R2
R1 @R2 R0
R1 M[R2] R1
R1 M[2000] R2 2000 (EA) 2000 100
.
R1 100
.
.
register
direct
register
1 register indirect
reference
1 register
reference
effective address = PC value + Address field (offset) effective address = base register + Address field (offset)
effective address = current PC value + relative value
(ix) auto decrement and increment addressing mode : it is similar to register indirect
addressing mode in which register value
decrement (or) increment
Decrement : Pre-decrement
Increment : Post-increment
(ix) implied / implicit addressing mode : in this addressing mode operand (data info) are present
in the OPCODE itself
OPCODE
important points
operand is placed in the memory and effective address (address of operand) is also present in the memory
same as memory direct am but operand is present in the register instead of memory
operand is present or placed in the memory and effective address is present in the register
500 800
201 500
800 300
702 325
- direct mai effective
600 900 address instruction
mai hai so we go to
R1 400 500, 500 is address
of effective address
400 700 800
- indirect mai
effective address
400 700 memory mai hi hai
toh 500 mera
399 450 address
r1 = 111
222 155
operand is in memory
operand is operand is in
and instruction in memory register
carries the effective
address of operand and instruction carries the and instruction contains the
address of effective address register name
operand is in
instruction itself
- access consant
- initialize variable
1 memory 2 memory
EA
reference reference
used 1 register
used
reference
EA
operand is in
memory
assume,
if i write R1 then computer writes 00001
AC 20
effective address = PC value + Address field (offset) effective address = base register + Address field (offset)
effective address = current PC value + relative value
mentioned
hogi
format :
OPCODE REGISTER AF(A)
If Register is :
- current value
because in the CPU only
- index register one program counter
- base register
effective address = PC value + Address field (offset) effective address = base register + Address field (offset)
effective address = current PC value + relative value
OPCODE 500 effective address = index register value + Address field (offset) OPCODE BR 500
used in array implementation also
OPCODE R AF(A)
+ DATA(operand) read/write
memory
register file
(i) index register addressing mode : index addressing mode are used to implement the array
- index value
starting / base address of array : present in the address field of the instruction (offset)
index value : stored in index register
a[2] = 1000 + 2 x 4
index
= 1000 + 8 value
(stored
in index
= 1008 register)
OPCODE R AF(A)
index register
(XR)
load to AC 500
1 register reference for
XR = 100 index value
500 800
1 memory reference for DATA 600 900
702 325
800 300
memory
at the CPU design time, one special purpose register is made as index
OPCODE A RS
contains
index value
effective address = A + R
= m[1000 + Rs] index addressing
= m[1000 + 1] mode used here
= m[1001]
RD m[1001]
RD RD + 1000
RD 1 + 1000
RD 1001
when we want to access / perform same operation in multiple location (assume 100
location) then different different instructions are required
but with the help of auto decrement and auto increment am it is possible by using only
one instruction we perform same operation or access the multiple location
example :
for (i = 1; i<=100, i++)
----- address 1
.
.
so, the loop will continue to execute until R1 becomes 0 and in memory the decrement will continue
in 1 loop / iteration
2 times we access the memory
case (i)
1000 i1 i1 fetch PC : 1001
i2 fetch PC : 1002
1001 i2 stored in tos
i3 fetch PC : 1003 when i3 is decoding (executing) i3 gets
1002 i3
JMP 1051 PC : 1051 to know it does not have to go to 1003
1003 i4 JMP +48 instead i3 have to go to 1051.
. hence, target address : 1051
.
.
. forward jumping
1051 i51
1052 i52
1053 i53
case (ii)
i2 fetch PC : 1002
1001 i2
i3 fetch PC : 1003
1002 i3
1052 i52 i52 fetch PC : 1053 when i52 is decoding (executing) i52
JMP 1002 PC : 1002 gets to know it does not have to go to
1053 i53 1053 instead i52 have to go to 1002.
JMP -51 hence, target address : 1002.
+ve -ve
forward backward
jumping jumping
note : relative AM and base AM register are used to write re-allocatable code.
JMP BNZ
jump branch on not
zero
GOTO JNZ
go to jump on
not zero
SKIP BE
skip branch
on
equal
condition
if sub x y becomes 0
then go 211
branch on zero
condition
if R1 and R2 is equal
then go to 235
branch if equal
case (i)
i2 fetch PC : 1002
1001 i2
i3 fetch PC : 1003 JMP +48 jo kaam tha i51 par le
1002 i3 jaane ka vo isse ho gaya
1003 i4
displacement /
.
forward jumping offset /
.
relative value
.
.
Chapter - 2 (introduction format and addressing Page 133
.
forward jumping offset /
.
relative value
.
.
1051 i51
1052 i52
1053 i53
case (ii)
i2 fetch PC : 1002
1001 i2
i3 fetch PC : 1003
1002 i3
benefit : relative AM and base AM register are used to write re-allocatable code.
after some time same program will reload but at different address
i2 fetch PC : 4002
4001 i2
i3 fetch PC : 4003 JMP +48 jo kaam tha i51 par le jaane
4002 i3 ka hai vo isse ho gaya
4003 i4
. displacement /
. forward jumping offset /
. relative value
. now, this will not work here
4051 i51
4053 i53
benefit : agar mai memory location (re-allocate) change bhi kardu toh displacement /
offset / relative idhar bhi kaam karega but with previous method , hume address 1000
par operation perform krna tha jisme JMP 1051 tha, toh hum sirf 1000 par hi operate
kar paate but with displacement we can work on any address because it does not stick
to any particular address (i.e 1000, 2000, etc) instead it is a operation like JMP +48
that can be perfomed on any address that will be perfoming forward jumping.
target = 202
PC : 211
offset : -9
target = 235
PC : 226
offset : +9
PC relative addressing mode : intra segment transfer of control (branching) when target address is
present in same segment then during program execution control will be transferred with in the
segment called intra segment branching.
i1
i2
i3
segment 1 intra segment
.
(same segment)
.
ix
(ii) base register addressing mode : the only difference is that in based register we have to put the
starting address in base register (due to re-allocation)
target = 211
PC : 204
offset : +7
target = 202
PC : 211
offset : -9
target = 235
PC : 226
offset : +9
the only difference is that in based register we have to put the starting address in Base register
ex. BR = 200
900
901 i1
word size = 16 bits = 2 byte 902 next instruction
instruction size = 4byte = 2 words
PC value = 902
displacement value = -32
jmp -32
8bit 8bit
PC value = 902
target address = 614
PC value = 902
target address = 614
PC value
(i) before instruction fetch :
900
1000 - 1003
1004 - 1007
1008 - 1011 compare
1012 - 1015 branch if equal
1016 - 1019
unsigned([+ve])
magnitude
complement
0 [+ve]
signed
1 [-ve]
negative in
signed when
starting with
1
eg.
16bit number = -(216-1) to +(216-1)
= (-32k to +32k -1)
but if we want to store 51,000 then it is not possible with 16bit data because range is
between (-32k to +32k -1), hence this is not possible with fixed point.
syntax :
0 [+ve]
s (sign bit)
1 [-ve]
S e
+/- 0.xxxxxx X 2e
mantissa
(or)
BE= AE + bias
M= mantissa
e=exponent / actual exponent AE
+/- 0.xxxxxx X 2e
110.1
0.11o1 x 2+3
0.1101 x 2+3
three bits right mai sarka di means right alignment
0.1101 x 2+3
three bits right mai sarka di means right alignment
hence power is in positive.
0.1101 x 2+3
S e
+/- 0.xxxxxx X 2e
mantissa
Q. +(4.5)
100.1
0.1001 x 2+3
0.1001 x 2+3
S e
+/- 0.xxxxxx X 2e
mantissa
100.11
0.10011 x 2+4
0.10011 x 2+3
S e
+/- 0.xxxxxx X 2e
mantissa
Q. 0.00101
0.101 x 2-2
0.101 x 2-2
S e
e
+/- 0.xxxxxx X 2
mantissa
s(sign) = 0 (positive hai)
M = 101 (mantissa)
in order to convert all numbers into positive numbers, take the most (highest) negative number
and add as a bias
Q. 0.00101
0.101 x 2-2
0.101 x 2-2
S e
e
+/- 0.xxxxxx X 2
mantissa
s(sign) = 0 (positive hai)
M = 101 (mantissa)
continuation....
exponent : 5bit
hence , bias = 25-1
bias = 25-1 = 24 = 16
E= e + bias
E= -2 + 16 = 14
in 5 bits : 01110
normalized mantissa
syntax syntax
0. 1........X 2e 1. .......X 2e
M M
example : example :
(101.11) (101.11)
0.10111 x2+3 1.0111 x22
M=10111 M=0111
e=3 e=2
E = e+bias E = e+bias
e = E - bias e = E - bias
Q. +6.75
110.11
S e
exponent : 4bit e
+/- 0.xxxxxx X 2
hence , bias = 24-1
bias = 24-1 = 23 = 8
E= e + bias
E= e + 8 mantissa
explicit implicit
0
+110.11 x 2 +110.11 x 20
0.11011 x 23 1.1011 x 22
E= e + bias E= e + bias
E = 3 + 8 = 11 (1011) E = 2 + 8 = 10 (1010)
1bit 4bit 5bit 1bit 4bit 5bit
0 1011 11011 0 1010 10110
explicit implicit
E = 1011 = 11 E = 1010 = 10
bias = 8 bias = 8
Q. +5.5
101.1
S e
exponent : 4bit
+/- 0.xxxxxx X 2e
hence , bias = 24-1
bias = 24-1 = 23 = 8
E= e + bias
E= e + 8 mantissa
0.1011 x 23 1.011 x 22
E= e + bias E= e + bias
E = 3 + 8 = 11 (1011) E = 2 + 8 = 10 (1010)
1bit 4bit 5bit 1bit 4bit 5bit
0 1011 10110 0 1010 01100
hexa : (1 7 6) 16 (1 4 C)16
Q. +4.875
100.111
S e
exponent : 4bit
+/- 0.xxxxxx X 2e
hence , bias = 24-1
bias = 24-1 = 23 = 8
E= e + bias
E= e + 8 mantissa
explicit implicit
0
+100.111 x 2 +100.111 x 20
0.100111 x 23 1.00111 x 22
E= e + bias E= e + bias
E = 3 + 8 = 11 (1011) E = 2 + 8 = 10 (1010)
1bit 4bit 5bit 1bit 4bit 5bit
0 1011 10011 0 1010 00111
explicit implicit
E = 1011 = 11 E = 1010 = 10
bias = 8 bias = 8
mantissa : giving precision (accuracy) (large number / more and more bits) (in mantissa
getting accurate for very small number)
exponent : giving range (power) (more bits in exponent means large-large number)
instruction size
S E M
(or)
-(29.75)10
S e
11101.11 x 20
+/- 0.xxxxxx X 2e
excess : 32
bias : 2k-1 = 26-1 = 25 = 32
k= 6bit
mantissa
exponent : 5
E= e + bias
E= e + 32
explicit normalized :
11101.11 x 20
0.1110111 x 25
e=5
E= 5 + 32
E= 37 (100101)
M= 1110111
Q. 21.75
21.75
S e
10101.11 x 20
+/- 0.xxxxxx X 2e
explicit implicit
+10101.11 x 20 +10101.11 x 20
0.1010111 x 25 1.010111 x 24
E= e + bias E= e + bias
E = 5 + 64 = 69 (1000101) E = 4 + 64 = 68 (1000100)
1bit 7bit 8bit 1bit 7bit 8bit
0 1000101 1010111 0 1000100 01011100
13.5
S e
1101.1 x 20
e
+/- 0.xxxxxx X 2
excess : 32
bias : 26-1 = 26-1 = 25 = 32
k= 6 bit
mantissa
E= e + bias
E= e + 32
explicit implicit
0
+1101.1 x 2 +1101.1 x 20
0.11011 x 24 1.1011x 23
E= e + bias E= e + bias
E = 4 + 32 = 36 (100100) E = 3 + 32 = 35 (100011)
1bit 6bit 9bit 1bit 6bit 9bit
0 100100 110110000 0 100011 101100000
4 9 B 0 4 7 6 0
explicit implicit
E = 100100 = 36 E = 100011 = 35
bias = 32 bias = 32
E = 100100 = 36 E = 100011 = 35
bias = 32 bias = 32
excess : 64
bias : 2k-1 = 27-1 = 26 = 64
k= 7 bit
E= e + bias
E= e + 64
implicit
(i) smallest mantissa : 0000 0000
(ii) largest / highest mantissa : 1111 1111
(iii) smallest exponent : 0000 0000
(minimum value in exponent)
(iv) large / highest exponent : 1111 111
(maximum value in exponent)
explicit
(i) smallest mantissa : 1000 0000 (.M point ke baad mantissa)
(ii) largest / highest mantissa : 1111 1111
(iii) smallest exponent : 0000 0000
(minimum value in exponent)
(iv) large / highest exponent : 1111 111
(maximum value in exponent)
note :
(i) in the explicit represent we cannot represent 0 because 0.1 something is not 0
(ii) in the implicit representation we cannot represent '0' because 1.something is not 0
but here for first smallest we put all 0's in E and mantissa but actually value is not zero.
so for that '0' we use IEEE 754 single precision and double precision
E(7bit) M(8bit)
0 0000000 00000000
bias : 27-1 = 26 = 64
implicit :
+1.00000000 x 2-64
E(7bit) M(8bit)
0 0000000 00000001 second smallest positive mai idhar se 1 aajaygea
bias : 27-1 = 26 = 64
implicit :
Chapter - 2 (introduction format and addressing Page 156
implicit :
+1.00000001 x 2-64
2-8 x 2-64
2-72
E(7bit) M(8bit)
0 0000000 00000000
E(7bit) M(8bit)
0 1111 111 1111 1111
implicit :
+1.11111111 x 263
E(7bit) M(8bit)
0 1111 111 1111 1110 second highestpositive mai idhar se 0 aajaygea
bias : 27-1 = 26 = 64
implicit :
+1.11111110 x 263
2-72
proof :
proof : 0.111 x 20
111 x 2-3 (left alignment)
(23-1) x 2-3
23-3 - 1 x 2-3
1-2-3
0 111111 111111111
bias : 26-1=32
0.111111111 x 231
----32bit---- ----64bit----
S E M S E M
1bit 8bit 23bit 1bit 11bit 52bit
single precision
default : implicit
119 : 1110111.0 x 20
hexa : 42EE0000
implicit E = 133
bias = 127
(-1)s 1.M x 2E-bias
double precision
double precision
default : implicit
119 : 1110111.0 x 20
hexa : 405DC00000000000
implicit
210 29 28 27 26 25 24 23 22 21 20
1024 512 256 128 64 32 16 8 4 2 1
1 1 1
0 0 0 0 0 0 0 0
1000000101
single precision
Chapter - 2 (introduction format and addressing Page 162
single precision
default : implicit
S:0
M : 11000000000000000000000
e: 5
bias : 127
E : 5 + 127 = 132 (10000100)
E : 10000100
hexa : 42600000
implicit
(-1)s 1.M x 2E-bias
single precision
default : implicit
-14.25: -1110.01 x 20
hexa : C1640000
implicit
in convential representation
bias = 2k-1
in single precision
E = 00000000 M = 000000000000
00000000000
NAN ko bhi
represent kar
sakte hai
M=0 (+/- 0)
when E = 0
E = 00000000
M=0 (fraction /
denormalized)
M=0 (infinte)
when E = 255
E = 11111111
in double precision
E = 00000000000 M = 0000000000
0000000000
dono ko represent
kar sakte hai aur
denormalized ko
bhi
NAN ko bhi
represent kar
sakte hai
M=0 (+- 0)
when E = 0
E = 00000
000000
M=0 (fraction /
denormalized)
M=0 (infinte)
when E = 2047
E = 111111
11111
M=0 (NAN, not a
Chapter - 2 (introduction format and addressing Page 166
E = 111111
11111
M=0 (NAN, not a
number)
in single precision
if we take bias = 2k-1 =28-1 = 27-1 =128 (or) excess 128 then there is a
chance of getting E = 255
assume if e = 127
agar aap only 2k-1 loge toh 128 aayega
bias = 128 aur e =127 hai
then 255 aa jayega aur 255 infinite aur NAN
E = 127 + 128 ke lie bana hai then problem will arrive.
E = 255
S E(8bit) M(23bit)
S=0
E = 254
S=0
E = 254
1<E<254 (for single precision)
S E(6bit) M(7bit)
0 111111 1111111
S E(6bit) M(7bit)
0 111111 1111110
254 aaye aur overflow (255) na ho jaye E islie mene bias ko 2 K-1-1 kar diya hai.
note : single precision for a normalized number in worst case will be : 1.M x 2 -126
then its normalized otherwise denormalized number.
double precision
E = 1 bias = 1023
E = E + bias
1 = e + 1023
e = -1022
+- 1.0 x 2-1022
if e = -1023, 1024...then store as
denormalized number
lets say
(i)
A = .......x2-2 power same honi chaiye
and B = ......x2-3
Chapter - 2 (introduction format and addressing Page 169
(i)
A = .......x2-2 power same honi chaiye
and B = ......x2-3
(i) choose the number with the smaller exponent and shift its mantissa right a number of
steps equal to the difference in exponents.
(ii) set the exponent of the result equal to the larger exponent.
(iii) perform addition / subtraction on the mantissas and determine the sign of result.
example:
show the IEEE 754 binary representation of the number -0.7510 in single and double
precision
-0.11 x 20
bias = 127
e = -1
S=1
E = -1 + 127 = 126
single precision
S=-1
bias = 28-1 = 128-1 = 127
E = E + bias = -1 + 127 = 126
E = 126 (001111110)
A : 9.999 x 101
B : 1.610 x 10-1
9.999x101
0.016x101
10.015
10.015 x 101
example :
subtract 127 :
259 - 127 = 132
E : range is 1 to 254 hence 132 allowed
(ii) multiply the mantissas and determine the sign of the result.
1.110(3points ke baad)
9.200 (3points ke baad)
10.212000 (total 6 points ke baad)
10.212 x 105
normalized : 1.0212+6
the addition or subtraction of 127 in the multiply and divide rules results from using excess -127
notation for exponents.
instruction cycle
sub cycle -
(i) fetch cycle : to fetch the instruction from memory to CPU
PC MAR, MAR Memory, Memory MBR, MBR IR
objective of micro operation : how execute, data path, register, MUX, common bus,
system bus will work here and how control signal will
be implemented? how, what, when and why?
Primary/main memory
2. Memory
secondary/auxillary memory
Input device
3. I/O
Output device
cpu organisation
(i) registers
(ii) ALU
(iii) control unit
#memory address register : stores all the address of memory used for read/write operation.
why MAR/AR?
because it is connected to address line of the system bus. knows address and how to and where
to go etc.
why MBR/DR/MDR(data)?
connected to the data line of the system bus
Rin : if the content of the bus loaded into register (andar aa raha hai)
Rout : if the content from ther register will be placed on bus (bahar jaa raha hai)
8. Accumlator (AC)
Program counter : When instruction is fetched (fetch cycle executed) then PC denotes the starting
address of next instruction.
15 16bit 0 15 16bit 0
TR DR
7 0 7 0 15 16bit 0
OUTR INPR AC
ALU (arithmetic logical unit) : ALU is a hardware that performs arithmetic, logical operations
and condition checking, etc.
(or)
it performs multiple operation
CU (control unit) : timing signal and control signal. how and what to do each and everything.
4
c2 = 6 connections. charo ko connect karne ke lie 6 connections chaiye.
ALU register
PSW memory
q.consider if we have 16 registers, 1 memory, 1 ALU , 1PSW and 1 other component (total 20
components) then total connection required?
20
c2 = 190 connections.
so the solution is, instead of using 190 connections, connect all components to a common
bus(internal
bus). at a time which part (components) will communicate? for that control signals are required.
(i) the number of multiplexer required = size of register (#of bits in register)
(ii) size of multiplexer required = number of register
q.if we have m registers and each register size is n bits then what is the number of mux and size
Chapter - 3 (ALU^J data path and CU) Page 177
q.if we have m registers and each register size is n bits then what is the number of mux and size
of mux?
(i) the number of multiplexer required = n (size of register (#of bits in register))
(ii) size of multiplexer required = m x 1 (number of register)
q.if we have 32 registers and each register size is n 8bits then what is the number of mux and
size of mux?
(i) the number of multiplexer required = 8 (size of register (#of bits in register))
(ii) size of multiplexer required = 32 x 1 (number of register)
s1 s0 register selected
register A to register B : RA RB
Process : Register A content (data) will be given to MUX then that data will be transferred from
MUX to common and then that data will be loaded into register B from common bus
S1 S0
0 0 =A
a1 RA out = 1
a2
a3
a4
a1 a2
a3 S1 S0
a4
0 1 =B
RB in = 1
when Rout is set to 1 then respective register data is loaded into the MUX then MUX to common bus,
common bus is connected to all registers, the register which have R in is set to 1 in that respective
register bus data is loaded.
RA to RB : RA out RB in
00[A] RA out = 1 then RA data load to MUX then common; then we get R B = 1 then from common bus
content is loaded into register.
working of computer
11 0
PC
4bit 12bit 11 0
OPCODE address AR
memory 4096
15 0 words
16 bits per word
IR
15 0 15 0
TR DR
7 0 7 0 15 0
OUTR INPR AC
4096 x 16
212 x 16
S2
S1 bus
S0
memory unit
4096 x 16 7
address
LD CLR
AR 1
LD INC CLR
PC 2
LD INC CLR
DR 3
LD INC CLR
E
adder
and AC
4
logic
LD INC CLR
INPR
IR
5
LD INC CLR
TR
6
LD INC CLR
OUTR
clock
LD
clock
LD
total components : 7
memory : 7
AR : 1
PC : 2
AC : 4
DR : 3
TR : 6
IR : 5
S2 S2 S2 enables
0 1 0 2(PC)
1 0 1 5(IR)
in memory we have 'n' locations so, which memory address contains (data) loaded into the bus? it is
given by AR(MAR) aur MAR mai PC se aayega
111 7 (memory)
010 2 (PC), PC will be enabled and content of PC will be loaded into common bus and AR
register(MAR) [Load(IN) is set to 1(active) so AR(MAR) get the memory address
- memory
- PC
- IR
T1: PC M(MAR)
T2 : MAR Memory
[MAR MBR]
T2 : Memory MBR
T3 : MBR IR
PC se AR mai address waha se memory mai gaye aur memory se MBR aur MBR se hum IR mai
aye aur ise hum ALU data path kehte hai
T2
T1 T3
fetched instruction
with the help of is loaded into the IR
memory line
instruction
address read control line
(memory)
IR se decoder
fetch cycle :
T1: PC M(MAR) PCout MARin
Rin : if the content of the bus loaded into register (andar aa raha hai)
Rout : if the content from ther register will be placed on bus (bahar jaa raha hai)
(i) micro instruction/operation : in fetch cycle, execute cycle, interrupt cycle we have small
operations [x operations]
computer
programs
instruction/data
instruction cycle
(subcycles) chote chote operations hote is cycle mai aur
fetch use hum micro operations kehte hai
execute aur execute karayega control signal
interrupt aur control aayege control unit ki taraf se
(ii) micro refers to the fact that each step is very simple and accomplishes very little
each instruction is executed during an instruction cycle made up of shorter sub cycles (fetch,
indirect, execute, interrupt)
the execution of each sub cycle involves one or more shorter operations (micro-operations)
hardware level par jo kaam hota hai vo perform karta hai micro program.
instruction cycle
T1: PC M(MAR)
T2 : MAR Memory
T2 : Memory MBR
T3 : MBR IR
fetched instruction
with the help of is loaded into the IR
memory line
instruction
address read control line
(memory)
(ii) must not read and write same register at same time
if different component are there then perform in a single cycle (no conflict)
(ii) PC se MAR
AC AC
AC AC
(0064) H (1020) H
0000 0000 0110 0100 000 0000 01100 0100
memory
(iv) (0064)H par hume mila
(1020)H aur vo MBR mai gaya (v) MBR se IR mai gaye
MAR 0000 0000 0110 0100 MAR 0000 0000 0110 0100
MBR 0001 0000 0010 0000 MBR 000 0000 0100 0000
at the end of fetch cycle instruction is fetched from memory to CPU (IR)register
- decode the instruction (analysis of instruction) [what opcode, how many opcode, kehna kya chate ho]
AF
LOAD 4000 memory location 4000 pe jo available
will be given to MAR hai vo accumulator mai daal do
fetched instruction
with the help of is loaded into the IR
memory line
(memory)
AC M[4000]
and data [4000]
with the help
I1 : Load [4000]
(operand fetch)
of system buses
(a) ID stage : enable the hardware to perform the operation (instruction decode / analysis)
(b) OF stage : AM's (addressing mode) are required to access (operand fetch)
operand fetch
why ALU ? because if we have this kind of data type then we need ALU to
perform these operations.
i2 : ADD R1 X : R1 R1 + M[X]
direct AM
(EA) x location sabse pehle
MAR ko deni padegi
load @4000
address of
effective address
AC M[M[4000]]
AC M[EA]
operand fetch
LOAD @4000 4000
operand fetch
AL MAR
AC / ALU MBR DL binary
CL RD
AC M[[4000]] 11 [100]
11 [100] read
data process control line
and
result store
(memory)
q. R0 R1 + R2
T1 T2 T1 : T1 R1 ; R1 out T1 in
T 2 : T2 R2 ; R2 out T2 in
ALU T4 : R0 AC ; AC out R0 in
AC
instruction given : R0 R1 + R 2
fetch cycle :
step (i) 3. PC r, MARw, MEMr : PC MAR and memory read
step (ii) 5. MDR r, IRw : MDR IR
execute cycle :
step (iii) 2. R 1 r, temp1 w : R1 temp1
step (iV) 1. R2 r, temp1 r, ALUadd, temp2 w : temp2 R2 + R 1
step (V) 4. temp 2 r, ROw : R0 temp2
memory read
load [4000]
AC M[4000]
RD
done
#memory store concept :
M[6000] AC
M[6000] data(11)
M[6000] content of accumulator
fetch cycle - when CPU encounter the interrupt then it push the PC (program
counter) value into the stack as a return address and control
transfer to ISR (interrupt subroutines)
execute cycle
no
check
interrupt
unusual event
that disturbs
service the
interrupt
the nature of this cycle varies greatly from one machine to another
step (i) content of PC are transferred into MBR so that they can be saved for
return from the interrupt.
stack
step (ii) then the MAR is loaded with the address at which the contents of PC are
to be saved, and the PC is loaded with the address of the start of the
interrupt processing routine.
step (iii) once this is done, the final step is to store the MBR, which contains the
old value of the PC, into memory
step (iv) the processor is now ready to begin the next instruction cycle.
SP MAR AL
stack pointer
tos address
supplies WR CL
address
write control line
control stack
signal
stack
vector address PC
interrupt vector table : a data structure that associates a list of interrupt handlers with a list of
interrupt requests in a table of interrupt vectors
micro program
ISR address will be given to PC for interrupt service and once the service is completed
RETI/IRET interrupt will return; then we will POP the PC value from the stack
(i) by reducing the operation of the processor to its most fundamental level we are able
to define exactly what it is that the control unit must cause to happen
(kisi bhi processor ke operation ko reduce kara uske fundamental level par usi ke
karan we are able to define exactly ki control unit kislie bani hai).
control unit :
control unit is the supervisior in the system that control each and every activity.
control unit takes several input but produce control signals and these control signals are
(i) control signals are implemented in control unit. (control unit control signal ko
generate karti hai)
(ii) control signals are required to execute the micro operation.
(iii) micro operation is the elementary operation in the hardware.
(iv) control unit generated the sequence of control signal.
(v) control signal are directly executed on a base hardware (H/W)
so, hardware generate the desired response.
computer system functionally is program execution.
memory mai hai operand and then we studied addressing mode, there are 11
types of addressing modes and then data laane ka working humne working of
register aur MUX padha (ALU data path) for syntax and format we studied
micro operation and for fetch cycle we studied micro program and the working of
execution will be done by control unit through control signals that generate
control words.
instruction register
decoder
sequencing
control address register
logic
read
control memory
control control
signals signals
within CPU to system
bus
instruction register
control signals
flags
within CPU
control signals
control unit
from control bus
control signals to control bus
control bus
clocks
T1 : MAR PC (or)
C2
PC MAR
T3 : IR (MBR) C4
T1 : MAR IR (address)
C8
T3 : IR(address) MBR(address) C4
T1 : MAR PC
C1
whatever component we have (register, ALU, MUX, bus, etc) we use control word for it.
PCin PCout MARin MARout IRin IRout MBRin MBRout ACin ACout MUX ALU GPR SPin SPout
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
C2 : one
micro
operation
PCin PCout MARin MARout IRin IRout MBRin MBRout ACin ACout MUX ALU GPR SPin SPout
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
C5 : one
micro
operation
PCin PCout MARin MARout IRin IRout MBRin MBRout ACin ACout MUX ALU GPR SPin SPout
0 0 0 0 1 0 1 1 0 0 0 0 0 0 0
MEM MBR
C9 : ACin ALUout
C7 : ACout ALUin
MBR IR
C4 : MBRout IRin
C5 : memoryout MBRin
C2 : PCout MARin
memory
PC MAR
how many micro operation are required for each instruction [T 1, T2, T3...]
what the control signal required for each micro operation for each instruction
T3 M(MAR) MBR
T3 MBR AC/ALU
working : control unit generate the control signals and at the control unit design time,
designer decides which control signal are generated in which cycle (T 1, T2, T3..)
of different-different instructions and that will be stored in a table.
control signal will be implemented into the control unit by using following approach :
control unit
(i) hardwired control unit : in hardwired control unit, control signal are expressed in the
S.O.P (sum of product) form and they are directly realized on the
hardware.
- in the hardwired control unit they use fixed logic circuit to interpret the instruction then
generate the control signal.
T5 OR yin
T3 AND
BR
branch
Bout = T1 I1 + T1 I2 + T1 I3 + T3 I1 + T3 I2 + T3 I3
Bout = T1 (I1 + I2 + I3) + T3 (I1 + I2 + I3)
Bout = T1 + T3
(I1 + I2 + I3) = 1
(I1 + I2 + I3) = 1
S5 = T1 I1 + T1 I2 + T1 I3 + T1 I4 + T3 I2 + T3 I4
S5 = T1 (I1 + I2 + I3 + I4) + T3(I2 + I4)
S10 = T2 I2 + T2 I3 + T3 I4 + T4 I1 + T4 I3 + T5 I2 + T5 I4
S10 = T2 (I2 + I3) + T3 I4 + T4 (I1 + I3) + T5 (I2 + I4)
(I1 + I2 + I3 + I4) = 1
(i) microprogrammed control unit : in microprogrammed control unit, control words are stored in
the control memory
then according to the type of operations control signals are
generated.
control memory is associated with CAR (control memory
address) and CDR (control data register) to contain the control
memory address and data respectively.
control unit
control signals
hardwired control unit
0 : disable
1 : enable
S 3 S 2 S1 S0
base hardware
(3) design a horizontal micro instruction for control signal (cs) [S 0 S2]
control field
BC flag 0 1 0 1 CM address
S 3 S 2 S1 S 0
base hardware
00 : S0 control field
01 : S1
10 : S2 decoder
11 : S3
S0 S 1 S2 S 3
(iii) design a vertical micro insruction for control signal : [S0 S2]
2bit
BC flag control field CM address
FC1 FC2
00(s1) 10(s2)
function code (FC) is generated by the control unit to give signal to CPU to perform operation
(iv)operational state :
BC flag 00 10 CM address
00 : S0
00 : S0
10 : S2 decoder
S 3 S 2 S1 S 0 S 3 S 2 S1 S 0
(i) in this control signal are expressed (i) in this control signal are expressed
in decoded format. in encoded format.
note: default microprogram control unit is vertical micro program control unit
used in CISC
horizontal vertical
microprogramming microprogramming
125 - 22 = 103
32 branch : 5 bit
AF/NIA/CAR : 20bit
16 flag : 4bit
G1 G2
vertical horizontal
(none/one) (none/one)
400 cs 6 cs
9bit 6bit
125bit 10bit
(i) cycle (T1, T2....Tn) : cycle is defined as clock pulse transition either from
rising edge to rising edge or falling edge to falling edge.
(ii) cycle time : the time required to transfer the pulse either form rising edge to rising
edge or falling edge to falling edge is called cycle time
cycle time ∝ 1
clock frequency
my computer properties :
64bit processor (word length = 64bit)(operation perfomed on 64bits)
8GB RAM (AL)
ITB hardisk (240 byte hardisk)
clk
waha se clock pulse processor (CPU)
nikalti hai
clock generator
cycle time - 1
clock frequency
- 1/1GHZ
- example : - example :
some conversions :
210 = 1k = 103 -3
≡ 1mili second = 10 second
220 = 1m = 106 ≡ 1micro second = 10-6 second
230 = 1g = 109 ≡ 1nano second = 10-9 second
240 = 1t = 1012
250 = 1p
260 = 1e
q. CPU has 1 GHZ processor and program P1 having 100 instruction and each instruction takes 5
cycle then what is the program execution time?
(i) in cycle
(ii) in time
1GHz processor
cycle time : 1 nano second
q. CPU has 2 GHZ processor and program P1 having 100 instruction and each instruction takes 5
cycle then what is the program execution time?
(i) in cycle
(ii) in time
2GHz processor
Chapter - 3 (ALU^J data path and CU) Page 213
2GHz processor
cycle time : 0.5 nano second
q. CPU has 4GHZ processor and program P1 having 100 instruction and each instruction takes 5
cycle then what is the program execution time?
(i) in cycle
(ii) in time
4GHz processor
cycle time : 0.25 nano second
note :
- number of instructions : instruction count (IC)
- CPI : cycle per instruction
q. CPU has 1 GHZ processor and program P 1 having 100 instruction and each instruction takes 5
cycle then what is the program execution time?
(i) in cycle
(ii) in time
Chapter - 3 (ALU^J data path and CU) Page 214
(ii) in time
1GHz processor
cycle time : 1 nano second
P1
IC cycle time
CPI
100 instruction 5 cycle 1 nano seco
P1
q. consider a CPU with clock frequency (rate) of 400MHZ and if the CPU has average CPI 6
then average execution time is?
Chapter - 3 (ALU^J data path and CU) Page 215
then average execution time is?
=2.5 x 10-9
= 2.5 nsec
q.consider a CPU operate at (run at) 800MHZ clock rate and iexecuting a program consiting
4000 instruction. if each instruction taking CPI 5 then total program execution time (CPU
time)?
note :
super computer : floaps
floating point operation per second
- aryan in 10 hours
- vipul in 4 hours
old design :
16.1 nsec
3.22 nsec
s=5
instruction pipelining
non - pipelining : new input only accepted after completion of old input (or) accepting new
input after previously accepted input appearas a output at the other end, in non pipelining
non overlapping execution.
output end
s4 i1 i2 i3 i4
s3 i1 i2 i3 i4
s2 i1 i2 i3 i4
s1 i1 i2 i3 i4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
input end
cycle
sabhi cheeze common clock pulse se control ho rahi hai
s : stage total 16 cycles
pipelining : pipelining is a mechanism which is used to improve the perfomance of the system in
which task (instruction) are executed in overlapping (parallel) manner.
- pipelining is decomposition technique that means problem is divided into sub problem as assign
the sub problem to the pipes then operate the pipe under the same clock
pipelining means accepting new input at one end before previously accepted input appears as a output
at other end. this means new input are executed along with old input in overlapping manner (in a
pipeline)
instruction : i1 i2 i3 i4
stage: s1 s2 s3 s4
example : pipeline
box 1 box 2 box 3 box 4
input output
stage 4 box 4
stage 3 box 4 box 3 ......
stage 2 box 4 box 3 box 2 ......
stage 1 box 4 box 3 box 2 box 1 ......
1 2 3 4
new inputs are accepted at one end before previously accepted inputs
appear as output at the other end
example : non-pipeline
box 1 box 2 box 3 box 4
input output
stage 4 box 4
stage 3 box 4
stage 2 box 4
stage 1 box 4
1 2 3 4
box 1 box 2
input output box 3 box 4
stage 4 box 3
stage 3 box 3
stage 2 box 3
stage 1 box 3
5 6 7 8
output
stage = segment
kisi bhi pipelining ke 2 end hote hai ek hota hai input end aur dusra output end and between these
end multiple pipes are interconnected to functioning of pipelining
- these pipes are called stage (or) segment
- between the stages 'buffer' are used to store the intermediate results.
- these buffer is called as pipeline register (or) interface register (or) buffer (or) latch
- all the stages along with the buffer are controlled (or) connected by common clock.
S1 ka data S2 mai daal rahe but agar S2 khali nahi hua then? islie buffer hota hai
stages ke beech mai, s1 ka data s2 mai daal rahe hai islie s1 se free karke buffer mai
daal dete hai jisse s1 free ho jaye jisse usme I2 aajaye.
ETpipelining = [k+(n-1)] tp
k : no of stages (segments)
n : no of instructions
tp : each stage delay in pipeline.
output end
s4 i1 i2 i3 i4
s3 i1 i2 i3 i4
s2 i1 i2 i3 i4
s1 i1 i2 i3 i4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
input end cycle
k : no of stages (segments) = 4
n : no of instructions = 4
tp : each stage delay in pipeline (assume) = 1 cycle
ETpipelining = [4 + (4-1)]1
ETpipelining = [4 + 3]1
ETpipelining = 7 cycle.
ETnonpipeline = n x tn
n : no of instruction
tn : each instruction execution time in non-pipeline
output end
s4 i1 i2 i3 i4
s3 i1 i2 i3 i4
s2 i1 i2 i3 i4
s1 i1 i2 i3 i4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
input end
cycle
n : no of instruction = 4
tn : each instruction execution time in non-pipeline = 4
ETnonpipeline = 4 x 4
ETnonpipeline = 16 cycle.
1
perfomance ∝
ET
1/ETpipeline
=
1/ETnon-pipeline
perfomance gain n x tn
=
(speed up factor) [k + (n-1)] tp
(s)
when large number of instruction executed (or) n value is not given (or) ideal case
example :
n x tn n x tn n x tn
(s) = (s) = (s) =
[k + (n-1)] tp [k + (n-1)] tp [k + (n-1)] tp
4 x tn 100 x tn 10000 x tn
(s) = (s) = (s) =
[4 + (4-1)] tp [4 + (100-1)] tp [4 + (10000-1)] tp
ETnon-pipeline = k x tp
tn : k x tp (only when perfectly balanced)
when each stage (all stage) are perfectly balanced (or) uniform delay then 1 task ET in
no-pipelining is
k=4
tp = 2
ETnon-pipelining = 4 x 2 = 8 nsec
perfomance gain tn
=
(speed up factor) tp
(s)
perfomance gain k x tp
=
(speed up factor) tp
(s)
in ideal case when pipeline are perfectly balanced then maximum speed up factor is equal to
number of stages in pipeline
(i) uniform delay pipeline (perfectly balanced) : in uniform delay pipeline each stage
taking the same amount of time (delay) to complete the assigned task.
s : stage
1ns 1ns 1ns 1ns
k=4 n: 1
n=1 tn : each instruction execution time
tp : each stage delay in pipeline tn : 2+2+2+2 = 8 nsec
tp : 2nsec ETnon-pipeline = 1 x 8
ETpipeline = [4+(1-1)]2 ETnon-pipeline = 8nsec
ETpipeline = 8nsec
in uniform delay :
1 task ET in pipeline = 1 task ET in non pipeline
tp : 2 + 1 = 3 nsec
Chapter - 4 (Pipelining) Page 227
tp : 2 + 1 = 3 nsec
ETnon-pipeline = k x tp
tn : k x tp (only when perfectly balanced)
(ii)non-uniform delay pipelining : in non-uniform delay pipeline each stage taking (maintain)
different amount of time to complete the assigned task
s : stage
1ns 1ns 1ns 1ns
k=4 n: 1
n=1 tn : each instruction execution time
tp : max (stage delay) tn : 2+8+4+2 = 16 nsec
tp : max (2,4,8,2) ETnon-pipeline = 1 x 16
tp : 8sec ETnon-pipeline = 16nsec
ETpipeline = [4+(1-1)]8
ETpipeline = 32nsec
tp : 2 + 1 = 3 nsec
k=4 n : 1000
n = 1000 tn : each instruction execution time
tp : max (stage delay) tn : 2+8+4+2 = 16 nsec
tp : max (2,4,8,2) ETnon-pipeline = 1000 x 16
tp : 8sec ETnon-pipeline = 16000 nsec
ETpipeline = [4+(1ooo-1)]8
ETpipeline = 8024nsec
tp : 9nsec
jab perfectly balanced hai toh same time lag raha lekin jab non uniform hai toh different time lag
raha hai
important points:
(i) when pipeline are perfectly balanced (uniform delay) then 1 task ET in pipelining is same as 1
task ET in non-pipelining.
(ii) when pipeline are not perfectly balanced (non-uniform delay) then 1 task ET in pipeline is greater
than 1 task ET in non-pipeline.
T1 ≥ T2
T1 : 1 task execution time in pipeline
T2 : 1 task execution time in non-pipeline
(iii) but in non-uniform delay when number of task (instruction) increase then pipeline perfomance
Chapter - 4 (Pipelining) Page 229
(iii) but in non-uniform delay when number of task (instruction) increase then pipeline perfomance
is best in non-uniform delay (or) uniform delay pipeline.
(iv) buffer delay is included only in pipeline and in non-pipeline we are not storing the intermediate
result so in non pipeline buffer delay is not included.
k=4
s = tn
tp
s = 75 = 2.5
30
n= s (neta : efficiency)
k
k=4
s = tn
tp
s = 41 = 1.86
22
n : 1000
ETpipeline = [k+(n-1)] tp
ETpipeline = [4+(1ooo-1)]165
ETpipeline = 1003 x 165 x 10-9
ETpipeline = 165.5 x 10-6
k=5
n : 100
ETpipeline = [k+(n-1)] tp
ETpipeline = [5+(1ooo-1)]165
ETpipeline = 1003 x 165 x 10-9
ETpipeline = 104 x 165nsec
ETpipeline = 17160 nsec
k=3
n : 100
ETpipeline = [k+(n-1)] tp
ETpipeline = [3+(1oo-1)]20
ETpipeline = 102 x 20
ETpipeline = 2040 nsec
in general : throughput means rate of output (kis rate se aapko output aa raha hai)
for n instruction :
ETpipeline = [k+(n-1)] tp time
throughput : n
[k+(n-1)]
when numer of instructions are large (or) not given (or) in ideal case:
1
throughput :
tp
summary:
ETnonpipeline = n x tn 1
throughput :
tp
n : no of instruction
tn : each instruction execution time
in non-pipeline
n= s n (neta)
k
perfomance gain perfomance of pipe n : efficiency
=
(speed up factor) perfomance of non-pipe s : speed up factor
(s) k : number of stages
perfomance gain ETnon-pipeline
=
(speed up factor) ETpipeline in non-uniform
(s)
perfomance gain n x tn tp : max (stage delay + buffer delay)
=
(speed up factor) [k + (n-1)] tp
(s)
when perfectly balanced / ideal case:
k : no of stages (segments)
n : no of instructions
tp : each stage delay in pipeline. tn = k * tp
tn
tp =
when numer of instructions are large k
(or) not given (or) in ideal case: s=k
perfomance gain tn
=
(speed up factor) tp
(s)
s= 20
n= s (neta : efficiency)
k
80% = 20
k
k = 20 x 100 = 25
80
k=4
s = tn
tp
s = 75 = 2.5
30
n= s (neta : efficiency)
k
k=4
s = tn
tp
s = 41 = 1.86
22
k=4
in new design :
largest stage is split into 2 equal stage delay
s = perfomance of new
perfomance of old
s = ETold
ETnew
s = 80 = 1.6
50
tp = 50nsec
frequency = 1
time
1
frequency =
50 x 10-9
1
frequency = x 10-9
50
1000
frequency = x 106 = 20MHZ
50
old design
tp : max (stage delay)
tp = max(900, 600, 550, 450, 400) = 900nsec
1 instruction takes : 900 x 10-9 sec
in one second : 1/900 x 109 instruction
throughput : 1/900 instruction
new design
tp : max (stage delay)
tp = max(440, 460, 600, 550, 450, 400) = 600nsec
throughput : 1/600 instruction
% of increment 9-6 9
: x = 50%
in throughput 54 1
design D1 design D2
k=5 k=8
tp : max (stage delay) tp : 2nsec
tp = max(3, 2,4,2,3) = 4nsec ETpipeline = [k+(n-1)] tp
ETpipeline = [k+(n-1)] tp ETpipeline = [8+(1oo-1)]2
ETpipeline = [5+(1oo-1)]4 ETpipeline = 107 x 2
- nonpipelined processor
- operating at 100MHz
so cycle time : 1 1 10-8 x 10
= 108 =
100 x 106 10
= 10 x 10-9 sec
k=5
tp : max (stage delay + buffer delay)
tp = max(2+0.5, 1.5+0.5, 2+0.5, 1.5+0.5, 2.5+0.5) = 3nsec
s = tn
tp
s = 10 = 3.33
3
q.(iv) how to set this CPI in uniform delay? and how to set this CPI in non-uniform
delay?
RISC : 5stages
if we want to construct N stage pipeline then entire CPU is divided into 'N' functional unit
(independent functional unit) which is independent from each other.
independent functional unit : it means when one functional unit perform the task, in the same
time other functional unit perform the other task (operation).
(functional unit : adder, subtractor,logic GATE, hardware etc)
agar hum computer mai N stage ki pipeline banana chate hai toh hume entire CPU ko
independent N functional unit mai construct krna padega
1 2
- because when we will enable the clock then the operation will perfom (data will
move from one register to another register (or) any task)
q.(iv) how to set this CPI in uniform delay and in non-uniform delay?
time : 22 nsec
frequency : 1 1
hz = hz
time 22nsec
22 22
nsec nsec
in uniform delay :
WB i1 i2 i3
EX i1 i2 i3
ID i1 i2 i3
IF i1 i2 i3
1 2 3 4 5 6
at
clock cycle (CC) 4 : i1 out
clock cycle (CC) 5 : i2 out
clock cycle (CC) 6 : i3 out
4x 22 + (n-1)22
time : 22 nsec
frequency : 1 1
hz = hz
time 22nsec
Stage 1 will complete its work in 22 nanoseconds, and data movement from stage 1 to
stage 2 will occur within this period. However, for the remaining stages, tasks will not
be completed within this same 22 nanosecond clock cycle. Therefore, the maximum
stage delay is considered for the proper functioning of the pipeline.
maximum :
tp : 30nsec
30 30
nsec nsec
In stage 1, the task will finish in 22 nanoseconds, but the clock cycle is set to 30
nanoseconds. This means that every 30 nanoseconds, the output from each stage
will become available. Therefore, after one complete cycle of 30 nanoseconds, the
output from stage 1 will be passed on to stage 2, and this sequence will continue
for each subsequent stage.
so synchronization!
= 0.4 nsec
k=4
ETnon-pipeline = n x tn
ETnon-pipeline = 4 x 0.4 = 1.6 nsec
s = tn
tp
s = 10 = 3.33
3
pipelined processor
- operating at 2 GHz
so cycle time : 1 10-9 x 1
2x 109 = 2
= 0.5 nsec
k=5
ETpipeline = 0.5 nsec
s = tn
tp
s = 1.6 = 3.2nsec
0.5
20%more CPI in P2
ETp2 = cycle time x 1.2 CPI
frequency = 1
time2
1
frequency =
0.625 x 10-9
for i = 2 i1 7 9 11 13
i2 9 10 13 14
i3 10 11 15 16
i4 12 13 17 18
s4 i1 i1 i2 i3 i4 i1 i1
s3 i1 i2 i2 i3 i3 i4 i4 i1
s2 i1 i1 i2 i3 i4 i1 i1
s1 i1 i2 i2 i3 i4 i4 i1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
s1 s2 s3 s4
for i = 1 i1 2 3 4 5
i2 3 6 8 10
i3 5 7 9 13
i4 6 9 11 15
for i = 2 i1 8 10 12 16
i2 9 13 15 18
i3 11 14 16 21
i4 14 16 18 23
[weightage : 3-4marks]
cache memory
memory hierarchy : hierarchy design organize the system supported memory into 4
levels to minimize the accessing times.
locality of reference
cache hit
if not then cache miss
number of hit
hit ratio = total number of access
if cache hit ratio is 80% that means 80% reference found in cache.
topic : memory
(ii) if the reference (respective data) find in the cache then that is called cache hit
(operation is called hit) then respective data is given cache to CPU in the form of words.
(iii) if the reference is not found in the cache then its called cache miss, then it is
forwarded to main memory.
(iv) if the reference is found in the main memory then it is called main memory hit (page
hit) then respective data is given main memory to cache in the form of blocks and
cache to CPU in the form of words.
(v) if the reference is not found in the main memory then it is called main memory miss
(page fault) then the reference forward to secondary memory.
(vi) secondary memory is the last level of memory in which hit ratio is always '1' so
respective data is transferred from secondary memory to main memory in the form of
pages, main memory to cache memory in the form of blocks, then cache to CPU in the
form of words.
mapping : the process of transferring the data from main memory to cache memory is called
mapping.
h : cache hit ratio tm : main memory access time
tc : cache access time
register blocks
words
cache main memory
CPU
Tavg = hit x time taken by memory when there is a hit + (1-H) time taken by
memory when there is a miss.
hit + miss = 1
number of hit
hit ratio = total number of access
number of hit 90
hit ratio : = 100 = 0.9
total number of access
Tavg = hit x time taken by memory when there is a hit + (1-H) time
taken by memory when there is a miss.
Tavg = 0.9 x 20 + 0.1 x 150
Tavg = 18 + 15
Tavg = 33ns
(or)
Tavg = hit x time taken by memory when there is a hit + (1-H) time
taken by memory when there is a miss.
Tavg = 0.75 x 20 + 0.25 x 150
Tavg = 15 + 37.5
Tavg = 52.5ns
(or)
hit : 80%
time taken : 5nsec
hit miss : 20%
time taken : 50nsec
(i) simultaneous access memory organisation : all the levels of memory directly connected to
CPU (or) CPU is communication with all the
level of memory directly but access (follow)
in sequence
- when there is a miss in level 1 (l1) then hit in level 2(l2) then directly data is given from level
2 (l2) memory to CPU without copying into level 1 (l 1) memory
- when there is a miss in level 2 (l2) then hit in level 3(l3) then directly data is given from level
3 (l3) memory to CPU without copying into level 2 (l 2) and level 1 (l1) memory
. . .
. . .
ln hn tn
(ii) hierarchical access memory organisation : all in the hierarchical access CPU is
communicating with only level 1 (l1)
memory
- when there is miss in level 1 (l1) and hit in level 2 (l2), firstly that data is copied from
level 2 (l2) to level 1 (l1) then level 1 (l1) to CPU
- when there is miss in level 1 (l1) and level 2 (l2) but hit in level 3 (l3), firstly that data is
copied from level 3 (l3) to level 2 (l2) then level 2 (l2) to level 1 (l1) then level 1 (l1) to CPU
in the locality of reference is present (cache works on locality of reference).
n level
h1 t1 h 3 t2 h 3 t3 . . . . . . . . h n tn
CPU l1 l2 l3 ln
pehle level
mai mila
register blocks
words
cache main memory
CPU
3 level
h1 h2
register
words
l1 cache l2 cache main memory
CPU
t1 t2
tm
level 1 mai hi sab mil gaya, main memory mai jaane ki zarurat nahin.
(or)
Tavg = tc + (1-h)tm
Tavg = 2 + 0.2 (100)
Tavg = 2 + 20
Tavg = 22nsec
(or)
Tavg = tc + (1-h)tm
Tavg = tc + 0.2 (tm)
2 level l2 (t2)
l1 (t1)
register blocks
words
cache main memory
Chapter - 5 (Cache memory) Page 261
register blocks
words
cache main memory
CPU
hit ratio : h
(or)
(or)
(or)
l1 = t1
l2 = t2
perfomance of l1 1/t1 t2
5 = perfomance of l = = t = 5t1
2 1/t2 t1 2
t1 = Tavg - 10
Tavg = T1 + 10
T1 = 20nsec
T2 = 5 x 20 = 100ns
Tavg = 20 + 10 = 30
Tavg = h x t1 + (1-h)t2
30 = h x 20 + (1-h) 100
30 = 20h + 100 -100h
70 = 80h
h = 70/80 = 0.875
Tavg = h x t1 + (1-h)t2
30 = h x 20 + (1-h) 100
30 = 20h + 100 -100h
70 = 80h
h = 70/80 = 0.875
Tavg = 30 + 10% of 30
Tavg(new) = 30 + 3
Tavg(new) = 33
1
hit ratio ∝
Tavg
fetched
this word
1 block = 32word
but only 1 word that is requested/demanded by the CPU given from cache
memory to CPU.
locality of reference : accessing the higher level of memory data from level 1 memory is called
locality of reference. (data kahin bhi ho hum cache memory se lenge)
types :
(i) temporary LOR
(ii) spatial LOR
(i) temporary LOR : means the same word in the same block is reference by the CPU in near
future (frequently)
(or)
same data which access again and again then type of data stored in
temporary LOR
1 block = 32word
(ii) spatial LOR : means the adjacent word in the same block is reference by the CPU in
a sequence.
x x+1
1 block = 32word
CPU always access the data from the cache (faster/level one) memory. if there is a miss in
level one (cache) memory and hit in main memory (level 2 memory) then one complete
block is transferred from l2 memory to l1 memory and addressed word (with
request/demanded) by the CPU given from level 1 (cache) to CPU.
l2
l1
block :
words : 32
register (block)
1 word word
cache main memory
CPU TB
t1
t2
3 level
(or)
Tavg = h1 t1 +m1 h2 (tb1+t1) + m1m2(tb2+tb1+t1)
m1 : miss in level 1
m2 : miss in level 2
Tavg = t1
h2 = 100% = 1
h1 = 0
Tavg = h1t1 + (1-h1)h2(t2+t1)
Tavg = 0xt1 + (1-0)1(t2+t1)
Tavg = 1(t2+t1)
Tavg = t2+t1
h3 = 1
h2 = 0
h1 = 0
Tavg = h1t1 + (1-h1)h2(t2+t1) + (1-h1)(1-h2)(t3+t2+t1)
Tavg = 0xt1 + (1-0)0(t2+t1) + (1-0)(1-0)(t3+t2+t1)
Tavg = 1x0(t2+t1) + (1)(1)(t3+t2+t1)
Tavg = t3+t2+t1
Tavg = t1
h2 = 100% = 1
h1 = 0
Tavg = h1t1 + (1-h1)h2(TB1+t1)
Tavg = 0xt1 + (1-0)1(TB1+t1)
Tavg = 1(t2+t1)
Tavg = TB1+t1
h3 = 1
h2 = 0
h1 = 0
Tavg = h1t1 + (1-h1)h2(TB1+t1) + (1-h1)(1-h2)(TB2+TB1+t1)
(hum T1, T2 aur Tavg ek word ka nikalte hai lekin yahan jo box
aayega vo n word ka hoga, T1 mai sirf one word aa raha lekin T2
mai 4 words)
if block size is n words : TB = n x T2
TB = 4 x 250 = 1,000 nsec
(or)
Tavg = T1 + (1-h)TB
Tavg = 30 + (1-0.9)1000
Tavg = 30 + 0.1 x 1000
Tavg = 30 + 100
Tavg = 130nsec.
= 1000/14x106 words/sec
cache miss : 20ns for first word and 5ns for remaining words
word size : 64bits = 8bytes
256B
words in a block : block size = = 32 words
word size 8B
if block size is n words : TB = n x T2
2mbps
cycle time = 1
sec
60 x 106
12 x 1 sec
60 x 106