Nmam Institute of Technology: Department of Computer Science and Engineering
Nmam Institute of Technology: Department of Computer Science and Engineering
1) You have a system that contains a special processor for doing floating-point operations. You have
determined that 50% of your computations can use the floating-point processor. The speedup of the
floating pointing-point processor is 15.
a) What is the overall speedup achieved by using the floating-point processor?
b) What is the overall speedup achieved if you modify the compiler so that 75% of the
computations can use the floating-point processor?
c) What fraction of the computations should be able to use the floating–point processor in order to
achieve an overall speedup of 2.25?
Solution:
i) Overall speedup achieved by using the floating-point processor.
F = 0.5 S = 15
ii) Overall speedup achieved if you modify the compiler so that 75% of the computations can use
the floating-point processor.
F = 0.75 S = 15
iii) Fraction of the computations that should be able to use the floating–point processor in order to
achieve an overall speedup of 2.25:
F = ? S = 15
1|Page
2) Suppose you have a load/store computer with the following instruction mix:
Operation Frequency No. of Clock cycles
ALU ops 35% 1
Loads 25% 2
Stores 15% 2
Branches 25% 3
Solution
a) Compute the CPI. b) We observe that 35% of the ALU ops are paired with a load, and we
propose to replace these ALU ops and their loads with a new instruction. The new instruction takes
1 clock cycle. With the new instruction added, branches take 5 clock cycles, Compute the CPI for
the new version.
a)
CPI old = (0.35*1) + (0.25*2) + (0.15*2) + (0.25*3) =1.9
b) 0.35*0.35 = 0.1225
c) If the clock of the old version is 20% faster than the new version, which version has faster CPU
Execution time and by how much percent?
2
NMAM INSTITUTE OF TECHNOLOGY
(An Autonomous Institution affiliated to VTU, Belgaum)
(AICTE, approved, NBA Accredited, ISO 9001:2008 Certified)
Nitte – 574110, Karkala, Udupi District, Karnataka, India.
Department of Computer Science and Engineering
3) For the purpose of solving a given application problem, you benchmark a program on two
computer systems. On system A, the object code executed 80 million Arithmetic Logic Unit
operations (ALU ops), 40 million load instructions, and 25 million branch instructions. On system
B, the object code executed 50 million ALU ops, 50 million loads, and 40 million branch
instructions. In both systems, each ALU op takes 1 clock cycles, each load takes 3 clock cycles, and
each branch takes 5 clock cycles.
a) Compute the relative frequency of occurrence of each type of instruction executed in both
systems.
b) Find the CPI for each system.
c) Assuming that the clock on system B is 10% faster than the clock on system A, which system is
faster for the given application problem and by how much percent?
Solution: Compute the relative frequency of occurrence of each type of instruction executed in both
systems.
A B
ALU ops 80/145= 0.55 50/140=0.36
Loads 40/145=0.28 50/140=0.36
Branches 25/145=0.17 40/140=0.28
3|Page
c)
4) Suppose that a system contains a special floating point processor for doing floating-point
operations. When a program uses the floating-point processor, the speedup that the floating-point
processor offers is 1.4.
In order to improve the speedup two options are considered:
Option 1: Modifying the compiler so that 70% of the computations can use the floating-point
processor. Cost of this option is Rs. 2500.
Option 2: Modifying the floating-point processor. The speedup offered by the modified floating-
point processor is 2. Assume in this case that 50% of the computations can use the floating-point
processor. Cost of this option is Rs. 3000.
Which option would you recommend? Justify your answer quantitatively.
Solution:
Option 1:
Speedup= 1/ [(1-0.7)+(0.7/1.4)] = 1.25
Cost/speedup = 2500/1.25 = 2000
Option 2:
Speedup= 1/[(1-0.5) + (0.5/2)] =1.33
Cost/speedup = 3000/1.33 = 2255
5) An unpipelined processor takes 6 ns to work on one instruction. The pipelined version of the
processor has 6 stages with the following lengths: 1.0ns; 0.8ns; 0.4ns; 1.2ns; 1.3ns; 1.3ns. It then
takes 0.3 ns to latch its results into latches. Answer the following, assuming that there are no stalls
in the pipeline.
a) What are the cycle times in both processors?
b) How long does it take (in nano-seconds ) to finish one instruction in both processors?
(Note : Ignore the initial fill time in the pipelined processor)
c) What is the speedup achieved by the 6 stage pipeline with respect to unpipelined processor?
4
NMAM INSTITUTE OF TECHNOLOGY
(An Autonomous Institution affiliated to VTU, Belgaum)
(AICTE, approved, NBA Accredited, ISO 9001:2008 Certified)
Nitte – 574110, Karkala, Udupi District, Karnataka, India.
Department of Computer Science and Engineering
d) How long does it take (in nano-seconds) to finish 1000 instructions in both processors?
(Note : Do not ignore the initial fill time in the pipelined processor)
6) Compute the overall CPI of a computer which executes a program with following instruction
mix:
Operation Frequency No. of Clock cycles
ALU ops 35% 1
Loads 25% 2
5|Page
Stores 15% 2
Branches 25% 3
Solution:
7) An unpipelined processor takes 6 ns to work on one instruction. The pipelined version of the
processor has 6 stages with the following lengths: 1.0ns; 0.8ns; 0.4ns; 1.2ns; 1.3ns; 1.3ns. It then
takes 0.3 ns to latch its results into latches. Answer the following, assuming that there are no stalls
in the pipeline.
1. What are the cycle times in both processors?
2. How long does it take (in nano-seconds ) to finish one instruction in both processors? (Note :
Ignore the initial fill time in the pipelined processor)
3. What is the speedup achieved by the 6 stage pipeline with respect to unpipelined processor?
4) How long does it take (in nano-seconds) to finish 1000 instructions in both processors? (Note :
Do not ignore the initial fill time in the pipelined processor)
Ans.
1) T_unpipelined = 6ns
T_pipelined = 1.3+0.3=1.6ns
2) exec. Time per instr_unpipelined= 6ns
exec. Time per instr_pipelined= 1.6ns
3) speedup=6/1.6=3.75
4) exec. Time for 1000 instr_unpipelined= 1000x6=6000ns
exec. Time for 1000 instr_pipelined= (6+999)*1.6=1608ns
8) A 400 MHz processor was used to execute a benchmark program with the following
instruction mix and clock cycle counts?
Instruction type Instruction counts Clock cycle counts
Integer Arithmetic 45000 1
Data transfer 32000 2
Floating point 15000 2
Control transfer 8000 2
Determine the effective CPI, MIPS rate and execution time for this program?
6
NMAM INSTITUTE OF TECHNOLOGY
(An Autonomous Institution affiliated to VTU, Belgaum)
(AICTE, approved, NBA Accredited, ISO 9001:2008 Certified)
Nitte – 574110, Karkala, Udupi District, Karnataka, India.
Department of Computer Science and Engineering
9) Consider a branch that is taken 80% of the time. On average, how many stalls are introduced for
this branch for each approach below:
i) Stall fetch until branch outcome is known
ii) Assume not-taken and squash if the branch is taken
iii) Assume a branch delay slot:
a. No instruction is found to put in the delay slot
b. An instruction before the branch is put in the delay slot
c. An instruction from the taken side is put in the delay slot
d. An instruction from the not-taken side is put in the slot
e.
Solotion:
i) Stall fetch until branch outcome is known – 1
ii) Assume not-taken and squash if the branch is taken – 0.8
iii) Assume a branch delay slot
a. You can’t find anything to put in the delay slot – 1
b. An instr before the branch is put in the delay slot – 0
c. An instr from the taken side is put in the slot – 0.2
d. An instr from the not-taken side is put in the slot – 0.8
10) Analyse the data dependence among the following statements in a given program?
Where (Ri) means the content of register Ri and memory (10) contains 64 initially
7|Page
12) For the following reservation table of a nonlinear pipeline find the minimal average latency
(MAL) for a collision free scheduling and calculate the efficiency of the pipeline.
1 2 3 4 5 6 7 8
S1 X x
S2 x X x
S3 x X x
13) For the following reservation table of a nonlinear pipeline find the minimal average latency
(MAL) for a collision free scheduling.
1 2 3 4 5
S1 x
S2 x x
S3 x x
14) Consider the following pipeline reservation table. Find the minimal average latency (MAL) for
a collision free scheduling.
1 2 3 4 5
S1 x x
S2 x
S3 x x