0% found this document useful (0 votes)
131 views

CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala

This document contains a student's responses to questions about pipelines and dependencies in assembly language code. The student lists all the data and control dependencies in a sample program. They also calculate speedup between single-cycle and pipelined processors for different stall rates. Finally, the student analyzes the latencies of processor stages and proposes pipeline implementations to minimize clock cycle time.

Uploaded by

Black Reaper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views

CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala

This document contains a student's responses to questions about pipelines and dependencies in assembly language code. The student lists all the data and control dependencies in a sample program. They also calculate speedup between single-cycle and pipelined processors for different stall rates. Finally, the student analyzes the latencies of processor stages and proposes pipeline implementations to minimize clock cycle time.

Uploaded by

Black Reaper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CSN-221

Pipelines-Quiz
Enrollment No.: 18114031
Name – Hemil Panchiwala

Question 01 . Consider the following assembly language program.


I1: MOV R3, R7
I2: LD R8, [R3]
I3: ADD R3, R3, 4
I4: LOAD R9, [R3]
I5: BNE R8, R9, I3
List all the dependencies in this code.
Answer:
Data Dependencies:
1) l1 – l2 : Read After Write for R3
2) l1 – l3 : Read After Write for R3
3) l1 – l4 : Read After Write for R3
4) l2 – l3 : Write After Read and Read After Read for R3
5) l2 – l4 : Read After Read for R3
6) l2 – l5 : Read After Write for R9
7) l3 – l4 : Read After Write for R3
8) l4 – l5 : Read After Write for R9
Control Dependencies:
1) l5 : conditional branch dependency for next instruction to be executed
Question 02 . We have a single stage, no pipelined machine, and a pipelined machine with 5-stages.
The cycle time for the former is 5 ns and the latter is 1 ns.
a. Assume no stalls, what is the speedup of the pipelined machine over the single
staged machine?
b. Given the pipeline stalls 1 cycle for 40% of the instructions, what is the speedup
now?

1
Answer:
a) Speedup = CPU time for single stage/ CPU time for pipelined machine = 5
b) Cycles per instruction for pipelined = (CPI)normal + (% of instructions) x penalty

= 1 + (0.4) x 1 = 1.4
Speedup = 5/1.4 = 3.57

Question 03 . Use the following code fragment.


I1: Loop: LD R1, 0[R2]
I2: DADDI R1, R1, 1
I3: SD 0[R2], R1
I4: DADDI R2, R2, 4
I5: DSUB R4, R3, R2
I6: BNEZ R4, Loop
a. List all the True RAW data dependencies.
b. Show the timing of this instruction sequence for a 5-stage pipeline along with
the number of cycles required to execute one iteration of the loop with no
forwarding.
c. Show the timing of this instruction sequence for a 5-stage pipeline along with
the number of cycles required to execute one iteration of the loop with forwarding.
Assume registers can be written and read in the same cycle, during write back.
(The number of cycles for the execution of one iteration of the loop ends after the
A (ALU) stage of BNEZ instruction.)
Answer:

a) 1) l1 – l2 : RAW for R1
2) l2 – l3 : RAW for R1
3) l4 – l5 : RAW for R2
4) l5-l6 : RAW for R4

b) Here the number indicates the instruction number

2
Cycle Fetch Decode Execute Memory Write
1 1
2 2 1
3 2 1
4 2 1
5 3 2 1
6 3 2
7 3 2
8 4 3 2
9 5 4 3
10 5 4 3
11 5 4 3
12 6 5 4
13 6 5
14 6 5
15 6 5
16 6

Therefore, it takes 16 cycles to finish this sequence properly using a pipeline with no forwarding.
c)

3
Cycle Fetch Decode Execute Memory Write
1 1
2 2 1
3 3 2 1
4 3 2 1
5 4 3 2 1
6 5 4 3 2
7 6 5 4 3 2
8 6 5 4 3
9 6 5 4

Therefore, it takes 9 cycles to finish this sequence using a pipeline with forwarding.
Here, due to forwarding, value obtained after execution goes straight to the execute stage of the
next instruction if needed. Special case of load-use still needs stalling.

Question 04 . Individual stages of a processor have the following latencies.

F D A M W
210 ps 90 ps 110 ps 240 ps 50 ps

If the processor is pipelined, each pipeline latch adds a latency of 20 ps to the


stage that precedes it – this is so called “setup-latency” , where the signals need to
be stable at the input of the latch for some amount of time before they can be
latched correctly at the end of the cycle.
In this approach, no pipeline is used, and in each cycle one instruction is executed
from start (F) to finish (W).
a. What is the clock cycle time if we implement this processor using single-
cycle approach (in ps)?
b. What is the clock cycle time if we implement this processor using a 5-stage
pipeline (in ps)?
c. What is the speedup of the pipelined processor over a single-cycle processor

4
if the single cycle processor has a CPI of 1 and the pipelined processor
achieves a CPI of 1.2?
d. If the processor must be implemented with a 3-stage pipeline, some of the
existing 5-stages must be combined (assume that the existing 5-stages can
not be split). Which of the existing five stages (F, D, A, M, W) should be
placed into which stage of the 3-stage pipeline to minimize the resulting
clock cycle time?
Stage-1:
Stage-2:
Stage-3:
e. If the processor is to be implemented with a 6-stage pipeline, but the
design effort and time to market are such that there is only enough time to
split one of the five existing (F, D, A, M, W) stages into two new stages,
which stage would you choose to split?

Answer:
a)
If we implement using a single cycle approach :
Cycle time =
Sum of individual latencies of the stages
= 210 + 90 + 110 + 240 + 50 = 700 ps

b)
If we implement this using a 5 – stage pipeline :
Cycle time = max(individual stage time + 20ps)
= max(210,90,110,240,50) + 20
= 240 + 20 = 260 ps
c)
Speedup = (CPU time for single cycle)/(CPU time for pipelined processor)
= (CPI x cycle time)single cycle / (CPI x cycle time)pipelined

= (1 x 700) / (1.2 x 260)

5
= 2.2436
d)
Stage 1 : F
Stage 2 : D + A
Stage 3 : M + W
Cycle time = max(210,200,290) + 20
= 290 +20 = 310 ps
e)
Splitting stage M would be wisest as that is the stage with maximum latency , so dividing it
would reduce the time of the resultant stages so that the clock cycle time reduces and hence
performance increases.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy