0% found this document useful (0 votes)
2 views34 pages

Chapter_4

The document discusses performance metrics in computing, focusing on response time, throughput, execution time, and CPU time. It explains how to calculate performance using clock cycles, instruction counts, and cycles per instruction (CPI), along with examples comparing different computer architectures. Additionally, it introduces Amdahl's Law, which illustrates the limitations of performance enhancements based on the utilization of improved features.

Uploaded by

popculturefan27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views34 pages

Chapter_4

The document discusses performance metrics in computing, focusing on response time, throughput, execution time, and CPU time. It explains how to calculate performance using clock cycles, instruction counts, and cycles per instruction (CPI), along with examples comparing different computer architectures. Additionally, it introduces Amdahl's Law, which illustrates the limitations of performance enhancements based on the utilization of improved features.

Uploaded by

popculturefan27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Unit 1.

Performance

ECE369
1
Defining Performance

• Which airplane has the best performance?

ECE369
2
Response Time and Throughput

• Response time
– How long it takes to do a task
• Throughput
– Total work done per unit time
• e.g., tasks/transactions/… per hour
• How are response time and throughput
affected by
– Replacing the processor with a faster version?
– Adding more processors?
• We’ll focus on response time for now…

ECE369
3
Relative Performance

• Define Performance = 1
Execution Time
• “X is n time faster than Y”

Performance X /PerformanceY
Execution time Y /Execution time X =n
 Example: time taken to run a program
 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
ECE369
4
Execution Time

• Elapsed Time
– Total response time, including all aspects of Processing,
such as I/O, OS overhead, idle time
– a useful number, but often not good for comparison
purposes
• CPU time
– Time spent processing a given job
• Discounts I/O time, other jobs’ shares
– can be broken up into system time, and user time

• Our focus: user CPU time


– time spent executing the lines of code that are "in" our
program

ECE369
5
Clock Cycles

• Instead of reporting execution time in seconds, we often use cycles

seconds cycles seconds Operation of digital hardware


= ´ governed by a constant-rate clock
program programClockcycle
period

Clock (cycles)

Data transfer
and computation
Update state

• clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)


 Clock period: duration of a clock cycle
 e.g., 250ps = 0.25ns = 250×10–12s Since 1 picoseconds = 0.001 ns

 Clock frequency (rate): cycles per second


 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
ECE369
6
CPU Time

CPU Time=CPU Clock Cycles×Clock Cycle Time


CPU Clock Cycles
=
Clock Rate
• Performance improved by
– Reducing number of clock cycles
– Increasing clock rate
– Hardware designer must often trade off
clock rate against cycle count

ECE369
7
CPU Time Example

• Computer A: 2GHz clock, 10s CPU time


• Designing Computer B
– Aim for 6s CPU time
– Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
Clock Cycles B 1.2×Clock Cycles A
Clock Rate B = =
CPU Time B 6s
Clock Cycles A =CPU Time A ×Clock Rate A
9 ; Since 1 GHz = 109 HZ
¿10s×2GHz=20×10
1.2×20×10 9 24×109
Clock Rate B = = =4GHz
6s 6s
ECE369
8
Instruction Count and CPI

Clock Cycles=Instruction Count×Cycles per Instruction


CPU Time=Instruction Count×CPI×Clock Cycle Time
Instruction Count×CPI
=
Clock Rate
• Instruction Count for a program
– Determined by program, ISA and compiler
• Average cycles per instruction
– Determined by CPU hardware
– If different instructions have different CPI
• Average CPI affected by instruction mix

ECE369
9
CPI Example

• Computer A: Cycle Time = 250ps, CPI = 2.0


• Computer B: Cycle Time = 500ps, CPI = 1.2
• Same ISA
• Which is faster, and by how much?
CPU Time A =Instruction Count×CPI A ×Cycle TimeA
=I×2. 0×250ps=I×500ps A is faster…
CPU TimeB =Instruction Count×CPI B×Cycle TimeB
=I ×1.2×500ps=I ×600ps
CPU Time B I ×600ps
= =1.2 …by this much
CPU Time A I ×500ps
ECE369
10
CPI in More Detail

• If different instruction classes take


different numbers of cycles
n
Clock Cycles=∑ ( CPIi ×Instruction Count i )
i=1

 Weighted average CPI


n
CPI=
Clock Cycles
=∑ CPI i×
Instruction Count i=1 (Instruction Count i
Instruction Count )
Relative frequency
ECE369
11
CPI Example

• Alternative compiled code sequences using


instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

 Sequence 1: IC = 5  Sequence 2: IC = 6
 Clock Cycles  Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5
ECE369
12
Performance Summary

Instructions Clock cycles Seconds


CPU Time= × ×
Program Instruction Clock cycle

• Performance depends on
– Algorithm: affects IC, possibly CPI
– Programming language: affects IC, CPI
– Compiler: affects IC, CPI
– Instruction set architecture: affects IC, CPI, Tc

ECE369
13
Component Analysis

ECE369
14
Example

• Our favorite program runs in 10 seconds on computer A, which has a


4 GHz. clock. We are trying to help a computer designer build a new
machine B, that will run this program in 6 seconds. The designer can use
new (or perhaps more expensive) technology to substantially increase the
clock rate, but has informed us that this increase will affect the rest of the
CPU design, causing machine B to require 1.2 times as many clock cycles as
machine A for the same program. What clock rate should we tell the
designer to target?"

• Don't Panic, can easily work this out from basic principles

ECE369
15
Example

seconds cycles seconds


• Our favorite program runs in 10 = ´
seconds on computer A, which has
a 4 GHz. clock. We are trying to program program cycle
help a computer designer build a
new machine B, that will run this
program in 6 seconds. The
designer can use new (or perhaps
more expensive) technology to
substantially increase the clock
rate, but has informed us that this
increase will affect the rest of the
CPU design, causing machine B to
require 1.2 times as many clock
cycles as machine A for the same
program. What clock rate should
we tell the designer to target?"

ECE369
16
CPI Example ( Repeated problem )
• Suppose we have two
implementations of the same seconds cycles seconds
instruction set = ´
architecture (ISA). program program cycle
For some program,

Machine A has a clock cycle time


of 250 ps and a CPI of 2.0
Machine B has a clock cycle time
of 500 ps and a CPI of 1.2

What machine is faster for this


program, and by how much?

ECE369
17
Let’s Complicate Things A Little bit… ( Repeated problem )

Which sequence will be faster? How much?


A compiler designer is trying to
decide between two code
sequences for a particular seconds cycles
= ×
seconds
machine. Based on the hardware program program cycle
implementation, there are three
different classes of instructions:
Class A, Class B, and Class C, and
they require one, two, and three
cycles (respectively).

What is the CPI for each sequence?

The first code sequence has 5


instructions: 2 of A, 1 of B, and 2 of C

The second sequence has 6 instructions:


4 of A, 1 of B, and 1 of C.

ECE369
18
Scary Stuff ( New problem )

Op Frequency Cycle Count

ALU 43% 1
Load 21% 1
Store 12% 2
Branch 24% 2

Let’s say we were able to reduce the cycle count for


“Store” operations to 1 with a cost of slowing our
clock by15%. Is this new design feasible?

( )
n
∑ CPI i×IC i n
CPI original=
i =1
Instruction Count
=∑ CPI i ×
i=1
( IC i
InstructionCount )
ECE369
19
Example(Contd.)

Old CPI = 0.43 + 0.21 + 0.12x2 + 0.24x2 = 1.38

New CPI = 0.43 + 0.21 + 0.12 + 0.24x2 = 1.24

Speed up = old time/new time


= (ICx oldCPI x T)/(IC x newCPI x 1.15T)
= 1.38 / ( 1.24 * 1.15 ) = 1.38 / 1.426
=0.97

so, don't make this change.

ECE369
20
Practice problems

ECE369
21
What is MIPS?

• Instruction execution rate => higher is better


• Issues:
– Can not compare processors with different instruction sets
– Varies between programs on the same processor
– Can vary inversely with the performance… ?

ECE369
22
MIPS Example

ECE369
23
Performance Measurement Overview
CPUtime=CPUclock cycles for the
× Clock Cycle Time
pogram

CPUclock cycles for the


CPUtime= pogram

Clock Rate

CPUclock cycles for the


CPI = pogram

IC

CPUtime= IC×CPI ×Clock Cycle Time

IC ×CPI
CPUtime=
Clock Rate

Seconds Instructions ClcokCycles Seconds


CPUtime= = × ×
Pr ogram Pr ogram Instruction ClockCycle

ECE369
24
Performance Measurement Overview

n
CPU clock cycles for
=∑ CPI i× IC i
the
program i=1

(∑ )
n
CPUtime= CPI i ×IC i ×Clock Cycle Time
i=1

(∑ )
n
CPI i ×IC i n
overall CPI =
i=1
Instruction Count
=∑ CPI i ×
i=1
(
IC i
Instruction Count )

ECE369
25
Exercise

Suppose we have made the following measurements:


• Frequency of FP operations = 25%
• Average CPI of FP operations = 4.0
• Average CPI of other instructions = 1.33
• Frequency of FPSQR = 2%
• CPI of FPSQR = 20

Assume that the two design alternatives are:


a) to reduce the CPI of FPSQR to 2 or
b) to reduce the average CPI of all FP operations to 2.

Compare these alternatives.

ECE369
26
Solution

(∑ )
n
CPI i×IC i n
CPI original=
i =1
Instruction Count
=∑ CPI i ×
i=1
( IC i
InstructionCount )
=4×25 +1 . 33×75 =2 . 0

CPI Saved onFPSQR


=2 ×( CPI oldFPSQR−CPI newFPSQR )=2 ×( 20−2)=0.36

CPI overall for new


=CPI original −CPI Saved onFPSQR
=2−0 . 36=1 . 64
FPSQR

CPI overall for new


=75 ×1 . 33 +25 ×2 . 0=1 . 5
FP

CPUTimeoriginal IC×ClockCycle×CPI original CPI original 2.00


SpeedupFP= = = = =1.33
CPUTime new IC×ClockCycle×CPI new CPI new 1.5
ECE369
27
Amdahl's Law

ECE369
28
Amdahl's Law

• The performance enhancement of an improvement is limited by how


much the improved feature is used. In other words: Don’t expect an
enhancement proportional to how much you enhanced something.

• Example:
"Suppose a program runs in 100 seconds on a machine, with
multiply operations responsible for 80 seconds of this time. How much
do we have to improve the speed of multiplication if we want the
program to run 4 times faster?"
How about making it 5 times faster?

ECE369
29
Amdahl’s Law

1. Speed up = 4
2. Old execution time = 100
3. New execution time = 100/4 = 25
4. If 80 seconds is used by the affected part =>
5. Unaffected part = 100-80 = 20 sec
6. Execution time new = Execution time unaffected +
Execution time affected / Improvement

7. 25= 20 + 80/Improvement

8. Improvement = 16

ECE369
30
Example: Speed up using parallel processors

Suppose an application is “almost all” parallel: 90%.


What is the speedup using 10, 100, and 1000 processors?

new time = old time * 10% + ( old time * 90% ) / 10

Speed up (P=10) = old time / new time

Speedup (P=10) = 5.3

Speedup (P = 100 ) = 9.1

Speedup ( P = 1000 ) = 9.9


ECE369
31
Amdahl’s Law Overview

ECE369
32
Example

• Suppose we are considering an enhancement that


runs 10times faster than the original machine but is
only usable 40% of the time. What is the overall
speedup gained by incorporating the enhancement?

1
Speedup= ≃1 . 56
0 .4
+0 . 6
10

ECE369
33
Example

• Implementations of floating point square root vary significantly in


performance. Suppose FP square root (FPSQR) is responsible for
20% of the execution time of a critical benchmark on am machine,
One proposal is to add FPSQR hardware that will speed up this
operation by a factor of 10. The other alternative is just to try to make
all FP instructions run faster; FP instructions are responsible for a
total of 50% of the execution time. The design team believes that they
can make all FP instructions run two times faster with the same effort
as required for the fast square root. Compare those two design
alternatives.
1
SpeedupFPSQR= ≃1.22
0.2
+( 1−0.2)
10
1
SpeedupFP= ≃1 . 33
0.5
+(1−0 . 5 )
2
ECE369
34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy