0% found this document useful (0 votes)
53 views53 pages

Performance Measures For Computers

The document discusses performance evaluation of computer systems and describes key parameters like execution time, throughput, and speedup. It explains that performance depends on factors like technology, instruction set architecture, organization, and software. Quantitative principles for comparing the performance of different systems using metrics like clock rate, clock cycles, and instructions per clock cycle are also presented.

Uploaded by

King Of Luck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views53 pages

Performance Measures For Computers

The document discusses performance evaluation of computer systems and describes key parameters like execution time, throughput, and speedup. It explains that performance depends on factors like technology, instruction set architecture, organization, and software. Quantitative principles for comparing the performance of different systems using metrics like clock rate, clock cycles, and instructions per clock cycle are also presented.

Uploaded by

King Of Luck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

Module 2

Performance Evaluation of Computer


Systems 1
Content
• Evaluation Parameters
• Quantitative Principals: Amdahl’s Law

2
A Computer System

Computer systems consist of:


– Processor Memory

– Memory instruction data

– Input/Output Input
unit
Processor
Output
unit

– Operating system
– Network
P P P P

Network

3
Performance Factors

Performance depends on:


– Technology
– Instruction Set Architecture
– Organization
– Software

4
Technology

• In recent years, microprocessors have become


smaller and denser.

5
Technology

1945 2020
Computer ENIAC Laptop
Devices 18 000 17 000 000 000
Weight (kg) 27 200 1.5
Size (m3) 68 0.0018
Power (watts) 20 000 5.5
Cost ($) 4 630 000 1 000
Memory (bytes) 200 160 000 000 000
Performance 800 3 000 000 000
(Flops/s)

6
Performance Units
Speed
1 Mflop/s 1 Megaflop/s 106 Flop/second
1 Gflop/s 1 Gigaflop/s 109 Flop/second
1 Tflop/s 1 Teraflop/s 1012 Flop/second
1 Pflop/s 1 Petaflop/s 1015 Flop/second
1 Eflop/s 1 Exaflop/s 1018 Flop/second

Storage
1 MB 1 Megabyte 106 Bytes
1 GB 1 Gigabyte 109 Bytes
1 TB 1 Terabyte 1012 Bytes
1 PB 1 Petabyte 1015 Bytes

7
Performance Evaluation of
Computer Systems

Performance depends on:


– Technology
– Instruction Set Architecture
– Organization
– Software

8
Instruction Set Architecture-ISA
Instruction Set Design:

• RISC / CISC
– Code density

• Number of operands
– Stack machines (0-operand)
– Accumulator machines (1-operand)
– Register machines (2-operand, 3-operand)

9
Performance Evaluation of
Computer Systems

Performance depends on:


– Technology
– Instruction Set Architecture
– Organization
– Software

11
Organization

Memory Hierarchy CPU


Registers
L1 Cache
Hierarchy Speed Size
Within the processor 1 ns Byte L2 Cache
(CPU-registers-on chip cache)
L2 cache (SRAM) 10 ns KByte Main Memory
Main Memory (DRAM) 100 ns MByte
Secondary storage (Disk) 10 ms Gbyte Disk

Tertiary Storage (Tape/Disk) 10 s TByte


Tape

12
Organization

Multicore Chips
Single-core Dual-core
CPU CPU CPU
Registers Registers Registers
L1 Cache L1 Cache L1 Cache

L2 Cache L2 Cache

Main Memory Main Memory

13
Performance Evaluation of
Computer Systems

Performance depends on:


– Technology
– Instruction Set Architecture
– Organization
– Software

14
Performance Evaluation of Computer Systems

EVALUATION PARAMETERS [3]

15
The Questions are..
– How to compare performance of two different machines?

– What factors affect performance?

– How to improve performance?

16
Defining Performance [ 4]
• If you were running a program on two different desktop computers, you’d
say that the faster one is the desktop computer that gets the job done first.

• If you were running a datacenter that had several servers running jobs
submitted by many users, you’d say that the faster computer was the one that
completed the most jobs during a day.

• As an individual computer user, you are interested in reducing response time


—the time between the start and completion of a task—also referred to as
execution time.

• Datacenter managers are often interested in increasing throughput or


bandwidth—the total amount of work done in a given time.

• Hence, in most cases, we will need different performance metrics

17
• Response Time
– the time between the start and completion of a task also referred to as
execution time.
• Throughput
– The total amount of work done in a given time.

• Decreasing response time almost always improves throughput.

18
• To maximize performance, we want to minimize response time or execution
time for some task. Thus, we can relate performance and execution time for a
computer X:

• PerformanceX = 1/ Execution timeX

• This means that for two computers X and Y, if the performance of X is greater
than the performance of Y, we have

• Performancex > Performancey

• 1/ Execution timex > 1/Execution timey

• Execution timeY > Execution timeX


• That is, the execution time on Y is longer than that on X, if X is faster than Y.
19
• we oft en want to relate the performance of two different computers quantitatively.
We will use the phrase “X is n times faster than Y”—or equivalently “X is n times
as fast as Y”—to mean

• Performancex / Performancey = n

• If X is n times as fast as Y, then the execution time on Y is n times as long as it is


• on X:

• Performancex / Performancey = Execution timey/ Execution timex=n = Speedup

20
• If computer A runs a program in 10 seconds and computer B runs the same program in
15 seconds, how much faster is A than B?
• We know that A is n times as fast as B if

• PerformanceA / PerformanceB = Execution timeB/ Execution timeA=n

• Thus the performance ratio is

• 15 / 10 = 1.5

• A is therefore 1.5 times as fast as B.


• In the above example, we could also say that computer B is 1.5 times slower than
• computer A, since
• PerformanceA / PerformanceB = 1.5

• PerformanceA = 1.5 * Performance B

21
CPU Performance Its Factors[4]
Clock cycle

Execution time depends on Clock Cycle time


cycle 1 cycle 2 cycle 3

CPU execution time = CPU clock cycles for a program * Clock cycle Time
• Clock Cycle time= 1 / Clock rate,
• So,
CPU execution time for a program = CPU clock cycles for a program /Clock rate
Example :
Our favorite program runs in 10 seconds on computer A, which has a 2 GHz
clock. We are trying to help a computer designer build a computer, B, which
will run this program in 6 seconds. The designer has determined that a
substantial increase in the clock rate is possible, but this increase will affect the
rest of the CPU design, causing computer B to require 1.2 times as many clock
cycles as computer A for this program. What clock rate should we tell the
designer to target?
22
Solution

• Clock cycleA = Execution TimeA * Clock Rate

• = 10 sec* 2*10^9 cycle/ sec =20*10^9 cycle

• Clock CycleB= 1.2 * Clock cycleA

• = 24 * 10^9 cycle
• Clock RateB = Clock CycleB / Execution timeB

• = 24 * 10^9 cycle / 6 sec

• = 4 *10^9 cycle / sec = 4GHz

23
The CPU clock cycle required for a program can be define as,

CPU clock cycles for a program= Instructions for a program * Average


clock cycles per instruction

Average Clock cycle per Instruction-


Defined as average number of clock taken by each instruction. Which is
also known as CPI.

• Different instructions may take different times for execution , the CPI is
the average of all instructions executed in the program.

24
Example

• Suppose we have two implementations of the same instruction set


architecture. Computer A has a clock cycle time of 250ps and a CPI of 2.0
for some program, and computer B has a clock cycle time of 500ps and a
CPI of 1.2 for the same program. Which computer is faster for this
program and by how much?.

25
• CPU clock cyclesA = I* 2.0

• CPU clock cycleB = I* 1.2

• CUP timeA = CPU clock cycleA * Clock cycle time

• = I * 2.0 * 250 ps = 500 * I ps

• CPU time B = I * 1.2 * 500 ps = 600 * I ps

• CPU performanceA / CPU performanceB= Execution timeB / Execution timeA

• = 600 *I ps / 500* I ps =1.2

• So A is 1.2 times as fast as B .

26
Now we can write CPU execution time as,
• CPU time = Instruction count * CPI * Clock cycle time

• CPU time = ( Instruction count * CPI ) / Clock rate

27
CPU Execution Time: Example
• A Program is running on a specific machine (CPU)
with the following parameters:
– Total executed instruction count: 10,000,000
instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
What is the execution time for this program??
CPU time = Instruction count x CPI x Clock cycle
Time
= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10-9
= 0.125 seconds
T = I x CPI x C 28
Aspects of CPU Execution Time
CPU Time = Instruction count executed x CPI x Clock cycle

T = I x CPI x C Depends on:


Program
Used
Compiler
ISA

Instruction Count I
(executed)

Depends on:
Program Used Depends on:
Compiler CPI Clock CPU Organization
ISA
(Average
Cycle Technology (VLSI)
CPU Organization
CPI) C

29
Factors Affecting CPU Performance

T = I x CPI x C
Instruction Cycles per Clock Rate
Count Instruction (1/C)
Program

Compiler

Instruction Set
Architecture (ISA)

Organization
(CPU Design)

Technology
(VLSI)

T = I x CPI x C 30
Performance Comparison: Example
A Program is running on a specific machine (CPU) with the following
parameters:
– Total executed instruction count, I: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz. Thus: C = 1/(200x10 )= 5x10 seconds
6 -9

• Using the same program with these changes:


– A new compiler used: New executed instruction count, I: 9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHz
• What is the speedup with the changes?

Speedup = Old Execution Time = Iold x CPIold x Clock cycleold


New Execution Time Inew x CPInew x Clock Cyclenew
Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )
= .125 / .095 = 1.32
or 32 % faster after changes.

Clock Cycle = C = 1/ Clock Rate T = I x CPI x C 31


Instruction Types & CPI
• Given a program with n types or classes of instructions executed on a given CPU
with the following characteristics:

Ci = Count of instructions of typei


i = 1, 2, …. n
CPIi = Cycles per instruction for typei
Depends on CPU Design
Then:
CPI = CPU Clock Cycles / Instruction Count I
i.e average or effective CPI
Where: n Executed
CPU clock cycles  
i 1
CPI  C 
i i

Executed Instruction Count I = Ci

T = I x CPI x C 32
Instruction Types & CPI: An Example
• An instruction set has three instruction classes:
Instruction class CPI
A 1 For a specific
B 2 CPU design
C 3
• Two code sequences have the following instruction
counts:
Instruction counts for instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1

• CPU cycles for sequence 1 = 2 x 1 + 1 x 2 + 2 x 3 = 10 cycles


CPI for sequence 1 = clock cycles / instruction count
i.e average or effective CPI
= 10 /5 = 2
• CPU cycles for sequence 2 = 4 x 1 + 1 x 2 + 1 x 3 = 9 cycles
CPI for sequence 2 = 9 / 6 = 1.5

n
CPU clock cycles  
i 1
CPI  C 
i i
CPI = CPU Cycles / I 33
Instruction Frequency & CPI
• Given a program with n types or classes of instructions
with the following characteristics:
• Ci = Count of instructions of typei executed i = 1, 2, …. n

• CPIi = Average cycles per instruction of typei


• Fi = Frequency or fraction of instruction typei executed

• = Ci/ total executed instruction


Where:count
Executed=Instruction
Ci/ I Count I = Ci

• Then: n
CPI   CPI
i 1
i
 F i

i.e average or effective CPI

CPIi x Fi
Fraction of total execution time for instructions of type i =
CPI

T = I x CPI x C 34
Instruction Type Frequency & CPI:
A RISC Example
CPIi x Fi
Program Profile or Executed Instructions Mix
CPI
Base Machine (Reg / Reg) Depends on CPU Design

Op Freq, Fi CPIi CPIi x Fi % Time


ALU 50% 1 .5 23% = .5/2.2
Given
Load 20% 5 1.0 45% = 1/2.2
Store 10% 3 .3 14% = .3/2.2
Branch 20% 2 .4 18% = .4/2.2

Typical Mix
Sum = 2.2
n
CPI   CPI i  F i 
i.e average or effective CPI i 1

CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
= .5 + 1 + .3 + .4
T = I x CPI x C 35
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating
• For a specific program running on a specific CPU the MIPS rating is a measure of
how many millions of instructions are executed per second:
MIPS Rating = Instruction count / (Execution Time x 10 6)
= Instruction count / (CPU clocks x Cycle time x 10 6)
= (Instruction count x Clock rate) / (Instruction count x CPI x 10 6)
= Clock rate / (CPI x 106)
• Major problem with MIPS rating: As shown above the MIPS rating does not account for the
count of instructions executed (I).
– A higher MIPS rating in many cases may not mean higher performance or better
execution time. i.e. due to compiler design variations.
• In addition the MIPS rating:
– Does not account for the instruction set architecture (ISA) used.
• Thus it cannot be used to compare computers/CPUs with different instruction sets.
– Easy to abuse: Program used to get the MIPS rating is often omitted.
• Often the Peak MIPS rating is provided for a given CPU which is obtained using a
program comprised entirely of instructions with the lowest CPI for the given CPU
design which does not represent real programs.

T = I x CPI x C

36
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating

• Under what conditions can the MIPS rating be


used to compare performance of different
CPUs?
• The MIPS rating is only valid to compare the performance of different CPUs
provided that the following conditions are satisfied:
1 The same program is used
(actually this applies to all performance metrics)
2 The same ISA is used
3 The same compiler is used

 (Thus the resulting programs used to run on the CPUs


and obtain the MIPS rating are identical at the machine
code level including the same instruction count)
(binary)

37
Compiler Variations, MIPS & Performance:
An Example
• For a machine (CPU) with instruction classes:
Instruction class CPI
A 1
B 2
C 3

• For a given high-level language program, two compilers produced


the following executed instruction counts:
Instruction counts (in millions)
for each instruction class
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

• The machine is assumed to run at a clock rate of 100 MHz.

38
Compiler Variations, MIPS & Performance:
An Example (Continued)
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)

CPI = CPU executionncycles / Instructions count


CPU clock cycles  
i 1
CPI  C 
i i

CPU time = Instruction count x CPI / Clock rate


• For compiler 1:
– CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.428
– MIPS Rating1 = 100 / (1.428 x 106) = 70.0 MIPS
– CPU time1 = ((5 + 1 + 1) x 106 x 1.428) / (100 x 106) = 0.10 seconds
• For compiler 2:
– CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25
– MIPS Rating2 = 100 / (1.25 x 106) = 80.0 MIPS
– CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds

MIPS rating indicates that compiler 2 is better


39
while in reality the code produced by compiler 1 is faster
Computer Performance Measures :
MFLOPS (Million FLOating-Point Operations Per Second)
• A floating-point operation is an addition, subtraction, multiplication, or division
operation applied to numbers represented by a single or a double precision floating-
point representation.
• MFLOPS, for a specific program running on a specific computer, is a measure of
millions of floating point-operation (megaflops) per second:
MFLOPS = Number of floating-point operations / (Execution time x 10 6 )
• MFLOPS rating is a better comparison measure between different machines (applies
even if ISAs are different) than the MIPS rating.
– Applicable even if ISAs are different
• Program-dependent: Different programs have different percentages of floating-point
operations present. i.e compilers have no floating- point operations and yield a
MFLOPS rating of zero.
• Dependent on the type of floating-point operations present in the program.
– Peak MFLOPS rating for a CPU: Obtained using a program comprised entirely
of the simplest floating point instructions (with the lowest CPI) for the given
CPU design which does not represent real floating point programs.

40
Quantitative Principles
of Computer Design
• Amdahl’s Law:
The performance gain from improving some
portion of a computer is calculated by:
i.e using some enhancement

Speedup = Performance for entire task using the enhancement


Performance for the entire task without using the enhancement

or Speedup = Execution time without the enhancement


Execution time for entire task using the enhancement

Here: Task = Program Recall: Performance = 1 /Execution Time

41
Performance Enhancement Calculations:
Amdahl's Law
• Amdahl’s Law:
Execution Time without enhancement
Speedup(E) =
Execution Time with enhancement
• Suppose a program contains s portion of code which requires sequential
execution and and f portion of code can run in parallel fashion.

• We can write as s=1-f

• Suppose T be the execution time of program using single processor


• So, T= T(1-f)+ Tf

• If n processor will be used,

• So, T= T(1-f)+ Tf / N 1
T( 1-f)  Tf
f
Speedup = Tf = (1  f ) 
T( 1-f)  N
N 42
Pictorial Depiction of Amdahl’s Law

Illustration of Amdahl’s Law [5]


43
Amdahl’s Law for Multiprocessors [5]
44
Example
• Suppose that a task makes extensive use of floating-point
operations, with 40% of the time consumed by floating-point
operations. With a new hardware design, the floating-point
module is speed up by a factor of 4. Find overall Speedup?

• Solution: Floating point operation= 0.4


• Non floating point operation = 1-0.4=0.6
• Speedup = 1/ (0.6+0.4/4) =1.42

45
Performance Enhancement Example
• For the RISC machine with the following instruction mix given earlier:
OP Freq Cycles CPI(i) % Time
CPIi x Fi
ALU 50% 1 .5 23%
Load 20% 5 1.0 45% CPI
Store 10% 3 .3 14%
Branch 20% 2 .4 18% CPI = 2.2

• If a CPU design enhancement improves the CPI of load instructions from 5


to 2, what is the resulting performance improvement from this
enhancement:
Fraction enhanced = F = 45% or .45
Unaffected fraction = 1- F = 100% - 45% = 55% or .55
Factor of enhancement = S = 5/2 = 2.5
Using Amdahl’s Law:
1 1
Speedup(E) = ------------------ = --------------------- = 1.37
(1 - F) + F/S .55 + .45/2.5

46
An Alternative Solution Using CPU Equation
Op Freq Cycles CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 CPI = 2.2
45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
• If a CPU design enhancement improves the CPI of load instructions from 5
to 2, what is the resulting performance improvement from this
enhancement:
New CPI of load is now 2 instead of 5
Old CPI = 2.2
New CPI = .5 x 1 + .2 x 2 + .1 x 3 + .2 x 2 = 1.6
Original Execution Time Instruction count x old CPI x clock cycle
Speedup(E) = ----------------------------------- = ----------------------------------------------------------------
New Execution Time Instruction count x new CPI x clock cycle

old CPI 2.2


= ------------ = --------- = 1.37
new CPI 1.6
Which is the same speedup obtained from Amdahl’s Law in the first solution.

T = I x CPI x C 47
Performance Enhancement Example
• A program runs in 100 seconds on a machine with multiply operations
responsible for 80 seconds of this time. By how much must the speed
of multiplication be improved to make the program four times faster?

1 1 1
Speedup(E) = ------------------ = 4 = ----------------- = ---------------
(1 - F) + F/S (1 - .8) + .8/S .2 + .8/s

Solving for S gives S= 16

Hence multiplication should be 16 times


faster to get an overall speedup of 4.

48
Machine = CPU
Performance Enhancement Example
• For the previous example with a program running in 100 seconds on a
machine with multiply operations responsible for 80 seconds of this
time. By how much must the speed of multiplication be improved to
make the program five times faster?

Speedup= 1 / (0.2 + (0.8/K))


5=1 / (0.2 + (0.8/K))
K+4 =K
4=0

No amount of multiplication speed improvement can achieve this.

49
Extending Amdahl's Law To Multiple Enhancements
n enhancements each affecting a different portion of execution time

• Suppose that enhancement Ei accelerates a fraction Fi of the


original execution time by a factor Si and the remainder of the time
is unaffected then: i = 1, 2, …. n

Original Execution Time


Speedup 
((1   F )   F ) XOriginal Execution Time
i
i i i
S i
Unaffected fraction
1
Speedup 
((1   F )   F )
What if the fractions given are
after the enhancements were applied?
i How would you solve the problem?
(i.e find expression for speedup)
i i i
S i

Note: All fractions Fi refer to original execution time before the


enhancements are applied.
50
Amdahl's Law With Multiple Enhancements:
Example
• Three CPU performance enhancements are proposed with the following speedups and
percentage of the code execution time affected:
Speedup1 = S1 = 10 Percentage1 = F1 = 20%
Speedup2 = S2 = 15 Percentage1 = F2 = 15%
Speedup3 = S3 = 30 Percentage1 = F3 = 10%
• While all three enhancements are in place in the new design, each enhancement
affects a different portion of the code and only one enhancement can be used at a time.
• What is the resulting overall speedup?

1
Speedup 
((1   F )   F ) i
i i i
S i

• Speedup = 1 / [(1 - .2 - .15 - .1) + .2/10 + .15/15 + .1/30)]


= 1 / [ .55 + .0333 ]
= 1 / .5833 = 1.71

51
Pictorial Depiction of Example
i.e normalized to 1
Before:
Execution Time with no enhancements: 1
S1 = 10 S2 = 15 S3 = 30

Unaffected, fraction: .55 F1 = .2 F2 = .15 F3 = .1

/ 10 / 15 / 30

Unchanged

Unaffected, fraction: .55

After:
Execution Time with enhancements: .55 + .02 + .01 + .00333 = .5833
What if the fractions given are
Speedup = 1 / .5833 = 1.71 after the enhancements were applied?
How would you solve the problem?

Note: All fractions Fi refer to original execution time.

52
That’s all for now…

53
References
• [1]. Cramming more components onto integrated circuits, Reprinted from
Electronics, volume 38, number 8, April 19, 1965, pp.114 ff.
• [2]. https://www.top500.org/lists/top500/2020/06/highs
• [3]. M Morris R. Mano, “Computer System Architecture”, 3rd edition, Pearson
Education
• [4] Computer Organization and Design THE HARDWARE /
SOFTWAREINTERFACE David A. Patterson University of California, Berkeley
John L. Hennessy Stanford University
• [5] Computer Organization and Architecture Designing for Performance Tenth
Edition William Stallings

54

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy