Performance Measures For Computers
Performance Measures For Computers
2
A Computer System
– Input/Output Input
unit
Processor
Output
unit
– Operating system
– Network
P P P P
Network
3
Performance Factors
4
Technology
5
Technology
1945 2020
Computer ENIAC Laptop
Devices 18 000 17 000 000 000
Weight (kg) 27 200 1.5
Size (m3) 68 0.0018
Power (watts) 20 000 5.5
Cost ($) 4 630 000 1 000
Memory (bytes) 200 160 000 000 000
Performance 800 3 000 000 000
(Flops/s)
6
Performance Units
Speed
1 Mflop/s 1 Megaflop/s 106 Flop/second
1 Gflop/s 1 Gigaflop/s 109 Flop/second
1 Tflop/s 1 Teraflop/s 1012 Flop/second
1 Pflop/s 1 Petaflop/s 1015 Flop/second
1 Eflop/s 1 Exaflop/s 1018 Flop/second
Storage
1 MB 1 Megabyte 106 Bytes
1 GB 1 Gigabyte 109 Bytes
1 TB 1 Terabyte 1012 Bytes
1 PB 1 Petabyte 1015 Bytes
7
Performance Evaluation of
Computer Systems
8
Instruction Set Architecture-ISA
Instruction Set Design:
• RISC / CISC
– Code density
• Number of operands
– Stack machines (0-operand)
– Accumulator machines (1-operand)
– Register machines (2-operand, 3-operand)
9
Performance Evaluation of
Computer Systems
11
Organization
12
Organization
Multicore Chips
Single-core Dual-core
CPU CPU CPU
Registers Registers Registers
L1 Cache L1 Cache L1 Cache
L2 Cache L2 Cache
13
Performance Evaluation of
Computer Systems
14
Performance Evaluation of Computer Systems
15
The Questions are..
– How to compare performance of two different machines?
16
Defining Performance [ 4]
• If you were running a program on two different desktop computers, you’d
say that the faster one is the desktop computer that gets the job done first.
• If you were running a datacenter that had several servers running jobs
submitted by many users, you’d say that the faster computer was the one that
completed the most jobs during a day.
17
• Response Time
– the time between the start and completion of a task also referred to as
execution time.
• Throughput
– The total amount of work done in a given time.
18
• To maximize performance, we want to minimize response time or execution
time for some task. Thus, we can relate performance and execution time for a
computer X:
• This means that for two computers X and Y, if the performance of X is greater
than the performance of Y, we have
• Performancex / Performancey = n
20
• If computer A runs a program in 10 seconds and computer B runs the same program in
15 seconds, how much faster is A than B?
• We know that A is n times as fast as B if
• 15 / 10 = 1.5
21
CPU Performance Its Factors[4]
Clock cycle
CPU execution time = CPU clock cycles for a program * Clock cycle Time
• Clock Cycle time= 1 / Clock rate,
• So,
CPU execution time for a program = CPU clock cycles for a program /Clock rate
Example :
Our favorite program runs in 10 seconds on computer A, which has a 2 GHz
clock. We are trying to help a computer designer build a computer, B, which
will run this program in 6 seconds. The designer has determined that a
substantial increase in the clock rate is possible, but this increase will affect the
rest of the CPU design, causing computer B to require 1.2 times as many clock
cycles as computer A for this program. What clock rate should we tell the
designer to target?
22
Solution
• = 24 * 10^9 cycle
• Clock RateB = Clock CycleB / Execution timeB
23
The CPU clock cycle required for a program can be define as,
• Different instructions may take different times for execution , the CPI is
the average of all instructions executed in the program.
24
Example
25
• CPU clock cyclesA = I* 2.0
26
Now we can write CPU execution time as,
• CPU time = Instruction count * CPI * Clock cycle time
27
CPU Execution Time: Example
• A Program is running on a specific machine (CPU)
with the following parameters:
– Total executed instruction count: 10,000,000
instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
What is the execution time for this program??
CPU time = Instruction count x CPI x Clock cycle
Time
= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10-9
= 0.125 seconds
T = I x CPI x C 28
Aspects of CPU Execution Time
CPU Time = Instruction count executed x CPI x Clock cycle
Instruction Count I
(executed)
Depends on:
Program Used Depends on:
Compiler CPI Clock CPU Organization
ISA
(Average
Cycle Technology (VLSI)
CPU Organization
CPI) C
29
Factors Affecting CPU Performance
T = I x CPI x C
Instruction Cycles per Clock Rate
Count Instruction (1/C)
Program
Compiler
Instruction Set
Architecture (ISA)
Organization
(CPU Design)
Technology
(VLSI)
T = I x CPI x C 30
Performance Comparison: Example
A Program is running on a specific machine (CPU) with the following
parameters:
– Total executed instruction count, I: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz. Thus: C = 1/(200x10 )= 5x10 seconds
6 -9
T = I x CPI x C 32
Instruction Types & CPI: An Example
• An instruction set has three instruction classes:
Instruction class CPI
A 1 For a specific
B 2 CPU design
C 3
• Two code sequences have the following instruction
counts:
Instruction counts for instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1
n
CPU clock cycles
i 1
CPI C
i i
CPI = CPU Cycles / I 33
Instruction Frequency & CPI
• Given a program with n types or classes of instructions
with the following characteristics:
• Ci = Count of instructions of typei executed i = 1, 2, …. n
• Then: n
CPI CPI
i 1
i
F i
CPIi x Fi
Fraction of total execution time for instructions of type i =
CPI
T = I x CPI x C 34
Instruction Type Frequency & CPI:
A RISC Example
CPIi x Fi
Program Profile or Executed Instructions Mix
CPI
Base Machine (Reg / Reg) Depends on CPU Design
Typical Mix
Sum = 2.2
n
CPI CPI i F i
i.e average or effective CPI i 1
CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
= .5 + 1 + .3 + .4
T = I x CPI x C 35
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating
• For a specific program running on a specific CPU the MIPS rating is a measure of
how many millions of instructions are executed per second:
MIPS Rating = Instruction count / (Execution Time x 10 6)
= Instruction count / (CPU clocks x Cycle time x 10 6)
= (Instruction count x Clock rate) / (Instruction count x CPI x 10 6)
= Clock rate / (CPI x 106)
• Major problem with MIPS rating: As shown above the MIPS rating does not account for the
count of instructions executed (I).
– A higher MIPS rating in many cases may not mean higher performance or better
execution time. i.e. due to compiler design variations.
• In addition the MIPS rating:
– Does not account for the instruction set architecture (ISA) used.
• Thus it cannot be used to compare computers/CPUs with different instruction sets.
– Easy to abuse: Program used to get the MIPS rating is often omitted.
• Often the Peak MIPS rating is provided for a given CPU which is obtained using a
program comprised entirely of instructions with the lowest CPI for the given CPU
design which does not represent real programs.
T = I x CPI x C
36
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating
37
Compiler Variations, MIPS & Performance:
An Example
• For a machine (CPU) with instruction classes:
Instruction class CPI
A 1
B 2
C 3
38
Compiler Variations, MIPS & Performance:
An Example (Continued)
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)
40
Quantitative Principles
of Computer Design
• Amdahl’s Law:
The performance gain from improving some
portion of a computer is calculated by:
i.e using some enhancement
41
Performance Enhancement Calculations:
Amdahl's Law
• Amdahl’s Law:
Execution Time without enhancement
Speedup(E) =
Execution Time with enhancement
• Suppose a program contains s portion of code which requires sequential
execution and and f portion of code can run in parallel fashion.
• So, T= T(1-f)+ Tf / N 1
T( 1-f) Tf
f
Speedup = Tf = (1 f )
T( 1-f) N
N 42
Pictorial Depiction of Amdahl’s Law
45
Performance Enhancement Example
• For the RISC machine with the following instruction mix given earlier:
OP Freq Cycles CPI(i) % Time
CPIi x Fi
ALU 50% 1 .5 23%
Load 20% 5 1.0 45% CPI
Store 10% 3 .3 14%
Branch 20% 2 .4 18% CPI = 2.2
46
An Alternative Solution Using CPU Equation
Op Freq Cycles CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 CPI = 2.2
45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
• If a CPU design enhancement improves the CPI of load instructions from 5
to 2, what is the resulting performance improvement from this
enhancement:
New CPI of load is now 2 instead of 5
Old CPI = 2.2
New CPI = .5 x 1 + .2 x 2 + .1 x 3 + .2 x 2 = 1.6
Original Execution Time Instruction count x old CPI x clock cycle
Speedup(E) = ----------------------------------- = ----------------------------------------------------------------
New Execution Time Instruction count x new CPI x clock cycle
T = I x CPI x C 47
Performance Enhancement Example
• A program runs in 100 seconds on a machine with multiply operations
responsible for 80 seconds of this time. By how much must the speed
of multiplication be improved to make the program four times faster?
1 1 1
Speedup(E) = ------------------ = 4 = ----------------- = ---------------
(1 - F) + F/S (1 - .8) + .8/S .2 + .8/s
48
Machine = CPU
Performance Enhancement Example
• For the previous example with a program running in 100 seconds on a
machine with multiply operations responsible for 80 seconds of this
time. By how much must the speed of multiplication be improved to
make the program five times faster?
49
Extending Amdahl's Law To Multiple Enhancements
n enhancements each affecting a different portion of execution time
1
Speedup
((1 F ) F ) i
i i i
S i
51
Pictorial Depiction of Example
i.e normalized to 1
Before:
Execution Time with no enhancements: 1
S1 = 10 S2 = 15 S3 = 30
/ 10 / 15 / 30
Unchanged
After:
Execution Time with enhancements: .55 + .02 + .01 + .00333 = .5833
What if the fractions given are
Speedup = 1 / .5833 = 1.71 after the enhancements were applied?
How would you solve the problem?
52
That’s all for now…
53
References
• [1]. Cramming more components onto integrated circuits, Reprinted from
Electronics, volume 38, number 8, April 19, 1965, pp.114 ff.
• [2]. https://www.top500.org/lists/top500/2020/06/highs
• [3]. M Morris R. Mano, “Computer System Architecture”, 3rd edition, Pearson
Education
• [4] Computer Organization and Design THE HARDWARE /
SOFTWAREINTERFACE David A. Patterson University of California, Berkeley
John L. Hennessy Stanford University
• [5] Computer Organization and Architecture Designing for Performance Tenth
Edition William Stallings
54