Computer Organization The Role of Performance
Computer Organization The Role of Performance
Computer Organization The Role of Performance
TIME
Computer Performance: TIME
• Response Time (elapsed time, latency):
– Time to Respond (Complete an operation)
• Other way:
– How long does it take for my job to run?
– How long does it take to execute (start to finish) my job?
Individual user
– How long must I wait for the database query? concerns…
• Other way:
– How many jobs can the machine run at once?
Systems manager
– What is the average execution rate? concerns…
– How much work is getting done?
Servers
Computer Performance: TIME
• Response Time (elapsed time, latency):
– How long does it take for my job to run?
– How long does it take to execute (start to Individual user
concerns…
finish) my job?
Desktops
– How long must I wait for the database query? Or
Embedded Systems
• Throughput:
– How many jobs can the machine run at once?
Systems manager
– What is the average execution rate? concerns…
– How much work is getting done? Servers
Elapsed time = CPU time + wait time (I/O, other programs, etc.)
Execution Time
• CPU time
– Doesn't count waiting for I/O or time spent running other programs
– Can be divided into user CPU time and system CPU time (OS calls on behalf of the
program)
Elapsed time = User CPU time + System CPU time + wait time
• Our focus:
• User CPU time (CPU execution time or, simply, execution time)
– time spent executing the lines of code that are in our program
Defining (Speed) Performance
PerformanceX / PerformanceY = n
Performance Example:
Time taken to run a program 10s on machine A and 15s on
machine B, how much time, the machine A is faster than
machine B?
Solution:
A is n times faster than B, if
PerformanceA / PerformanceB = n
Execution TimeB / Execution TimeA = n
equivalently
2nd instruction
3rd instruction
1st instruction
4th
5th
6th
...
time
time
CPU execution time = Instruction count x Avg CPI x Clock cycle time
n
• Clock Cycles = (CPIi x InstructionCounti)
Σ
i=1
CPI Example II
• A compiler designer is trying to decide between two code
sequences for a particular machine.
• Based on the hardware implementation, there are three different
classes of instructions: Class A, Class B and Class C
and they require 1, 2 and 3 cycles resp.
Instruction counts for instruction class
• Code sequence A B C
1 2 1 2
2 4 1 1
• Benchmark suites
– Perfect Club: set of application codes
– Livermore Loops: 24 loop kernels
– Linpack: linear algebra package
– SPEC: mix of code from industry organization
SPEC
• Programs used to measure performance
– Specialized benchmarks for particular classes of applications
• Standard Performance Evaluation Corporation (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• Elapsed time to execute a selection of programs
– Negligible I/O, so focuses on CPU performance
• Normalize relative to reference machine
• Summarize as geometric mean of performance ratios
– SPEC CPU2006
• CINT2006 (integer) and CFP2006 (floating-point)
Specialized SPEC Benchmarks
• I/O
• Network
• Graphics
• Java
• Web server
• Transaction processing (databases)
Other Performance Metrics
• Power consumption
– Especially in the embedded market where battery life
is important
• For power-limited applications, the most important metric is
energy efficiency
– Clock Rate and Power both are co-related
– So, both increased rapidly for decades and then
flattened off
– And now there is a practical power limit for cooling
commodity microprocessors
Other Performance Metrics
• Design of Microprocessors
– Power limit forced the drastic change
– Rather than to decrease the response time of a single
program running on a single microprocessor
• Adapt the microprocessors with multiple processors per chip
– Where the benefit is often more on throughput than on
response time
Other Performance Metrics
• Amdahl's Law
– A rule stating that the performance enhancement
possible with a given improvement is limited by the
amount that the improved feature is used
(
Execution Time Unaffected + Execution Time Affected
Amount of Improvement
)
Amdahl's Law
• Suppose a program runs in 100 seconds on a machine,
with multiply operations responsible for 80 seconds of
this time.
– How much do I have to improve the speed of multiplication if I
want my program to run 5 times faster?
(As, CPU Clock cycles = Instructions count x Average Clock cycles per Instruction)
So, the code from compiler 2 has a higher MIPS rating, but the
compiler 1 runs faster!
MIPS can fail to give a true picture of performance even when
comparing two versions of the same program on the same
machine
• MIPS doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions
Terminology
• A given program will require:
– some number of instructions (machine instructions)
– some number of cycles
– some number of seconds
• A vocabulary that relates these quantities:
– cycle time (seconds per cycle)
– clock rate (cycles per second)
– (average) CPI (cycles per instruction)
• a floating point intensive application might have a higher average CPI