0% found this document useful (0 votes)
82 views

Lecture 2: Performance: CMPS 255 - Computer Architecture

1. A computer's performance is determined by its clock rate, instructions per cycle (CPI), and number of instructions. While clock rate improvements drove performance gains for decades, power limitations now prevent further clock rate increases. 2. Transistor count continues increasing due to Moore's Law, but these transistors must now be used to improve CPI and instruction count through techniques like pipelining, caching, and specialized hardware rather than just increasing clock rate. 3. Both software and hardware techniques are needed to reduce CPI and instruction count and continue performance gains within power limits.

Uploaded by

samah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Lecture 2: Performance: CMPS 255 - Computer Architecture

1. A computer's performance is determined by its clock rate, instructions per cycle (CPI), and number of instructions. While clock rate improvements drove performance gains for decades, power limitations now prevent further clock rate increases. 2. Transistor count continues increasing due to Moore's Law, but these transistors must now be used to improve CPI and instruction count through techniques like pipelining, caching, and specialized hardware rather than just increasing clock rate. 3. Both software and hardware techniques are needed to reduce CPI and instruction count and continue performance gains within power limits.

Uploaded by

samah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Lecture 2: Performance

CMPS 255 – Computer Architecture


Clocks
• A computer is driven by a clock that determines
when events take place
• A clock cycle is a discrete time interval between two
pulses of an oscillator

• A clock period is the duration of a clock cycle


• The clock rate or frequency is the number of clock
cycles per second (inverse of the clock period)
• Example: the Intel Core i7-8700K has a clock rate of 3.7GHz.
What is its clock period?
Which has better performance?
• CPU1: 2.4 GHz
• CPU2: 3.8 GHz

Trick question!
Which has better performance?

It depends on the performance metric we care about


Which has better performance?

If we care about minimizing the time to transport one person


from one place to another (i.e., execution time)…

…the car
Which has better performance?

If we care about maximizing the number of people we can


transport in a certain amount of time (i.e., throughput)…

…the bus
Which has better performance?

If we care about minimizing the energy it takes


to transport people (i.e., energy efficiency)…

…bikes
Which has better execution time?
• CPU1: 2.4 GHz
• CPU2: 3.8 GHz

Still a trick question!


Components of Execution Time
𝑺𝒆𝒄𝒐𝒏𝒅𝒔 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
= ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏

𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔


= × ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1


Which has better execution time?
𝑺𝒆𝒄𝒐𝒏𝒅𝒔 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
= ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏

𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔


= × ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

CPU1 ? ? 2.4 GHz


CPU2 ? ? 3.8 GHz

Cannot decide based on just the clock rate


Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

Approaches to decreasing execution time involve


decreasing one of these three components
Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

For a long time, improvements in circuits technology enabled


driving processors at higher clock rates, improving execution
time without the need for additional effort by software
developers and computer architects

This trend was called the “free lunch”


Moore’s “Law”
107
Transistors
106
(thousands)
105

104

103

102

101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Moore’s “Law” predicted that the number of transistors


per unit area would double every 18-24 months
No More Free Lunch
107
Transistors
106
(thousands)
105

104
Frequency
103
(MHz)

102

101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Processor frequency (clock rate) followed the same trend because


smaller transistors can be switched faster… until around 2005.
Power Wall
107
Transistors
106
(thousands)
105

104
Frequency
103
(MHz)

102
Power Wall
101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Around 2005, frequency stopped increasing due to the Power Wall


Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)
Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)

Increasing frequency increases power which dissipates more heat,


requiring more support for cooling the chip
Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)

Historically, the increase in power was partially compensated for by a


decrease in voltage, enabled by the decrease in transistor size
(over 20yrs, there was a 1,000x increase in frequency but only
a 30x increase in power because voltage decreased by 5x)
Power Breakdown

2
𝑃 ∝ 𝐶 𝑉 𝑓
(Power) (Capacitance) (Voltage) (Frequency)

Today, voltage can no longer be decreased because it makes transistors unreliable

and power can no longer be increased because we have reached the limit of what we
can cool

therefore, frequency can no longer be increased.


Power Trend
107 But we still get more
Transistors
106 transistors! 
(thousands)
105 What to do with
104
them?
Frequency
103
(MHz)
Typical Power
102
(Watts)
101

100

1970 1980 1990 2000 2010 2020


Source: M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten (1970-2010 ). K. Rupp (2010-2017).

Stagnation in frequency is associated with a stagnation in power


Where to invest transistors?
• Increase number of cores
• Improves throughput

• Make cores more advanced


• Improves execution time

• Tradeoff between execution time and throughput


Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

Computer architects have developed a wide variety of


techniques for improving the number of instructions that
can be executed each clock cycle
Techniques for Reducing CPI
• Pipelining (Chapter 4)

• Caching (Chapter 5)

• Out-of-order Execution (not covered)

• Speculative Execution (not covered)


Improving a CPU’s Execution Time
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

Software plays a primary role in reducing a program’s instruction count


(low complexity algorithms, powerful compiler optimizations)

Computer architecture also plays a role by providing special purpose


hardware for common operations (increasingly popular trend)
Pitfalls
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

A CPU manufacturer increases the clock rate of their processor,


decreasing the clock cycle duration.

As a result, some instructions that used to take 1 cycle to


complete now require 2 cycles, increasing the overall CPI.

If CPI increase is disproportionate, execution time my increase.


Pitfalls
𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏𝒔 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆𝒔 𝑺𝒆𝒄𝒐𝒏𝒅𝒔
× ×
𝑷𝒓𝒐𝒈𝒓𝒂𝒎 𝑰𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑪𝒍𝒐𝒄𝒌 𝑪𝒚𝒄𝒍𝒆

Instruction Count CPI Clock Rate -1

A CPU manufacturer creates a fused-multiply-add (FMA)


instruction which is a common operation in linear algebra.

Assuming an add instruction takes 4 cycles and a multiply


instruction takes 8 cycles, if the FMA instruction takes 12 cycles,
then execution time does not improve.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy