0% found this document useful (0 votes)
1 views25 pages

Lecture 1 Computer Abstraction and Performance

The document is a lecture on computer architecture, covering topics such as the evolution of computer technology, classes of computers, performance metrics, and the components of a computer system. It discusses the impact of algorithms, programming languages, and hardware on performance, as well as the importance of abstraction in design. Additionally, it addresses trends in semiconductor technology and power consumption in CPUs.

Uploaded by

lam201006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views25 pages

Lecture 1 Computer Abstraction and Performance

The document is a lecture on computer architecture, covering topics such as the evolution of computer technology, classes of computers, performance metrics, and the components of a computer system. It discusses the impact of algorithms, programming languages, and hardware on performance, as well as the importance of abstraction in design. Additionally, it addresses trends in semiconductor technology and power consumption in CPUs.

Uploaded by

lam201006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

4/17/2025

VIETNAM NATIONAL UNIVERSITY HANOI (VNU)


VNU INFORMATION TECHNOLOGY INSTITUTE

Computer Architecture
Lecture 1: Computer abstraction & performance

Duy-Hieu Bui, PhD


AIoT Laboratory
Email: hieubd@vnu.edu.vn
https://duyhieubui.github.io

Content adapted from “Computer Organization and Design RISC-V


Edition: The Hardware Software Interface, Second Edition” by David A.
Patterson, John L. Hennessy, published by Morgan Kaufmann. © 2020
Elsevier Inc. All rights reserved.

4/17/2025 VNU-ITI/CICA 2

1
4/17/2025

The Computer Revolution

• Progress in computer technology


– Underpinned by domain-specific accelerators
• Makes novel applications feasible
– Computers in automobiles
– Cell phones
– Human genome project
– World Wide Web
– Search Engines
• Computers are pervasive

4/17/2025 VNU-ITI/CICA 3

Classes of Computers

• Personal computers
– General purpose, variety of software
– Subject to cost/performance tradeoff

• Server computers
– Network based
– High capacity, performance, reliability
– Range from small servers to building sized

4/17/2025 VNU-ITI/CICA 4

2
4/17/2025

Classes of Computers

• Supercomputers
– Type of server
– High-end scientific and engineering calculations
– Highest capability but represent a small fraction of the overall
computer market

• Embedded computers
– Hidden as components of systems
– Stringent power/performance/cost constraints

4/17/2025 VNU-ITI/CICA 5

The PostPC Era

4/17/2025 VNU-ITI/CICA 6

3
4/17/2025

The PostPC Era

• Personal Mobile Device (PMD)


– Battery operated
– Connects to the Internet
– Hundreds of dollars
– Smart phones, tablets, electronic glasses
• Cloud computing
– Warehouse Scale Computers (WSC)
– Software as a Service (SaaS)
– Portion of software run on a PMD and a portion run in the Cloud
– Amazon and Google

4/17/2025 VNU-ITI/CICA 7

Understanding Performance

• Algorithm
– Determines number of operations executed
• Programming language, compiler, architecture
– Determine number of machine instructions executed
per operation
• Processor and memory system
– Determine how fast instructions are executed
• I/O system (including OS)
– Determines how fast I/O operations are executed

4/17/2025 VNU-ITI/CICA 9

4
4/17/2025

Seven Great Ideas

• Use abstraction to simplify design

• Make the common case fast

• Performance via parallelism

• Performance via pipelining

• Performance via prediction

• Hierarchy of memories

• Dependability via redundancy

4/17/2025 VNU-ITI/CICA 10

Below Your Program

• Application software
– Written in high-level language
• System software
– Compiler: translates HLL code to machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
– Processor, memory, I/O controllers

4/17/2025 VNU-ITI/CICA 11

5
4/17/2025

Levels of Program Code

• High-level language
– Level of abstraction closer to
problem domain
– Provides for productivity and
portability
• Assembly language
– Textual representation of
instructions
• Hardware representation
– Binary digits (bits)
– Encoded instructions and
data

4/17/2025 VNU-ITI/CICA 12

Components of a Computer

• Same components for


all kinds of computer
– Desktop, server,
embedded
• Input/output includes
– User-interface devices
• Display, keyboard, mouse
– Storage devices
• Hard disk, CD/DVD, flash
– Network adapters
• For communicating with other
computers

4/17/2025 VNU-ITI/CICA 13

6
4/17/2025

Touchscreen

• PostPC device
• Supersedes keyboard
and mouse
• Resistive and Capacitive
types
– Most tablets, smart phones
use capacitive
– Capacitive allows multiple
touches simultaneously

4/17/2025 VNU-ITI/CICA 14

Through the Looking Glass

• LCD screen: picture elements (pixels)


– Mirrors content of frame buffer memory

4/17/2025 VNU-ITI/CICA 15

7
4/17/2025

Opening the Box

4/17/2025 VNU-ITI/CICA 16

Inside the Processor (CPU)

• Datapath: performs operations on data


• Control: sequences datapath, memory, ...
• Cache memory
– Small fast SRAM memory for immediate access to data

4/17/2025 VNU-ITI/CICA 17

8
4/17/2025

Inside the Processor

• A12 processor

4/17/2025 VNU-ITI/CICA 18

Abstractions

• Abstraction helps us deal with complexity


– Hide lower-level detail
• Instruction set architecture (ISA)
– The hardware/software interface
• Application binary interface
– The ISA plus system software interface
• Implementation
– The details underlying and interface

4/17/2025 VNU-ITI/CICA 19

9
4/17/2025

A Safe Place for Data

• Volatile main memory


– Loses instructions and data when power off
• Non-volatile secondary memory
– Magnetic disk
– Flash memory
– Optical disk (CDROM, DVD)

4/17/2025 VNU-ITI/CICA 20

Networks

• Communication, resource sharing, nonlocal access


• Local area network (LAN): Ethernet
• Wide area network (WAN): the Internet
• Wireless network: WiFi, Bluetooth

4/17/2025 VNU-ITI/CICA 21

10
4/17/2025

Technology Trends

• Electronics
technology continues
to evolve
– Increased capacity
and performance
– Reduced cost
DRAM capacity

Year Technology Relative performance/cost


1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

4/17/2025 VNU-ITI/CICA 22

Semiconductor Technology

• Silicon: semiconductor
• Add materials to transform properties:
– Conductors
– Insulators
– Switch

4/17/2025 VNU-ITI/CICA 23

11
4/17/2025

Manufacturing ICs

• Yield: proportion of working dies per wafer

4/17/2025 VNU-ITI/CICA 24

Intel® Core 10th Gen

• 300mm wafer, 506 chips, 10nm technology


• Each chip is 11.4 x 10.7 mm

4/17/2025 VNU-ITI/CICA 25

12
4/17/2025

Integrated Circuit Cost

• Nonlinear relation to area and defect rate


– Wafer cost and area are fixed
– Defect rate determined by manufacturing process
– Die area determined by architecture and circuit design
Cost per wafer
Cost per die =
Dies per wafer  Yield
Dies per wafer  Wafer area Die area
1
Yield =
(1+ (Defects per area  Die area/2)) 2

4/17/2025 VNU-ITI/CICA 26

Defining Performance

• Which airplane has the best performance?

4/17/2025 VNU-ITI/CICA 27

13
4/17/2025

Response Time and Throughput


• Response time
– How long it takes to do a task
• Throughput
– Total work done per unit time
• e.g., tasks/transactions/… per hour
• How are response time and throughput affected
by
– Replacing the processor with a faster version?
– Adding more processors?
• We’ll focus on response time for now…

4/17/2025 VNU-ITI/CICA 28

Relative Performance

• Define Performance = 1/Execution Time


• “X is n time faster than Y”

Performanc e X Performanc e Y
= Execution time Y Execution time X = n

• Example: time taken to run a program


– 10s on A, 15s on B
– Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
– So A is 1.5 times faster than B

4/17/2025 VNU-ITI/CICA 29

14
4/17/2025

Measuring Execution Time

• Elapsed time
– Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
– Determines system performance
• CPU time
– Time spent processing a given job
• Discounts I/O time, other jobs’ shares
– Comprises user CPU time and system CPU time
– Different programs are affected differently by CPU and system
performance

4/17/2025 VNU-ITI/CICA 30

CPU Clocking

• Operation of digital hardware governed by a


constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

• Clock period: duration of a clock cycle


– e.g., 250ps = 0.25ns = 250×10–12s
• Clock frequency (rate): cycles per second
– e.g., 4.0GHz = 4000MHz = 4.0×109Hz

4/17/2025 VNU-ITI/CICA 31

15
4/17/2025

CPU Time

• Performance improved by
– Reducing number of clock cycles
– Increasing clock rate
– Hardware designer must often trade off clock rate against cycle
count

CPU Time = CPU Clock Cycles  Clock Cycle Time


CPU Clock Cycles
=
Clock Rate

4/17/2025 VNU-ITI/CICA 32

CPU Time Example

• Computer A: 2GHz clock, 10s CPU time


• Designing Computer B
– Aim for 6s CPU time
– Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
Clock CyclesB 1.2  Clock CyclesA
Clock RateB = =
CPU Time B 6s
Clock CyclesA = CPU Time A  Clock Rate A
= 10s  2GHz = 20  10 9
1.2  20  10 9 24  10 9
Clock RateB = = = 4GHz
6s 6s
4/17/2025 VNU-ITI/CICA 33

16
4/17/2025

Instruction Count and CPI

• Instruction Count for a program


– Determined by program, ISA and compiler
• Average cycles per instruction
– Determined by CPU hardware
– If different instructions have different CPI
• Average CPI affected by instruction mix

Clock Cycles = Instructio n Count  Cycles per Instructio n


CPU Time = Instructio n Count  CPI  Clock Cycle Time
Instructio n Count  CPI
=
Clock Rate
4/17/2025 VNU-ITI/CICA 34

CPI Example

• Computer A: Cycle Time = 250ps, CPI = 2.0


• Computer B: Cycle Time = 500ps, CPI = 1.2
• Same ISA
• Which is faster, and by how much?
CPU Time = Instructio n Count  CPI  Cycle Time
A A A
= I  2.0  250ps = I  500ps A is faster…
CPU Time = Instructio n Count  CPI  Cycle Time
B B B
= I  1.2  500ps = I  600ps

B = I  600ps = 1.2
CPU Time
…by this much
CPU Time I  500ps
A
4/17/2025 VNU-ITI/CICA 35

17
4/17/2025

CPI in More Detail

• If different instruction classes take different numbers of


cycles

n
Clock Cycles =  (CPIi  Instruction Count i )
i=1

◼ Weighted average CPI


Clock Cycles n
 Instruction Count i 
CPI = =   CPIi  
Instruction Count i=1  Instruction Count 

Relative frequency

4/17/2025 VNU-ITI/CICA 36

CPI Example

• Alternative compiled code sequences using


instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
4/17/2025 VNU-ITI/CICA 37

18
4/17/2025

Performance Summary

• Performance depends on
– Algorithm: affects IC, possibly CPI
– Programming language: affects IC, CPI
– Compiler: affects IC, CPI
– Instruction set architecture: affects IC, CPI, Tc

Instructions Clock cycles Seconds


CPU Time =  
Program Instruction Clock cycle

4/17/2025 VNU-ITI/CICA 38

Power Trends

• In CMOS IC technology

Power = Capacitive load  Voltage 2  Frequency

×30 5V → 1V ×1000

4/17/2025 VNU-ITI/CICA 39

19
4/17/2025

Reducing Power

• Suppose a new CPU has


– 85% of capacitive load of old CPU
– 15% voltage and 15% frequency reduction

Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85


= = 0.85 4 = 0.52
Cold  Vold  Fold
2
Pold

• The power wall


– We can’t reduce voltage further
– We can’t remove more heat
• How else can we improve performance?

4/17/2025 VNU-ITI/CICA 40

Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency

4/17/2025 VNU-ITI/CICA 41

20
4/17/2025

Multiprocessors

• Multicore microprocessors
– More than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
– Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization

4/17/2025 VNU-ITI/CICA 42

SPEC CPU Benchmark

• Programs used to measure performance


– Supposedly typical of actual workload
• Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006
– Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance
– Normalize relative to reference machine
– Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)

n
n
 Execution time ratio
i=1
i

4/17/2025 VNU-ITI/CICA 43

21
4/17/2025

SPECspeed 2017 Integer benchmarks on a


1.8 GHz Intel Xeon E5-2650L

4/17/2025 VNU-ITI/CICA 44

SPEC Power Benchmark

• Power consumption of server at different workload levels


– Performance: ssj_ops/sec
– Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_opsi    poweri 
 i =0   i =0 

4/17/2025 VNU-ITI/CICA 45

22
4/17/2025

SPECpower_ssj2008 for Xeon E5-2650L

4/17/2025 VNU-ITI/CICA 46

Pitfall: Amdahl’s Law

• Improving an aspect of a computer and


expecting a proportional improvement in overall
performance
Taffected
Timproved = + Tunaffected
improvemen t factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance to
get 5× overall?
80 ◼ Can’t be done!
20 = + 20
n
◼ Corollary: make the common case fast
4/17/2025 VNU-ITI/CICA 47

23
4/17/2025

Fallacy: Low Power at Idle

• Look back at i7 power benchmark


– At 100% load: 258W
– At 50% load: 170W (66%)
– At 10% load: 121W (47%)
• Google data center
– Mostly operates at 10% – 50% load
– At 100% load less than 1% of the time
• Consider designing processors to make power
proportional to load

4/17/2025 VNU-ITI/CICA 48

Pitfall: MIPS as a Performance Metric

• MIPS: Millions of Instructions Per Second


– Doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions

Instruction count
MIPS =
Execution time  10 6
Instruction count Clock rate
= =
Instruction count  CPI CPI  10 6
 10 6
Clock rate
◼ CPI varies between programs on a given CPU
4/17/2025 VNU-ITI/CICA 49

24
4/17/2025

Concluding Remarks

• Cost/performance is improving
– Due to underlying technology development
• Hierarchical layers of abstraction
– In both hardware and software
• Instruction set architecture
– The hardware/software interface
• Execution time: the best performance measure
• Power is a limiting factor
– Use parallelism to improve performance

4/17/2025 VNU-ITI/CICA 50

25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy