0% found this document useful (0 votes)
49 views

CIE3301 C56i

computer arch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

CIE3301 C56i

computer arch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

CIE3301

Computer Architecture
2021-2022

Chapter 1: Computer Abstractions and


Technology
Dr. Sedki B. T. Younis
Email: sedki.thanoon@uoninevah.edu.iq

CIE3301 Chapter 1.1 Dr. S. Younis, Ninevah University, 2021


Course Administration
 Instructor: Dr. S Younis sedki.thanoon@uoninevah.edu.iq
 Text: Required:
Computer Org and Design, 4th Ed.,
Revised Printing, Patterson &Hennessy, ©2012
Computer System Architecture, 3rd Ed.,
M. Mano, ©2004
Computer Org and Architecture: Designing for
performance, 9th Ed., W. Stallings, ©2012
 Slides: Hard copy handed out in class; pdf on dropbox
after lecture

CIE3301 Chapter 1.2 Dr. S. Younis, Ninevah University, 2021


Grading Information
 Grade determinates
 First Term Exam ~15%
 Second Term Exam ~15%
 Homeworks (6) ~5%
- To be submitted on the due date. No late assignments will be
accepted.
 Class participation & quizzes ~5%
 Final Exam ~60%
 Let me know about exam conflicts ASAP
 Grades will be posted to your email
 Must submit email request for change of grade after
discussions with the instructor.
 There will be deadline for filing grade corrections; no
requests for grade changes will be accepted after this date

CIE3301 Chapter 1.3 Dr. S. Younis, Ninevah University, 2021


Course Structure & Schedule
 Lectures: 10:30am to 12:25am Every Wednesday
1:30am to 2:25am Every Sunday
 Lectures:
 5 weeks review of the ISA and basic architecture
 1 week introduction to computer abstraction & technology
 2 weeks Instructions: Language of the Computer
 2 weeks Arithmetic for computer

CIE3301 Chapter 1.4 Dr. S. Younis, Ninevah University, 2021


Course Structure & Schedule
 Lectures:
 3 weeks pipelined datapath design issues

 3 weeks superscalar/VLIW datapath design issues


 6 week memory hierarchies and memory design
issues
 6 weeks storage and I/O design issues
 6 weeks multiprocessor design issues

CIE3301 Chapter 1.5 Dr. S. Younis, Ninevah University, 2021


Course Content
 Memory hierarchy and design, CPU design, pipelining,
multiprocessor architecture.
 “This course will introduce students to the architecture-level design
issues of a computer system. They will apply their knowledge of digital
logic design to explore the high-level interaction of the individual
computer system hardware components. Concepts of sequential and
parallel architecture including the interaction of different memory
components, their layout and placement, communication among
multiple processors, effects of pipelining, and performance issues, will
be covered. Students will apply these concepts by studying and
evaluating the merits and demerits of selected computer system
architectures.”
- To learn what determines the capabilities and performance
of computer systems and to understand the interactions
between the computer’s architecture and its software so
that future software designers (compiler writers, operating
system designers, database programmers, application
programmers, …) can achieve the best cost-performance
trade-offs and so that future architects understand the
effects of their design choices on software.

CIE3301 Chapter 1.6 Dr. S. Younis, Ninevah University, 2021


What You Should Know –
CIE1301, CIE2301, & CIE2303
 Basic logic design & machine organization
 logical minimization, FSMs, component design

 Create, assemble, run, debug programs in an


assembly language
 MIPS preferred

 Create, simulate, and debug hardware structures in a


hardware description language
 VHDL

CIE3301 Chapter 1.7 Dr. S. Younis, Ninevah University, 2021


The Computer Revolution
 Progress in computer technology
 Underpinned by Moore’s Law

 Makes novel applications feasible


 Computers in automobiles
 Cell phones
 Human genome project
 World Wide Web
 Search Engines

 Computers are pervasive

CIE3301 Chapter 1.8 Dr. S. Younis, Ninevah University, 2021


Classes of Computers
 Desktop computers
 Designed to deliver good performance to a single user at low
cost usually executing 3rd party software, usually incorporating a
graphics display, a keyboard, and a mouse
 Servers
 Used to run larger programs for multiple, simultaneous users
typically accessed only via a network and that places a greater
emphasis on dependability and (often) security
 Supercomputers
 A high performance, high cost class of servers with hundreds to
thousands of processors, terabytes of memory and petabytes of
storage that are used for high-end scientific and engineering
applications
 Embedded computers (processors)
 A computer inside another device used for running one
predetermined application
CIE3301 Chapter 1.9 Dr. S. Younis, Ninevah University, 2021
Review: Some Basic Definitions

 Kilobyte – 210 or 1,024 bytes


 Megabyte– 220 or 1,048,576 bytes
 sometimes “rounded” to 106 or 1,000,000 bytes

 Gigabyte – 230 or 1,073,741,824 bytes


 sometimes rounded to 109 or 1,000,000,000 bytes

 Terabyte – 240 or 1,099,511,627,776 bytes


 sometimes rounded to 1012 or 1,000,000,000,000 bytes

 Petabyte – 250 or 1024 terabytes


 sometimes rounded to 1015 or 1,000,000,000,000,000 bytes

 Exabyte – 260 or 1024 petabytes


 Sometimes rounded to 1018 or 1,000,000,000,000,000,000 bytes

CIE3301 Chapter 1.10 Dr. S. Younis, Ninevah University, 2021


Growth in Cell Phone Sales (Embedded)
embedded growth >> desktop growth

 Where else are embedded processors found?


CIE3301 Chapter 1.11 Dr. S. Younis, Ninevah University, 2021
Embedded Processor Characteristics
The largest class of computers spanning the widest range
of applications and performance

 Often have minimum performance requirements.


Example?
 Often have stringent limitations on cost. Example?
 Often have stringent limitations on power consumption.
Example?
 Often have low tolerance for failure. Example?

CIE3301 Chapter 1.12 Dr. S. Younis, Ninevah University, 2021


What You Will Learn
 How programs are translated into the machine language
 And how the hardware executes them

 The hardware/software interface


 What determines program performance
 And how it can be improved

 How hardware designers improve performance


 What is parallel processing

CIE3301 Chapter 1.13 Dr. S. Younis, Ninevah University, 2021


Understanding Performance
 Algorithm
 Determines number of operations executed

 Programming language, compiler, architecture


 Determine number of machine instructions executed
per operation

 Processor and memory system


 Determine how fast instructions are executed

 I/O system (including OS)


 Determines how fast I/O operations are executed

CIE3301 Chapter 1.14 Dr. S. Younis, Ninevah University, 2021


Below the Program
Applications software
Systems software

Hardware

 System software
 Operating system – supervising program that interfaces the
user’s program with the hardware (e.g., Linux, MacOS,
Windows)
- Handles basic input and output operations
- Allocates storage and memory
- Provides for protected sharing among multiple applications
 Compiler – translate programs written in a high-level language
(e.g., C, Java) into instructions that the hardware can execute
CIE3301 Chapter 1.15 Dr. S. Younis, Ninevah University, 2021
Below the Program, Con’t
 High-level language program (in C)
swap (int v[], int k)
(int temp;
temp = v[k];
v[k] = v[k+1]; one-to-many
v[k+1] = temp;
) C compiler

 Assembly language program (for MIPS)


swap: sll $2, $5, 2
add $2, $4, $2
lw $15, 0($2)
lw $16, 4($2) one-to-one
sw $16, 0($2)
sw $15, 4($2)
jr $31 assembler

 Machine (object, binary) code (for MIPS)


000000 00000 00101 0001000010000000
000000 00100 00010 0001000000100000
. . .
CIE3301 Chapter 1.16 Dr. S. Younis, Ninevah University, 2021
Advantages of Higher-Level Languages ?
 Higher-level languages

 As a result, very little programming is done today at


the assembler level

CIE3301 Chapter 1.17 Dr. S. Younis, Ninevah University, 2021


Advantages of Higher-Level Languages ?
 Higher-level languages
 Allow the programmer to think in a more natural language and
for their intended use (Fortran for scientific computation,
Cobol for business programming, Lisp for symbol
manipulation, Java for web programming, …)
 Improve programmer productivity – more understandable
code that is easier to debug and validate
 Improve program maintainability
 Allow programs to be independent of the computer on which
they are developed (compilers and assemblers can translate
high-level language programs to the binary instructions of any
machine)
 Emergence of optimizing compilers that produce very efficient
assembly code optimized for the target machine

 As a result, very little programming is done today at


the assembler level

CIE3301 Chapter 1.18 Dr. S. Younis, Ninevah University, 2021


Under the Covers
 Five classic components of a computer – input, output,
memory, datapath, and control

 datapath
+ control
=
processor
(CPU)

CIE3301 Chapter 1.19 Dr. S. Younis, Ninevah University, 2021


Components of a Computer

 Same components for


The BIG Picture
all kinds of computer
 Desktop, server,
embedded

 Input/output includes
 User-interface devices
- Display, keyboard, mouse
 Storage devices
- Hard disk, CD/DVD, flash
 Network adapters
- For communicating with
other computers

CIE3301 Chapter 1.20 Dr. S. Younis, Ninevah University, 2021


Inside the Processor (CPU)
 Datapath: performs operations on data
 Control: sequences datapath, memory, ...
 Cache memory
 Small fast SRAM memory for immediate access to data

Instruction Set Architecture


 ISA, or simply architecture – the abstract interface
between the hardware and the lowest level software that
encompasses all the information necessary to write a
machine language program, including instructions,
registers, memory access, I/O, …
 Enables implementations of varying cost and performance to run
identical software

CIE3301 Chapter 1.21 Dr. S. Younis, Ninevah University, 2021


Instruction Set Architecture (ISA)
 ISA, or simply architecture – the abstract interface
between the hardware and the lowest level software that
encompasses all the information necessary to write a
machine language program, including instructions,
registers, memory access, I/O, …
 Enables implementations of varying cost and performance to run
identical software

CIE3301 Chapter 1.22 Dr. S. Younis, Ninevah University, 2021


Moore’s Law
 In 1965, Intel’s Gordon Moore Dual Core
predicted that the number of Itanium with
transistors that can be
1.7B transistors
integrated on single chip would
double about every two years

feature size
&
die size

Courtesy, Intel ®
CIE3301 Chapter 1.23 Dr. S. Younis, Ninevah University, 2021
Technology Scaling Road Map (ITRS)

Year 2004 2006 2008 2010 2012


Feature size (nm) 90 65 45 32 22
Intg. Capacity (BT) 2 4 6 16 32

 Fun facts about 45nm transistors


 30 million can fit on the head of a pin
 You could fit more than 2,000 across the width of a human
hair
 If car prices had fallen at the same rate as the price of a
single transistor has since 1968, a new car today would cost
about 1 cent

CIE3301 Chapter 1.24 Dr. S. Younis, Ninevah University, 2021


Another Example of Moore’s Law Impact

DRAM capacity growth over 3 decades

1G
256M
64M 512M
128M

4M
16M
1M

64K 256K

16K

CIE3301 Chapter 1.25 Dr. S. Younis, Ninevah University, 2021


Replace with Figure 1.16 when available

CIE3301 Chapter 1.26 Dr. S. Younis, Ninevah University, 2021


But What Happened to Clock Rates and Why?

 Clock rates hit a


“power wall”

120
100

Power (Watts)
80
60
40
20
0

CIE3301 Chapter 1.27 Dr. S. Younis, Ninevah University, 2021


A Sea Change is at Hand
 The power challenge has forced a change in the design
of microprocessors
 Since 2002 the rate of improvement in the response time of
programs on desktop computers has slowed from a factor of 1.5
per year to less than a factor of 1.2 per year

 As of 2006 all desktop and server companies are


shipping microprocessors with multiple processors –
cores – per chip
Product AMD Intel IBM Power 6 Sun Niagara
Barcelona Nehalem 2
Cores per chip 4 4 2 8
Clock rate 2.5 GHz ~2.5 GHz? 4.7 GHz 1.4 GHz
Power 120 W ~100 W? ~100 W? 94 W

 Plan of record is to double the number of cores per chip


per generation (about every two years)
CIE3301 Chapter 1.28 Dr. S. Younis, Ninevah University, 2021
 End of Lecture 1
 Lecture 1 was on the 3rd of October 2012
 Next slides will be presented in Lecture 2 on the 4th of
October 2012

CIE3301 Chapter 1.29 Dr. S. Younis, Ninevah University, 2021


Lecture 2 Outline Wednesday 20/10/2021
 Review of Lecture 1
 Definition of computer performance
 Performance metrics (Execution time versus throughput)
 Performance Factors
 Performance Equation

CIE3301 Chapter 1.30 Dr. S. Younis, Ninevah University, 2021


Inspirational Quotes

“Never let an engineer get away with simply presenting the


data. Always insist that he or she lead off with the
conclusions to which the data led.”
Bob Colwell, Pentium Chronicles

CIE3301 Chapter 1.31 Dr. S. Younis, Ninevah University, 2021


Performance Metrics
 Purchasing perspective
 given a collection of machines, which has the
- best performance ?
- least cost ?
- best cost/performance?
 Design perspective
 faced with design options, which has the
- best performance improvement ?
- least cost ?
- best cost/performance?
 Both require
 basis for comparison
 metric for evaluation
 Our goal is to understand what factors in the architecture
contribute to overall system performance and the relative
importance (and cost) of these factors
CIE3301 Chapter 1.32 Dr. S. Younis, Ninevah University, 2021
Defining Performance

 Which airplane has the best performance?

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50

0 200 400 600 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50

0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

CIE3301 Chapter 1.33 Dr. S. Younis, Ninevah University, 2021


Throughput versus Response Time
 Response time (execution time) – the time between the
start and the completion of a task
 Important to individual users

 Throughput (bandwidth) – the total amount of work done


in a given time
 Important to data center managers
 Will need different performance metrics as well as a
different set of applications to benchmark embedded and
desktop computers, which are more focused on response
time, versus servers, which are more focused on
throughput
 How are response time and throughput affected by
Replacing the processor with a faster version?
Adding more processors?
We’ll focus on response time for now…
CIE3301 Chapter 1.34 Dr. S. Younis, Ninevah University, 2021
Defining (Speed) Performance
 To maximize performance, need to minimize execution
time

performanceX = 1 / execution_timeX

If X is n times faster than Y, then

performanceX execution_timeY
-------------------- = --------------------- = n
performanceY execution_timeX

 Decreasing response time almost always improves


throughput

CIE3301 Chapter 1.35 Dr. S. Younis, Ninevah University, 2021


A Relative Performance Example
 If computer A runs a program in 10 seconds and
computer B runs the same program in 15 seconds, how
much faster is A than B?

CIE3301 Chapter 1.36 Dr. S. Younis, Ninevah University, 2021


Relative Performance Example
 If computer A runs a program in 10 seconds and
computer B runs the same program in 15 seconds, how
much faster is A than B?
We know that A is n times faster than B if

performanceA execution_timeB
-------------------- = --------------------- = n
performanceB execution_timeA

The performance ratio is 15


------ = 1.5
10
So A is 1.5 times faster than B

CIE3301 Chapter 1.37 Dr. S. Younis, Ninevah University, 2021


Performance Factors
 CPU execution time (CPU time) – time the CPU spends
working on a task
 Does not include time waiting for I/O or running other programs

CPU execution time # CPU clock cycles


= x clock cycle time
for a program for a program
or
CPU execution time = #-------------------------------------------
CPU clock cycles for a program
for a program clock rate

 Can improve performance by reducing either the length


of the clock cycle or the number of clock cycles required
for a program
CIE3301 Chapter 1.38 Dr. S. Younis, Ninevah University, 2021
Review: Machine Clock Rate
 Clock rate (clock cycles per second in MHz or GHz) is inverse of
clock cycle time (clock period)
Clock period CC = 1 / CR

Clock (cycles)

Data transfer
and computation
Update state

10 nsec clock cycle => 100 MHz clock rate


5 nsec clock cycle => 200 MHz clock rate
2 nsec clock cycle => 500 MHz clock rate
1 nsec (10-9) clock cycle => 1 GHz (109) clock rate
500 psec clock cycle => 2 GHz clock rate
250 psec clock cycle => 4 GHz clock rate
200 psec clock cycle => 5 GHz clock rate
CIE3301 Chapter 1.39 Dr. S. Younis, Ninevah University, 2021
Improving Performance Example
 A program runs on computer A with a 2 GHz clock in 10
seconds. What clock rate must a computer B run at to
run this program in 6 seconds? Unfortunately, to
accomplish this, computer B will require 1.2 times as
many clock cycles as computer A to run the program.

CIE3301 Chapter 1.40 Dr. S. Younis, Ninevah University, 2021


Improving Performance Example
 A program runs on computer A with a 2 GHz clock in 10
seconds. What clock rate must computer B run at to run
this program in 6 seconds? Unfortunately, to accomplish
this, computer B will require 1.2 times as many clock
cycles as computer A to run the program.
CPU timeA = -------------------------------
CPU clock cyclesA
clock rateA
CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec
= 20 x 109 cycles
CPU timeB = -------------------------------
1.2 x 20 x 109 cycles
clock rateB
clock rateB = -------------------------------
1.2 x 20 x 109 cycles = 4 GHz
6 seconds
CIE3301 Chapter 1.41 Dr. S. Younis, Ninevah University, 2021
Clock Cycles per Instruction
 Not all instructions take the same amount of time to
execute
 One way to think about execution time is that it equals the
number of instructions executed multiplied by the average time
per instruction

# CPU clock cycles # Instructions Average clock cycles


for a program = for a program x per instruction

 Clock cycles per instruction (CPI) – the average number


of clock cycles each instruction takes to execute
 A way to compare two different implementations of the same ISA

CPI for this instruction class


A B C
CPI 1 2 3
CIE3301 Chapter 1.42 Dr. S. Younis, Ninevah University, 2021
Using the Performance Equation
 Computers A and B implement the same ISA. Computer
A has a clock cycle time of 250 ps and an effective CPI of
2.0 for some program and computer B has a clock cycle
time of 500 ps and an effective CPI of 1.2 for the same
program. Which computer is faster and by how much?

CIE3301 Chapter 1.43 Dr. S. Younis, Ninevah University, 2021


Using the Performance Equation
 Computers A and B implement the same ISA. Computer
A has a clock cycle time of 250 ps and an effective CPI of
2.0 for some program and computer B has a clock cycle
time of 500 ps and an effective CPI of 1.2 for the same
program. Which computer is faster and by how much?
Each computer executes the same number of instructions, I,
so
CPU timeA = I x 2.0 x 250 ps = 500 x I ps
CPU timeB = I x 1.2 x 500 ps = 600 x I ps

Clearly, A is faster … by the ratio of execution times


performanceA execution_timeB 600 x I ps
------------------- = --------------------- = ---------------- = 1.2
performanceB execution_timeA 500 x I ps

CIE3301 Chapter 1.44 Dr. S. Younis, Ninevah University, 2021


Effective (Average) CPI
 Computing the overall effective CPI is done by looking at
the different types of instructions and their individual
cycle counts and averaging
n
Overall effective CPI =  (CPIi x ICi)
i=1
 Where ICi is the count (percentage) of the number of instructions
of class i executed
 CPIi is the (average) number of clock cycles per instruction for
that instruction class
 n is the number of instruction classes

 The overall effective CPI varies by instruction mix – a


measure of the dynamic frequency of instructions across
one or many programs

CIE3301 Chapter 1.45 Dr. S. Younis, Ninevah University, 2021


THE Performance Equation
 Our basic performance equation is then
CPU time = Instruction_count x CPI x clock_cycle
or
Instruction_count x CPI
CPU time = -----------------------------------------------
clock_rate

 These equations separate the three key factors that


affect performance
 Can measure the CPU execution time by running the program
 The clock rate is usually given
 Can measure overall instruction count by using profilers/
simulators without knowing all of the implementation details
 CPI varies by instruction type and ISA implementation for which
we must know the implementation details
CIE3301 Chapter 1.46 Dr. S. Younis, Ninevah University, 2021
Determinates of CPU Performance
CPU time = Instruction_count x CPI x clock_cycle

Instruction_ CPI clock_cycle


count
Algorithm

Programming
language
Compiler

ISA

Core
organization
Technology

CIE3301 Chapter 1.47 Dr. S. Younis, Ninevah University, 2021


Determinates of CPU Performance
CPU time = Instruction_count x CPI x clock_cycle

Instruction_ CPI clock_cycle


count
Algorithm
X X
Programming X X
language
Compiler
X X

ISA
X X X

Core
X X
organization
Technology
X

CIE3301 Chapter 1.48 Dr. S. Younis, Ninevah University, 2021


A Simple Example
Op Freq CPIi Freq x CPIi
ALU 50% 1 .
Load 20% 5
Store 10% 3
Branch 20% 2

=
 How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?

 How does this compare with using branch prediction to shave


a cycle off the branch time?

 What if two ALU instructions could be executed at once?

CIE3301 Chapter 1.49 Dr. S. Younis, Ninevah University, 2021


A Simple Example
Op Freq CPIi Freq x CPIi
ALU 50% 1 .5 .5 .5 .25
Load 20% 5 1.0 .4 1.0 1.0
Store 10% 3 .3 .3 .3 .3
Branch 20% 2 .4 .4 .2 .4

= 2.2 1.6 2.0 1.95

 How much faster would the machine be if a better data cache


reduced the average load time to 2 cycles?
CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster
 How does this compare with using branch prediction to shave
a cycle off the branch time?
CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster
 What if two ALU instructions could be executed at once?
CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster
CIE3301 Chapter 1.50 Dr. S. Younis, Ninevah University, 2021
Power Trends

 In CMOS IC technology

Power  Capacitive load  Voltage 2  Frequency

×30 5V → 1V ×1000

CIE3301 Chapter 1.51 Dr. S. Younis, Ninevah University, 2021


Summary: Evaluating ISAs
 Design-time metrics:
 Can it be implemented, in how long, at what cost?
 Can it be programmed? Ease of compilation?

 Static Metrics:
 How many bytes does the program occupy in memory?

 Dynamic Metrics:
 How many instructions are executed? How many bytes does the
processor fetch to execute the program?
 How many clocks are required per instruction? CPI
 How "lean" a clock is practical?

Best Metric: Time to execute the program!


depends on the instructions set, the
processor organization, and compilation Inst. Count Cycle Time
techniques.

CIE3301 Chapter 1.52 Dr. S. Younis, Ninevah University, 2021


THANK YOU

ANY QUESTIONS

CIE3301 Chapter 1.53 Dr. S. Younis, Ninevah University, 2021


Next Week’s Material
 Next week
 Chapter 2: Register Transfer and Computer Design Basics
 Reading assignment – Morris Mano (Ch. 4, 5)

See Next Week


Have a Nice Weekend !!!

CIE3301 Chapter 1.54 Dr. S. Younis, Ninevah University, 2021

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy