0% found this document useful (0 votes)

84 views44 pages

ΕCE 338 Parallel Computer Architecture Spring 2022

This document provides an overview of an Electrical and Computer Engineering course on Parallel Computer Architecture being offered in Spring 2022. The course will cover topics related to instruction level parallelism, memory hierarchies, data level parallelism, thread level parallelism, and warehouse scale computing. It includes the instructor's contact information, prerequisites, grading breakdown, and required textbook. The goal of the course is to teach students about various techniques for exploiting parallelism in computer systems.

Uploaded by

Nikitas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views44 pages

ΕCE 338 Parallel Computer Architecture Spring 2022

Uploaded by

Nikitas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

ΕCE 338

Parallel Computer Architecture

Spring 2022

Administrivia
The need for Parallel Computing
Introduction to Parallel Computer Architecture

Nikos Bellas

Electrical and Computer Engineering Department

University of Thessaly

ECE 338: Parallel Computer Architecture 1

Administrivia
Instructor: Νίκος Μπέλλας (nbellas@uth.gr)

Class lectures - Labs: Tuesdays 11:00-13:00,

Thursdays 11:00-13:00
Location : #305

MS-teams page
Code: 9zj4f0d
To be used exclusively

Office: #422
Phone #: 24210-74704

ECE 340: Parallel Computer Architecture 2

Προαπαιτούμενα για το μάθημα
• ECE232. Οργάνωση Η/Υ
• Καλή γνώση της γλώσσας C
• Ψηφιακή σχεδίαση και Λογικό σχεδιασμό
• Βασικές γνώσεις σε Unix/Linux
• Πολύ καλή γνώση της Αγγλικής, κυρίως της σχετικής
ορολογίας
• Όρεξη γιά δουλειά και επιθυμία για γνώσεις πάνω σε
“Computer Architecture”
• Τα slides και η ορολογία θα είναι στα Αγγλικά

ECE 338: Parallel Computer 3

Curriculum I
• Introductory Material
– Introduction to Parallel Computer Architecture
– The need for parallel architectures
– Latest trends in system design
– Quick reminder of ECE232
• Instruction Level parallelism (ILP)
– Stalls and their effects on ILP
– Dynamic instruction scheduling. Tomasulo’s algorithm. Reorder Buffers.
Speculation
– Branch prediction
– Static instruction scheduling. VLIW technology. Loop scheduling. Software
pipeline. Modulo Scheduling
– Simultaneous Multithreading (SMT)
– Case study: Multicore architectures from Intel (Intel Core i7 and Itanium)

ECE 338: Parallel Computer 4

Curriculum II
• Memory Hierarchy
– Cache optimizations for performance improvements
– Virtual Memory
– DRAM organization and functionality
• Data Level Parallelism (DLP)
– Vector Architectures
– SIMD instructions in General Purpose CPUs
– GPUs architectures and CUDA
• Thread Level Parallelism (TLP) - Multicores
– Centralized and Distributed shared memory multiprocessors
– Memory Coherence
– Memory Consistency
– Synchronization

ECE 338: Parallel Computer 5

Curriculum III

• Warehouse Scale Computing (WSC)

– Architectures for WSC
– Programming models and Workloads for WSC
– Efficiency and Cost of WSC
– Cloud computing
– Case study: Google WSC

• Various topics (time permitting)

– Domain specific architectures (Google’s TPU, FPGAs)
– Reliable computing. Approximate computing
– Memory Centric Computing
– Machine Learning and Computer Architecture

ECE 338: Parallel Computer 6

Συγγράμματα
Το μάθημα θα βασιστεί στην 6η έκδοση του βιβλίου
“Computer Architecture: A Quantitative Approach”,
by J. Hennessy, D. Patterson, Morgan Kaufmann
Publishers, 6th edition, 2019
Οι φοιτητές θα πάρουν την 6η έκδοση του βιβλίου
που έχει μεταφραστεί στα ελληνικά.

Eπιλεγμένες δημοσιεύσεις από συνέδρια

“Αρχιτεκτονικής Υπολογιστών” όπως (ISCA, Micro,
HPCA, κοκ).
Συμβουλή: To internet έχει ένα τεράστιο αριθμό από
πηγές για “Computer Architecture”. Χρησιμοποιέιστε
τις.

ECE 338: Parallel Computer 7

Grading
• Final Exam: 50%
• You should take at least 5 to pass the class
• Homeworks: 50%
• Notes:
• this is a demanding class. You have to devote a lot of time to keep
up with the lectures, and do the homework, the project and paper
study.

ECE 338: Parallel Computer 8

Let's get started

ECE 338: Parallel Computer 9

The Past…

ENIAC, “US Army photo, around 1946”

ECE 338: Parallel Computer 1010
The present…
Today a $500 laptop has more performance, more main memory and more disk storage
than a computer bought in 1985 for $1 M ARMv8 64-bit Microserver
Raspberry PI4 CS Lab@UTH

Android Smartphone

Playstation 4

Facebook Datacenter

Xilinx Ultrascale+ FPGA

ECE 338: Parallel Computer 11

What is Computer Architecture
• Computer architecture is a description of the
structure and the functionality of a computer system.

• Computer architecture comprises at least three main

subcategories:
• Instruction set architecture (ISA), also known as assembly language is the lowest
point of control of the programmer on the processor.
• Microarchitecture or Computer Organization is at a lower level description of the
system. What are the modules of the system, how they interconnect and how they
interact. Microarchitecture is beyond the control of the programmer. For example, the
number of functional units in a CPU is a microarchitectural detail.
• System Design which includes all of the other hardware components within a
computing system such as system interconnects, memory hierarchies, peripherals, etc.
System design is sometimes visible to the programmer.

ECE 338: Parallel Computer 12

ISA vs. Computer Architecture
• Old definition of computer architecture
= instruction set design
– Other aspects of computer design called implementation
– Insinuates implementation is uninteresting or less challenging
• Today computer architects do much more. Technical hurdles
more challenging than in the earlier days
• Two very important trends:
– Implementation of microarchitecture has become critical, and more
so as technology scales down
– What really matters now is the whole system, NOT only the CPU
(end-to-end system design). Computer architecture is an integrated
approach
• All these are in the plate of the computer architect.

ECE 338: Parallel Computer 13

So, what does a good computer architect do
• Exploit quantitative principles of design
– Take Advantage of Parallelism
– Principle of Locality
– Focus on the Common Case
– Amdahl’s Law
– The Processor Performance Equation
• Performs careful, quantitative comparisons
 Define, quantify, and summarize relative performance, cost,
dependability, power dissipation of multiple solution
• Anticipate and exploit advances in technology
• Define and thoroughly verify well-defined interfaces

ECE 338: Parallel Computer 14

1) Taking Advantage of Parallelism
Parallelism in Space
– Multiple CPUs running different threads of the program
– Carry lookahead adders uses parallelism to speed up computing sums from
linear to logarithmic in number of bits per operand
– Multiple memory banks searched in parallel in set-associative caches
Parallelism in Time
– overlap instruction execution (pipelining) to reduce the total time to
complete an instruction sequence
– Not every instruction depends on immediate predecessor executing
instructions completely/partially in parallel possible
– Classic 5-stage pipeline:
1) Instruction Fetch (Ifetch),
2) Register Read (Reg),
3) Execute (ALU),
4) Data Memory Access (Dmem),
5) Register Write (Reg)

ECE 338: Parallel Computer 15

Pipelined Instruction Execution
Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
Ifetch Reg DMem Reg
n
s
t

ALU
Ifetch Reg DMem Reg
r.

ALU
Ifetch Reg DMem Reg

r
d

ALU
Ifetch Reg DMem Reg
e
r

ECE 338: Parallel Computer 16

Limits to pipelining

• Hazards prevent next instruction from executing during its

designated clock cycle
– Structural hazards: attempt to use the same hardware to do two different
things at once
– Data hazards: Instruction depends on result of prior instruction still in the
pipeline
– Control hazards: Caused by delay between the fetching of instructions
and decisions about changes in control flow (branches and jumps).
Time (clock cycles)

ALU
I Ifetch Reg DMem Reg
n

ALU
s Ifetch Reg DMem Reg
t

ALU
r. Ifetch Reg DMem Reg

ALU
Ifetch Reg DMem Reg
O
r
d
e
r ECE 338: Parallel Computer 17
2) The Principle of Locality
• The Principle of Locality:
– Program access a relatively small portion of the address space at any instant of time.
• Two Different Types of Locality:
– Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced
again soon (e.g., loops, reuse)
– Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are
close by tend to be referenced soon
(e.g., straight-line code, array access)
• Last 30 years, computer architecture relied on locality for memory
perfomance

P $ MEM

ECE 338: Parallel Computer 18

Capacity
Levels of the Memory Hierarchy
Access Time Staging
Cost Xfer Unit
CPU Registers prog./compiler Upper Level
100s Bytes
Registers
1-8 bytes
300 – 500 ps (0.3-0.5 ns) Instr. Operands faster
L1 and L2 Cache L1 Cache
10s-100s K Bytes cache cntl
~1 ns - ~10 ns
Blocks 32-64 bytes
$1000s/ GByte L2 Cache

Main Memory Blocks Cache/memory cntl

GBytes 64-128 bytes
80ns- 200ns Memory
~ $100/ GByte
Pages
Disk OS
10s TBytes, 10 ms 4K-8K bytes
Disk
(10,000,000 ns)
~ $1 / GByte
Files
Storage Servers user/operator Larger
infinite Mbytes
sec-min Storage Servers in the Net Lower Level
~$1 / GByte

ECE 338: Parallel Computer 19

3) Focus on the Common Case
• Common sense guides computer design
– Since we’re in engineering, common sense is valuable
• In making a design trade-off, favor the frequent case over
the infrequent case
– E.g., Instruction fetch and decode unit used more frequently than multiplier, so
optimize it first
• Frequent case is often simpler and can be done faster than
the infrequent case
– E.g., overflow is rare when adding 2 numbers, so improve performance by optimizing
more common case of no overflow
– May slow down overflow, but overall performance improved by optimizing for the
normal case
• What is frequent case and how much performance
improved by making case faster => Amdahl’s Law

ECE 338: Parallel Computer 20

4) Amdahl’s Law
T
Sequential Part Parallelizable Part
(1-α)*Τ α*Τ

(1-α)Τ αΤ / Speedup enhanced

 α 
ExTime new  ExTime old  1  α   
 Speedup enhanced 

ExTime old 1
Speedup overall  
ExTime new 1  α   α
Speedup enhanced
Best you could ever hope to do (perfect speedup):
1
Speedup maximum 
1 - α 
ECE 338: Parallel Computer 21
Amdahl’s Law example
• New CPU 10X faster
• I/O bound server, so 60% time waiting for I/O

1
Speedup overall  
α
1  α  
Speedup enhanced

1
 1.56
1  0.4  0.4
10
• Apparently, it’s human nature to be attracted by 10X faster,
vs. keeping in perspective it’s just 1.6X faster

ECE 338: Parallel Computer 22

Amdahl’s Law for different parallelism granularities

Amdahl’s law main idea: the sequential part of an application

limits performance scaling.

ECE 338: Parallel Computer 23

CPI
5) Processor performance equation
“Iron Law of Performance”
inst count Cycle time
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

Inst Count CPI Clock Rate

Program X

Compiler X (X)

Inst. Set. X X

Organization X X

Technology X

ECE 338: Parallel Computer 24

What drives new architectures ?
Why are there so many of them?
• Technology
– Determines what is plausible and what is not
– What is cheap and what is expensive in terms of performance, cost
and power
– For example, memory hierarchy (e.g. cache memories) became
necessary due to DRAM technology being much slower than CPU
technology
• Applications
– High volume, mainstream applications drive architectural decisions
– SIMD parallelism driven by a market need for multimedia products

ECE 338: Parallel Computer 25

Technology drives new architectures

• General-purpose single cores have stopped

historic performance scaling
• Why?
– Power consumption
– DRAM access latency
– Diminishing returns of more instruction-level
parallelism

ECE 338: Parallel Computer 26

Power consumption problem
•
Robert H. Dennard photo from Wikipedia

1 transistor = 1x energy 2 transistors= 1x energy 4 transistors = 1x energy

after 2 yrs after 2 yrs

ECE 338: Parallel Computer 27

Dennard Scaling
L’ = L/S
W’ = W/S
tox’ = tox/S
Xj’ = Xj/S -- junction depth
Vdd’ = Vdd/S
Vth’ = Vth/S
Na’, Nd’ = S * Na, Nd
Id (lin)’ = Id(lin)/S
Id(sat)’ = Id(sat)/S
P’ = Id.Vds = Id/S * Vds/S = P/(s^2)
Power density’ = Power’/Area’ = (P/S^2) / ((W*L)/S^2) =
Power/Area

ECE 338: Parallel Computer Architecture 28

What is the problem?

After mid-2000’s
Transistors still getting smaller (Moore’s law) but
energy increases!
WHY?

1 transistor = 1x energy 2 transistors > 1x energy 4 transistors >> 1x energy

after 2 yrs after 2 yrs

ECE 338: Parallel Computer 29

Dennard Scaling no more
•

ECE 338: Parallel Computer 30

Technology drives new architectures
» High power dissipation (CV2f) drives lower clock frequency
» Simpler, lower frequency unicore architecture
» Seek performance through more, simpler, slower cores

ECE 338: Parallel Computer 31

Reducing Power: Frequency
Growth in clock frequency has stalled since 2003/04

ECE 338: Parallel Computer 32

Reducing Power: Multicores
Multiple cores instead of single, complex cores

Processor
f/2
Input Output Output
Processor
f
Input

Processor
f/2
Before
Capacitance = C
After
Clock frequency = f
Capacitance = 2.2C
Voltage = V
Clock frequency = f/2
Power = CV2f
Voltage = 0.6V
Power = 0.396CV2f
Slower processors allow for lower Vdd voltage.
Emphasis on parallelism NOT on clock frequency
ECE 338: Parallel Computer 33
Reducing Power: Heterogeneous computing
Specialization

• Domain specific processors (Google’s TPU, Security processors)

• FPGAs
ECE 338: Parallel Computer Architecture
34
Reducing Power: Smart Software
– Turn off the clock (or even Vdd) when cores are idle
(turbo mode in modern multicores)
– Dynamic Voltage-Frequency Scaling (DVFS)
• Under the control of the Operating Systems
– Low power state for DRAM, disks
– Approximate computing

ECE 338: Parallel Computer 35

Technology drives new architectures
» Wire speed scales slower than transistor speed
» Wire delays drive localized computing == multi cores
» Super-pipelining to account for data transfer!

ECE 338: Parallel Computer 36

Technology drives new architectures
DRAM access latency
External memory accesses becoming more and more expensive
In the order of hundreds of cycles for HighPerf processors
Need for caches or local memories

ECE 338: Parallel Computer 37

Technology drives new architectures
• Diminishing returns of instruction level parallelism

– 50% performance improvement every year in the 80's

– Due to pipeline : 5 CPI → 1 CPI
– Diminishing returns in the 90's
– More complexity to detect the last available ILP
– Superscalar, VLIW, Branch prediction
– Due to ILP : 1 CPI -> 0.3 CPI
– The multicore era in the 00's

ECE 338: Parallel Computer 38

Technology drives new architectures
General-purpose unicores have stopped historic performance scaling
From Hennessy and Patterson, Computer Architecture: A Quantitative
Approach, 6th edition, 2017 Single-core performance

ECE 338: Parallel Computer 39

Tremendous change in Design Technology
• Intel 4004 (1971): 4-bit processor,
2250 transistors, 750 KHz,
10 micron PMOS process, 11 mm2 chip

• RISC II (1983): 32-bit, 5 stage

pipeline, 40,760 transistors, 3 MHz,
3 micron NMOS process, 60 mm2 chip

• IBM Power 9 (2017): 24-core, 64-bit, 96

threads, 8 billion transistors, 14nm FinFET
Silicon On Insulator (SOI) process, 695 mm2
chip

• State of art is 7nm (0.007 micron) in 2019

ECE 338: Parallel Computer 40

Computer Architecture Today (I)
• Today is a very exciting time to study computer architecture
• Industry is in a large paradigm shift (to heterogeneous or
accelerator-based computing) – many different potential system
designs possible
• Machine Learning applications have rejuvenated hardware design
• Many difficult problems motivating and caused by the shift
– Huge hunger for data and new data-intensive applications (ML, Big Data,
Robotics)
– Power/energy/thermal constraints
– Complexity of design due to Heterogeneity
– Difficulties in technology scaling
– Memory wall/gap
– Reliability problems
– Programmability problems
– Security and privacy issues
ECE 338: Parallel Computer Architecture 41
Computer Architecture Today (II)
• These problems affect all parts of the computing stack – if we do
not change the way we design systems
Problem
Many new demands
from the top Algorithm
(Look Up) Program/Language User Fast changing
demands and
personalities
Runtime System of users
(VM, OS, MM) (Look Up)

ISA
Microarchitecture
Many new issues Logic
at the bottom Circuits
(Look Down)
Electrons
• No clear, definitive answers to these problems
ECE 338: Parallel Computer 42
Computer Architecture Today (III)
• Computing landscape is very different from 10-20 years ago
• Both UP (software and humanity trends) and DOWN (technologies
and their issues), FORWARD and BACKWARD, and the resulting
requirements and constraints

Hybrid Main Memory

Heterogeneous Persistent Memory/Storage

Processors and
Accelerators Every component and its
interfaces, as well as
entire system designs
are being re-examined
General Purpose GPUs

ECE 338: Parallel Computer 43

Future trends
• All exponential laws must come to an end
– Dennard scaling (constant power density)
• Stopped by threshold voltage
– Disk capacity
• 30-100% per year to 5% per year
• Moore’s Law has slowed
• Most visible with DRAM capacity
• Only four foundries left producing state-of-the-art
logic chips
– Taiwan Semi (TSMC), Intel, Samsung, and Global
Foundries (IBM, AMD, etc).
• 7 nm now, 3 nm might be the limit

ECE 338: Parallel Computer 44

The Baker's Wife - Meadowlark Sheet Music
94% (18)
The Baker's Wife - Meadowlark Sheet Music
13 pages
Software Systems
100% (1)
Software Systems
298 pages
Eee 3132 Part 1 Lecture 1 to 8
No ratings yet
Eee 3132 Part 1 Lecture 1 to 8
366 pages
Manual PCS7V90 Virtualization en
No ratings yet
Manual PCS7V90 Virtualization en
59 pages
Ebook - Allan Mckay - VFX Hardware Guide - Jan 2019 PDF
No ratings yet
Ebook - Allan Mckay - VFX Hardware Guide - Jan 2019 PDF
29 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
CA Lec1
No ratings yet
CA Lec1
29 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
CH02 COA10e.performance Issues
No ratings yet
CH02 COA10e.performance Issues
19 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
L01 Introduction
No ratings yet
L01 Introduction
22 pages
Cours 1
No ratings yet
Cours 1
38 pages
Parallel Archit 1
No ratings yet
Parallel Archit 1
18 pages
Parallel Architecture Fundamental
No ratings yet
Parallel Architecture Fundamental
18 pages
Computer Organization Unit 1: Overview
No ratings yet
Computer Organization Unit 1: Overview
32 pages
Cs6303comparchnotes PDF
No ratings yet
Cs6303comparchnotes PDF
250 pages
Cours 1
No ratings yet
Cours 1
38 pages
PDF
No ratings yet
PDF
41 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
Solihin Chapter 1 Slides
No ratings yet
Solihin Chapter 1 Slides
23 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Lec21 Career
No ratings yet
Lec21 Career
55 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
44 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
L1-intro
No ratings yet
L1-intro
23 pages
Chapter 1 Edit PDF
No ratings yet
Chapter 1 Edit PDF
40 pages
Chapter 1 Edit
No ratings yet
Chapter 1 Edit
463 pages
Parallel Programming- Unit 1
No ratings yet
Parallel Programming- Unit 1
81 pages
Lec 2
No ratings yet
Lec 2
17 pages
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
No ratings yet
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
39 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Cs1304-Computer Architecture Department of Cse & It
No ratings yet
Cs1304-Computer Architecture Department of Cse & It
105 pages
ACA Notes UNIT-1
No ratings yet
ACA Notes UNIT-1
20 pages
Multiprocessing vs Multithreading 2
No ratings yet
Multiprocessing vs Multithreading 2
16 pages
COMP 303 Lecture 1 (CH 1)
No ratings yet
COMP 303 Lecture 1 (CH 1)
21 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
26 pages
Cse.m-ii-Advances in Computer Architecture (12scs23) - Notes
No ratings yet
Cse.m-ii-Advances in Computer Architecture (12scs23) - Notes
213 pages
A Brief History of Computer Architecture
No ratings yet
A Brief History of Computer Architecture
12 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
chapter_1
No ratings yet
chapter_1
53 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Introduction To PCA
No ratings yet
Introduction To PCA
7 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Computer Architecture and Operating Systems (CS31702)
No ratings yet
Computer Architecture and Operating Systems (CS31702)
30 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
CAO-Fall-2024-Lecture-01-Introduction-Motivation
No ratings yet
CAO-Fall-2024-Lecture-01-Introduction-Motivation
68 pages
ch1 PC
No ratings yet
ch1 PC
84 pages
Ec8552 Computer Architecture and Organization
No ratings yet
Ec8552 Computer Architecture and Organization
106 pages
SP23 CS 212 Week 2
No ratings yet
SP23 CS 212 Week 2
23 pages
Unit 5
No ratings yet
Unit 5
66 pages
Comp422 534 2020 Lecture1 Introduction
No ratings yet
Comp422 534 2020 Lecture1 Introduction
49 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
ACA Notes
No ratings yet
ACA Notes
156 pages
Basic Ideas and Definition Major Components of Software/Hardware Computer Revolution
No ratings yet
Basic Ideas and Definition Major Components of Software/Hardware Computer Revolution
5 pages
CS-3006_2_PDC_Overview_compressed
No ratings yet
CS-3006_2_PDC_Overview_compressed
107 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Lecture05 Branches
No ratings yet
Lecture05 Branches
47 pages
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
No ratings yet
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
13 pages
ECE 338 Parallel Computer Architecture Spring 2022: Basic MIPS Pipeline Review
No ratings yet
ECE 338 Parallel Computer Architecture Spring 2022: Basic MIPS Pipeline Review
11 pages
Learning Quiz 3 - Discrete Random Variables - Jupyter Notebook
No ratings yet
Learning Quiz 3 - Discrete Random Variables - Jupyter Notebook
15 pages
Second Conference On Artificial Intelligence For Defence, Actes de Conference
No ratings yet
Second Conference On Artificial Intelligence For Defence, Actes de Conference
177 pages
Multi-Processor Embedded Systems
No ratings yet
Multi-Processor Embedded Systems
44 pages
NMCNTT-02-Data Manipulation
No ratings yet
NMCNTT-02-Data Manipulation
66 pages
Single Multi Core Comparision Report Final
No ratings yet
Single Multi Core Comparision Report Final
102 pages
Swift Shader
No ratings yet
Swift Shader
5 pages
Unit-3
No ratings yet
Unit-3
69 pages
ADEGBITE Mary BUS410
No ratings yet
ADEGBITE Mary BUS410
3 pages
HP Compaq 8100 Elite Business PC Data Sheet FINALv2
No ratings yet
HP Compaq 8100 Elite Business PC Data Sheet FINALv2
2 pages
Factored Operating Systems (Fos) : The Case For A Scalable Operating System For Multicores
No ratings yet
Factored Operating Systems (Fos) : The Case For A Scalable Operating System For Multicores
11 pages
Module 5 Edited
No ratings yet
Module 5 Edited
12 pages
Siemens Workstation Configuration Guide
No ratings yet
Siemens Workstation Configuration Guide
2 pages
Group No. Course Code Course Title Unique Code
No ratings yet
Group No. Course Code Course Title Unique Code
86 pages
BM3551 Embedded Systems and IoMT Lecture Notes 1
100% (1)
BM3551 Embedded Systems and IoMT Lecture Notes 1
137 pages
Components of Embedded System
No ratings yet
Components of Embedded System
7 pages
The Basic Architecture of Avid Media Composer
No ratings yet
The Basic Architecture of Avid Media Composer
11 pages
Chapter 9 Multicore Systems
No ratings yet
Chapter 9 Multicore Systems
203 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
61 pages
RES3DINV VERI IRAWAN SAPUTRA 22011013.rar
No ratings yet
RES3DINV VERI IRAWAN SAPUTRA 22011013.rar
18 pages
Quickspecs: HP Pro Tower 280 G9 Pci Desktop PC
No ratings yet
Quickspecs: HP Pro Tower 280 G9 Pci Desktop PC
41 pages
The Design of Optimized RISC Processor For Edge Artificial Intelligence Based On Custom Instruction Set Extension
No ratings yet
The Design of Optimized RISC Processor For Edge Artificial Intelligence Based On Custom Instruction Set Extension
13 pages
ARM-CEVA White Paper_2014 Release
No ratings yet
ARM-CEVA White Paper_2014 Release
20 pages
ANSYS Mechanical APDL Parallel Processing Guide PDF
No ratings yet
ANSYS Mechanical APDL Parallel Processing Guide PDF
56 pages
Lecture Note 3-Computer Processing
No ratings yet
Lecture Note 3-Computer Processing
3 pages
Autodyn Parallel Processing Guide
No ratings yet
Autodyn Parallel Processing Guide
52 pages
Shakti
0% (1)
Shakti
32 pages
Sun Grid Engine Tutorial
No ratings yet
Sun Grid Engine Tutorial
14 pages
Bary Pangrle - Mentor Track D
No ratings yet
Bary Pangrle - Mentor Track D
34 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ΕCE 338 Parallel Computer Architecture Spring 2022

Uploaded by

ΕCE 338 Parallel Computer Architecture Spring 2022

Uploaded by

ΕCE 338

Parallel Computer Architecture

Electrical and Computer Engineering Department

ECE 338: Parallel Computer Architecture 1

Class lectures - Labs: Tuesdays 11:00-13:00,

ECE 340: Parallel Computer Architecture 2

ECE 338: Parallel Computer 3

ECE 338: Parallel Computer 4

ECE 338: Parallel Computer 5

• Warehouse Scale Computing (WSC)

• Various topics (time permitting)

ECE 338: Parallel Computer 6

Eπιλεγμένες δημοσιεύσεις από συνέδρια

ECE 338: Parallel Computer 7

ECE 338: Parallel Computer 8

ECE 338: Parallel Computer 9

ENIAC, “US Army photo, around 1946”

Xilinx Ultrascale+ FPGA

ECE 338: Parallel Computer 11

• Computer architecture comprises at least three main

ECE 338: Parallel Computer 12

ECE 338: Parallel Computer 13

ECE 338: Parallel Computer 14

ECE 338: Parallel Computer 15

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ECE 338: Parallel Computer 16

• Hazards prevent next instruction from executing during its

ECE 338: Parallel Computer 18

Main Memory Blocks Cache/memory cntl

ECE 338: Parallel Computer 19

ECE 338: Parallel Computer 20

(1-α)*Τ α*Τ / Speedup enhanced

ECE 338: Parallel Computer 22

Amdahl’s law main idea: the sequential part of an application

ECE 338: Parallel Computer 23

Inst Count CPI Clock Rate

ECE 338: Parallel Computer 24

ECE 338: Parallel Computer 25

• General-purpose single cores have stopped

ECE 338: Parallel Computer 26

1 transistor = 1x energy 2 transistors= 1x energy 4 transistors = 1x energy

after 2 yrs after 2 yrs

ECE 338: Parallel Computer 27

ECE 338: Parallel Computer Architecture 28

1 transistor = 1x energy 2 transistors > 1x energy 4 transistors >> 1x energy

after 2 yrs after 2 yrs

ECE 338: Parallel Computer 29

ECE 338: Parallel Computer 30

ECE 338: Parallel Computer 31

ECE 338: Parallel Computer 32

• Domain specific processors (Google’s TPU, Security processors)

ECE 338: Parallel Computer 35

ECE 338: Parallel Computer 36

ECE 338: Parallel Computer 37

– 50% performance improvement every year in the 80's

ECE 338: Parallel Computer 38

ECE 338: Parallel Computer 39

• RISC II (1983): 32-bit, 5 stage

• IBM Power 9 (2017): 24-core, 64-bit, 96

• State of art is 7nm (0.007 micron) in 2019

ECE 338: Parallel Computer 40

Hybrid Main Memory

Heterogeneous Persistent Memory/Storage

ECE 338: Parallel Computer 43

ECE 338: Parallel Computer 44

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

(1-α)Τ αΤ / Speedup enhanced