0% found this document useful (0 votes)

141 views

COA UNIT-III Parallel Processors

The document discusses parallel processing and Flynn's classifications of computer architectures. It describes parallel processing challenges like branch prediction and finite registers. It also covers instruction-level parallelism exploitation techniques like loop unrolling. Flynn's taxonomy includes SISD, SIMD, MISD, and MIMD models based on the number of instruction and data streams. SISD refers to the traditional single processor von Neumann model, while SIMD involves simultaneous execution of a single instruction over multiple data elements.

Uploaded by

Devika csbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views

COA UNIT-III Parallel Processors

Uploaded by

Devika csbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 51

COMPUTER ORGANIZATION AND ARCHITECTURE

II Year / III Semester

UNIT III

1
PARALLEL PROCESSOR
• Parallel processing and its challenges
• Instruction level parallelism
• Flynn's classification
• Hardware multithreading: SISD, MIMD, SIMD,
SPMD and Vector multithreading
• Multicore processors: Shared memory
multiprocessor and cluster multiprocessor
UNIT-5

Parallel Processing
UNIT-5

Parallel Processing
Parallel processing
• A parallel processing system can be achieved by having a
multiplicity of functional units that perform identical or
different operations simultaneously. The data can be
distributed among various multiple functional units.
• The following diagram shows one possible way of
separating the execution unit into eight functional units
operating in parallel.
• The operation performed in each functional unit is
indicated in each block if the diagram:
Parallel processing
• The adder and integer multiplier performs the arithmetic
operation with integer numbers.
• The floating-point operations are separated into three
circuits operating in parallel.
• The logic, shift, and increment operations can be
performed concurrently on different data. All units are
independent of each other, so one number can be shifted
while another number is being incremented.
PARALLEL PROCESSING CHALLENGES
The Hardware Model
• An ideal processor is one where all constraints on ILP are
removed. The only limits on ILP in such a processor are those
imposed by the actual data flows through either registers or
memory.

• The assumptions made for an ideal or perfect processor are as

follows:

1) Register renaming
There are an infinite number of virtual registers available, and hence
all WAW and WAR hazards are avoided and an unbounded
number of instructions can begin execution simultaneously.

2) Branch prediction
Branch prediction is perfect. All conditional branches are predicted
exactly.
PARALLEL PROCESSING CHALLENGES

3.Jump prediction
• —All jumps (including jump register used for return and
computed jumps) are perfectly predicted. When combined with
perfect branch prediction, this is equivalent to having a processor
with perfect speculation and an unbounded buffer of instructions
available for execution.

4.Memory address alias analysis
• —All memory addresses are known exactly, and a load can be
moved before a store provided that the addresses are not
identical. Note that this implements perfect address alias analysis.
PARALLEL PROCESSING CHALLENGES

5.Perfect caches
• —All memory accesses take 1 clock cycle. In practice, superscalar
processors will typically consume large amounts of ILP hiding
cache misses, making these results highly optimistic.
PARALLEL PROCESSING CHALLENGES
Limitations on the Window Size and Maximum Issue Count

• To build a processor that even comes close to perfect branch

prediction and perfect alias analysis requires extensive dynamic
analysis, since static compile time schemes cannot be perfect. Of
course, most realistic dynamic schemes will not be perfect, but
the use of dynamic schemes will provide the ability to uncover
parallelism that cannot be analyzed by static compile time
analysis.

• Thus, a dynamic processor might be able to more closely

match the amount of parallelism uncovered by our ideal
processor.
PARALLEL PROCESSING CHALLENGES
The Effects of Realistic Branch and Jump Prediction

• Our ideal processor assumes that branches can be perfectly

predicted: The outcome of any branch in the program is
known before the first instruction is executed! Of course,
no real processor can ever achieve this.
• We assume a separate predictor is used for jumps. Jump
predictors are important primarily with the most accurate branch
predictors, since the branch frequency is higher and the accuracy
of the branch predictors dominates.

1.Perfect —All branches and jumps are perfectly predicted at the

start of execution.

2.Tournament-based
branch predictor —The prediction scheme uses a correlating 2-bit
predictor and a noncorrelating 2-bit predictor together with a
selector, which chooses the best predictor for each branch.
PARALLEL PROCESSING CHALLENGES
The Effects of Finite Registers

• Our ideal processor eliminates all name dependences among

• To date, the IBM Power5 has provided the largest numbers of

virtual registers: 88 additional floating-point and 88 additional
integer registers, in addition to the 64 registers available in the
base architecture. All 240 registers are shared by two threads
when executing in multithreading mode, and all are available to a
single thread when in single-thread mode.
PARALLEL PROCESSING CHALLENGES
The Effects of Imperfect Alias Analysis

• Our optimal model assumes that it can perfectly analyze all

memory dependences, as well as eliminate all register
name dependences. Of course, perfect alias analysis is not
possible in practice: The analysis cannot be perfect at compile
time, and it requires a potentially unbounded number of
comparisons at run time (since the number of simultaneous
memory references is unconstrained).
INSTRUCTION-LEVEL-PARALLELISM

• All processors since about 1985 use pipelining to overlap the

execution of instructions and improve performance. This potential
overlap among instructions is called instruction-level parallelism
(ILP), since the instructions can be evaluated in parallel.

• There are two largely separable approaches to exploiting ILP: an

approach that relies on hardware to help discover and exploit the
parallelism dynamically, and an approach that relies on software
technology to find parallelism, statically at compile time.
Processors using the dynamic, hardware-based approach,
including the Intel Pentium series, dominate in the market; those
using the static approach, including the Intel Itanium, have more
limited uses in scientific or application-specific environments.

• The value of the CPI (cycles per instruction) for a pipelined

processor is the sum of the base CPI and all contributions from
stalls: Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data
hazard stalls + Control stalls
INSTRUCTION-LEVEL-PARALLELISM

• The ideal pipeline CPI is a measure of the maximum performance

attainable by the implementation. By reducing each of the terms
of the right-hand side to minimize the overall pipeline CPI or,
alternatively, increase the IPC (instructions per clock).

• The simplest and most common way to increase the ILP is to

• The ideal pipeline CPI is a measure of the maximum performance

attainable by the implementation. By reducing each of the terms
of the right-hand side to minimize the overall pipeline CPI or,
alternatively, increase the IPC (instructions per clock).

• The simplest and most common way to increase the ILP is to

exploit parallelism among iterations of a loop. This type of
parallelism is often called loop-level parallelism.There are a
number of techniques for converting such loop- level parallelism
into instruction-levelparallelism. Basically, such techniques work
by unrolling the loop either statically by the compiler or
dynamically by the hardware. An important alternative method
for exploiting loop-level parallelism is the use of vector
instructions . A vector instruction exploits data- level parallelism
by operating on data items in parallel.
Flynn's Classification
• In 1966, Michael Flynn proposed a classification for computer
architectures based on the number of instruction steams and data
streams (Flynn’s Taxonomy).
• Flynn uses the stream concept for describing a machine's
structure.
• A stream simply means a sequence of items (data or
instructions).

• The classification of computer architectures based on the number

of instruction steams and data streams (Flynn’s Taxonomy).
Flynn’s Taxonomy
• SISD: Single instruction single data
Classical von Neumann architecture

• SIMD: Single instruction multiple data

• MISD: Multiple instructions single data

– Non existent, just listed for completeness

• MIMD: Multiple instructions multiple data

– Most common and general parallel machine
Multiple Processor Organization
• Single instruction, single data stream -
SISD
• Single instruction, multiple data stream -
SIMD
• Multiple instruction, single data stream -
MISD
• Multiple instruction, multiple data stream-
MIMD
Single Instruction, Single Data Stream -
SISD
• SISD (Singe-Instruction stream, Singe-Data stream)

• Single processor
• Single instruction stream
• Data stored in single memory
• Uni-processor

• SISD corresponds to the traditional mono-processor ( von

Neumann computer). A single data stream is being processed by
one instruction stream

• A single-processor computer (uni-processor) in which a single

stream of instructions is generated from the program.
Single Instruction, Multiple Data Stream -
SIMD
• SIMD (Single-Instruction stream, Multiple-Data streams)

• Single machine instruction

• Controls simultaneous execution
• Number of processing elements
• Lockstep basis
• Each processing element has associated data memory
• Each instruction executed on different set of data by different
processors
• Vector and array processors

• Each instruction is executed on a different set of data by different

processors i.e multiple processing units of the same type process
on multiple-data streams.
• This group is dedicated to array processing machines.
• ·Sometimes, vector processors can also be seen as a part of this
group.
Multiple Instruction, Single Data Stream -
MISD
MISD (Multiple-Instruction streams, Singe-Data stream)

• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction sequence
• Never been implemented
• Each processor executes a different sequence of instructions.

• In case of MISD computers, multiple processing units operate on

one single-data stream .
• In practice, this kind of organization has never been used
Multiple Instruction, Multiple Data
Stream- MIMD
• MIMD (Multiple-Instruction streams, Multiple-Data streams)
• Set of processors
• Simultaneously execute different instruction sequences
• Different sets of data
• SMPs, clusters and NUMA systems

• Each processor has a separate program.

• An instruction stream is generated from each program.
• Each instruction operates on different data.
• This last machine type builds the group for the traditional multi-
processors. Several processing units operate on multiple-data
streams
Taxonomy of Parallel Processor
Architectures
HARDWARE MULITHREADING
Exploiting Thread-Level Parallelism within a Processor.

• Multithreading allows multiple threads to share the

functional units of a single processor in an overlapping
fashion.

• To permit this sharing, the processor must duplicate the

independent state of each thread.

• For example, a separate copy of the register file, a

separate PC, and a separate page table are required for
each thread.
HARDWARE MULITHREADING
There are two main approaches to multithreading

 Fine-grained multithreading
 Coarse-grained multithreading
HARDWARE MULITHREADING

Fine-grained multithreading

 Fine-grained multithreading switches between threads on

each instruction, causing the execution of multiples
threads to be interleaved.

 This interleaving is often done in a round-robin fashion,

skipping any threads that are stalled at that time
HARDWARE MULITHREADING
HARDWARE MULITHREADING

Coarse-grained multithreading

 Coarse-grained multithreading was invented as an

alternative to fine-grained multithreading. Coarse-grained
multithreading switches threads only on costly stalls, such
as level two cache misses.

 This change relieves the need to have thread-switching be

essentially free and is much less likely to slow the
processor down, since instructions from other threads will
only be issued, when a thread encounters a costly stall
HARDWARE MULITHREADING
HARDWARE MULITHREADING
Simultaneous Multithreading:

• Simultaneous multithreading (SMT) is a variation on

multithreading that uses the resources of a multiple issue,
dynamically-scheduled processor to exploit TLP at the
same time it exploits ILP.

• The key insight that motivates SMT is that modern

multiple-issue processors often have more functional unit
parallelism available than a single thread can effectively
use.

• Furthermore, with register renaming and dynamic

scheduling, multiple instructions from independent threads
can be issued without regard to the dependences among
them; the resolution of the dependences can be handled
by the dynamic scheduling capability.
Simultaneous Multithreading:
Simultaneous Multithreading:

The following figure illustrates the differences in a processor’s

ability to exploit the resources of a superscalar for the
following processor configurations:

• a superscalar with no multithreading support,

• a superscalar with coarse-grained multithreading,
• a superscalar with fine-grained multithreading, and
• a superscalar with simultaneous multithreading.
Simultaneous Multithreading:
Simultaneous Multithreading:

• In the superscalar without multithreading support, the use of

issue slots is limited by a lack of ILP.
• In the coarse-grained multithreaded superscalar, the long stalls
are partially hidden by switching to another thread that uses the
resources of the processor.

• In the fine-grained case, the interleaving of threads eliminates

fully empty slots. Because only one thread issues instructions in a
given clock cycle.In the SMT case, thread-level parallelism (TLP)
and instruction-level parallelism (ILP) are exploited
simultaneously; with multiple threads using the issue slots in a
single clock cycle.

• The above figure greatly simplifies the real operation of these

processors it does illustrate the potential performance advantages
of multithreading in general and SMT in particular.
A multi-core processor :
 A multi-core processor is a processing system composed of two or
more independent cores (or CPUs). The cores are typically
integrated onto a single integrated circuit die (known as a chip
multiprocessor or CMP), or they may be integrated onto multiple
dies in a single chip package.

 A multi-core processor implements multiprocessing in a single

physical package. Cores in a multi-core device may be coupled
together tightly or loosely. For example, cores may or may not
share caches, and they may implement message passing or
shared memory inter-core communication methods. Common
network topologies to interconnect cores include: bus, ring, 2-
dimentional mesh, and crossbar.
A multi-core processor :
• All cores are identical in symmetric multi-core systems and they
are not identical in asymmetric multi-core systems. Just as with
single-processor systems, cores in multi-core systems may
implement architectures such as superscalar, vector processing,
or multithreading.

• Multi-core processors are widely used across many application

domains including: general-purpose, embedded, network, digital
signal processing, and graphics.
A multi-core processor :
• The amount of performance gained by the use of a multi-core
processor is strongly dependent on the software algorithms and
implementation.

• Multi-core processing is a growing industry trend as single core

processors rapidly reach the physical limits of possible complexity
and speed.

• Companies that have produced or are working on multi-core

products include AMD, ARM, Broadcom, Intel, and VIA.

• with a shared on-chip cache memory, communication events can

be reduced to just a handful of processor cycles.
A multi-core processor :
• therefore with low latencies, communication delays have a much
smaller impact on overall performance.

• threads can also be much smaller and still be effective

• automatic parallelization more feasible.

A multi-core processor :

Multiple cores run in parallel

A multi-core processor :
Properties of Multi-core systems

• Cores will be shared with a wide range of other applications

dynamically.
• ¡ Load can no longer be considered symmetric across the cores.

• ¡ Cores will likely not be asymmetric as accelerators become

common for scientific hardware.

• ¡ Source code will often be unavailable, preventing compilation

against the specific hardware configuration.
A multi-core processor :
Applications that benefit from multi-core

• Database servers
• ¡  Web servers
• ¡  Telecommunication markets
• ¡  Multimedia applications
• ¡  Scientific applications
Shared Memory Multiprocessors
• In shared-memory multiprocessors, numerous processors are
accessing one or more shared memory modules. The processors
may be physically connected to the memory modules in many
ways, but logically every processor is connected to every memory
module.
• One of the major characteristics of shared memory
multiprocessors is that all processors have equally direct access
to one large memory address space. The limitation of shared
memory multiprocessors is memory access latency.
• The figure shows shared-memory multiprocessors.
Shared Memory Multiprocessors
Shared Memory Multiprocessors
• Shared memory multiprocessors have a major benefit over other
multiprocessors since all the processors sent a similar view of the
memory.
• These processors are also termed Uniform Memory Access (UMA)
systems. This term denotes that memory is equally accessible to
every processor, providing access at the same performance rate
Clustered Multiprocessors
• The clustered system usually consists of integrating several
machines into one machine to complete tasks.
• Cluster systems are a mix of hardware clusters and software
groups.
• Hardware clusters help to share high-performance disks among
devices.
• The device clusters make both systems work together. Every
node of the clustered systems contains the cluster program.
Clustered Multiprocessors
Clustered Multiprocessors

TouchCode Class 6
From Everand
TouchCode Class 6
Team Orange
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
67 pages
Forensic Physics Reviewer
No ratings yet
Forensic Physics Reviewer
14 pages
Work Zone Traffic Management Guide Version 1.2 - November 2015 PDF
100% (1)
Work Zone Traffic Management Guide Version 1.2 - November 2015 PDF
365 pages
Transported by The Lion of Judah
100% (1)
Transported by The Lion of Judah
85 pages
Touchpad Plus Ver. 2.1 Class 2
From Everand
Touchpad Plus Ver. 2.1 Class 2
Team Orange
No ratings yet
Lecture 2 PP
No ratings yet
Lecture 2 PP
15 pages
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
No ratings yet
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
15 pages
Parallelism
No ratings yet
Parallelism
22 pages
UNIT 1 (1)
No ratings yet
UNIT 1 (1)
34 pages
CS405 Csa M1
No ratings yet
CS405 Csa M1
5 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
ACA1
No ratings yet
ACA1
26 pages
Unit 5
No ratings yet
Unit 5
44 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
8. Module3
No ratings yet
8. Module3
49 pages
CSC 580 - Chapter 2
No ratings yet
CSC 580 - Chapter 2
50 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
2 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Computer Architecture
No ratings yet
Computer Architecture
12 pages
Unit 5-2 COA
No ratings yet
Unit 5-2 COA
52 pages
Final
No ratings yet
Final
26 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Unit V
No ratings yet
Unit V
95 pages
04 Hardware
No ratings yet
04 Hardware
109 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Parallel Processing Chapter - 2
0% (1)
Parallel Processing Chapter - 2
135 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Unit 6 - Pipeline, Vector Processing and Multiprocessors
No ratings yet
Unit 6 - Pipeline, Vector Processing and Multiprocessors
23 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
COMPUTER ARCHITECTURE ASSIGNMENT GROUP 3
No ratings yet
COMPUTER ARCHITECTURE ASSIGNMENT GROUP 3
15 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Parallel Processing: sp2016 Lec#3
No ratings yet
Parallel Processing: sp2016 Lec#3
23 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Instruction Level Parallelism: Module 5: Chapter 12
No ratings yet
Instruction Level Parallelism: Module 5: Chapter 12
13 pages
Classification of Parallel Architecture Designs
No ratings yet
Classification of Parallel Architecture Designs
22 pages
StudM1p1Parallel Computer Modelsppt1shared
No ratings yet
StudM1p1Parallel Computer Modelsppt1shared
107 pages
Advanced Computer Architecture: The Architecture of Parallel Computers
No ratings yet
Advanced Computer Architecture: The Architecture of Parallel Computers
44 pages
Advanced Computer Architecture: The Architecture of Parallel Computers
No ratings yet
Advanced Computer Architecture: The Architecture of Parallel Computers
44 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Introduction To High Performance Computing: Unit-I
No ratings yet
Introduction To High Performance Computing: Unit-I
70 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
42 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
57 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Chapter 08 - Pipeline and Vector Processing
No ratings yet
Chapter 08 - Pipeline and Vector Processing
14 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
Unit 5
No ratings yet
Unit 5
96 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
CompanionAsset 9780128119051 Chapter03 (3)
No ratings yet
CompanionAsset 9780128119051 Chapter03 (3)
67 pages
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
No ratings yet
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
26 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
Computer Architecture A Quantitative Approach 2nd Edition 1gcu6vr0gn
No ratings yet
Computer Architecture A Quantitative Approach 2nd Edition 1gcu6vr0gn
7 pages
005-SimultaneousMultithreading
No ratings yet
005-SimultaneousMultithreading
50 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Coa Unit Iv
No ratings yet
Coa Unit Iv
147 pages
COA UNIT - III Processor and Control Unit
No ratings yet
COA UNIT - III Processor and Control Unit
127 pages
COA - Unit2 Floating Point Arithmetic 3
No ratings yet
COA - Unit2 Floating Point Arithmetic 3
19 pages
COA - UNIT 2 - Floating Point Arithmetic 1
No ratings yet
COA - UNIT 2 - Floating Point Arithmetic 1
19 pages
COA - Unit 2 Data Representation 1
No ratings yet
COA - Unit 2 Data Representation 1
59 pages
Addressing Modes Notes With Examples
No ratings yet
Addressing Modes Notes With Examples
10 pages
Enrichment Lesson Plan: Animals and Groups
No ratings yet
Enrichment Lesson Plan: Animals and Groups
4 pages
Calibration Procedures
71% (7)
Calibration Procedures
24 pages
AMOCO (American Oil Company)
No ratings yet
AMOCO (American Oil Company)
20 pages
UPSC Questions Geography - With Answers PDF
No ratings yet
UPSC Questions Geography - With Answers PDF
117 pages
Mode of Operation of The EEV IOT PDF
No ratings yet
Mode of Operation of The EEV IOT PDF
19 pages
Detailed Lesson Plan in T L E
No ratings yet
Detailed Lesson Plan in T L E
3 pages
ES Q2 Week-4a-1
No ratings yet
ES Q2 Week-4a-1
10 pages
Artists Down Under - 08.2024
No ratings yet
Artists Down Under - 08.2024
92 pages
JBL Full
No ratings yet
JBL Full
1 page
ACA Notes UNIT-1
No ratings yet
ACA Notes UNIT-1
20 pages
Blood Bank Biosafety & Biomedical Waste Management
No ratings yet
Blood Bank Biosafety & Biomedical Waste Management
43 pages
[Mutants_and_Masterminds]The_Super_Villain_Handbook_Deluxe_Edition_Conversion_Pack
100% (1)
[Mutants_and_Masterminds]The_Super_Villain_Handbook_Deluxe_Edition_Conversion_Pack
67 pages
Bohemian Queen
33% (6)
Bohemian Queen
7 pages
Brocade 6510 Switch: Flexible, Easy-to-Use Enterprise-Class SAN Switch For Private Cloud Storage
No ratings yet
Brocade 6510 Switch: Flexible, Easy-to-Use Enterprise-Class SAN Switch For Private Cloud Storage
6 pages
Is Mars Effect A Social Effect
No ratings yet
Is Mars Effect A Social Effect
12 pages
Gry D Gry I Eu Service Course en
No ratings yet
Gry D Gry I Eu Service Course en
43 pages
TCH M900
No ratings yet
TCH M900
36 pages
Chemistry Holiday Home Work
No ratings yet
Chemistry Holiday Home Work
3 pages
[Ebooks PDF] download The Impact of Peri-Urbanisation on Housing Development: Environmental Quality and Residents' Productivity in Ibeju-Lekki, Lagos 2nd Edition Dr. Adedirefunmilayo Mokunfayo full chapters
100% (5)
[Ebooks PDF] download The Impact of Peri-Urbanisation on Housing Development: Environmental Quality and Residents' Productivity in Ibeju-Lekki, Lagos 2nd Edition Dr. Adedirefunmilayo Mokunfayo full chapters
47 pages
VOLKSWAGEN GOLF GTD Specifications / June 2010
No ratings yet
VOLKSWAGEN GOLF GTD Specifications / June 2010
14 pages
Tumkur ZONAL REGULATION
No ratings yet
Tumkur ZONAL REGULATION
55 pages
Jurnal Stabilitas Suhu PCT
No ratings yet
Jurnal Stabilitas Suhu PCT
7 pages
OBSBOT Meet 4K User Manual - EN
No ratings yet
OBSBOT Meet 4K User Manual - EN
6 pages
Ata Chapters 100 Memorize
No ratings yet
Ata Chapters 100 Memorize
3 pages
OB-ULT-Technical-data-sheet_2018
No ratings yet
OB-ULT-Technical-data-sheet_2018
6 pages
R.E Marking Key
No ratings yet
R.E Marking Key
31 pages
ME-495 Laboratory Exercise Number 3 ME Dept, SDSU - Kassegne
No ratings yet
ME-495 Laboratory Exercise Number 3 ME Dept, SDSU - Kassegne
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

COA UNIT-III Parallel Processors

Uploaded by

COA UNIT-III Parallel Processors

Uploaded by

COMPUTER ORGANIZATION AND ARCHITECTURE

II Year / III Semester

• The assumptions made for an ideal or perfect processor are as

• To build a processor that even comes close to perfect branch

• Thus, a dynamic processor might be able to more closely

• Our ideal processor assumes that branches can be perfectly

1.Perfect —All branches and jumps are perfectly predicted at the

• Our ideal processor eliminates all name dependences among

• To date, the IBM Power5 has provided the largest numbers of

• Our optimal model assumes that it can perfectly analyze all

• All processors since about 1985 use pipelining to overlap the

• There are two largely separable approaches to exploiting ILP: an

• The value of the CPI (cycles per instruction) for a pipelined

• The ideal pipeline CPI is a measure of the maximum performance

• The simplest and most common way to increase the ILP is to

• The ideal pipeline CPI is a measure of the maximum performance

• The simplest and most common way to increase the ILP is to

• The classification of computer architectures based on the number

• SIMD: Single instruction multiple data

• MISD: Multiple instructions single data

• MIMD: Multiple instructions multiple data

• SISD corresponds to the traditional mono-processor ( von

• A single-processor computer (uni-processor) in which a single

• Single machine instruction

• Each instruction is executed on a different set of data by different

• In case of MISD computers, multiple processing units operate on

• Each processor has a separate program.

• Multithreading allows multiple threads to share the

• To permit this sharing, the processor must duplicate the

• For example, a separate copy of the register file, a

 Fine-grained multithreading switches between threads on

 This interleaving is often done in a round-robin fashion,

 Coarse-grained multithreading was invented as an

 This change relieves the need to have thread-switching be

• Simultaneous multithreading (SMT) is a variation on

• The key insight that motivates SMT is that modern

• Furthermore, with register renaming and dynamic

The following figure illustrates the differences in a processor’s

• a superscalar with no multithreading support,

• In the superscalar without multithreading support, the use of

• In the fine-grained case, the interleaving of threads eliminates

• The above figure greatly simplifies the real operation of these

 A multi-core processor implements multiprocessing in a single

• Multi-core processors are widely used across many application

• Multi-core processing is a growing industry trend as single core

• Companies that have produced or are working on multi-core

• with a shared on-chip cache memory, communication events can

• threads can also be much smaller and still be effective

• automatic parallelization more feasible.

Multiple cores run in parallel

• Cores will be shared with a wide range of other applications

• ¡ Cores will likely not be asymmetric as accelerators become

• ¡ Source code will often be unavailable, preventing compilation

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.