0% found this document useful (0 votes)
124 views86 pages

Unit 5

coa u 5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views86 pages

Unit 5

coa u 5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 86

COMPUTER ORGANIZATION AND ARCHITECTURE

UNIT-V
Syllabus

• Reduced Instruction Set Computer: CISC Characteristics, RISC Characteristics.

• Pipeline and Vector Processing: Parallel Processing, Pipelining, Arithmetic Pipeline, Instruction Pipeline, RISC Pipeline, Vector
Processing, Array Processor. Multi Processors: Characteristics of Multiprocessors, Interconnection Structures, Interprocessor
arbitration, Interprocessor communication and synchronization, Cache Coherence.

G.SWARNALATHA,Asst.Professor, GNITC
V
 List of Topics
Reduced Instruction Set Computer:
•CISC Characteristics,
•RISC Characteristics.
Pipeline and Vector Processing:
•Parallel Processing
•Pipelining, Arithmetic Pipeline
•Instruction Pipeline
•RISC Pipeline
•Vector Processing
•Array Processor
Multi Processors
•Characteristics of Multiprocessors
•Interconnection Structures
• Interprocessor arbitration,
• Interprocessor communication and synchronization
• Cache Coherence. G.SWARNALATHA,Asst.Professor, GNITC
V RISC and CISC

RISC and CISC Computers


• An important aspect of computer architecture is the design of the
instruction set for the processor. The instruction set chosen for a
particular computer determines the way that machine language
programs are constructed.

• A computer with a large number of instructions is classified as a


complex instruction set computer, abbreviated CISC.

• 1980s, a number of computer designers recommended that computers


use fewer instructions with simple constructs so they can be executed
much faster within the CPU without having to use memory as often.
This type of computer is classified as a reduced instruction set RISC

G.SWARNALATHA,Asst.Professor, GNITC
V CISC
CISC Characteristics
• One reason for the trend to provide a complex instruction set is
the desire to simplify the compilation and improve the overall
computer performance.
• The task of a compiler is to generate a sequence of machine
instructions for each high-level language statement.
• The task is simplified if there are machine instructions that
implement the statements directly.
• The essential goal of a CISC architecture is to attempt to provide a
single machine instruction for each statement that is written in a
high-level language.
• Examples of CISC architectures are the Digital Equipment Corporation VAX computer and the
IBM 370 computer.

G.SWARNALATHA,Asst.Professor, GNITC
V CISC
CISC Characteristics

• Another characteristic of CISC architecture is the incorporation of variable-


length instruction formats.
• Instructions that require register operands may be only two bytes in length, but
instructions that need two memory addresses may need five bytes to include
the entire instruction code.
• The instructions in a typical CISC processor provide direct manipulation of
operands residing in memory.
• However, as more instructions and addressing modes are incorporated into a
computer, the more hardware logic is needed to implement and support them,
and this may cause the computations to slow down.

G.SWARNALATHA,Asst.Professor, GNITC
V CISC
CISC Characteristics

• In summary, the major characteristics of CISC architecture are:


1. A large number of instructions-typically from 100 to 250 instructions
2. Some instructions that perform specialized tasks and are used infrequently
3. A large variety of addressing modes-typically from 5 to 20 different modes
4. Variable-length instruction formats
5. Instructions that manipulate operands in memory

G.SWARNALATHA,Asst.Professor, GNITC
V RISC
RISC Characteristics
• The concept of RISC architecture involves an attempt to reduce execution time
by simplifying the instruction set of the computer.
The major characteristics of a RISC processor are:
1. Relatively few instructions
2. Relatively few addressing modes
3. Memory access limited to load and store instructions
4. All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction format
6. Single-cycle instruction execution
7. Hardwired rather than microprogrammed control
8. A relatively large number of registers in the processor unit
9. Efficient instruction pipeline

G.SWARNALATHA,Asst.Professor, GNITC
V RISC
RISC Characteristics

• A characteristic of RISC processors is their ability to execute one


instruction per clock cycle. This is done by overlapping the fetch,
decode, and execute phases of two or three instructions by using a
procedure referred to as pipelining.

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing
V
PIPELINING AND VECTOR PROCESSING

• Parallel Processing

• Pipelining

• Arithmetic Pipeline

• Instruction Pipeline

• RISC Pipeline

• Vector Processing

• Array Processors

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V
PARALLEL PROCESSING

Execution of Concurrent Events in the computing


process to achieve faster Computational Speed

Levels of Parallel Processing

- Job or Program level

- Task or Procedure level

- Inter-Instruction level

- Intra-Instruction level

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V
PARALLEL COMPUTERS
Architectural Classification

• Flynn's classification
• Based on the multiplicity of Instruction Streams and Data Streams
• Instruction Stream
• Sequence of Instructions read from memory
• Data Stream
• Operations performed on the data in the processor

Number of Data Streams


Single Multiple

Number of Single SISD SIMD


Instruction
Streams Multiple MISD MIMD
G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V COMPUTER ARCHITECTURES FOR PARALLEL
PROCESSING
SISD Superscalar processors
Von-Neuman
based
Superpipelined processors

VLIW

MISD Nonexistence

SIMD Array processors

Systolic arrays
Dataflow
Associative processors

MIMD Shared-memory multiprocessors


Reduction
Bus based
Crossbar switch based
Multistage IN based

Message-passing multicomputers

Hypercube
Mesh
Reconfigurable

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V
SISD COMPUTER SYSTEMS
Control Processor Data stream Memory
Unit Unit

Instruction stream

Characteristics

- Standard von Neumann machine


- Instructions and data are stored in memory
- One operation at a time

Limitations

Von Neumann bottleneck

Maximum speed of the system is limited by the


Memory Bandwidth (bits/sec or bytes/sec)

- Limitation on Memory Bandwidth


- Memory is shared by CPU and I/O G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing

SISD PERFORMANCE IMPROVEMENTS

• Multiprogramming
• Spooling
• Multifunction processor
• Pipelining
• Exploiting instruction-level parallelism
- Superscalar
- Superpipelining
- VLIW (Very Long Instruction Word)

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing

MISD COMPUTER SYSTEMS

M CU P

M CU P
Memory
• •
• •
• •

M CU Data stream
P

Instruction stream

Characteristics
- There is no computer at present that can be
classified as MISD

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V
SIMD COMPUTER SYSTEMS
Memory

Data bus

Control Unit
Instruction stream

P P ••• P Processor units

Data stream

Alignment network

M M ••• M Memory modules

Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V
TYPES OF SIMD COMPUTERS

Array Processors

- The control unit broadcasts instructions to all PEs,


and all active PEs execute the same instructions
- ILLIAC IV, GF-11, Connection Machine, DAP, MPP

Systolic Arrays

- Regular arrangement of a large number of


very simple processors constructed on
VLSI circuits
- CMU Warp, Purdue CHiP

Associative Processors

- Content addressing
- Data transformation operations over many sets
of arguments with a single instruction G.SWARNALATHA,Asst.Professor, GNITC
- STARAN, PEPE
Pipelining and Vector Processing Parallel Processing
V
MIMD COMPUTER SYSTEMS
P M P M ••• P M

Interconnection Network

Shared Memory

Characteristics
- Multiple processing units

- Execution of multiple instructions on multiple data

Types of MIMD computer systems


- Shared memory multiprocessors

- Message-passing multicomputers
G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V
SHARED MEMORY MULTIPROCESSORS
M M ••• M

Buses,
Interconnection Network(IN) Multistage IN,
Crossbar Switch

P P ••• P

Characteristics
All processors have equally direct access to
one large memory address space
Example systems
Bus and cache-based systems
- Sequent Balance, Encore Multimax
Multistage IN-based systems
- Ultracomputer, Butterfly, RP3, HEP
Crossbar switch-based systems
- C.mmp, Alliant FX/8
Limitations
Memory access latency
Hot spot problem
G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Parallel Processing
V
MESSAGE-PASSING MULTICOMPUTER
Message-Passing Network Point-to-point connections

P P ••• P

M M ••• M

Characteristics
- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing

Example systems
- Tree structure: Teradata, DADO
- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III

Limitations

- Communication overhead
- Hard to programming G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Pipelining
V
PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci

Segment 1
R1 R2

Multiplier
Segment 2

R3 R4

Adder
Segment 3

R5

R1  Ai, R2  Bi Load Ai and Bi


R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Pipelining
V
OPERATIONS IN EACH PIPELINE STAGE

Clock
Segment 1 Segment 2 Segment 3
Pulse
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Pipelining
V
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

Space-Time Diagram
1 2 3 4 5 6 7 8 9
Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6

2 T1 T2 T3 T4 T5 T6

3 T1 T2 T3 T4 T5 T6

4 T1 T2 T3 T4 T5 T6

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Pipelining
V
PIPELINE SPEEDUP
n: Number of tasks to be performed

Conventional Machine (Non-Pipelined)


tn: Clock cycle
t1: Time required to complete the n tasks
t1 = n * t n

Pipelined Machine (k stages)


tp: Clock cycle (time to complete each suboperation)
tk: Time required to complete the n tasks
tk = (k + n - 1) * tp

Speedup
Sk: Speedup

Sk = n*tn / (k + n - 1)*tp
tn
lim Sk = ( = k, if tn = k * tp )
n tp
G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Pipelining
V
PIPELINE AND MULTIPLE FUNCTION UNITS
Example
- 4-stage pipeline
- subopertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS

Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS

Speedup
Sk = 8000 / 2060 = 3.88
Ii I i+1 I i+2 I i+3

4-Stage Pipeline is basically identical to the system


with 4 identical function units
Multiple Functional Units P1 P2 P3 P4

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Arithmetic Pipeline
V
ARITHMETIC PIPELINE
Floating-point adder Exponents Mantissas
a b A B
X = A x 2a
Y = B x 2b
R R

[1] Compare the exponents Compare Difference


[2] Align the mantissa Segment 1: exponents
[3] Add/sub the mantissa by subtraction
[4] Normalize the result

Segment 2: Choose exponent Align mantissa

Segment 3: Add or subtract


mantissas

R R

Segment 4: Adjust Normalize


exponent result

R R

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Arithmetic Pipeline
V
4-STAGE FLOATING POINT ADDER
A = a x 2p B = b x 2q
p a q b

Stages: Other
Exponent fraction Fraction
S1 subtractor selector
Fraction with min(p,q)
r = max(p,q)
Right shifter
t = |p - q|

S2 Fraction
adder
r c

Leading zero
S3 counter
c
Left shifter
r

d
Exponent
S4 adder

s d
C = A + B = c x 2r = d x 2s
(r = max (p,q), 0.5  d < 1) G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Instruction Pipeline
V
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place

* Some instructions skip some phases


* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase

==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory


[2] DA: Decode the instruction and calculate
the effective address of the operand
[3] FO: Fetch the operand G.SWARNALATHA,Asst.Professor, GNITC
[4] EX: Execute the operation
Pipelining and Vector Processing Instruction Pipeline
V
INSTRUCTION PIPELINE

Execution of Three Instructions in a 4-Stage Pipeline


Conventional

i FI DA FO EX

i+1 FI DA FO EX

i+2 FI DA FO EX

Pipelined

i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Instruction Pipeline
V
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

Segment1: Fetch instruction


from memory

Decode instruction
Segment2: and calculate
effective address

yes Branch?
no
Fetch operand
Segment3: from memory

Segment4: Execute instruction

Interrupt yes
Interrupt?
handling
no
Update PC

Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
G.SWARNALATHA,Asst.Professor, GNITC7 FI DA FO EX
Pipelining and Vector Processing Instruction Pipeline
V
MAJOR HAZARDS IN PIPELINED EXECUTION
Structural hazards(Resource Conflicts)
Hardware Resources required by the instructions in
simultaneous overlapped execution cannot be met
Data hazards (Data Dependency Conflicts)
An instruction scheduled to be executed in the pipeline requires the
result of a previous instruction, which is not yet available

R1 <- B + C ADD DA B,C + Data dependency


R1 <- R1 + 1
INC DA bubble R1 +1
Control hazards
Branches and other instructions that change the PC
make the fetch of the next instruction to be delayed
JMP ID PC + PC Branch address dependency

bubble IF ID OF OE OS

Hazards in pipelines may make it Pipeline Interlock:


necessary to stall the pipeline Detect Hazards Stall until it is cleared
G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Instruction Pipeline
V
STRUCTURAL HAZARDS
Structural Hazards
Occur when some resource has not been
duplicated enough to allow all combinations
of instructions in the pipeline to execute

Example: With one memory-port, a data and an instruction fetch


cannot be initiated in the same clock
i FI DA FO EX

i+1 FI DA FO EX

i+2 stall stall FI DA FO EX

The Pipeline is stalled for a structural hazard


<- Two Loads with one port memory
-> Two-port memory will serve without stall

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Instruction Pipeline
V
DATA HAZARDS
Data Hazards

Occurs when the execution of an instruction


depends on the results of a previous instruction
ADD R1, R2, R3
SUB R4, R1, R5
Data hazard can be dealt with either hardware
techniques or software technique
Hardware Technique

Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible

Software Technique G.SWARNALATHA,Asst.Professor, GNITC


Instruction Scheduling(compiler) for delayed load
Pipelining and Vector Processing Instruction Pipeline
V
FORWARDING HARDWARE

Example: Register
file
ADD R1, R2, R3
SUB R4, R1, R5

3-stage Pipeline MUX MUX Bypass


path
Result
I: Instruction Fetch write bus
A: Decode, Read Registers, ALU

ALU Operations
E: Write the result to the
R4
destination register
ALU result buffer

ADD I A E

SUB I A E Without Bypassing

SUB I A E With Bypassing

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Instruction Pipeline
V
INSTRUCTION SCHEDULING
a = b + c;
d = e - f;

Unscheduled code: Scheduled Code:


LW Rb, b LW Rb, b
LW Rc, c LW Rc, c
ADD Ra, Rb, Rc LW Re, e
SW a, Ra ADD Ra, Rb, Rc
LW Re, e LW Rf, f
LW Rf, f SW a, Ra
SUB Rd, Re, Rf SUB Rd, Re, Rf
SW d, Rd SW d, Rd

Delayed Load
A load requiring that the following instruction not use its result

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Instruction Pipeline
V
CONTROL HAZARDS
Branch Instructions

- Branch target address is not known until


the branch instruction is completed
Branch
Instruction FI DA FO EX

Next
FI DA FO EX
Instruction

Target address available

- Stall -> waste of cycle times


Dealing with Control Hazards

* Prefetch Target Instruction


* Branch Target Buffer
* Loop Buffer
* Branch Prediction
* Delayed Branch G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Instruction Pipeline
V
CONTROL HAZARDS
Prefetch Target Instruction
• Fetch instructions in both streams, branch not taken and branch taken
• Both are saved until branch branch is executed. Then, select the right
instruction stream and discard the wrong stream
Branch Target Buffer(BTB; Associative Memory)
• Entry: Addr of previously executed branches; Target instruction
and the next few instructions
• When fetching an instruction, search BTB.
• If found, fetch the instruction stream in BTB;
• If not, new stream is fetched and update BTB
Loop Buffer(High Speed Register file)
• Storage of entire loop that allows to execute a loop without accessing memory
Branch Prediction
• Guessing the branch condition, and fetch an instruction stream based on
the guess. Correct guess eliminates the branch penalty
Delayed Branch
• Compiler detects the branch and rearranges the instruction sequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing RISC Pipeline
V
RISC PIPELINE
RISC
- Machine with a very fast clock cycle that
executes at the rate of one instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
Instruction Cycles of Three-Stage Instruction Pipeline
Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register

Load and Store Instructions


I: Instruction Fetch
A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register

Program Control Instructions


I: Instruction Fetch
A: Decode, Evaluate Branch Address G.SWARNALATHA,Asst.Professor, GNITC
E: Write Register(PC)
Pipelining and Vector Processing RISC Pipeline
V
DELAYED LOAD
LOAD: R1  M[address 1]
LOAD: R2  M[address 2]
ADD: R3  R1 + R2
STORE: M[address 3]  R3

Three-segment pipeline timing


Pipeline timing with data conflict

clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E

Pipeline timing with delayed load

clock cycle 1 2 3 4 5 6 7 The data dependency is taken


Load R1 I A E care by the compiler rather
Load R2 I A E than the hardware
NOP I A E
Add R1+R2 I A E G.SWARNALATHA,Asst.Professor, GNITC
Store R3 I A E
Pipelining and Vector Processing RISC Pipeline
V
DELAYED BRANCH
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps
Using no-operation instructions
Clock cycles: 1 2 3 4 5 6 7 8 9 10
1. Load I A E
2. Increment I A E
3. Add I A E
4. Subtract I A E
5. Branch to X I A E
6. NOP I A E
7. NOP I A E
8. Instr. in X I A E

Rearranging the instructions


Clock cycles: 1 2 3 4 5 6 7 8
1. Load I A E
2. Increment I A E
3. Branch to X I A E
4. Add I A E
5. Subtract I A E
6. Instr. in X I A E
G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Vector Processing

VECTOR PROCESSING
Vector Processing Applications
• Problems that can be efficiently formulated in terms of vectors
• Long-range weather forecasting
• Petroleum explorations
• Seismic data analysis
• Medical diagnosis
• Aerodynamics and space flight simulations
• Artificial intelligence and expert systems
• Mapping the human genome
• Image processing

Vector Processor (computer)


Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers

Vector Processors may also be pipelined

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Vector Processing
V
VECTOR PROGRAMMING

DO 20 I = 1, 100
20 C(I) = B(I) + A(I)

Conventional computer
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I  100 goto 20

Vector computer

C(1:100) = A(1:100) + B(1:100)

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Vector Processing
V
VECTOR INSTRUCTIONS
f1: V * V
f2: V * S
f3: V x V * V V: Vector operand
f4: V x S * V S: Scalar operand

Type Mnemonic Description (I = 1, ..., n)

f1 VSQR Vector square root B(I) * SQR(A(I))


VSIN Vector sine B(I) * sin(A(I))
VCOM Vector complement A(I) * A(I)
f2 VSUM Vector summation S * S A(I)
VMAX Vector maximum S * max{A(I)}
f3 VADD Vector add C(I) * A(I) + B(I)
VMPY Vector multiply C(I) * A(I) * B(I)
VAND Vector AND C(I) * A(I) . B(I)
VLAR Vector larger C(I) * max(A(I),B(I))
VTGE Vector test > C(I) * 0 if A(I) < B(I)
C(I) * 1 if A(I) > B(I)
f4 SADD Vector-scalar add B(I) * S + A(I)
SDIV Vector-scalar divide B(I) * A(I) / S

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Vector Processing
V
VECTOR INSTRUCTION FORMAT

Vector Instruction Format


Operation Base address Base address Base address Vector
code source 1 source 2 destination length

Pipeline for Inner Product

Source
A

Source Multiplier Adder


B pipeline pipeline

G.SWARNALATHA,Asst.Professor, GNITC
Pipelining and Vector Processing Vector Processing
V MULTIPLE MEMORY MODULE AND INTERLEAVING

Multiple Module Memory


Address bus
M0 M1 M2 M3

AR AR AR AR

Memory Memory Memory Memory


array array array array

DR DR DR DR

Data bus

Address Interleaving

Different sets of addresses are assigned to


different memory modules
G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors
V MULTIPROCESSORS

• Characteristics of Multiprocessors

• Interconnection Structures

• Interprocessor Arbitration

• Interprocessor Communication
and Synchronization

• Cache Coherence

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V TERMINOLOGY

Parallel Computing

Simultaneous use of multiple processors, all components


of a single architecture, to solve a task. Typically processors identical,
single user (even if machine multiuser)

Distributed Computing

Use of a network of processors, each capable of being


viewed as a computer in its own right, to solve a problem. Processors
may be heterogeneous, multiuser, usually individual task is assigned
to a single processors

Concurrent Computing

All of the above?

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V TERMINOLOGY

Supercomputing
Use of fastest, biggest machines to solve big, computationally
intensive problems. Historically machines were vector computers,
but parallel/vector or parallel becoming the norm

Pipelining
Breaking a task into steps performed by different units, and multiple
inputs stream through the units, with next input starting in a unit when
previous input done with the unit but not necessarily done with the task

Vector Computing
Use of vector processors, where operation such as multiply
broken into several steps, and is applied to a stream of operands
(“vectors”). Most common special case of pipelining

Systolic
Similar to pipelining, but units are not necessarily arranged linearly,
steps are typically small and more numerous, performed in lockstep
fashion. Often used in special-purpose hardware such as image or signal
processors
G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V
SPEEDUP AND EFFICIENCY
A: Given problem

T*(n): Time of best sequential algorithm to solve an


instance of A of size n on 1 processor
Tp(n): Time needed by a given parallel algorithm
and given parallel architecture to solve an
instance of A of size n, using p processors

Note: T*(n)  T1(n)


Speedup
Speedup: T*(n) / Tp(n) Perfect Speedup

Efficiency: T*(n) / [pTp(n)]

1 2 3 4 5 6 7 8 9 10
Processors
Speedup should be between 0 and p, and
Efficiency should be between 0 and 1
Speedup is linear if there is a constant c > 0
so that speedup is always at least cp. G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V AMDAHL’S LAW

Given a program
f : Fraction of time that represents operations
that must be performed serially

Maximum Possible Speedup: S


1
S  , with p processors
f + (1 - f ) / p
S < 1/f , with unlimited number of processors

- Ignores possibility of new algorithm, with much smaller f

- Ignores possibility that more of program is run from higher speed


memory such as Registers, Cache, Main Memory

- Often problem is scaled with number of processors, and f is a


function of size which may be decreasing (Serial code may take
constant amount of time, independent of size)
G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V
FLYNN’s HARDWARE TAXONOMY
I: Instruction Stream M M
D: Data Stream [ S
]I [S ]D
SI: Single Instruction Stream
- All processors are executing the same instruction in the same cycle
- Instruction may be conditional
- For Multiple processors, the control processor issues an instruction
MI: Multiple Instruction Stream
- Different processors may be simultaneously
executing different instructions
SD: Single Data Stream
- All of the processors are operating on the same
data items at any given time
MD: Multiple Data Stream
- Different processors may be simultaneously
operating on different data items

SISD : standard serial computer


MISD : very rare
MIMD and SIMD : Parallel processing computersG.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V
COUPLING OF PROCESSORS

Tightly Coupled System


- Tasks and/or processors communicate in a highly synchronized
fashion
- Communicates through a common shared memory
- Shared memory system
Loosely Coupled System
- Tasks or processors do not communicate in a
synchronized fashion
- Communicates by message passing packets
- Overhead for data exchange is high
- Distributed memory system

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V
GRANULARITY OF PARALLELISM
Granularity of Parallelism
Coarse-grain

- A task is broken into a handful of pieces, each


of which is executed by a powerful processor
- Processors may be heterogeneous
- Computation/communication ratio is very high

Medium-grain

- Tens to few thousands of pieces


- Processors typically run the same code
- Computation/communication ratio is often hundreds or more

Fine-grain

- Thousands to perhaps millions of small pieces, executed by very


small, simple processors or through pipelines
- Processors typically have instructions broadcasted to them
- Compute/communicate ratio often near unity G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V MEMORY
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory
Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's
memory a message must be sent there
Uniform Memory
- All processors take the same time to reach all memory locations
Nonuniform (NUMA) Memory
- Memory access is not uniform

SHARED MEMORY
DISTRIBUTED MEMORY
Memory
Network

Network

Processors G.SWARNALATHA,Asst.Professor,
Processors/Memory GNITC
Multiprocessors Characteristics of Multiprocessors
V
SHARED MEMORY MULTIPROCESSORS
M M M
...

Buses,
Interconnection Network Multistage IN,
Crossbar Switch

P P ... P

Characteristics

All processors have equally direct access to one


large memory address space
Example systems

- Bus and cache-based systems: Sequent Balance, Encore Multimax


- Multistage IN-based systems: Ultracomputer, Butterfly, RP3, HEP
- Crossbar switch-based systems: C.mmp, Alliant FX/8
Limitations

Memory access latency; Hot spot problem


G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Characteristics of Multiprocessors
V
MESSAGE-PASSING MULTIPROCESSORS
Message-Passing Network Point-to-point connections

P P ... P

M M ... M

Characteristics

- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing

Example systems

- Tree structure: Teradata, DADO


- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III

Limitations

- Communication overhead; Hard to programming G.SWARNALATHA,Asst.Professor, GNITC


Multiprocessors Interconnection Structure
V
INTERCONNECTION STRUCTURES

* Time-Shared Common Bus


* Multiport Memory
* Crossbar Switch
* Multistage Switching Network
* Hypercube System

Bus
All processors (and memory) are connected to a
common bus or busses
- Memory access is fairly uniform, but not very scalable

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V
BUS
- A collection of signal lines that carry module-to-module communication
- Data highways connecting several digital system elements
Operations of Bus
Devices
M3 S7 M6 S5 M4
S2

Bus

M3 wishes to communicate with S5


[1] M3 sends signals (address) on the bus that causes
S5 to respond
[2] M3 sends data to S5 or S5 sends data to
M3(determined by the command line)

Master Device: Device that initiates and controls the communication


Slave Device: Responding device
Multiple-master buses
-> Bus conflict
-> need bus arbitration
G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V SYSTEM BUS STRUCTURE FOR MULTIPROCESSORS

Local Bus

Common System Local


Shared Bus CPU IOP
Memory
Memory Controller

SYSTEM BUS

System Local System Local


CPU IOP Bus CPU
Bus Memory Memory
Controller Controller

Local Bus Local Bus

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V MULTIPORT MEMORY

Multiport Memory Module


- Each port serves a CPU

Memory Module Control Logic


- Each memory module has control logic
- Resolve memory module conflicts Fixed priority among CPUs

Advantages
- Multiple paths -> high transfer rate Memory Modules

MM 1 MM 2 MM 3 MM 4
Disadvantages
- Memory control logic
- Large number of cables and
CPU 1
connections

CPU 2

CPU 3

CPU 4
G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V
CROSSBAR SWITCH
Memory modules

MM1 MM2 MM3 MM4

CPU1

CPU2

CPU3

CPU4

Block Diagram of Crossbar Switch

} data,address, and
control from CPU 1
data

Memory
address
Multiplexers
and } data,address, and
control from CPU 2
Module R/W arbitration
logic
memory
enable
} data,address, and
control from CPU 3

} data,address, and
controlG.SWARNALATHA,Asst.Professor,
from CPU 4 GNITC
Multiprocessors Interconnection Structure
V MULTISTAGE SWITCHING NETWORK

Interstage Switch

0 0
A A

1 1
B B

A connected to 0 A connected to 1

0 0
A A

1 1
B B

B connected to 0 B connected to 1

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V MULTISTAGE INTERCONNECTION NETWORK
Binary Tree with 2 x 2 Switches 0
000
0 1
001
1
0
010
0
P1 1
1 011
P2
0
100
0
1
1 101

0
110
1
111
8x8 Omega Switching Network
0 000
1 001

2 010
3 011

4 100
5 101

6 110
7 111

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V HYPERCUBE INTERCONNECTION

n-dimensional hypercube (binary n-cube)

- p = 2n
- processors are conceptually on the corners of a
n-dimensional hypercube, and each is directly
connected to the n neighboring nodes
- Degree = n
011 111

010
0 01 11 110

101
001

1 00 10 100
000

One-cube Two-cube Three-cube

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interprocessor Arbitration
V
INTERPROCESSOR ARBITRATION

Bus
Board level bus
Backplane level bus
Interface level bus

System Bus - A Backplane level bus

- Printed Circuit Board


- Connects CPU, IOP, and Memory
- Each of CPU, IOP, and Memory board can be
plugged into a slot in the backplane(system bus)
- Bus signals are grouped into 3 groups e.g. IEEE standard 796 bus
- 86 lines
Data, Address, and Control(plus power) Data: 16(multiple of 8)
Address: 24
Control: 26
Power: 20

- Only one of CPU, IOP, and Memory can be


granted to use the bus at a time G.SWARNALATHA,Asst.Professor, GNITC
- Arbitration mechanism is needed to handle
Multiprocessors Interprocessor Arbitration
V
SYNCHRONOUS & ASYNCHRONOUS DATA TRANSFER

Synchronous Bus
Each data item is transferred over a time slice
known to both source and destination unit
- Common clock source
- Or separate clock and synchronization signal
is transmitted periodically to synchronize
the clocks in the system

Asynchronous Bus

* Each data item is transferred by Handshake


mechanism
- Unit that transmits the data transmits a control
signal that indicates the presence of data
- Unit that receiving the data responds with
another control signal to acknowledge the
receipt of the data

* Strobe pulse - supplied by one of the units to


indicate to the other unit when the data transferG.SWARNALATHA,Asst.Professor, GNITC
has to occur
Multiprocessors Interprocessor Arbitration
V
BUS SIGNALS
- address
- data
Bus signal allocation - control
- arbitration
- interrupt
- timing
- power, ground

IEEE Standard 796 Multibus Signals


Data and address
Data lines (16 lines) DATA0 - DATA15
Address lines (24 lines) ADRS0 - ADRS23
Data transfer
Memory read MRDC
Memory write MWTC
IO read IORC
IO write IOWC
Transfer acknowledge TACK (XACK)
Interrupt control
Interrupt request INT0 - INT7
interrupt acknowledge INTA

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interprocessor Arbitration
V
BUS SIGNALS

IEEE Standard 796 Multibus Signals (Cont’d)

Miscellaneous control
Master clock CCLK
System initialization INIT
Byte high enable BHEN
Memory inhibit (2 lines) INH1 - INH2
Bus lock LOCK
Bus arbitration
Bus request BREQ
Common bus request CBRQ
Bus busy BUSY
Bus clock BCLK
Bus priority in BPRN
Bus priority out BPRO
Power and ground (20 lines)

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interprocessor Arbitration
V
INTERPROCESSOR ARBITRATION STATIC ARBITRATION

Serial Arbitration Procedure


Highest
priority
To next
arbiter
1 PI Bus PO PI Bus PO PI Bus PO PI Bus PO
arbiter 1 arbiter 2 arbiter 3 arbiter 4

Bus busy line

Parallel Arbitration Procedure


Bus Bus Bus Bus
arbiter 1 arbiter 2 arbiter 3 arbiter 4
Ack Req Ack Req Ack Req Ack Req

Bus busy line

4x2
Priority encoder

2x4
Decoder

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interprocessor Arbitration
V
INTERPROCESSOR ARBITRATION DYNAMIC ARBITRATION

Priorities of the units can be dynamically changeable


while the system is in operation

Time Slice
Fixed length time slice is given sequentially to
each processor, round-robin fashion

Polling
Unit address polling - Bus controller advances
the address to identify the requesting unit

LRU

FIFO

Rotating Daisy Chain


Conventional Daisy Chain - Highest priority to the
nearest unit to the bus controller
Rotating Daisy Chain - Highest priority to the unit
that is nearest to the unit that has G.SWARNALATHA,Asst.Professor, GNITC
most recently accessed the bus(it
Multiprocessors Interprocessor Communication and Synchronization
V INTERPROCESSOR COMMUNICATION
Interprocessor Communication Shared Memory
Receiving
Processor
Communication Area
Sending
Processor Mark
Receiver(s) Receiving
Processor
Message
..
.

Receiving
Processor

Interrupt

Shared Memory
Receiving
Sending Processor
Communication Area
Processor
Instruction Mark
Receiver(s) Receiving
Processor
Message
..
.
Receiving
G.SWARNALATHA,Asst.Professor, GNITC Processor
Multiprocessors Interprocessor Communication and Synchronization
V
INTERPROCESSOR SYNCHRONIZATION
Synchronization
Communication of control information between processors
- To enforce the correct sequence of processes
- To ensure mutually exclusive access to shared writable data

Hardware Implementation

Mutual Exclusion with a Semaphore


Mutual Exclusion
- One processor to exclude or lock out access to shared resource by
other processors when it is in a Critical Section
- Critical Section is a program sequence that,
once begun, must complete execution before
another processor accesses the same shared resource

Semaphore
- A binary variable
- 1: A processor is executing a critical section,
that not available to other processors
0: Available to any requesting processor
- Software controlled Flag that is stored in G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interprocessor Communication and Synchronization
V
SEMAPHORE
Testing and Setting the Semaphore

- Avoid two or more processors test or set the same semaphore


- May cause two or more processors enter the
same critical section at the same time
- Must be implemented with an indivisible operation

R <- M[SEM] / Test semaphore /


M[SEM] <- 1 / Set semaphore /

These are being done while locked, so that other processors cannot test
and set while current processor is being executing these instructions

If R=1, another processor is executing the


critical section, the processor executed
this instruction does not access the
shared memory

If R=0, available for access, set the semaphore to 1 and access


G.SWARNALATHA,Asst.Professor, GNITC
The last instruction in the program must clear the semaphore
Multiprocessors Cache Coherence
V
CACHE COHERENCE
Caches are Coherent X = 52 Main memory

Bus

X = 52 X = 52 X = 52 Caches

P1 P2 P3 Processors

Cache Incoherency in X = 120 Main memory


Write Through Policy Bus

X = 120 X = 52 X = 52 Caches

P1 P2 P3 Processors

Cache Incoherency in Write Back Policy X = 52 Main memory

Bus

X = 120 X = 52 X = 52 Caches

P1 P2 G.SWARNALATHA,Asst.Professor,
P3 Processors GNITC
Multiprocessors Cache Coherence
V MAINTAINING CACHE COHERENCY
Shared Cache
- Disallow private cache
- Access time delay

Software Approaches
* Read-Only Data are Cacheable
- Private Cache is for Read-Only data
- Shared Writable Data are not cacheable
- Compiler tags data as cacheable and noncacheable
- Degrade performance due to software overhead

* Centralized Global Table


- Status of each memory block is maintained in CGT: RO(Read-Only); RW(Read and Write)
- All caches can have copies of RO blocks
- Only one cache can have a copy of RW block

Hardware Approaches
* Snoopy Cache Controller
- Cache Controllers monitor all the bus requests from CPUs and IOPs
- All caches attached to the bus monitor the write operations
- When a word in a cache is written, memory is also updated (write through)
- Local snoopy controllers in all other caches check their memory to determine if they have
a copy of that word; If they have, that location is marked invalid(future reference to
this location causes cache miss) G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Parallel Computing
V
PARALLEL COMPUTING
Grosche’s Law

Grosch’s Law states that the speed of computers is proportional to the


square of their cost. Thus if you are looking for a fast computer, you are
better off spending your money buying one large computer than two
small computers and connecting them.
Grosch’s Law is true within classes of computers, but not true between
classes. Computers may be priced according to Groach’s Law, but the
Law cannot be true asymptotically.

Minsky’s Conjecture

Minsky’s conjecture states that the speedup achievable


by a parallel computer increases as the logarithm of the
number of processing elements,thus making large-scale
parallelism unproductive.

Many experimental results have shown linear speedup for over


100 processors.
G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Parallel Computing
V
PARALLEL COMPUTING
History

History tells us that the speed of traditional single CPU


Computers has increased 10 folds every 5 years.
Why should great effort be expended to devise a parallel
computer that will perform tasks 10 times faster when,
by the time the new architecture is developed and
implemented, single CPU computers will be just as fast.
Utilizing parallelism is better than waiting.

Amdahl’s Law

A small number of sequential operations can effectively


limit the speedup of a parallel algorithm.
Let f be the fraction of operations in a computation that must be performed sequentially,
where 0 < f < 1. Then the maximum speedup S achievable by a parallel computer with p processors
performing the computation is S < 1 / [f + (1 - f) / p]. For example, if 10% of the computation must be
performed sequentially, then the maximum speedup achievable is 10, no matter how many
processors a parallel computer has.

There exist some parallel algorithms with almost no sequential


n operations. As the problem size(n)
increases, f becomes smaller (f -> 0 as n->In this case, lim S = p.
G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Parallel Computing
V
PARALLEL COMPUTING

Pipelined Computers are Sufficient

Most supercomputers are vector computers, and most of the successes


attributed to supercomputers have accomplished on pipelined vector
processors, especially Cray=1 and Cyber-205.

If only vector operations can be executed at high speed, supercomputers


will not be able to tackle a large number of important problems. The
latest supercomputers incorporate both pipelining and high level
parallelism (e.g., Cray-2)

Software Inertia

Billions of dollars worth of FORTRAN software exists.


Who will rewrite them? Virtually no programmers have
any experience with a machine other than a single CPU
computer. Who will retrain them ?

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V INTERCONNECTION NETWORKS

Switching Network (Dynamic Network)


Processors (and Memory) are connected to routing
switches like in telephone system
- Switches might have queues(combining logic),
which improve functionality but increase latency
- Switch settings may be determined by message
headers or preset by controller
- Connections can be packet-switched or circuit-
switched(remain connected as long as it is needed)
- Usually NUMA, blocking, often scalable and upgradable

Point-Point (Static Network)


Processors are directly connected to only certain other processors and
must go multiple hops to get to additional processors

- Usually distributed memory


- Hardware may handle only single hops, or multiple hops
- Software may mask hardware limitations
- Latency is related to graph diameter, among many G.SWARNALATHA,Asst.Professor,
other factors GNITC
- Usually NUMA, nonblocking, scalable, upgradable
Multiprocessors Interconnection Structure
V INTERCONNECTION NETWORKS

Multistage Interconnect

Switch Processor

Bus

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V INTERCONNECTION NETWORKS

Static Topology - Direct Connection

- Provide a direct inter-processor communication path


- Usually for distributed-memory multiprocessor

Dynamic Topology - Indirect Connection

- Provide a physically separate switching network


for inter-processor communication
- Usually for shared-memory multiprocessor

Direct Connection
Interconnection Network

A graph G(V,E)
V: a set of processors (nodes)
E: a set of wires (edges)

Performance Measures: - degree, diameter, etc


G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V
INTERCONNECTION NETWORKS
Complete connection

- Every processor is directly connected to every other processors


- Diameter = 1, Degree = p - 1
- # of wires = p ( p - 1 ) / 2; dominant cost
- Fan-in/fanout limitation makes it impractical for large p
- Interesting as a theoretical model because algorithm bounds for this
model are automatically lower bounds for all direct connection machines

Ring

- Degree = 2, (not a function of p)


- Diameter =  p/2 

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V
INTERCONNECTION NETWORKS

• 2-Mesh
m
...

m
m2 = p

...

- Degree = 4
- Diameter = 2(m - 1)
- In general, an n-dimensional mesh has
diameter = d ( p1/n - 1)
- Diameter can be halved by having wrap-around
connections (-> Torus)
- Ring is a 1-dimensional mesh with wrap-around
connection G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V
INTERCONNECTION NETWORK

Binary Tree

- Degree = 3
p+1
- Diameter = 2 log
2

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors Interconnection Structure
V MIN SPACE

MIN
Banyan network
=(unique path network) Multiple Path Network

Delta network [Patel81] PM2I network


• Data Manipulator
• Baseline [Wu80]
• Flip [Batcher76] [Feng74]
• Augmented DM
• Indirect binary
[Siegel78]
n-cube [Peas77] • Inverse ADM
• Omega [Lawrie75]
• Regular SW banyan [Siegel79]
• Gamma [Parker84]
[Goke73]
• Extra stage Cube
[Adams82]
• Replicated/Dialted
Delta netork
[Kruskal83]
• B-delta [Yoon88]

Permutation/Sorting Network
(N!)
• Clos network [53]
• Benes network [62]
• Batcher sorting
network [68]

G.SWARNALATHA,Asst.Professor, GNITC
Multiprocessors
V
SOME CURRENT PARALLEL COMPUTERS
DM-SIMD
• AMT DAP
• Goodyear MPP
• Thinking Machines CM series
• MasPar MP1
• IBM GF11

SM-MIMD
• Alliant FX
• BBN Butterfly
• Encore Multimax
• Sequent Balance/Symmetry
• CRAY 2, X-MP, Y-MP
• IBM RP3
• U. Illinois CEDAR

DM-MIMD
• Intel iPSC series, Delta machine
• NCUBE series
• Meiko Computing Surface
• Carnegie-Mellon/ Intel iWarp

G.SWARNALATHA,Asst.Professor, GNITC

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy