downloadfile (3)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

COMPUTER ARCHITECTURE

PARALLEL COMPUTING
Parallel computing refers to the process of executing several processors, application or computation
simultaneously. Generally, it is a kind of computing architecture where the large problems break into
independent, smaller, usually similar parts that can be processed in one go. It is done by multiple
CPUs communicating via shared memory, which combines results upon completion. It helps in
performing large computations as it divides the large problem between more than one processor.
A problem is broken into discrete parts that can be solved concurrently. Each part is further broken
down to a series of instructions. Instructions from each part execute simultaneously on different
CPUs.
Parallel Computers: Virtually all stand-alone computers today are parallel from a hardware
perspective:
 Multiple functional units (floating point, integer, GPU, etc.)
 Multiple execution units / cores
 Multiple hardware threads Networks connect multiple stand-alone computers (nodes) to
create larger parallel computer clusters
 Each compute node is a multi-processor parallel computer in itself
 Multiple compute nodes are networked together with an InfiniBand network
 Special purpose nodes, also multi-processor, are used for other purposes

Serial Computing: Serial computing executes instructions sequentially on a single processor,


while parallel computing divides problems into smaller units processed concurrently on multiple
resources.
Distributed Computing
In distributed computing we have multiple autonomous computers which seems to the user as single
system. In distributed systems there is no shared memory and computers communicate with each
other through message passing. In distributed computing a single task is divided among different
computers.

1
COMPUTER ARCHITECTURE
Difference between and Parallel Computing Distributed Computing

S/N Parallel Computing Distributed Computing:

1. Many operations are performed


System components are
simultaneously System components are located at different
located at different locations locations
2. Single computer is required Uses multiple computers
3. Multiple processors perform multiple
Multiple computers
operations perform multiple operations
4. It may have shared or distributed memory
memory It have only
distributed memory
5. Processors communicate with each other Computer communicate
through bus with each other through
message passing.
6. Improves the system performance Improves system
scalability, fault tolerance
and resource sharing
capabilities

 Types of parallel computing (parallelism)


There are generally three types of parallel computing available, which are discussed below:
1. Bit-level parallelism: The form of parallel computing in which every task is dependent on
processor word size. In terms of performing a task on large-sized data, it reduces the number of
instructions the processor must execute. There is a need to split the operation into series of
instructions. For example, there is an 8-bit processor, and you want to do an operation on 16-bit
numbers. First, it must operate the 8 lower-order bits and then the 8 higher-order bits. Therefore,
two instructions are needed to execute the operation. The operation can be performed with one
instruction by a 16-bit processor.

2. Instruction-level parallelism: In a single CPU clock cycle, the processor decides in


instruction level parallelism how many instructions are implemented at the same time. For each
clock cycle phase, a processor in instruction-level parallelism can have the ability to address that
is less than one instruction. The software approach in instruction-level parallelism functions on
static parallelism, where the computer decides which instructions to execute simultaneously.

3. Task Parallelism: Task parallelism is the form of parallelism in which the tasks are
decomposed into subtasks. Then, each subtask is allocated for execution. And, the execution of
subtasks is performed concurrently by processors.

INSTRUCTION-LEVEL PARALLELISM (ILP)


 ILP is a computer architecture concept that allows multiple instructions in a computer
program to be executed simultaneously.
 ILP is a technique used in computer architecture to increase the speed of execution by
overlapping individual machine operations. This means that multiple operations, such
as addition, subtraction, multiplication, and division, can be executed in parallel.
2
COMPUTER ARCHITECTURE
 ILP improves performance without requiring changes to the base code.
 ILP is achieved by keeping different functional units busy with different parts of
instructions, or by providing multiple functional units for the same operation. Some
techniques used to achieve ILP include: Pipelining, Superscalar execution, Out-of-order
execution, Instruction reordering, Speculation
ILP can be implemented in two ways namely:
1. Hardware (Dynamic Parallelism): In this approach, the processor determines which
instructions can be executed in parallel. It is defined by machine architecture and hardware
multiplicity, focusing on the number of instructions issued per machine cycle. Modern
processors can issue two or more instructions per cycle, enabling simultaneous execution of
operations like arithmetic, memory access, and branching.
2. Software (Static Parallelism) Parallelism: The compiler decides which instructions to
execute in parallel. It is defined by the control and data dependency of the program,
influenced by algorithms, programming style, and compiler optimization.

Applications of Parallel Computing


There are various applications of Parallel Computing, which are as follows:
 One of the primary applications of parallel computing is Databases and Data mining.
 The real-time simulation of systems is another use of parallel computing.
 The technologies, such as Networked videos and Multimedia.
 Science and Engineering.
 Collaborative work environments.
 The concept of parallel computing is used by augmented reality, advanced graphics, and
virtual reality.
FLYNN’S CLASSIFICATION OF PARALLEL COMPUTERS
Flynn's classification is a way to categorize parallel computer architectures based on the concurrency
of instructions, data, or processing sequences:

1. Single Instruction Stream and Single Data Stream (SISD)


It depicts the structure of a single computer, which includes a control unit, a memory unit, and a
processor unit. This system may or may not consist of internal parallel processing capability;
therefore, instructions are performed sequentially. Like classic Von-Neumann computers, most
conventional computers utilize the SISD architecture. Multiple functional units or pipeline
processing can be used to achieve parallel processing in this case. The Control Unit decodes the
instructions before sending them to the processing units for their execution. The Data Stream is a bi-
directional data stream that moves between the memory and processors. Examples are
Minicomputers, workstations, and computers from previous generations.

2. Single Instruction and Multiple Data Stream (SIMD)


It symbolizes an organization with a large number of processing units overseen by a central control
unit. SIMD units are hardware components that simultaneously perform the very same operation on
various data operands. In Flynn’s taxonomy, it’s a sort of parallel processing. It symbolizes an
3
COMPUTER ARCHITECTURE
organization with a large number of processing units overseen by a common control unit. The control
unit sends the same instruction to all processors, but they work on separate data. To connect with all
of the processors at the same time, the shared memory unit must have numerous modules. SIMD was
created with array processing devices in mind. Vector processors, on the other hand, can be included
in this category according to Flynn’s taxonomy. There are architectures that are not vector processors
but are SIMD architectures. The Connection Machine and numerous GPUs are two examples of
multiple processors executing the same instructions.

3. Multiple Instruction and Single Data stream (MISD)

It’s a parallel computing architecture type in which multiple functional units work on the same data
at the same time. Pipeline topologies fall under this category, though purists could argue that the data
after every stage in the pipeline is different. Because no real system has been built using the MISD
structure, it is primarily of theoretical importance. Multiple processing units work on a single data
stream in MISD. Each processing unit works on the data in its own way, using its own instruction
stream.

4. Multiple Instruction and Multiple Data Stream (MIMD).

All processors of a parallel computer may execute distinct instructions and act on different data at the
same time in this organization. Each processor in MIMD has its own program, and each program
generates an instruction stream.

MIMD stands for “parallel architecture,” which is the most fundamental and well-known type of
parallel processor. It is a technique used to achieve parallelism. The shared memory programming
paradigm and the distributed memory programming model are used in the MIMD architecture. Every
model has its own set of benefits and drawbacks.

4
COMPUTER ARCHITECTURE
PIPELINING

To improve the performance of a CPU we have two options:

1) Improve the hardware by introducing faster circuits.

2) Arrange the hardware such that more than one operation can be performed at the same time
(pipeline)
Since, there is a limit on the speed of hardware and the cost of faster circuits is quite high, we have
nd
to adopt the 2 option (pipelining)

Pipelining: Pipelining is a process of arrangement of hardware elements of the CPU such that its
overall performance is increased. Simultaneous execution of more than one instruction takes
place in a pipelined processor.

Let us see a real life example that works on the concept of pipelined operation. Consider a water
bottle packaging plant. Let there be 3 stages that a bottle should pass through, Inserting the bottle
(I), Filling water in the bottle (F), and Sealing the bottle(S). Let us consider these stages as stage
1, stage 2 and stage 3 respectively. Let each stage take 1 minute to complete its operation. Now,
in a non-pipelined operation, a bottle is first inserted in the plant, after 1 minute it is moved to
stage 2 where water is filled. Now, in stage 1 nothing is happening. Similarly, when the bottle
moves to stage 3, both stage 1 and stage 2 are idle. But in pipelined operation, when the bottle is
in stage 2, another bottle can be loaded at stage 1. Similarly, when the bottle is in stage 3, there
can be one bottle each in stage 1 and stage 2. So, after each minute, we get a new bottle at the
end of stage 3. Hence, the average time taken to manufacture 1 bottle is:

Without pipelining = 9/3 minutes = 3m

I F S | | | | ||

| | | I F S | ||

| | | | | | I F S (9 minutes)

With pipelining = 5/3 minutes = 1.67m

I F S ||

| I F S|

| | I F S (5 minutes)
5
COMPUTER ARCHITECTURE
Thus, pipelined operation increases the efficiency of a system.

The Design of a basic pipeline

In a pipelined processor, a pipeline has two ends, the input end and the output end. Between
these ends, there are multiple stages/segments such that output of one stage is connected to
input of next stage and each stage performs a specific operation.
Interface registers are used to hold the intermediate output between two stages.
These interface registers are also called latch or buffer.
All the stages in the pipeline along with the interface registers are controlled by a common
clock.

Execution in a pipelined processor

Execution sequence of instructions in a pipelined processor can be visualized using a space-time


diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be
executed. We can visualize the execution sequence through the following space-time diagrams:

Non overlapped execution:

STAGE 1 2 3 4 5 6 7 8
/
CYCLE

S1 I1 I2

S2 I I2
1

S3 I I
1 2

S4 I I
1 2

Total time = 8 Cycle

6
COMPUTER ARCHITECTURE
Overlapped execution:

STAGE / CYCLE

S1 I1 I2

S2 I1 I2

S3 I1 I2

S4 I1 I2

Total time = 5 Cycle

Pipeline Stages

RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC
instruction set. Following are the 5 stages of RISC pipeline with their respective operations:

Stage 1 (Instruction Fetch)


In this stage the CPU reads instructions from the address in the memory whose value
is present in the program counter.

Stage 2 (Instruction Decode)


In this stage, instruction is decoded and the register file is accessed to get the values from
the registers used in the instruction.

Stage 3 (Instruction Execute)


In this stage, ALU operations are performed.
Stage 4 (Memory Access)

In this stage, memory operands are read and written from/to the memory that is present in
the instruction.
Stage 5 (Write Back)

In this stage, computed/fetched value is written back to the register present in the
instruction.

7
COMPUTER ARCHITECTURE
Performance of a pipelined processor

Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to be
completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to
come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a
total of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles

= (k + n – 1) T

So, speedup (S) of the pipelined processor over non-pipelined processor, when ‘n’ tasks
are executed on the same processor is:
S = Performance of pipelined processor /
Performance of Non-pipelinedprocessor

As the performance of a processor is inversely proportional to the execution time, we have,

S = ETnon-pipeline / ETpipeline

=> S = [n * k * Tp] / [(k + n – 1) * Tp] S =


[n * k] / [k + n – 1]

When the number of tasks ‘n’ are significantly larger than k, that is, n >> k

S=n*k/nS=k
where ‘k’ are the number of stages in the pipeline.

Also, Efficiency = Given speed up / Max speed up = S / Smax


We know that, Smax = k

So, Efficiency = S / k

Throughput = Number of instructions / Total time to complete the


instructions So, Throughput = n / (k + n – 1) * Tp.
Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1.

8
COMPUTER ARCHITECTURE
PIPELINE HAZARD
There are some factors that cause the pipeline to deviate its normal performance. Some of these
factors are given below:
1. Timing Variations: All stages cannot take same amount of time. This problem generally
occurs in instruction processing where different instructions have different operand
requirements and thus different processing time.
2. Data Hazards: When several instructions are in partial execution, and if they reference same
data then the problem arises. We must ensure that next instruction does not attempt to access
data before the current instruction, because this will lead to incorrect results.
3. Branching: In order to fetch and execute the next instruction, we must know what that
instruction is. If the present instruction is a conditional branch, and its result will lead us to
the next instruction, then the next instruction may not be known until the current one is
processed.
4. Interrupts: Interrupts set unwanted instruction into the instruction stream. Interrupts effect
the execution of instruction
5. Data Dependency: It arises when an instruction depends upon the result of a previous
instruction.

Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.

Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
How ILP impacts pipelines
ILP allows pipelines to overlap the execution of multiple instructions, which can increase CPU
throughput. Pipelines work by dividing instructions into a series of steps, or stages that are
performed by different processor units. When an instruction completes a stage, the functional unit
can be used to perform the same stage of another instruction.
How ILP is impacted by pipelines
The degree of ILP depends on how the program instructions depend on each other. If instructions
are independent, they can be executed in parallel, but if one instruction depends on another, they
must execute in order.
Techniques used in modern processors to improve performance and parallelize programs are:
 Out-of-order execution (OOO): Instructions are executed based on resource availability and
readiness of operands, rather than their original order in the program.

9
COMPUTER ARCHITECTURE
 Speculative execution: Instructions are executed before it is certain they will be needed.

MEMORY MANAGEMENT AND CONCEPT OF VIRTUAL MEMORY

Memory management is the functionality of an operating system which handles or manages primary
memory and moves processes back and forth between main memory and disk during execution.
 Memory management keeps track of each and every memory location, regardless of either it
is allocated to some process or it is free.
 It checks how much memory is to be allocated to processes.
 It decides which process will get memory at what time.
 It tracks whenever some memory gets freed or unallocated and correspondingly it updates the
status.

10
COMPUTER ARCHITECTURE
There are two Memory Management Techniques: Contiguous, and Non-Contiguous.
 In Contiguous Technique, executing process must be loaded entirely in main
memory.
 In contiguous memory allocation each process is contained in a single contiguous
block of memory.
 Memory is divided into several fixed size partitions.
 Each partition contains exactly one process. 
 Contiguous Technique can be divided into:
1. Fixed (or static) partitioning
2. Variable (or dynamic) partitioning
Fixed Partitioning: This is the oldest and simplest technique used to put more than one
processes in the main memory. In this partitioning, number of partitions (non-
overlapping) in RAM are fixed but size of each partition may or may not be same. As it is
contiguous allocation, hence no spanning is allowed. Here partitions are made before
execution or during system configure.

Advantages of Fixed Partitioning –


1. Easy to implement:
2. Little OS overhead
Disadvantages of Fixed Partitioning
 Internal Fragmentation
 External Fragmentation
 Limit process size  Limitation on Degree of Multiprogramming Variable Partitioning
 It is a part of Contiguous allocation technique.
 It is used to alleviate the problem faced by Fixed Partitioning.
 In contrast with fixed partitioning, partitions are not made before the execution or
during system configure.
Various features associated with variable Partitioning

11
COMPUTER ARCHITECTURE
1. Initially RAM is empty and partitions are made during the run-time according to
process’s need instead of partitioning during system configure.
2. The size of partition will be equal to incoming process.
3. The partition size varies according to the need of the process so that the internal
fragmentation can be avoided to ensure efficient utilization of RAM.
4. Number of partitions in RAM is not fixed and depends on the number of incoming
process and Main Memory’s size.

Advantages of Variable Partitioning


1. No Internal Fragmentation
2. No restriction on Degree of Multiprogramming
3. No Limitation on the size of the process
Disadvantages of Variable Partitioning
1. Difficult Implementation
2. External Fragmentation
VIRTUAL MEMORY
 Virtual memory is the separation of logical memory from physical memory. This
separation provides large virtual memory for programmers when only small
physical memory is available.
 Virtual memory is used to give programmers the illusion that they have a very
large memory even though the computer has a small main memory. It makes the
task of programming easier because the programmer no longer needs to worry
about the amount of physical memory available.
 Virtual Memory is a storage allocation scheme in which secondary memory can
be addressed as though it were part of main memory.

12
COMPUTER ARCHITECTURE
 The addresses a program may use to reference memory are distinguished from the
addresses the memory system uses to identify physical storage sites, and program
generated addresses are translated automatically to the corresponding machine
addresses.
 The size of virtual storage is limited by the addressing scheme of the computer
system and amount of secondary memory is available not by the actual number of
the main storage locations.  It is a technique that is implemented using both
hardware and software.
 It maps memory addresses used by a program, called virtual addresses, into
physical addresses in computer memory.
All memory references within a process are logical addresses that are dynamically
translated into physical addresses at run time. This means that a process can be
swapped in and out of main memory such that it occupies different places in main
memory at different times during the course of execution.
 A process may be broken into number of pieces and these pieces need not be
continuously located in the main memory during execution. The combination of
dynamic run-time address translation and use of page or segment table permits
this.

 If these characteristics are present then, it is not necessary that all the pages or segments
are present in the main memory during execution.
 This means that the required pages need to be loaded into memory whenever required.
 Virtual memory is implemented using Demand Paging or Demand Segmentation.
 There are two ways in which virtual memory is handled: paging and
segmentation.

13
COMPUTER ARCHITECTURE
PAGING
 Paging is a memory management scheme that eliminates the need for contiguous
allocation of physical memory.
 This scheme permits the physical address space of a process to be non – contiguous.
 Paging is a fixed size partitioning scheme.
 In paging, secondary memory and main memory are divided into equal fixed size
partitions.
 The partitions of secondary memory are called as pages.
The partitions of main memory are called as frames.

Each process is divided into parts where size of each part is same as page size.
 The size of the last part may be less than the page size.
 The pages of process are stored in the frames of main memory depending upon their
availability.
The advantages of paging are
 It allows to store parts of a single process in a non-contiguous fashion.
 It solves the problem of external fragmentation.
The disadvantages of paging are
 It suffers from internal fragmentation.
 There is an overhead of maintaining a page table for each process.
 The time taken to fetch the instruction increases since now two memory accesses are
required.
SEGMENTATION:
 Segmentation is a memory management technique in which each job is divided into
several segments of different sizes, one for each module that contains pieces that perform
related functions.

14
COMPUTER ARCHITECTURE
 Each segment is actually a different logical address space of the program.
 When a process is to be executed, its corresponding segmentation are loaded into non-
contiguous memory though every segment is loaded into a contiguous block of available
memory.
 Segmentation memory management works very similar to paging but here segments
are of variable-length where as in paging pages are of fixed size.
 The operating system maintains a segment map table for every process and a list of free
memory blocks along with segment numbers, their size and corresponding memory
locations in main memory.
 For each segment, the table stores the starting address of the segment and the length of
the segment.
 A reference to a memory location includes a value that identifies a segment and an
offset.
 Segment Table – It maps two-dimensional Logical address into one dimensional
Physical address. It’s each table entry has: 1. Base Address: It contains the starting
physical address where the segments
Advantages of Segmentation
 No Internal fragmentation.
 Segment Table consumes less space in comparison to Page table in paging.
Disadvantage of Segmentation
 As processes are loaded and removed from the memory, the free memory space is
broken into little pieces, causing External fragmentation.
FRAGMENTATION:
 As processes are loaded and removed from memory, the free memory space is broken
into little pieces.
 It happens after sometimes that processes cannot be allocated to memory blocks
considering their small size and memory blocks remains unused.
 This problem is known as Fragmentation.
Fragmentation is of two types:
1. External fragmentation: Total memory space is enough to satisfy a request or to
reside a process in it, but it is not contiguous, so it cannot be used.

15
COMPUTER ARCHITECTURE
2. Internal fragmentation: Memory block assigned to process is bigger. Some
portion of memory is left unused, as it cannot be used by another process.
PAGE FAULT:
 A page fault occurs when a program attempts to access a block of memory that is not
stored in the physical memory, or RAM.
 The fault notifies the operating system that it must locate the data in virtual memory,
then transfer it from the storage device, such as an HDD or SSD, to the system RAM.

16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy