downloadfile (3)
downloadfile (3)
downloadfile (3)
PARALLEL COMPUTING
Parallel computing refers to the process of executing several processors, application or computation
simultaneously. Generally, it is a kind of computing architecture where the large problems break into
independent, smaller, usually similar parts that can be processed in one go. It is done by multiple
CPUs communicating via shared memory, which combines results upon completion. It helps in
performing large computations as it divides the large problem between more than one processor.
A problem is broken into discrete parts that can be solved concurrently. Each part is further broken
down to a series of instructions. Instructions from each part execute simultaneously on different
CPUs.
Parallel Computers: Virtually all stand-alone computers today are parallel from a hardware
perspective:
Multiple functional units (floating point, integer, GPU, etc.)
Multiple execution units / cores
Multiple hardware threads Networks connect multiple stand-alone computers (nodes) to
create larger parallel computer clusters
Each compute node is a multi-processor parallel computer in itself
Multiple compute nodes are networked together with an InfiniBand network
Special purpose nodes, also multi-processor, are used for other purposes
1
COMPUTER ARCHITECTURE
Difference between and Parallel Computing Distributed Computing
3. Task Parallelism: Task parallelism is the form of parallelism in which the tasks are
decomposed into subtasks. Then, each subtask is allocated for execution. And, the execution of
subtasks is performed concurrently by processors.
It’s a parallel computing architecture type in which multiple functional units work on the same data
at the same time. Pipeline topologies fall under this category, though purists could argue that the data
after every stage in the pipeline is different. Because no real system has been built using the MISD
structure, it is primarily of theoretical importance. Multiple processing units work on a single data
stream in MISD. Each processing unit works on the data in its own way, using its own instruction
stream.
All processors of a parallel computer may execute distinct instructions and act on different data at the
same time in this organization. Each processor in MIMD has its own program, and each program
generates an instruction stream.
MIMD stands for “parallel architecture,” which is the most fundamental and well-known type of
parallel processor. It is a technique used to achieve parallelism. The shared memory programming
paradigm and the distributed memory programming model are used in the MIMD architecture. Every
model has its own set of benefits and drawbacks.
4
COMPUTER ARCHITECTURE
PIPELINING
2) Arrange the hardware such that more than one operation can be performed at the same time
(pipeline)
Since, there is a limit on the speed of hardware and the cost of faster circuits is quite high, we have
nd
to adopt the 2 option (pipelining)
Pipelining: Pipelining is a process of arrangement of hardware elements of the CPU such that its
overall performance is increased. Simultaneous execution of more than one instruction takes
place in a pipelined processor.
Let us see a real life example that works on the concept of pipelined operation. Consider a water
bottle packaging plant. Let there be 3 stages that a bottle should pass through, Inserting the bottle
(I), Filling water in the bottle (F), and Sealing the bottle(S). Let us consider these stages as stage
1, stage 2 and stage 3 respectively. Let each stage take 1 minute to complete its operation. Now,
in a non-pipelined operation, a bottle is first inserted in the plant, after 1 minute it is moved to
stage 2 where water is filled. Now, in stage 1 nothing is happening. Similarly, when the bottle
moves to stage 3, both stage 1 and stage 2 are idle. But in pipelined operation, when the bottle is
in stage 2, another bottle can be loaded at stage 1. Similarly, when the bottle is in stage 3, there
can be one bottle each in stage 1 and stage 2. So, after each minute, we get a new bottle at the
end of stage 3. Hence, the average time taken to manufacture 1 bottle is:
I F S | | | | ||
| | | I F S | ||
| | | | | | I F S (9 minutes)
I F S ||
| I F S|
| | I F S (5 minutes)
5
COMPUTER ARCHITECTURE
Thus, pipelined operation increases the efficiency of a system.
In a pipelined processor, a pipeline has two ends, the input end and the output end. Between
these ends, there are multiple stages/segments such that output of one stage is connected to
input of next stage and each stage performs a specific operation.
Interface registers are used to hold the intermediate output between two stages.
These interface registers are also called latch or buffer.
All the stages in the pipeline along with the interface registers are controlled by a common
clock.
STAGE 1 2 3 4 5 6 7 8
/
CYCLE
S1 I1 I2
S2 I I2
1
S3 I I
1 2
S4 I I
1 2
6
COMPUTER ARCHITECTURE
Overlapped execution:
STAGE / CYCLE
S1 I1 I2
S2 I1 I2
S3 I1 I2
S4 I1 I2
Pipeline Stages
RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC
instruction set. Following are the 5 stages of RISC pipeline with their respective operations:
In this stage, memory operands are read and written from/to the memory that is present in
the instruction.
Stage 5 (Write Back)
In this stage, computed/fetched value is written back to the register present in the
instruction.
7
COMPUTER ARCHITECTURE
Performance of a pipelined processor
Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to be
completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to
come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a
total of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) T
So, speedup (S) of the pipelined processor over non-pipelined processor, when ‘n’ tasks
are executed on the same processor is:
S = Performance of pipelined processor /
Performance of Non-pipelinedprocessor
S = ETnon-pipeline / ETpipeline
When the number of tasks ‘n’ are significantly larger than k, that is, n >> k
S=n*k/nS=k
where ‘k’ are the number of stages in the pipeline.
So, Efficiency = S / k
8
COMPUTER ARCHITECTURE
PIPELINE HAZARD
There are some factors that cause the pipeline to deviate its normal performance. Some of these
factors are given below:
1. Timing Variations: All stages cannot take same amount of time. This problem generally
occurs in instruction processing where different instructions have different operand
requirements and thus different processing time.
2. Data Hazards: When several instructions are in partial execution, and if they reference same
data then the problem arises. We must ensure that next instruction does not attempt to access
data before the current instruction, because this will lead to incorrect results.
3. Branching: In order to fetch and execute the next instruction, we must know what that
instruction is. If the present instruction is a conditional branch, and its result will lead us to
the next instruction, then the next instruction may not be known until the current one is
processed.
4. Interrupts: Interrupts set unwanted instruction into the instruction stream. Interrupts effect
the execution of instruction
5. Data Dependency: It arises when an instruction depends upon the result of a previous
instruction.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
How ILP impacts pipelines
ILP allows pipelines to overlap the execution of multiple instructions, which can increase CPU
throughput. Pipelines work by dividing instructions into a series of steps, or stages that are
performed by different processor units. When an instruction completes a stage, the functional unit
can be used to perform the same stage of another instruction.
How ILP is impacted by pipelines
The degree of ILP depends on how the program instructions depend on each other. If instructions
are independent, they can be executed in parallel, but if one instruction depends on another, they
must execute in order.
Techniques used in modern processors to improve performance and parallelize programs are:
Out-of-order execution (OOO): Instructions are executed based on resource availability and
readiness of operands, rather than their original order in the program.
9
COMPUTER ARCHITECTURE
Speculative execution: Instructions are executed before it is certain they will be needed.
Memory management is the functionality of an operating system which handles or manages primary
memory and moves processes back and forth between main memory and disk during execution.
Memory management keeps track of each and every memory location, regardless of either it
is allocated to some process or it is free.
It checks how much memory is to be allocated to processes.
It decides which process will get memory at what time.
It tracks whenever some memory gets freed or unallocated and correspondingly it updates the
status.
10
COMPUTER ARCHITECTURE
There are two Memory Management Techniques: Contiguous, and Non-Contiguous.
In Contiguous Technique, executing process must be loaded entirely in main
memory.
In contiguous memory allocation each process is contained in a single contiguous
block of memory.
Memory is divided into several fixed size partitions.
Each partition contains exactly one process.
Contiguous Technique can be divided into:
1. Fixed (or static) partitioning
2. Variable (or dynamic) partitioning
Fixed Partitioning: This is the oldest and simplest technique used to put more than one
processes in the main memory. In this partitioning, number of partitions (non-
overlapping) in RAM are fixed but size of each partition may or may not be same. As it is
contiguous allocation, hence no spanning is allowed. Here partitions are made before
execution or during system configure.
11
COMPUTER ARCHITECTURE
1. Initially RAM is empty and partitions are made during the run-time according to
process’s need instead of partitioning during system configure.
2. The size of partition will be equal to incoming process.
3. The partition size varies according to the need of the process so that the internal
fragmentation can be avoided to ensure efficient utilization of RAM.
4. Number of partitions in RAM is not fixed and depends on the number of incoming
process and Main Memory’s size.
12
COMPUTER ARCHITECTURE
The addresses a program may use to reference memory are distinguished from the
addresses the memory system uses to identify physical storage sites, and program
generated addresses are translated automatically to the corresponding machine
addresses.
The size of virtual storage is limited by the addressing scheme of the computer
system and amount of secondary memory is available not by the actual number of
the main storage locations. It is a technique that is implemented using both
hardware and software.
It maps memory addresses used by a program, called virtual addresses, into
physical addresses in computer memory.
All memory references within a process are logical addresses that are dynamically
translated into physical addresses at run time. This means that a process can be
swapped in and out of main memory such that it occupies different places in main
memory at different times during the course of execution.
A process may be broken into number of pieces and these pieces need not be
continuously located in the main memory during execution. The combination of
dynamic run-time address translation and use of page or segment table permits
this.
If these characteristics are present then, it is not necessary that all the pages or segments
are present in the main memory during execution.
This means that the required pages need to be loaded into memory whenever required.
Virtual memory is implemented using Demand Paging or Demand Segmentation.
There are two ways in which virtual memory is handled: paging and
segmentation.
13
COMPUTER ARCHITECTURE
PAGING
Paging is a memory management scheme that eliminates the need for contiguous
allocation of physical memory.
This scheme permits the physical address space of a process to be non – contiguous.
Paging is a fixed size partitioning scheme.
In paging, secondary memory and main memory are divided into equal fixed size
partitions.
The partitions of secondary memory are called as pages.
The partitions of main memory are called as frames.
Each process is divided into parts where size of each part is same as page size.
The size of the last part may be less than the page size.
The pages of process are stored in the frames of main memory depending upon their
availability.
The advantages of paging are
It allows to store parts of a single process in a non-contiguous fashion.
It solves the problem of external fragmentation.
The disadvantages of paging are
It suffers from internal fragmentation.
There is an overhead of maintaining a page table for each process.
The time taken to fetch the instruction increases since now two memory accesses are
required.
SEGMENTATION:
Segmentation is a memory management technique in which each job is divided into
several segments of different sizes, one for each module that contains pieces that perform
related functions.
14
COMPUTER ARCHITECTURE
Each segment is actually a different logical address space of the program.
When a process is to be executed, its corresponding segmentation are loaded into non-
contiguous memory though every segment is loaded into a contiguous block of available
memory.
Segmentation memory management works very similar to paging but here segments
are of variable-length where as in paging pages are of fixed size.
The operating system maintains a segment map table for every process and a list of free
memory blocks along with segment numbers, their size and corresponding memory
locations in main memory.
For each segment, the table stores the starting address of the segment and the length of
the segment.
A reference to a memory location includes a value that identifies a segment and an
offset.
Segment Table – It maps two-dimensional Logical address into one dimensional
Physical address. It’s each table entry has: 1. Base Address: It contains the starting
physical address where the segments
Advantages of Segmentation
No Internal fragmentation.
Segment Table consumes less space in comparison to Page table in paging.
Disadvantage of Segmentation
As processes are loaded and removed from the memory, the free memory space is
broken into little pieces, causing External fragmentation.
FRAGMENTATION:
As processes are loaded and removed from memory, the free memory space is broken
into little pieces.
It happens after sometimes that processes cannot be allocated to memory blocks
considering their small size and memory blocks remains unused.
This problem is known as Fragmentation.
Fragmentation is of two types:
1. External fragmentation: Total memory space is enough to satisfy a request or to
reside a process in it, but it is not contiguous, so it cannot be used.
15
COMPUTER ARCHITECTURE
2. Internal fragmentation: Memory block assigned to process is bigger. Some
portion of memory is left unused, as it cannot be used by another process.
PAGE FAULT:
A page fault occurs when a program attempts to access a block of memory that is not
stored in the physical memory, or RAM.
The fault notifies the operating system that it must locate the data in virtual memory,
then transfer it from the storage device, such as an HDD or SSD, to the system RAM.
16