Parallel Processing Report
Parallel Processing Report
Coursework 2017
Parallel Processing
SUPERVISOR: Dr. SAGVAN ALI SALIH
Better algorithms.
Greater reliability.
Parallelism Methods
1) TEMPORAL PARALLELISM
2) DATA PARALLELISM
This structure of a computer was proposed by John Von Neumann in the mid 1940s
and is known as the Von Neumann Architecture. In this architecture, a program is first
stored in the memory. The PE retrieves one instruction of this program at a time,
interprets it and executes it. The operation of this computer is thus sequential. At a time,
the PE can execute only one instruction. The speed of this sequential computer is thus
limited by the speed at which a PE can retrieve instructions and data from the memory
and the speed at which it can process the retrieved data. To increase the speed of
processing of data one may increase the speed of the PE by increasing the clock
speed. The clock speed increased from a few hundred kHz in the 1970s to 3 GHz in
2005. Processor designers found it difficult to increase the clock speed further as the
chip was getting overheated. The number of transistors which could be integrated in a
chip could, however, be doubled every two years. Thus, processor designers placed
many processing “cores” inside the processor chip to increase its effective throughput.
The processor retrieves a sequence of instructions from the main memory and stores
them in an on-chip memory. The “cores” can then cooperate to execute these
instructions in parallel. A computer which consists of a number of inter-connected
computers which cooperatively execute a single program to solve a problem is called a
parallel computer. Rapid developments in electronics have led to the emergence of
processors which can process over 5 billion instructions per second. Such processors
cost only around $100. It is thus possible to economically construct parallel computers
which use around 4000 such multicore processors to carry out ten trillion (1013 )
instructions per second assuming 50% efficiency. The more difficult problem is to
perceive parallelism in algorithms and develop a software environment which will enable
application programs to utilize this potential parallel processing power.
A sequential computer which exploits no parallelism in either the instruction or data streams.
Single control unit fetches single instruction stream from memory. The control unit then
generates appropriate control signals to direct single processing unit (PU) to operate on
single data stream i.e., one operation at a time.
Single Instruction Multiple Data (SIMD):
Single instruction, multiple data, or SIMD, systems are parallel systems. As the name
suggests, SIMD systems operate on multiple data streams by applying the same
instruction to multiple data items, so an abstract SIMD system can be thought of as
having a single control unit and multiple ALUs.
Note that: in a SIMD system, the ALUs must operate synchronously, that is, each ALU
must wait for the next instruction to be broadcast before proceeding.
Finally, SIMD systems are ideal for parallelizing simple loops that operate on large
arrays of data. Parallelism that’s obtained by dividing data among the processors and
having the processors all apply the same instructions to their subsets of the data is
called data-parallelism.
MIMD are currently the most common than others and can be broadly divided according
to the organization of the memory into three sub-classes: :
The MIMD architectures is primarily used in a number of application areas, including the
following:
• Computer-aided design.
• Computer-aided manufacturing.
• Simulation.
• Modeling.
However, in practice, this class is best suited to machines with a limited number of
processors, because increasing the number of processors, may constitute a bottleneck
with the access to the shared memory.
Distributed memory system refer to a computer system in which each processor has its
own memory space. Different processors communicate over an interconnected network.
So in distributed-memory systems the processors usually communicate explicitly by
sending messages through the network or by using special functions that provide
access to the memory of another processor.
The communication model can allow a considerable increase in speed, but its
programming is difficult since the programmers have to handle all communication
operations.
The combination of both shared and distributed memory mechanisms (noted as mixed
or hybrid memory architectures) provides a flexible means to adapt to various
computing platforms.
This combination may increase scalability, increase performance computing, speed up
computation, and permit to efficient utilization of the existing hardware capacities.
However, this type of architecture combines advantages, but also may combine
the disadvantages of the both architectures.