KCS 713 Unit 1 Lecture 5
KCS 713 Unit 1 Lecture 5
Accomplished by breaking the problem into independent parts so that each processing element
can execute its part of the algorithm simultaneously with the others
The computational problem should be:
Solved in less time with multiple compute resources than with a single
compute resource
In theory, throwing more resources at a task will shorten its time to completion, with potential
cost savings
Limits to miniaturization
• Processor technology is allowing an increasing number of transistors to be placed
on a chip
• However, even with molecular or atomic-level components, a limit will be reached on how
small components can be
Limits to serial computing
Economic limitations
• It is increasingly expensive to make a single processor faster
• Using a larger number of moderately fast commodity processors to achieve the
same or better performance is less expensive
Current computer architectures are increasingly relying upon hardware level
parallelism to improve performance
• Multiple execution units
• Pipelined instructions
• Multi-core
Von Neumann Architecture
Named after the Hungarian mathematician John von Neumann who first authored the
general requirements for an electronic computer in his 1945 papers
Since then, virtually all computers have followed this basic design
• Single Data: Only one data stream is being used as input during any
• one clock cycle
Parallel computer
Multiple-processor or multi-core system supporting parallel programming
Parallel programming
Programming in a language that supports concurrency explicitly
Distributed Memory
In hardware, refers to network based memory access for physical memory that is not
common
As a programming model, tasks can only logically "see" local machine memory and must
use communications to access memory on other machines where other tasks are
executing
Communications
Parallel tasks typically need to exchange data. There are several ways this can be
accomplished, such as through a shared memory bus or over a network, however the actual
event of data exchange is commonly referred to as communications regardless of the
method employed
Synchronization
Coordination of parallel tasks in real time, very often associated with
communications
Often implemented by establishing a synchronization point within an application where a
task may not proceed further until another task(s) reaches the same or logically equivalent
point
Synchronization usually involves waiting by at least one task, and can therefore cause a
parallel application's wall clock execution time to increase
Parallel Computer Memory Architectures
Shared Memory
All processors access all memory as global address space
Multiple processors can operate independently but share the same memory resources
Changes in a memory location effected by one processor are visible to all other processors
Shared memory machines are classified as UMA and NUMA, based upon memory access times
Uniform Memory Access (UMA)
• Identical processors
Cache coherent means if one processor updates a location in shared memory, all the other
processors know about the update. Cache coherency is accomplished at the hardware level
Non-Uniform Memory Access (NUMA)
• Often made by physically linking two or more SMPs
• One SMP can directly access memory of another SMP
• Not all processors have equal access time to all memories
• Memory access across link is slower
• If cache coherency is maintained, then may also be called CC- NUMA - Cache Coherent NUMA
Advantages
Global address space provides a user-friendly programming
perspective to memory
Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs
Disadvantages
Primary disadvantage is the lack of scalability between memory and CPUs
• Adding more CPUs can geometrically increases traffic on the shared memory-CPU
path, and for cache coherent systems, geometrically increase traffic associated with
cache/memory management
Programmer responsibility for synchronization constructs that ensure "correct" access of
global memory
Important Questions