Multiprocessors Shared Memory

Motivation
Multiprocessors
Challenges
Bus-based shared-memory multiprocessors
Shared-memory multiprocessor architecture
History
I Multiple processor on a single board, communicating over
a shared bus, using loads/stores and a cache coherence
protocol (80’s–90’s)
I Multiple processors on multiple boards in a single cabinet,
communicating over a shared bus (on-board) and a
scalable switch-based interconnection network (late 90’s)
I Multiple processors on a single chip, communicating over a
shared bus (2004–onwards) or a scalable switch-based
interconnection network (2008–onwards)
Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 21 / 36

Motivation
Multiprocessors
Challenges
Shared-memory multiprocessor architecture

Technology trends
I Bus is a centralized bottleneck and its BW is not adequate
to support more than a few (e.g. 10) processors).
Replaced by switch-based interconnect
I Cache coherence desirable due to programmability
I Processors communicate through loads and stores, with a
model which is more familiar to sequential programmers.
I Communication between producers and consumers done
by producer storing data in local cache and consumer
requesting data from remote cache
I Coherence protocol maintains consistency between
replicas of data potentially written by one or more
processors.

Motivation
Multiprocessors
Challenges
Cache coherence
Shared data
I Private cache per processor (or processor core) on SMPs
I Cache stores both
I Private data used only by owner processor
I Shared data accessed by multiple processors
I Caches reduce latency to access shared data, memory
bandwidth consumption, interconnect bandwidth
consumption
I Cache coherence problem

Motivation
Multiprocessors
Challenges
Cache coherence problem
P1 P2 P3
u=5 $ $
Interconnection network
1
DRAM
u=5
… DRAM

Motivation
Multiprocessors
Challenges
P1 P2 P3
u=5 $ u=5
1 2
DRAM
u=5
… DRAM

Motivation
Multiprocessors
Challenges
P1 P2 P3
u=5 $ u=7
1
DRAM
u=5
… DRAM
Processor 3 writes new value of u. Processor 1 and processor

Motivation
Multiprocessors
Challenges
P1 P2
Assume initial values A=0, flag=0
A = 1; while (flag==0); /* busy-wait */
flag = 1; print A;
P1 expects that A=1 after exiting the while. Intuition not

guaranteed by coherence. If memory writes from P0 commit in
order then intuition is verified. If not, then P1 may see A = 0!
The memory system is typically expected to preserve ordering
of memory accesses by a single processor but not across
processors.

Motivation
Multiprocessors
Challenges
Coherence versus consistency
What is the value of a memory location?

I Every read of a memory location should return the last
value written to the memory location
I In uniprocessors this is easy to implement and guaranteed
except from when the processor performs I/O (DMAs)
I Coherence defines the values returned by a read
I Consistency defines when a write from a processor
becomes visible to other processors
I Coherence defines the behavior of a single processor
while consistency defines the behavior across processors

Motivation
Multiprocessors
Challenges
Coherent memory system
Preserving program order

A read by processor P to location X that follows a write by P to
X, with no writes to X made by other processors between the
write and the read by P will always return the value written by P.
Coherent view of memory

A read by a processor to location X that follows a write by
another processor to X, returns the value of the write of the two
accesses are separated sufficiently apart in time and no other
writes to X occur between the two accesses.

Motivation
Multiprocessors
Challenges
Coherent memory system

Write serialization
2 writes to the same location by any 2 processors are seen in the same order
by all processors
I Assume that writes are not serialized: Two processors may proceed
assuming different last values of the same location
Write consistency
A write does not complete and does not allow the next write to occur until all
processors have “seen” the effect of that write. The processor does not
change the order of any write with respect to other reads or writes
I If a processor writes location A then location B, any processor that sees
the new value of B must also see the new value of A
I Reads can be reordered (module dependencies) but writes must
happen in program order

Motivation
Multiprocessors
Challenges
Schemes for enforcing coherence

I Multiple processors may have copies of same data
(common in parallel programs)
I SMPs typically use a cache coherence protocol
implemented in hardware, although slower software
solutions are also available
I Key operations: replication and migration of data:
I Migration: data can be moved to the cache of a single
processor and used for reading or writing transparently.
Reduces latency and demand for bandwidth.
I Replication: Data can be simultaneously read by multiple
processors, by having processors make copies of data in
their local caches. Reduces latency, demand for bandwidth
and contention for accessing shared data.

Motivation
Multiprocessors
Challenges
Classes of cache coherence protocols

I Directory based: Sharing status of a cache block (i.e. what
processors have a copy of the block in the cache and
whether this copy has been updated) is kept in one
location (in memory, or on-chip in recent multi-core
processors) called the directory
I Snooping: Every cache with a copy of a block also has
information on the sharing status of the block, but no
centralized state is kept.
I All caches are accessible via a centralized broadcasting
mechanism (typically a bus, nowadays a switch).
I All cache controllers monitor (or snoop) the centralized
medium to determine whether they have or not a copy of
the block requested by another processor, and update
sharing state.

Motivation
Multiprocessors
Challenges
Snoopy cache coherence
P1 … Pn
snoop Cache to I Cache controller snoops all

transaction memory
$ $ transaction
transactions on the shared
(e.g. writeback, interconnect.
or writethrough)
I A transaction is relevant if it
Interconnection network involves a block stored in the
cache of the snooping processor.
DRAM … DRAM

Motivation
Multiprocessors
Challenges
Snoopy cache coherence

I If transaction is on relevant
P1 … Pn
block, controller takes action to
ensure coherence.
I Action may be invalidate (block
snoop Cache to
written by other processor),
transaction memory update (block written by another
$ $ transaction
(e.g. writeback,
processor and new value stored
or writethrough) in the cache of the snooping
processor), pr supply new value
Interconnection network (requested by other processor).
I Processor that needs to write
either gets exclusive access to
DRAM … DRAM block by invalidating other
copies, or writes and updates
other copies.

Motivation
Multiprocessors
Challenges
Example: write-through, write-invalidate
P1 P2 P3
u=5 $ u=7
1
DRAM
u=7
… DRAM

Motivation
Multiprocessors
Challenges
Example: write-through, write-update
P1 P2 P3
u=7 $ u=7
1
DRAM
u=7
… DRAM

Multiprocessors Shared Memory

Uploaded by

Copyright:

Available Formats

Multiprocessors Shared Memory

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiprocessors Shared Memory

Uploaded by

Copyright:

Available Formats

Motivation

Shared-memory multiprocessor architecture

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 21 / 36

Shared-memory multiprocessor architecture

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 22 / 36

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 23 / 36

Cache coherence problem

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 24 / 36

Cache coherence problem

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 25 / 36

Cache coherence problem

Processor 3 writes new value of u. Processor 1 and processor

Cache coherence problem

P1 expects that A=1 after exiting the while. Intuition not

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 27 / 36

Coherence versus consistency

What is the value of a memory location?

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 28 / 36

Coherent memory system

Preserving program order

Coherent view of memory

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 29 / 36

Coherent memory system

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 30 / 36

Schemes for enforcing coherence

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 31 / 36

Classes of cache coherence protocols

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 32 / 36

Snoopy cache coherence

snoop Cache to I Cache controller snoops all

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 33 / 36

Snoopy cache coherence

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 34 / 36

Example: write-through, write-invalidate

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 35 / 36

Example: write-through, write-update

Dimitrios S. Nikolopoulos HY425 Lecture 16: Shared-Memory Multiprocessors 36 / 36

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.