Multiprocessors Shared Memory
Multiprocessors Shared Memory
Multiprocessors Shared Memory
Multiprocessors
Challenges
Bus-based shared-memory multiprocessors
History
I Multiple processor on a single board, communicating over
a shared bus, using loads/stores and a cache coherence
protocol (80’s–90’s)
I Multiple processors on multiple boards in a single cabinet,
communicating over a shared bus (on-board) and a
scalable switch-based interconnection network (late 90’s)
I Multiple processors on a single chip, communicating over a
shared bus (2004–onwards) or a scalable switch-based
interconnection network (2008–onwards)
Cache coherence
Shared data
I Private cache per processor (or processor core) on SMPs
I Cache stores both
I Private data used only by owner processor
I Shared data accessed by multiple processors
I Caches reduce latency to access shared data, memory
bandwidth consumption, interconnect bandwidth
consumption
I Cache coherence problem
P1 P2 P3
u=5 $ $
Interconnection network
1
DRAM
u=5
… DRAM
P1 P2 P3
u=5 $ u=5
Interconnection network
1 2
DRAM
u=5
… DRAM
P1 P2 P3
u=5 $ u=7
Interconnection network
1
DRAM
u=5
… DRAM
P1 P2
Assume initial values A=0, flag=0
A = 1; while (flag==0); /* busy-wait */
flag = 1; print A;
Write consistency
A write does not complete and does not allow the next write to occur until all
processors have “seen” the effect of that write. The processor does not
change the order of any write with respect to other reads or writes
I If a processor writes location A then location B, any processor that sees
the new value of B must also see the new value of A
I Reads can be reordered (module dependencies) but writes must
happen in program order
P1 … Pn
DRAM … DRAM
P1 … Pn
block, controller takes action to
ensure coherence.
I Action may be invalidate (block
snoop Cache to
written by other processor),
transaction memory update (block written by another
$ $ transaction
(e.g. writeback,
processor and new value stored
or writethrough) in the cache of the snooping
processor), pr supply new value
Interconnection network (requested by other processor).
I Processor that needs to write
either gets exclusive access to
DRAM … DRAM block by invalidating other
copies, or writes and updates
other copies.
P1 P2 P3
u=5 $ u=7
Interconnection network
1
DRAM
u=7
… DRAM
P1 P2 P3
u=7 $ u=7
Interconnection network
1
DRAM
u=7
… DRAM