15IF11 Multicore C PDF
15IF11 Multicore C PDF
Session-3
DRAM BANKS
DRAM INTERFACE
DRAM MEMORY
CORE 1
CORE 3
CONTROLLER
L2 CACHE 1 L2 CACHE 3
L2 CACHE 0 L2 CACHE 2
CORE 2
CORE 0
SHARED L3 CACHE
DRAM (Dynamic Random Access Memory)
DRAM vs SRAM
DRAM
Slower access (capacitor)
Higher density (1T, 1C cell)
Lower cost
Requires refresh (power, performance, circuitry)
Manufacturing requires putting capacitor and logic together
SRAM
Faster access (no capacitor)
Lower density (6T cell)
Higher cost
No need for refresh
Manufacturing compatible with logic process (no capacitor)
DRAM Subsystem Organization
Channel
DIMM
Rank
Chip
Bank
Row
Column
B-Cell
The DRAM subsystem
Processor
Side view
Side view
<0:63> <0:63>
Memory channel
Breaking down a Rank
...
Chip 0
Chip 1
Chip 7
Rank 0
<56:63>
<8:15>
<0:7>
<0:63>
Data <0:63>
Breaking down a Chip
Chip 0
Bank 0
<0:7>
<0:7>
<0:7>
<0:7>
...
<0:7>
Breaking down a Bank
2kB
1B (column)
row 16k-1
...
Bank 0
row 0
<0:7>
Row-buffer
1B 1B 1B
...
<0:7>
DRAM Rank
Bank 0 of a Rank
BE0
8 8 8 8
32
Rank : A set of chips that respond to same command and
same address at the same time but with different pieces of
the requested data.
Easy to produce 8 bit chip than 32 bit chip.
Produce an 8 bit chip but control and operate them as a rank
to get a 32 bit data in a single read.
DRAM Rank
DRAM Bank Operation
Access Address:
(Row 0, Column 0) Columns
(Row 0, Column 1)
(Row 0, Column 85)
Row decoder
(Row 1, Column 0)
Rows
Row address 0
1
Row 01
Row
Empty Row Buffer CONFLICT
HIT !
Column address 85
0
1 Column mux
Data
Transferring a cache block
0xFFFF…F
Channel 0
...
DIMM 0
0x40
Rank 0
64B
cache block
0x00
Transferring a cache block
...
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data <0:63>
cache block
0x00
Transferring a cache block
Row 0 ...
Col 0
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data <0:63>
cache block
0x00
Transferring a cache block
Row 0 ...
Col 0
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data <0:63>
cache block
8B
0x00 8B
Transferring a cache block
Row 0 ...
Col 1
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data <0:63>
cache block
8B
0x00
Transferring a cache block
Row 0 ...
Col 1
...
<56:63>
<8:15>
<0:7>
0x40
64B
8B Data <0:63>
cache block
8B
0x00 8B
Transferring a cache block
Row 0 ...
Col 1
...
<56:63>
<8:15>
<0:7>
0x40
64B
8B Data <0:63>
cache block
8B
0x00
A 64B cache block takes 8 I/O cycles to transfer.
Row (14 bits) C Bank (3 bits) Column (11 bits) Byte in bus (3 bits)
Row (14 bits) Bank (3 bits) C Column (11 bits) Byte in bus (3 bits)
Row (14 bits) Bank (3 bits) Column (11 bits) C Byte in bus (3 bits)
Basic DRAM Operation
CPU → controller transfer time
Controller latency
Queuing & scheduling delay at the controller
Access converted to basic commands
Controller → DRAM transfer time
DRAM bank latency
Simple CAS (column address strobe) if row is “open” OR
RAS (row address strobe) + CAS if array precharged OR
PRE + RAS + CAS (worst case)
DRAM → Controller transfer time
Bus latency (BL)
Controller to CPU transfer time
DRAM Controller Overview
DRAM Scheduling Policies
FCFS (first come first served)
Oldest request first
FR-FCFS (first ready, first come first served)
Row-hit first and then Oldest first
Goal is to maximize row buffer hit rate
maximize DRAM throughput
Actually, scheduling is done at the command level
Column commands (read/write) prioritized over row
commands (activate/precharge)
Within each group, older commands prioritized over
younger ones
DRAM Scheduling Policies
A scheduling policy is essentially a prioritization order
Prioritization can be based on
Request age
Row buffer hit/miss status
Request type (prefetch, read, write)
Request mode (load miss or store miss)
Requestor Type (CPU, DMA, GPU)
Request criticality
Oldest miss in the core?
How many instructions in core are dependent on it?
Will it stall the processor?
Interference caused to other cores
Row Buffer Management Policies
Open row
Keep the row open after an access
Next access might need the same row row hit
Next access might need a different row row conflict, wasted
energy
Closed row
Close the row after an access (if no other requests already in
the request buffer need the same row)
Next access might need a different row avoid a row conflict
Next access might need the same row extra activate latency
Adaptive policies- Predict whether or not the next access to
the bank will be to the same row
DRAM Refresh
DRAM capacitor charge leaks over time
The memory controller needs to read each row periodically
to restore the charge
Activate + precharge each row every Nms
Typical N = 64 ms (Refresh Interval)
Implications on performance?
DRAM bank unavailable while refreshed
Long pause times: If we refresh all rows in burst, every
64ms the DRAM will be unavailable until refresh ends
DRAM Refresh
Burst refresh: All rows refreshed immediately after one
another
Distributed refresh: Each row refreshed at a different time, at
regular intervals
Distributed refresh eliminates long pause times
johnjose@iitg.ac.in
http://www.iitg.ac.in/johnjose/