Lecture 7 Main Memory

Lecture 7
Main Memory
Memory Technology
• Random Access Memory (vs. Serial Access Memory)
• Cache uses SRAM: Static Random Access Memory
– No refresh (6 transistors/bit vs. 1 transistor
Size: DRAM/SRAM
Cost/Cycle time: SRAM/DRAM
• Access Time - Time between when a read is requested
and when the requested data arrives
• Cycle Time - The minimum time between two unrelated
requests to the memory. Must include refreshing time if
any
• Main Memory is DRAM: Dynamic Random Access Memory
– Dynamic since needs to be refreshed periodically
– Addresses divided into 2 halves (Memory as a 2D matrix):
» RAS or Row Access Strobe
» CAS or Column Access Strobe
Dynamic RAM
• SRAM cells exhibit high speed/poor density (6 cells per
transistor)
• DRAM: simple transistor/capacitor pairs in high density
form implemented in CMOS
CAS or Column Access Strobe
Word Line
RAS or Row
Bit Line Access Strobe
.
.
.
Sense Amp
Every read must have memory refreshing to restore the charge on the capacitor
DRAM cell
• During a read or write, the wordline goes high

and the transistor connects the capacitor to
the bitline. Whatever value is on the bitline
('1' or '0') gets stored or retrieved from the
capacitor.
• 1 GB of memory
• “2Rx8”. The 2R means that this module
is of rank 2, while the x8 (pronounced
“by eight”) denotes the output width of
the data coming from each DRAM chip.
• A rank is a separately addressable set
of DRAMs. In this case, one rank is a
set of four DRAM chips. Since there are
eight total (front/back), we have 2 ranks.
• The rank of a DRAM module is the

highest level of organization within a
DIMM. Below that, each chip is
organized into a number of banks and
memory arrays containing rows and
columns.
a DRAM chip with four banks.
Rank, Bank, Row, and
Column
• the rank of a DRAM is a
set of separately
addressable DRAM chips.
• Each DRAM chip is
further organized into a
number of banks that
contain a set of memory
arrays.
• The number of memory
arrays per bank is equal
to the size of the output
width. Therefore in a x4
DRAM chip, the internal
banks would each have
four memory arrays.
a single x4 bank.
Introduction
High Memory Demand for
Multicore Processors
Aggregate peak bandwidth grows with # cores:
» Intel Core i7 can generate two references
per core per clock
» Four cores and 3.2 GHz clock
25.6 billion 64-bit data references/second +
12.8 billion 128-bit instruction references
= 409.6 GB/s!
» DRAM bandwidth is only 6% of this (25
GB/s)
» Requires:
• Multi-port, pipelined caches
• Two levels of cache per core
• Shared third-level cache on chip
DRAM Banks
Figure 2.12 Internal organization of a DRAM. Modern DRAMs are organized in banks, typically four for DDR3. Each
bank consists of a series of rows. Sending a PRE (precharge) command opens or closes a bank. A row address is
sent with an Act (activate), which causes the row to transfer to a buffer. When the row is in the buffer, it can be
transferred by successive column addresses at whatever the width of the DRAM is (typically 4, 8, or 16 bits in DDR3)
or by specifying a block transfer and the starting address. Each command, as well as block transfers, are
synchronized with a clock.
multi-channel memory architecture is a technology that increases the data
transfer rate between the DRAM memory and the memory controller by
adding more channels of communication between them
Simplified View
DRAM Commands
block
Memory Interleaving
Also helps to reduce memory latency in single cores
Module accessed
Addresses that 0
are 0 mod 4
1
Addresses that 2
Add- are 1 mod 4 Data
ress Dispatch out 3
(based on Return
2 LSBs of data 0
Data address) Addresses that
in are 2 mod 4 1
2
Bus cycle
Addresses that
3
are 3 mod 4 Memory cycle Time
Interleaved memory is more flexible than wide-access memory in

that it can handle multiple independent accesses at once.
Memory and Bus Organizations
CPU CPU CPU
Multiplexor
Cache Cache
Cache
Bus Bus Bus
Memory Memory Memory Memory

Memory
bank 0 bank 1 bank 2 bank 3
Memory wide memory organization Interleaved memory

– a row consists of many words organization (high or
Expensive due to wide bus low-ordered)
and control circuits.
one-word wide
memory organization
Memory Access Time Example
• Assume that it takes 1 cycle to send the address, 15 cycles for

each DRAM access and 1 cycle to send a word of data.
• Assuming a cache block of 4 words and one-word wide DRAM,
miss penalty = 1 + 4x15 + 4x1 = 65 cycles
• With main memory and bus width of 2 words, miss penalty = 1 +
2x15 + 2x1 = 33 cycles. For 4-word wide memory, miss penalty
is 17 cycles.
• With interleaved memory (word-interleaved) of 4 memory banks
and same bus width, the miss penalty = 1 + 1x15 + 4x1 = 20
cycles. The memory controller must supply consecutive addresses
to different memory banks. Interleaving is universally adapted in
high-performance computers.
PC100 is a standard for internal removable computer random access memory, defined by
the JEDEC. PC100 refers to Synchronous DRAMoperating at a clock frequency of 100 MHz,
on a 64-bit-wide bus
Q: What are PC2100 and Rambus? Refer to Wikipedia

Row Buffer
One page/row
Q: Can we schedule memory accesses to maximize row buffer hit?
Memory Access Scheduling – Good research topic
Memory Technology
Memory Technology
• Amdahl:
– Memory capacity should grow linearly with processor
speed
– Unfortunately, memory capacity and speed has not
kept pace with processors
• Some optimizations:
– Multiple accesses to same row
– Synchronous DRAM
» Added clock to DRAM interface
» Burst mode with critical word first
– Wider interfaces
– Double data rate (DDR)
– Multiple banks on each DRAM device
DRAM Technology
Constant improvement in packaging and data transfer rate at the cost
of capacity increase
•DRAM followed Moore’s law for 20 years bringing out a new chip with 4 times the
capacity every three years.
•Due to packaging constraint, new chips double in capacity every two years
•Since 2006, the capacity doubles only in four years
Improving DRAM Performance

•Add timing signals for repeated access to the row buffer without another row access
time. 1024-4096 bits in each row access buffer, to be read with consequent column
access
•Instead of asynchronous access and synchronization overhead, add clock to allow
repeated access without overhead => Synchronous DRAM (SDRAM). With every
clock, send data in burst mode (8-16bit transfers) –no need to send the address again
•Increase bandwidth by transferring data both on rising and falling edge of the clock –
double data rate (DDR) technology
•Make memory buses wider. Initially, they offered 4-bit transfer mode. DDR2 and
DDR3 have 16-bit buses
•Add banks to allow interleaving and help power management – ex DDR3
Memory Technology
Memory Optimizations
• DDR:
– DDR2
» Lower power (2.5 V -> 1.8 V)
» Higher clock rates (266 MHz, 333 MHz, 400 MHz)
– DDR3
» 1.5 V
» 800 MHz
– DDR4
» 1-1.2 V
» 1600 MHz
• GDDR5 is graphics memory based on

DDR3
Memory Technology
Memory Technology
DIMM: Dual inline memory modules containing 4-16 DRAMs on a board

Other Types of DRAM
• Synchronous DRAM (SDRAM): Ability to transfer
a burst of data given a starting address and a burst length
– suitable for transferring a block of data from main
memory to cache.
• Page Mode DRAM: All bits on the same
ROW (Spatial Locality)
– Don’t need to wait for wordline to recharge
– Toggle CAS with new column address
• Extended Data Out (EDO)
– Overlap Data output w/ CAS toggle
– Later brother: Burst EDO (CAS toggle used to get next addr)
• Rambus DRAM (RDRAM)

- Pipelined control
Memory Technology
• Graphics memory:
– Achieve 2-5 X bandwidth per DRAM vs.
DDR3
» Wider interfaces (32 vs. 16 bit)
» Higher clock rate
• Possible because they are attached via soldering
instead of socketted DIMM modules
• Reducing power in SDRAMs:

– Lower voltage
– Low power mode (ignores clock, continues to
refresh)
Memory Technology
Memory Power Consumption
Power consumption for a DDR3 SDRAM operating under three conditions: low
power (shutdown) mode, typical system mode (DRAM active 30% for reads and
15% for writes), and fully active mode, where DRAM is continuously reading or
writing. All recent SDRAMs support power down mode.
Memory Technology
Flash Memory
• Type of EEPROM
• Must be erased (in blocks) before being
overwritten
• Non volatile
• Limited number of write cycles
• Cheaper than SDRAM, more expensive
than disk
• Slower than SRAM, faster than disk

Lecture 7 Main Memory

Uploaded by

Copyright:

Available Formats

Lecture 7 Main Memory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 7 Main Memory

Uploaded by

Copyright:

Available Formats

Lecture 7

• During a read or write, the wordline goes high

• The rank of a DRAM module is the

Interleaved memory is more flexible than wide-access memory in

Bus Bus Bus

Memory Memory Memory Memory

Memory wide memory organization Interleaved memory

• Assume that it takes 1 cycle to send the address, 15 cycles for

Q: What are PC2100 and Rambus? Refer to Wikipedia

Improving DRAM Performance

• GDDR5 is graphics memory based on

DIMM: Dual inline memory modules containing 4-16 DRAMs on a board

• Rambus DRAM (RDRAM)

• Reducing power in SDRAMs:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.