Unit 3 - Memory Organization
Unit 3 - Memory Organization
Unit 3 - Memory Organization
Memory Organization
Unit 3 – Memory
Organization
Memory Characteristics & Hierarchy
Internal Memory
Memory size and Memory Cell
RAM, SRAM, DRAM
Advanced DRAM-(SDRAM, DDR, RDRAM, RDRAM)
ROM and its types
Cache Memory
Cache memory principles
Multi level cache
Cache Mapping techniques
Replacement Algorithms
Write policy – Write through and write back
External Memory
Magnetic disks, Disk Layout
Optical memory
Memory
Memory is one of the key components of the
embedded systems.
It is also one of the major limiting resources
in the embedded systems.
Building block of memory – bit
Stores one piece of Boolean information (0
or 1)
Memory Interfaces
CPU reads or writes data to Memory through
Bus and Memory controllers; requires
address, data, and read/write signal.
Memory Characteristics
Location
Volatility
Capacity
Unit of transfer
Access methods
Performance
Memory Characteristics
Locatio Refers to whether memory is internal and
n external to the computer.
Internal Memory = Main Memory (The
processor requires its own local memory, in
the form of registers)
Internal (processor registers, cache, main
memory)
External (optical disks, magnetic disks,
tapes)
Volatili Refers to the ability of memory to store
ty data without power.
Two types: volatile and non-volatile
Volatile - requires power to store data :
SRAM, DRAM
Memory Characteristics
Capaci Refers to the amount of storage a memory
ty can hold.
• Number of bytes ( 8 bits )
• Number of words - biggest chunk of bits with
which a processor can do processing.
• WORD, DOUBLEWORD ( DWORD ) and
QUADWORD ( QWORD ) are used for 2, 4 and
8 byte sizes.
Unit Internal Governed by data bus width i.e. size of a bus
16-bitbus can transmit 16 bits of data
of i7 bus width Total (192 bits) and Address
transf bus-40 bits
er External Usually a block which is much larger than a
word
Block - A "memory block" is a contiguous
chunk of memory.
Example - malloc(2*sizeof(int));
Design Tradeof
considerations Faster access time, greater
How expensive? –
Reasonable i.e. cost.
Memory Hierarchy
Smaller, more expensive,
faster memories.
Write
Memory Type Category Erasure Volatility
Mechanism
Read-only Masks
memory- ROM Read-only Not possible
memory
Programmable
ROM (PROM)
Electrically,
Flash memory block-level
Dynamic RAM – DRAM
RAM types-
• Dynamic RAM
• Static RAM
DRAM
Made with cells that store data as charge on
capacitors.
Presence or absence of charge in a capacitor is
interpreted as a binary 1 or 0.
Requires periodic charge refreshing to
maintain data storage.
The term dynamic refers to tendency of the stored
charge to leak away, even with power
DRAM cell structure for 1 bit
(Bits stored as charge in capacitors)
WRITE OPERATION
A voltage signal is applied
to the bit line.
A high voltage represents 1
A signal is applied to
address line i.e. charge is
Address line is transferred to capacitor.
activated when bit
value is to be read
or written.
DRAM cell structure for 1
bit
READ OPERATION
The instruction find the
bit store using the
address line to read the
data or bit.
The address line is
selected.
The transistor turns on.
The charge stored on the
capacitor is fed out onto a
bit line and a sense
amplifier.
write requests
simultaneously
Read Only Memory (ROM)
Contains a permanent pattern of data that
cannot be changed or added to.
No power source is required to maintain the bit
values in memory.
Data or program is permanently in main
memory and never needs to be loaded from a
secondary storage device.
Data is actually wired into the chip as part of the
fabrication process
Disadvantages of this:
No room for error, if one bit is wrong the whole batch
of ROMs must be thrown out.
Data insertion step includes a relatively large fixed
Programmable ROM
(PROM)
Less expensive alternative.
Nonvolatile and may be written into only once.
Writing process is performed electrically and may
be performed by supplier or customer at a time
later than the original chip fabrication.
Special equipment is required for the writing
process.
Provides flexibility and convenience.
Attractive for high volume production runs .
Read-Mostly Memory
Flash
EEPRO
EPROM Memor
M
Erasable
Electrically erasable
y Intermediate
programmable read- between EPROM and
programmable read- only memory EEPROM in both cost
only memory
and functionality
Can be written into at
any time without
erasing prior Uses an electrical
Erasure process can contents erasing technology,
be performed Combines the does not provide
repeatedly advantage of non- byte-level erasure
volatility with the
flexibility of being
updatable in place Microchip is
More expensive than
organized so that a
PROM but it has the
section of memory
advantage of the More expensive than cells are erased in a
multiple update EPROM single action or
capability
“flash”
Advanced DRAM
Organization SDRAM
One of the most critical system
bottlenecks when using high-
performance processors is the
interface to main internal memory. DDR-DRAM
The traditional DRAM chip is constrained
both by its internal architecture and by
its interface to the processor’s memory
bus. RDRAM
A number of enhancements to the basic
DRAM architecture have been explored:
With synchronous access the DRAM moves data in and out under
control of the system clock.
• The processor issues the instruction and address information.
• The DRAM then responds after a set number of clock cycles.
• Meanwhile the master can perform other tasks while the
SDRAM is processing.
Developed by Rambus
RDRAM
Bus delivers address
Designed to transfer
data at faster rates.
and control information
using an asynchronous
block-oriented protocol
• Gets a memory Adopted by Intel for its
request over the Pentium and Itanium
high-speed bus. processors
• Request contains the
desired address, the
type of operation, and
the number of bytes in
the operation.
Cache
Main
Memory
Memory CPU
(SRAM)
(DRAM)
4. If not, the CPU has to 3. If it is, then the
fetch next instruction instruction is fetched
from main memory - a from the cache – a very
much slower process fast position
= Bus connections
Cache operation - overview
CPU requests contents of memory
location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from
main memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which
block of main memory is in each cache
slot
Typical Cache Organization
Cache Memory Structure
Cache Design
Size does matter
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Cache Addresses – Virtual
Memory
Virtual memory
Facility that allows programs to address
memory from a logical point of view, without
regard to the amount of main memory
physically available.
When used, the address fields of machine
instructions contain virtual addresses.
For reads to and writes from main memory, a
hardware memory management unit (MMU)
translates each virtual address into a physical
address in main memory.
Logical and Physical Cache
Multilevel Cache
Multilevel Cache
As logic density has increased it has become
possible to have a cache on the same chip as the
processor.
The on-chip cache reduces the processor’s
external bus activity and speeds up execution
time and increases overall system performance.
When the requested instruction
or data is FOUND in the ON- Bus access is eliminated
CHIP CACHE
On-chip CACHE ACCESSES will zero-wait state bus cycles
complete appreciably FASTER.
During this process the bus is free to support other transfers.
Multilevel Cache
Two level cache:
With internal (on-chip) cache as Level L1
External(off-chip) cache as level L2
Potential savings due to the use of an L2 cache
depends on the hit rates in both the L1 and L2
caches.
Multilevel Cache design features:
For an off-chip L2 cache, use a separate data
path, to reduce the burden on system bus.
Incorporate the L2 cache on the processor chip
for improving performance.
The use of multilevel caches complicates all of
the design issues related to caches, including
Mapping Functions
Because there are fewer cache lines than
main memory (MM) blocks, an algorithm is
needed for mapping main memory blocks
into cache lines.
The transformation of data from MM to CM is
referred to as a Mapping Process.
Tells which word of MM will be placed at
which location of cache.
Three techniques can be used-
Direct Mapping
Fully Associative Mapping
Assumptions for Mapping
Functions
Cache Memory – CM Main Memory - MM
64 KB 16 MB
LINE SIZE = 4 bytes - 22 BLOCK SIZE = 4 bytes - 22
Total 24 bits
i.e. 16 KB Lines of 4 bytes each i.e. 4M blocks of 4 bytes each
210 = 1 KB 64 KB = 210. 26
220 = 1 MB 16 MB = 220.24
230 = 1 GB 4 GB = 230.22
Direct Mapping
i.e. FIRST m
blocks maps to
first cm blocks of
cache.
NEXT m blocks
maps in the same
fashion i.e.
Block Bm to L0
Block Bm+1 to L1
Direct Mapping Address
Structure
Tag Line Word i.e. Block
Ofset
Identifies block Block # in cache Identifies a unique
inside cache. where a MM block word within a block of
will be placed. MM.
s-r bits r bits w bits
Address length =
s+w
i.e. Tag + Line +
Word
MM blocks / Cache Memory Bits as per Block
CM blocks size / Block size size
OR
Total bits – line Log2(# blocks in Log2(# words per
bits – word bits Cache) block)
Direct Mapping Address
Structure
Tag s-r Line or Slot r Word w
8 14 2
24 bit address
4 byte block i.e Word w = 22 i.e. 2 bits
24 -2 = 22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag
field
Check contents of cache by finding line and
checking Tag
Direct Mapping Cache Organization
Direct Mapping Example
Direct Mapping Summary
Address length = (s + w) bits
Number of addressable units = 2s+w words
or bytes
Block size = line size = 2w words or bytes
Number of blocks in main memory =
Address length / block size =
2s+ w/2w = 2s
Number of lines in cache = m = 2r
Size of tag = (s – r) bits
Direct Mapping pros & cons
Simple
Inexpensive
There is a fixed cache location for any
given block.
If a program happens to reference words
repeatedly from two different blocks that
map into the same line, then the blocks
will be continually swapped in the cache,
and the hit ratio will be low.
Fully Associative Mapping
Fastest and most flexible.
Many to many function mapping.
Any block of MM can be presented in any
line of CM.
Memory address is interpreted as tag and
word.
Tag uniquely identifies block of memory.
Fully Associative Mapping
Compare tag field with tag entry in cache to
check for hit.
Every line’s tag is examined for a match as
a comparator and later the results are given
to OR gate to get the actual HIT/MISS.
Cache searching gets expensive.
Fully Associative Mapping
Summary
Address length = (s + w) bits
Number of addressable units = 2s+w words or
bytes
Block size = line size = 2w words or bytes
k associative-mapped
caches
i.e. number of lines in
a set is k
Set Associative Mapping
The cache consists of a number sets, each
of which consists of a number of lines.
The relationships are -
Assumption –
2-set Associative; MM = 64 Bytes; CS = 32 Bytes;
Blk Size = 4 Bytes
Word
Tag 9 bit Set 13 bit 2 bit
Simplest technique.
All writes go to main memory as well as
cache.
Multiple CPUs can monitor main memory
traffic to keep local CPU cache up to date.
Disadvantages:
Lots of traffic creating a bottleneck.
Slows down writes
Write Back
80
When a block of data is retrieved and placed in the cache
Line Size not only the desired word but also some number of
adjacent words are retrieved.
4
Pentium 4 Core Processor
Fetch/Decode Unit
Fetches program instructions from L2 cache
Decode into a series of micro-ops
Stores the results in L1 instruction cache
Out of order execution logic
Schedules execution of micro-ops, based on data dependence and resource
availability.
Different fetching and executing order.
May speculatively execute
Execution units
Execute micro-ops
Fetch the required data from L1 cache
Temporary store results in registers
Memory subsystem
L2 cache + L3 cache + system bus
Used to access main memory when the L1 and L2 caches have a cache miss
and to access the system I/O resources
Pentium 4 Design
Reasoning
Decodes instructions into RISC like micro-ops before L1
cache.
Performance improved by separating decoding from
scheduling & pipelining.
Data cache is configured as write through.
L1 cache is controlled by 2 bits in register
CD = cache disable
NW = not write through
2 instructions to invalidate (flush) cache and write-back.
EXTERNAL
MEMORY
+
Types of External Memory
Magnetic Disk
RAID
Removable
Optical
CD-ROM
CD-Recordable (CD-R)
CD-R/W
DVD
Magnetic Tape
Read and Write Mechanisms
Recording & retrieval via conductive coil called a head
May be single read/write head or separate ones
During read/write, head is stationary, platter rotates
Write
Current through coil produces magnetic field
Pulses sent to head
Magnetic pattern recorded on surface below
Read (traditional)
Magnetic field moving relative to coil produces current
Coil is the same for read and write
Read (contemporary)
Separate read head, close to write head
Partially shielded magneto resistive (MR) sensor
Electrical resistance depends on direction of magnetic field
High frequency operation
Higher storage density and speed
Data Organization and
Formatting
Concentric rings or tracks
Gaps between tracks
Reduce gap to increase capacity
Same number of bits per track (variable
packing density)
Constant angular velocity
Tracks divided into sectors
Minimum block size is one sector
May have more than one sector per
block
Magnetic Disk – Disk
layout
External Memory