0% found this document useful (0 votes)

66 views

15IF11 Multicore C PDF

Here are the key steps to transfer a 64B cache block from DRAM to the CPU cache: 1. The memory controller issues a read command to Rank 0 with the cache block address (e.g. 0x40). 2. Rank 0 activates Row 0 in Chip 0 to access the first 8B of the cache block from Column 0. 3. It then activates Row 0 in Chips 1-7 to read the remaining 56B in parallel from Columns 8-63. 4. The 64B of data is returned to the memory controller and written to the CPU cache. The row is then closed. So in summary, transferring a cache line involves activating the correct row across multiple DRAM chips

Uploaded by

Rakesh Venkatesan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

15IF11 Multicore C PDF

Uploaded by

Rakesh Venkatesan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

15IF11: Multicore Technology @ PSG Tech, Coimbatore

Session-3

Dr. John Jose

Assistant Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati, Assam.
9th & 10th March 2019
Accessing Cache Memory
Hit time
Memory
CPU Cache Miss penalty

Average memory access time (AMAT) =

Hit time + (Miss rate×Miss penalty)
 Hit Time: Time to find the block in the cache and return it to
processor [indexing, tag comparison, transfer].
 Miss Rate: Fraction of cache access that result in a miss.
 Miss Penalty: Number of cycles required to fetch the block
from the next level of memory hierarchy.

AMAT  Thit  f miss * Tmiss

Tmiss means the extra (not total) time (or cycle) for a miss in
addition to Thit, which is incurred by all accesses
How to optimize cache ?
 Reduce Average Memory Access Time
 AMAT= Hit Time + Miss Rate x Miss Penalty
 Motives
Reducing the miss rate
Reducing the miss penalty
Reducing the hit time
Larger Block Size

 Larger block size to reduce miss rate

 Advantages
Utilize spatial locality
Reduces compulsory misses
 Disadvantages
Increases miss penalty
More time to fetch a block to the cache [bus width issue]
Increases conflict misses
More number of blocks will be mapped to the same
location
May bring useless data and evict useful data [pollution]
Larger Block Size
Larger Caches
 Larger cache to reduce miss rate
 Advantages
Reduces capacity misses
Can accommodate larger memory footprint
 Drawbacks
Longer hit time
Higher cost, area and power
Larger Caches
Higher Associativity

 Higher associativity to reduce miss rate

Fully associative caches are the best, but high hit time.
So increase the associativity to the possible level
 Advantages
Reduce conflict miss
Reduce miss rate and eviction rate
 Drawbacks
Increase in the hit time
Complex design than direct mapped
More time to search in the set (tag comparison time)
AMAT vs cache associativity
Multilevel caches
 Multilevel caches to reduce miss penalty
 Performance gap between processors and memory
 Caches should be faster to keep pace with the speed of
processors, AND cache should be larger to overcome the
widening gap between the processor and main memory
 Add another level of cache between the cache and memory.
 The first-level cache (L1) can be small enough to match the
clock cycle time of the fast processor. [Low hit time]
 The second-level cache (L2) can be large enough to capture
many accesses that would go to main memory, thereby
lessening the effective miss penalty. [Low miss rate]
Multilevel caches
Components of a Modern Computer
Components of a Modern Computer
Main Memory in the System

DRAM BANKS
DRAM INTERFACE
DRAM MEMORY
CORE 1

CORE 3
CONTROLLER

L2 CACHE 1 L2 CACHE 3
L2 CACHE 0 L2 CACHE 2

CORE 2
CORE 0
SHARED L3 CACHE
DRAM (Dynamic Random Access Memory)
DRAM vs SRAM
 DRAM
Slower access (capacitor)
Higher density (1T, 1C cell)
Lower cost
Requires refresh (power, performance, circuitry)
Manufacturing requires putting capacitor and logic together
 SRAM
Faster access (no capacitor)
Lower density (6T cell)
Higher cost
No need for refresh
Manufacturing compatible with logic process (no capacitor)
DRAM Subsystem Organization

Channel
DIMM
Rank
Chip
Bank
Row
Column
B-Cell
The DRAM subsystem

“Channel” DIMM (Dual in-line memory module)

Processor

Memory channel Memory channel

Breaking down a DIMM

DIMM (Dual in-line memory module)

Side view

Front of DIMM Back of DIMM

Breaking down a DIMM

DIMM (Dual in-line memory module)

Side view

Front of DIMM Back of DIMM

Rank 0: collection of 8 chips Rank 1

Rank

Rank 0 (Front) Rank 1 (Back)

<0:63> <0:63>

Addr/Cmd CS <0:1> Data <0:63>

Memory channel
Breaking down a Rank

...

Chip 0

Chip 1

Chip 7
Rank 0

<56:63>
<8:15>
<0:7>
<0:63>

Data <0:63>
Breaking down a Chip

Chip 0

Bank 0
<0:7>
<0:7>

<0:7>

<0:7>
...

<0:7>
Breaking down a Bank
2kB
1B (column)

row 16k-1

...
Bank 0

row 0
<0:7>

Row-buffer
1B 1B 1B
...
<0:7>
DRAM Rank
Bank 0 of a Rank
BE0

Chip 0 Chip 1 Chip 2 Chip 3

8 8 8 8

32
 Rank : A set of chips that respond to same command and
same address at the same time but with different pieces of
the requested data.
 Easy to produce 8 bit chip than 32 bit chip.
 Produce an 8 bit chip but control and operate them as a rank
to get a 32 bit data in a single read.
DRAM Rank
DRAM Bank Operation
Access Address:
(Row 0, Column 0) Columns
(Row 0, Column 1)
(Row 0, Column 85)

Row decoder
(Row 1, Column 0)

Rows
Row address 0
1

Row 01
Row
Empty Row Buffer CONFLICT
HIT !

Column address 85
0
1 Column mux

Data
Transferring a cache block

Physical memory space

0xFFFF…F
Channel 0
...

DIMM 0

0x40
Rank 0
64B
cache block

0x00
Transferring a cache block

Physical memory space

Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

...
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block

0x00
Transferring a cache block

Physical memory space

Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 0
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block

0x00
Transferring a cache block

Physical memory space

Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 0
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block
8B
0x00 8B
Transferring a cache block

Physical memory space

Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 1
...

<56:63>
<8:15>
<0:7>

0x40

64B
Data <0:63>
cache block
8B
0x00
Transferring a cache block

Physical memory space

Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 1
...

<56:63>
<8:15>
<0:7>

0x40

64B
8B Data <0:63>
cache block
8B
0x00 8B
Transferring a cache block

Physical memory space

Chip 0 Chip 1 Chip 7
Rank 0
0xFFFF…F

Row 0 ...
Col 1
...

<56:63>
<8:15>
<0:7>

0x40

64B
8B Data <0:63>
cache block
8B
0x00
A 64B cache block takes 8 I/O cycles to transfer.

During the process, 8 columns are read sequentially.

Multiple Banks and Channels
 Multiple banks
Enable concurrent DRAM accesses
Bits in address determine which bank an address resides in
 Multiple independent channels
Fully parallel as they have separate data buses
Increased bus bandwidth
More wires, area and power consumptions
More pins for on-chip memory controller
 Enabling more concurrency requires reducing
Bank conflicts
Channel conflicts
Multiple banks to reduce delay
Address Mapping (Single Channel)
 Single-channel system, 8B memory bus
2GB memory, 8 banks, 16K rows & 2K columns per bank
 Row interleaving
Consecutive rows of memory in consecutive banks
Accesses to consecutive cache blocks serviced in a
pipelined manner
Row (14 bits) Bank (3 bits) Column (11 bits) Byte in bus (3 bits)

 Cache block interleaving

 Consecutive cache block addresses in consecutive banks
 64 byte cache blocks
 Accesses to consecutive cache blocks in parallel
Row (14 bits) High Column Bank (3 bits) Low Col. Byte in bus (3 bits)
8 bits 3 bits
Address Mapping (Multiple Channels)
C Row (14 bits) Bank (3 bits) Column (11 bits) Byte in bus (3 bits)

Row (14 bits) C Bank (3 bits) Column (11 bits) Byte in bus (3 bits)

Row (14 bits) Bank (3 bits) C Column (11 bits) Byte in bus (3 bits)

Row (14 bits) Bank (3 bits) Column (11 bits) C Byte in bus (3 bits)
Basic DRAM Operation
 CPU → controller transfer time
 Controller latency
Queuing & scheduling delay at the controller
Access converted to basic commands
 Controller → DRAM transfer time
 DRAM bank latency
Simple CAS (column address strobe) if row is “open” OR
RAS (row address strobe) + CAS if array precharged OR
PRE + RAS + CAS (worst case)
 DRAM → Controller transfer time
Bus latency (BL)
 Controller to CPU transfer time
DRAM Controller Overview
DRAM Scheduling Policies
 FCFS (first come first served)
Oldest request first
 FR-FCFS (first ready, first come first served)
 Row-hit first and then Oldest first
Goal is to maximize row buffer hit rate
 maximize DRAM throughput
 Actually, scheduling is done at the command level
 Column commands (read/write) prioritized over row
commands (activate/precharge)
 Within each group, older commands prioritized over
younger ones
DRAM Scheduling Policies
 A scheduling policy is essentially a prioritization order
 Prioritization can be based on
Request age
Row buffer hit/miss status
Request type (prefetch, read, write)
Request mode (load miss or store miss)
Requestor Type (CPU, DMA, GPU)
Request criticality
Oldest miss in the core?
How many instructions in core are dependent on it?
Will it stall the processor?
Interference caused to other cores
Row Buffer Management Policies
 Open row
 Keep the row open after an access
 Next access might need the same row  row hit
 Next access might need a different row  row conflict, wasted
energy
 Closed row
 Close the row after an access (if no other requests already in
the request buffer need the same row)
 Next access might need a different row  avoid a row conflict
 Next access might need the same row  extra activate latency
 Adaptive policies- Predict whether or not the next access to
the bank will be to the same row
DRAM Refresh
 DRAM capacitor charge leaks over time
 The memory controller needs to read each row periodically
to restore the charge
Activate + precharge each row every Nms
Typical N = 64 ms (Refresh Interval)

 Implications on performance?
DRAM bank unavailable while refreshed
Long pause times: If we refresh all rows in burst, every
64ms the DRAM will be unavailable until refresh ends
DRAM Refresh
 Burst refresh: All rows refreshed immediately after one
another
 Distributed refresh: Each row refreshed at a different time, at
regular intervals
 Distributed refresh eliminates long pause times
johnjose@iitg.ac.in
http://www.iitg.ac.in/johnjose/

DT125 3BN0-ME5 Repair Manual
67% (9)
DT125 3BN0-ME5 Repair Manual
21 pages
ECE 554 Computer Architecture Main Memory Spring 2013
No ratings yet
ECE 554 Computer Architecture Main Memory Spring 2013
35 pages
An Image Encryption System by Cellular Automata With Memory
No ratings yet
An Image Encryption System by Cellular Automata With Memory
6 pages
Aquaponics Literature Review
90% (10)
Aquaponics Literature Review
15 pages
(Subrahmanyam Sanjay) The Political Economy of Com
No ratings yet
(Subrahmanyam Sanjay) The Political Economy of Com
411 pages
Onur 740 Fall11 Lecture25 Mainmemory
No ratings yet
Onur 740 Fall11 Lecture25 Mainmemory
50 pages
Seth 740 Fall13 Module3.5 Main Memory Part1
No ratings yet
Seth 740 Fall13 Module3.5 Main Memory Part1
69 pages
Lecture: DRAM Main Memory: Topics: DRAM Intro and Basics (Section 2.3)
No ratings yet
Lecture: DRAM Main Memory: Topics: DRAM Intro and Basics (Section 2.3)
14 pages
03-Memory
No ratings yet
03-Memory
48 pages
20 Dram
No ratings yet
20 Dram
34 pages
Lecture 7 Main Memory
No ratings yet
Lecture 7 Main Memory
36 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
DRAM Terminology and Basics, Energy Innovations
No ratings yet
DRAM Terminology and Basics, Energy Innovations
14 pages
CS5204/EE5364 - Advanced Computer Architecture - Memory
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Memory
67 pages
CH05
No ratings yet
CH05
56 pages
15 Memory Hierarchy FINAL
No ratings yet
15 Memory Hierarchy FINAL
29 pages
Chapter 3 Cache
No ratings yet
Chapter 3 Cache
38 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Memory: It Is A Storage Device Used To Hold The Input and Data
No ratings yet
Memory: It Is A Storage Device Used To Hold The Input and Data
23 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Associative Memory
No ratings yet
Associative Memory
31 pages
Unit 4 Memory Hierarchy
No ratings yet
Unit 4 Memory Hierarchy
66 pages
Coa Unit 4
No ratings yet
Coa Unit 4
5 pages
Memory Organization: Dr. Bernard Chen PH.D
No ratings yet
Memory Organization: Dr. Bernard Chen PH.D
77 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
UNIT IV.ppt
No ratings yet
UNIT IV.ppt
61 pages
CH7 Overview of DRAMs
No ratings yet
CH7 Overview of DRAMs
39 pages
DRAM Basics by Prof. Matthew D. Sinclair
No ratings yet
DRAM Basics by Prof. Matthew D. Sinclair
103 pages
1559460031_Chap 4 Cache Memory
No ratings yet
1559460031_Chap 4 Cache Memory
55 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
CO & OS Unit-3 (Only Imp Concepts)
No ratings yet
CO & OS Unit-3 (Only Imp Concepts)
26 pages
CH7- Memory Organization
No ratings yet
CH7- Memory Organization
38 pages
Cache Memory
No ratings yet
Cache Memory
60 pages
04 Cache Memory Comparc
No ratings yet
04 Cache Memory Comparc
47 pages
William Stallings Computer Organization and Architecture: Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture: Internal Memory
60 pages
Unit VI Final
No ratings yet
Unit VI Final
94 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
04 - Computer Memory Systems
No ratings yet
04 - Computer Memory Systems
91 pages
Memory Sub-System: CT101 - Computing Systems
No ratings yet
Memory Sub-System: CT101 - Computing Systems
46 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
Lecture Two
No ratings yet
Lecture Two
35 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
CH05
No ratings yet
CH05
10 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Unit 5 Dpco
No ratings yet
Unit 5 Dpco
20 pages
Chapter 4
No ratings yet
Chapter 4
61 pages
Memory Unit Bindu Agarwalla
No ratings yet
Memory Unit Bindu Agarwalla
62 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Lecture: DRAM Main Memory: Topics: DRAM Intro and Basics (Section 2.3)
No ratings yet
Lecture: DRAM Main Memory: Topics: DRAM Intro and Basics (Section 2.3)
18 pages
Memory Design
No ratings yet
Memory Design
36 pages
Chapter 6 Cache Memory
No ratings yet
Chapter 6 Cache Memory
22 pages
Memory Organization Assignment
No ratings yet
Memory Organization Assignment
61 pages
Computer Memory By: Manzoor Ali Solangi
No ratings yet
Computer Memory By: Manzoor Ali Solangi
41 pages
Mekelle Institute of Technology: PC Hardware Troubleshooting (CSE501) Lecture - 4
No ratings yet
Mekelle Institute of Technology: PC Hardware Troubleshooting (CSE501) Lecture - 4
63 pages
Lecture 11
No ratings yet
Lecture 11
41 pages
COA_PPT
No ratings yet
COA_PPT
158 pages
Unit 5
No ratings yet
Unit 5
40 pages
Lecture 2
No ratings yet
Lecture 2
9 pages
Dzaky Zakiyal Fawwaz Rangkuman Bab8
No ratings yet
Dzaky Zakiyal Fawwaz Rangkuman Bab8
4 pages
Chapter 03
No ratings yet
Chapter 03
57 pages
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
From Everand
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
Rodrigo Copetti
No ratings yet
Test-3: Name of The Student
No ratings yet
Test-3: Name of The Student
14 pages
DC Lab Exp6 17l238 Rep
No ratings yet
DC Lab Exp6 17l238 Rep
12 pages
Waveguides PDF
No ratings yet
Waveguides PDF
12 pages
15L701-Microwave Engineering-17L208Bharkavi R S
No ratings yet
15L701-Microwave Engineering-17L208Bharkavi R S
11 pages
Web Security 17L238
No ratings yet
Web Security 17L238
30 pages
Digital Communication MCQ Sanfoundry PDF
No ratings yet
Digital Communication MCQ Sanfoundry PDF
210 pages
If Else End For If Else End If End End: Changes in Code
No ratings yet
If Else End For If Else End If End End: Changes in Code
1 page
How It Works?: Hemavarini Ramya Senthilmani 17R216 - 17R237 - 17R245
No ratings yet
How It Works?: Hemavarini Ramya Senthilmani 17R216 - 17R237 - 17R245
1 page
Watch Dog Timer (Gokulnath, Sachin)
No ratings yet
Watch Dog Timer (Gokulnath, Sachin)
11 pages
Ca2 Unit3
No ratings yet
Ca2 Unit3
8 pages
NBFM
No ratings yet
NBFM
4 pages
OOP - I - GTU - Study - Material - Lab Manual - Object Oriented Programming - I (3140705) - 08052020070602AM PDF
100% (1)
OOP - I - GTU - Study - Material - Lab Manual - Object Oriented Programming - I (3140705) - 08052020070602AM PDF
129 pages
15IF11 Multicore A PDF
No ratings yet
15IF11 Multicore A PDF
64 pages
15IF11 Multicore E PDF
No ratings yet
15IF11 Multicore E PDF
14 pages
15IF11 Multicore E PDF
No ratings yet
15IF11 Multicore E PDF
14 pages
15IF11 Multicore D PDF
No ratings yet
15IF11 Multicore D PDF
67 pages
Embedded Signal Processing
No ratings yet
Embedded Signal Processing
11 pages
Noise in DSB-SC
No ratings yet
Noise in DSB-SC
8 pages
Trend Al Awning Window
No ratings yet
Trend Al Awning Window
9 pages
husky manual
No ratings yet
husky manual
2 pages
Balaji Hero - Community Table
No ratings yet
Balaji Hero - Community Table
1 page
Global Mapper Users Manual in
No ratings yet
Global Mapper Users Manual in
3 pages
Ist 407 Presentation
No ratings yet
Ist 407 Presentation
12 pages
ST - ST Catalogue (TSINGSHAN) - Compressed
No ratings yet
ST - ST Catalogue (TSINGSHAN) - Compressed
64 pages
Rohit Saini
No ratings yet
Rohit Saini
28 pages
Problem Solving Compilation
No ratings yet
Problem Solving Compilation
4 pages
2016 9-12-School Report Card
No ratings yet
2016 9-12-School Report Card
3 pages
Electronic Spreadsheet Advanced - MCQ - Set1
0% (1)
Electronic Spreadsheet Advanced - MCQ - Set1
3 pages
Walvoil
No ratings yet
Walvoil
64 pages
Business Communication - Business Meetings
No ratings yet
Business Communication - Business Meetings
23 pages
SAMI Triangle & Matrix
No ratings yet
SAMI Triangle & Matrix
2 pages
Codeavour5 231208143314 572ab026
No ratings yet
Codeavour5 231208143314 572ab026
4 pages
June Vape Directory
No ratings yet
June Vape Directory
48 pages
Judgment and Prepositions PDF
No ratings yet
Judgment and Prepositions PDF
35 pages
Topic: Constructing Objective Test Items
No ratings yet
Topic: Constructing Objective Test Items
28 pages
TL Homework
100% (1)
TL Homework
8 pages
Garbage Monitoring System Using Raspberry Pi
No ratings yet
Garbage Monitoring System Using Raspberry Pi
11 pages
Gas Turbines' Spare Rotor
100% (1)
Gas Turbines' Spare Rotor
12 pages
KS2 Coding Word Search
No ratings yet
KS2 Coding Word Search
2 pages
Point of View Problem Statement PDF
No ratings yet
Point of View Problem Statement PDF
7 pages
ATC Metar
No ratings yet
ATC Metar
6 pages
Assignment 1
No ratings yet
Assignment 1
13 pages
Salesian Spirituality 1a
No ratings yet
Salesian Spirituality 1a
21 pages
Changing Building Typologies Review NF
No ratings yet
Changing Building Typologies Review NF
2 pages
Gas Turbine Power Generation
No ratings yet
Gas Turbine Power Generation
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

15IF11 Multicore C PDF

Uploaded by

15IF11 Multicore C PDF

Uploaded by

15IF11: Multicore Technology @ PSG Tech, Coimbatore

Dr. John Jose

Average memory access time (AMAT) =

AMAT  Thit  f miss * Tmiss

 Larger block size to reduce miss rate

 Higher associativity to reduce miss rate

“Channel” DIMM (Dual in-line memory module)

Memory channel Memory channel

DIMM (Dual in-line memory module)

Front of DIMM Back of DIMM

DIMM (Dual in-line memory module)

Front of DIMM Back of DIMM

Rank 0: collection of 8 chips Rank 1

Rank 0 (Front) Rank 1 (Back)

Addr/Cmd CS <0:1> Data <0:63>

Chip 0 Chip 1 Chip 2 Chip 3

Physical memory space

Physical memory space

Physical memory space

Physical memory space

Physical memory space

Physical memory space

Physical memory space

During the process, 8 columns are read sequentially.

 Cache block interleaving

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.