0% found this document useful (0 votes)

13 views

Ca Mod 2

Uploaded by

su2222man

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

13 views

Ca Mod 2

Uploaded by

su2222man

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 40

Memory Hierarchy ‘CashaiSRAN i Ta a 2 | 3 e Magnetic Oiske EF i $ 5 : i s |: 3 Maotio Tipo : Memory hierarchy according to speed, size and cost + CPU is faster than memory access. * A hierarchical memory system can be used to close up the speed gap. The higher levels are expensive, but they are fast. * As we move down the hierarchy, the cost generally decreases, whereas the access time increases. Memory Hierarchy contd. * Cache is very high-speed memory, used to increase the speed of processing by making the current program and lata avail to the CPU at a rapid rate. It is employed in computer systems to compensate for the speed difference between main memory and processor. Cache memory consists of static RAM cell. * Main memory or primary memory stores the programs and data that are currently needed by the processor. All other information is stored in secondary memory and transferred to main memory when needed. * Secondary memory provides backup storage. The most common secondary memories used in computer system are magnetic disks and magnetic tapes. They are used for storing system programs, large data-files and other backup information.Locality of reference The references to memory at any given interval of time tend to be confined within a few localized areas in memory. This phenomenon is known as the property of locality of reference. There are two types of locality of reference. i) Temporal locality ii) Spatial locality i) Temporal locality: Recently referenced instructions are likely to be referenced again in the near future. This is called temporal locality. In case of iterative loops, subroutines, a small code segment will be referenced repeatedly. ii) Spatial locality: This refers to the tendency for a program to access instructions whose addresses are near one another. For example, in case of arrays, memory accesses are generally consecutive addresses.Cache memory What cache? ¥ Cache memory is a small-sized type of volatile memory. ¥ Cache is the temporary memory. ¥ The data from programs and files you use the most is stored in this temporary memory, which is also the fastest memory in your computer. ~ Cache storage is limited and very expensive for its space.Cache Memory The data or contents of the main memory that are used frequently by CPU are stored in the cache memory so that the processor can easily access that data in a shorter time. Whenever the CPU needs to access memory, it first checks the cache memory. If the data is not found in cache memory, then the CPU moves into the main memory. Cache memory is placed between the CPU and the main memory. The block diagram for a cache memory can be represented as: Byts or word Block transfer transfer Rv i i i i i i Cache Main ecg. cera Cache Memory The cache is the fastest component in the memory hierarchy and approaches the speed of CPU components.Cache memory CPU When the CPU refers the memory for a word. If the word found in cache it is called cache hit and if not found then cache miss. The ratio of number of hits and the total number of CPU references to the memory is called hit ratio. Total number of CPU references to the memory = number of hits + number of misses number of hits Hit ratio = ———tumber of hits __ to number of hits+number of misses number of mi Miss ratio =————umberofmisses S ratio = | mber of hitstnumber ofmisses = 2 Hit ratio Cache memory contd. Cache access time= t, Main memory access time Hit ratio is h, Average Memory Access Time = h, xt, +(1-h.) x (t,, + t) For example a computer with cache access time 100 ns , a main memory access time is 1000 ns and a hit ratio of 0.9. Calculate the Average Memory Access Time. Average Memory Access Time =0.9 x 100+ (1- 0.9) x (1000 + 100) ns 90 +110 nsExample: A three level memory system having cache access time of 15 ns ,main memory access time of 25 ns and disk access time of 40 ns has a cache hit ratio of 0.96 and main memory hit ratio of 0.9. What should be the Average Memory Access Time? Secondary Memory Cache access time= ty =15 ns Main memory access time =t,, =25 ns Disk access time = t, =40 ns Hit ratio of cache = h, =0.96 Hit ratio of cache = h, =0.9 Average Memory Access Time = h, xt, +(1-h) xh, x(t, + t)+(1-h) x(I-h,) x(t, +t,,+ td = 0.96 x15 + 0.04 x 0.9 x (25415) + 0.04 x 0.1 x (40+25+15) ns = 16.16 nsCache performance Qne method to evaluate cache performance is to expand CPU execution time. The CPU execution time is the product of the clock cycle time and the sum of the CPU cycles and the memory stall cycles. CPU execution time = (CPU clock cycles + Memory stall cycles) x clock cycle time CPU clock cycles=1C x CPI CPU clock cycle is the product of IG (Instruction Count) and the number of clock cycles needed per instruction (CPN. Here IC means Instruction Count. That means the total number of instructions are executed, Memory stall cycles are the number of cycles during which the CPU is stalled waiting for a memory access. The number of memory stall cycles depends on both the number of misses and miss penalty. + Ingeneral, the miss penalty is the time needed to bring the desired information from a slower unit in the memory hierarchy to a faster unit. * Here we consider, miss penalty is the time needed to bring a block of data from main memory to cache. Memory stall cycles = Number of misses x Miss penalty Glick Here Misses , =ICx Traction * Miss penalty Memory access Instruction Se CPI Example €xample: Let assume that a benchmark has 100 instructions: + 25 instructions are loads/stores (each take 2 cycles) + 50 instructions are adds (each takes 1 cycle) + 25 instructions are square root (each takes 3 cycles) a) What is the average CPI for this benchmark? b) How much CPU clock cycles are required for total instructions? SICx x Miss rate x Miss penalty a) Average CPI = ((25 * 2) + (50 * 1) + (25 * 3)) /100 = (50 + 50 + 75) / 100 = 1.75 b) Clock cycle required for total instructions = IC x CPI = 100 x 1.75 clock cycles = 175 clock cyclesBACK Memory stall cycles = Number of misses x Miss penalty * Number of misses can be represented as the product of total number of instructions and in each instruction how many misses will be occurred * Total number of Instructions means IC ( Instruction Count). x Mis Instruction Misses Instruction * Misses per Instruction can be represented as the product of Memory access per instruction and miss rate Miss Memory ace Instruction ~ Instruction Number of misses = IC Memory stall cycles = IC x x Miss penalty s x Miss rate Memo! Memory stall cycles = IC x ry v Instruction access x Miss rate x Miss penaltyCache performance examples ample: ‘Assume we have 2 computer where the dock pr instruction (Cis 1 when all memory accesses htin the ‘ache. The only data accesses are loads and stores, and these ate SOX ofthe total instruction. he miss penalty 625 clock cyles and the mis rate i 23, how much faster would the computer be fallinstuctions Were cache hits? (loHCPI +0} xclock eee time 1C1 x cock yele time amor stall ces = ex MEMS Mis atex is peaty "arary ecos par ntrasonrepreaert. on ratrton oan Trom manor. Tat Tor baton Tt mony (rom 1008 ad dat eso pr tron, tate fr aperand fetch Wad or wre back tor memory iasen ove 50% =iex-r05) «002825 don Sox siex075 lick Hare Cache performance examples contd. (GPU execution time that ha bath hit and mis (CPU clock cytes + Memory stall cytes) x clock eel time (We 1+ 16075) x dock cyte time 1161.75 x clock eyle tie “CPU excatontime with aways 2175 ‘The computer with no cathe misses i175 times faster. Performance ratio Cache performance examples contd. ‘Beample2: ‘AProcessor has instruction and data cache with miss rates 2% and 4% respectively Ifthe processor frequency of loads and stores on average s 36% and CPI is 2. Miss penalty can be ‘taken to be 40 cycles forall misses, How much time is needed for CPU execution, If 1000 instructions presentin the program? Consider clock cycle time is 2s. (PU execution time = (CPU clock eyes + Memory stall eyes) x lock eyle time = ((C CPI + Memory stallcycles) x clock cycle time ‘Memory stallcycles = 1C x MEROD-SEESS x jis rate x Miss penalty Click Here ex (10.02 +0.36 0.04) 40 cytes 1000 «0.0344 x 40 cycles 1376 oycles {CPU execution time = (1000 2+ 1376) x2 ns 6752s- ID la lx + Instruction cycle has five stages i) IF (Instruction Fetch) (Memory Access) il) 1D (Instruction Decoding) ii) OF ( Operand Fetch) (Memory Access) iv) EX (Execution) v) WB (Write Back) (Memory Access) + In this five stages, only three stages If, OF and WB involve with memory access. + If stage is mandatory for all instructions. So, incase of IF always memory access is required that means memory access is 100%, + OF and WG are optional for instruction execution + OF means load operation and WB means store operation. + 50% instructions are load and store means for 50% instruction either OF or WB stages will be required. * For execution of an instruction , IF must be required that is 100% memory access and additionally in average 50% memory access will be required for OF and WB. ID la la If there are two caches instruction cache and data cache then miss rate will be consider twice, For instruction access instruction cache will be required. For data access data cache will be required. Memory access per instruction can be written as a OSS x Miss rate = jon ewory seeeae for nteUOn FE instruction cache Miss rate + MOWOWAcese fr lat aNd ey data cache Miss rateMIPS * MIPS : millions of instructions per second It is metrics for computer performance . + MIPS = instruction count / {execution time x 105) for example, a program that executes 3 million instructions in 2 seconds has a MIPS rating of 1.5 Advantage : Easy to understand and measure. + Example: Two different compilers are being tested for a 500 MHz machine with three different classes of instructions: Class A, Class 8, and Class C, which require one, two, and three cycles (respectively). The first compiler’s code uses 5 billions Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. ‘The second compiler's code uses 10 billions Class A instructions, 1 billion Class 6 instructions, and 1 billion Class C instructions. What are the execution time of two different compilers? What are the MIPS of two different compilers? MIPS Example (Contd.) Instruction counts(lin billions) for each instruction class) Code from [A 8 Compiler 1 [5 1 Compiler 2. | 10 1 Given Class A, Class 8, and Class C, which require 1, 2, and 3 cycles (respectively). Clock frequency is 500 MHz . CPU Clock cycles1= (5 x 1+ 1x 2+ 1x 3) x 108 10 x 109 CPU Clock cycles2= (10 x 1+ 1x 2+ 1x 3) x 10% 15 x 109 CPU time1= 10 x 108 500 x 106 20 seconds CPU time2= 15 x 109 500 x 106 30 seconds MIPS1= (5 + 1+ 1) x 109 20 x 106 350 MIPS2= (10 + 1+ 1) x 109 30 x 106 400Average memory access time + A better measure of memory performance is the average memory access time. Average memory access time is defined as Average memory access time = Hit time + Miss rate x Miss ratty So, average memory access time is depending upon Hit time, Miss rate and Miss penalty, these three factors. * Average memory access time is reduced if these three factors are reduced. First we describe the miss penalty reduction technique. These are the following techniques to reduce the miss penalty. * Multi-level caches * Victim caches * €arly Restart and Critical Word First * Read Priority over Write on MissMulti-level caches cae ‘Multilevel eaches reduce the mies penalty. The. first level cache (L1 cache) is ameller in size compare to second level cache ( L2 cache). U1 cache is on-chip cache, witose access time Ia near to the clock speed of the CPU. (2 cache Ie off-chip cache, larger enough to capture many accesses that would go to ‘memory. So it reduces the miss penalty The speed of L1 cache affects the clock rate of the CPU, while the speed of L2 cache Multi-level caches contd. ‘Multi-level inclusion: Peo ‘By multi level inclision property, data present in L1 cache are always present in 2 cache. Inclusion is desirable because consistency between 1/0 and caches can be determined just by checking L2 cache. Disadvantage of multilevel inclusion is, L2 cache has a redundant copy of the L1 cache. So, space ie wasted in L2 cache. + By mult level exclusion property. data present in L1 cache never found in 2 cache ee Cache miss in L1 results in.a swap of blocks between (1 and L2 instead of a replacement of 1 block with an L2 block. Advantage of, mutilevel excision is, this policy prevents wasting space in the L2 cache. Multi-level caches contd. ‘Average memory access time for a two level cache is defined by the following formula. ‘Average memory acces Ue = Hit ime,, + Mis at, Mise penalty, Miss penalty, = Mit ime + Mis ates Miss penalty ‘Average memory access time = Hit ime,,+ Mis rate, x(HE time, + Mss ate. * Miss penalty] miss rate: ‘lobal miss rte» ssiasans afmeeny scorned PNET (Global mise rte of LL = Mie ate, (Global mis rate of 2 « Migs ate, * Mis ates Memory stalls per instruction can be defined as "Average memory stalls per instruction = Misses per instruction, x Hittime.+ Misses per instructions Miss penaltyzNumerical on Multi-level caches Example: In 1000 memory references there are 40 misses in the first level cache and 20 misses in the second level cache. What are the various miss rates? Assume the miss penalty from the L2 cache to memory is 100 clock cycles, the hit time of the L2 cache is 10 clock cycles, the hit time of L1 is 1 clock-cycle and there are 1.6 memory references per instruction. What is the average memory access time and average stall cycles per instruction? 40 Local miss rate of L1 cache = —*° x 100 = 4% 000 Global miss rate of L1 cache = 4% Local miss rate of L2 cache = 2? x 100 = 50% 20 Fay 100 = 2% Average memory access time = Hit time,, + Miss rate,, x (Hit time, + Miss rate,, x Miss penalty,:) Global miss rate of L2 cache = 140,04 x (10 + 0.5 x 100} clock cycles = 1+ 2.4clock cycles = 3.4 clock cycles Numerical on Multi-level caches contd. Average memory stalls per instruction= Misses per instruction, x Hit time, + Misses per instruction, Miss penalty, Let, x instructionsare present. 1.6 x x= 1000 1000 xe =625 Average memory stalls per instruction 2 x 10+2 x 100) clock cycles 400 4, 2000) clock cycles 625 625 ¥ 2400 = a5. Clock cycles = 3.84 clock cyclesVictim caches Main Processor Memory je} Cache 74 Vietim Cache Placement of victim cache in the memory hierarchy * Victim caches is another miss penalty reduction technique. - Suppose a block was discarded and after this it needed again, Since the discarded block has already been fetched, it can be used again at small cost. + Such recycling requires a small fully associative cache placed in between original cache and their refill path, * This small cache is called victim cache because it contains only blocks that are discarded from a cache due to miss. : This cache is checked on a miss to see if the cache contains the desired block or not before accessing the main memory. * If the desired block is found in the victim cache, the victim block and cache block are swapped. Neen eee eee eeOn chip cache and off chip cache * The first level cache (L1 cache) is smaller in size compare to second level cache (L2 cache). * L1 cache is on-chip cache, whose access time is near to the clock speed of the CPU. * 2 cache is off-chip cache, larger enough to capture many accesses that would go to main memory.Cache Mapping 1. Direct Mapping 2. Associative Mapping 3. Set- Associative Mapping Direct Mapping *** cache memory & block Mainmemorysie=1288=2” ( for direct ) No. of words present ina block=4, Tog No.of blocks present in cache =2 Teo cache memory size Block size Roger No. of blocks present in main memory™? ‘main memory size Dlocksize =22 No. of Tags present main memory size ant ee “Any block is mapped with blocki mod 8 Mapping funetion=i mod no. of block present in cache emery adeross7 it Block 7 main memory Associative Mapping Wain memory sae =128 8 Cache memory se=32 8 Blocksize=4 8 No.of blocks present in cache cathememory sie Blacksive =3 No. of blocks presentin main meme ainmemoryeize "Toe Cater] J @ cache 138 o3 ‘Mapping function ‘ny block can be mapped wit any block in cache *“main memory & block (for associative ) Memory saress 7b Bose oa Bide Sb Bes ‘main memorySet-Associative Mapping n way Set-Associative mapping Where n=2” and x=1,2,3. If x=1 then n=2 and it is 2 way Set-Associative mapping If x=2 then n=4 and it is 4 way Set-Associative mapping If x=3 then n=8 and it is 8 way Set-Associative mapping no of block = cache size / block size no of set = no of block / no of way 2 way Set-Associative Mapping Main memory size = 128 B Cache memory size = 32 B Block size = 4B No. of way =2 Tag [~ block0) No. of block presentin cache=8. Tag locl , Block No. of set present in cache : "No. of blocks present In cache 1 occa . No of way Blacks =8/2=4=22 : Poece— 8/2=4 = 2 Tag [Tblock7 No. of Tags present cache No of way, 28X2 32 =de28 Any block i is mapped with block i mod 4 “8 Mapping function=i mod no. of set present in Memory address 7 bit Te Set Word 3 bit 2 bit 2bit block 11 lock’: Blockt4 lock: block 16. loo lock lock19. block20 lock lock: block24 25. lock lock3t Tago Tag! Tag2 Tag3 Tag4 Tags Tagé Tag? main memoryDirect Mapping Assume a system has 2KB cache, 68 KB man memory and 16 byte block. Number of blocks presentin cache cache memory size _ 2% 210 block sine = Pie [Number of blocks present in main memory “malin memory size, ‘igeksize ex St BF = 2 = 4096 sain memory sie ache memory size B30 oh 32 Number of tags 00g Memory adress 16 it Main memory Teo Block Word oat “Advantage : Very simple and eas to manage. Search space is minimum compare to other mapring fa program requires blockO and bock128 repeatedly then cache miss wll occur due to blcko and block128 ae mapped in the same pace in cache. Memory aresst6 Teg oak Main memory ‘Advantage : There is no limitation in block mapping. Disadvantage : Search space is maximum compare to other mapping. 2 Way Set-Associative Mapping Main memory size = 64 KB Cache memory size = 2 KB site= Hae Blocksize = 16 8 co No. of way =2 No. of block presentin cache=128 No. of set present in cache ESE Po ofl prevent cache E ori =128/3 a No. of Tags present ‘isin memory size TEEHY RO aE 2x 2%D |eacacog ea rae a Main memory BERT Momery adsrose16 bt Set SoeGiven the following, determine size of the sub-fields ( in bits) in the address for direct mapping, associative and set associative mapping cache schemes. + We have 256 MB main memory and 1MB cache memory. * The block size is 128 bytes. * There are 8 blocks in cache set. je memory § 2x2 =o 2 for Direct Mapping “" Number of blocks present in cache _cache memory size _ 220 “plocksize 27 The block size is 128 bytes = 27 No. of Tags present pres 5 I ys for set associative mapping “te memory ste No.of way __ 28% 220 23 =a No. of set present in cache = No. of blocks present in cache Nootway =213/ 232210 Direct Mapping Memory address 28 bit Tag Block 8 bit 13 bit Associative Mapping Memory address 28 bit Tag Word 21 bit 7 bit 8 way set Associative Mapping Memory address 28 bit Tag Set Word 10 bit 7 bit| Cone anemmcy = 512 KB | Main memos = 2MB | MerOlotk sive . GA bytes \ Hee. \ D Dineet mopped cache re Coy Sek Associative | CAChA 4 No of worols psursunt iw block =6 N bi iu twit + Se at 0 TU = : of block pxrteut euBle I Doze 3 10 g Re AS —- Bi i ‘ 13 Q ee No of tags pruemr 2 » s: 4 se ot main memoxy size... aix2° 2g eeeeorr © 122 Coens memory S7e [ea | Block | word | > dinect NS ea \ mo af block prusint- in Cone oS No ef Sit psuant im coche = 10 | a s No of Taq Prusent = Main memory Aize Coren ‘remo sap Ame No peal Lome] gory ct 10 6 ea acta sae aeWrite-through and Write-back method Write-through : * The simplest and most commonly used procedure. * During write operation when the cache location is updated at the same time main memory also updated. + main memory always contains the same data as the cache Advantage: This characteristic is important in systems with Direct Memory Access (DMA) transfers. It ensures that the data residing in main memory are valid and DMA would transfer the most recent updated data. Disadvantage: For every modification of cache, main memory access required. write-back: * In this method only the cache location is updated during a write operation. * The location is then marked by a flag or modified bit so that later when the word is removed from the cache it is copied into main memory. Advantage: During the time a word resides in the cache, it may be updated several times. For this reason repeatedly memory access is not required for a word modification. Disadvantage: DMA transfer faces problem.Different types of misses * Compulsory miss- The very first access to a block can’t be in the cache, so the block must be brought into the cache. That means at the initial stage no blocks are present in cache when program begins. * Capacity miss- If the cache can’t contain all the blocks needed during execution of a program capacity misses will occur. Cache is too small. + Conflict miss- If the block placement strategy is set associative or direct mapped , conflict misses will occur because a block may be discarded and later retrieved if too many block map to its set. These misses are also called collision misses. Click HereMiss rate reduction techniques The following techniques are used to reduce the miss rate 1) Larger block size 2) Larger caches 3) Compiler optimization 1) Larger block size: + Using larger block in cache the miss rate can be reduced. Larger block sizes will reduce compulsory misses. Larger block size takes advantage of spatial locality At the same time larger block increase the miss penalty. Since it reduce the number of blocks in the cache, larger blocks may increase the conflict misses and even capacity misses if the cache is small Choose the optimum size of the block such that miss rate is reduced and other factor can't be increased. Miss rate reduction techniques contd. 2) Larger caches: * Larger caches reduce capacity misses. * Orawback ie longer hit time and higher cost, + This technique is essentially popular in off-chip caches. 3) Compiler optimization: + The previous miss rate reduction technique requires changes to or to the hardware: larger blocks, larger caches, higher associativity, or pseudoassociativity. This technique reduces miss rate using software approach. + Loop interchange for (j=0; j<100; je) t for (i=0; i600; i+) t XUN) = 2° XC: d i Miss rate reduction techniques contd. The previous code has nested loops that access data in memory in non-eequential ‘order. Simply exchanging the nesting of the loops can make the cade access the data in the order they are atored. for (nO; 1500; 1+ [ for (0; 100; joe) t Eng = 2+ XIE: i 1 Here the memory access is sequential and this technique reduces misses by Improving spatial locality.Hit time reduction technique Hit time is critical because it affects the clock rate of the processor. The following techniques are used to reduce cache hit time 1) Small and Simple cache- * Smaller hardware is faster, so a small cache certainly has lower hit time. + Smaller cache is easy to fit in onchip otherwise offchip time is included. + Simple cache means direct mapped cache. Here tag length is minimum than other cache mapping technique. So searching time is reduced. 2) Avoid address translation to the cache: * Translation of a virtual address to a physical address is taken more time. * Use virtual address for the cache, since hits are much more common than misses. + Cache which uses virtual address, is called virtual cache. Protection security is reduced, So adding protection information to the virtual cache. ee Main memory organizations for improving performance + Performance measures of a main memory emphasize both latency and bandwidth. + Memory bandwidth is the number of bytes read or written per unit time, On the other word, per unit time how many bytes are accessed from main memory is called bandwidth. + Memory latency time is the time gap between two consecutive word accesses. Since memory uses ORAM cell, one precharge time ( periodic refreshing time) is extra needed to access the word.Wider main memory for higher bandwidth a ou gt I co Cal il cae wens 1 f ‘Mamory First level caches are often organized with a physical width of 1 word because most CPU accesses are that size. Doubling or quadrupling the width of the cache and the memory will therefore double or quadruple the memory bandwidth A wider memory has a narrow L1 cache and a wide L2 cache. ‘There is cost in wider connection between the CPU and memory, typically called a memory bus. CPU will still access the cache one word at a time, so there now needs to be multiplexer between the cache and CPU. Second level cache can help since the multiplexing can be between first and second level caches. Simple interleaved memory for higher bandwidth car + Memory chip can be organized in 7 banks to read or write multiple words at a time rather than a Cache single word. + In general, the purpose of interleaved memory is to try to a aw oo take advantage of the potential ae os fas memory bandwidth of all the chips ° 2 8 in the system. 4 5 6 7 * Most memory system activates 8 ° ad a only those chips that containing the 2 ‘al 10 1s needed words. So, power is less — ho im required in the memory system. ‘eri. tot kerk The banks are often 1 word wide so that the width of the bus and the cache nea not change, but sending address to several banks permits them all to read simultaneously. In the above example, the addresses of the four banks are interleaved at the word level. Bank 0 has all words whose address modulo 4 is 0, bank? has all words whoseNee Example Assume the performance of 1-word wide primary memory organization is 7 4 clock cycles to send the address + $6 clock cycles for the access time per word + 4clock cycles to send a word of data. Given a cache block of 4 words and that a word is 8 bytes, calculate the miss penalty and the effective memory bandwidth. Recomputed the miss penalty and the memory bandwidth assuming we have + Main memory width of 2 words + Main memory width of 4 words + Interleaved main memory with 4 banks with each bank 1-word wide. Ans: + Incase of main memory width one word Miss penalty=(4+56+4)clock cycles x4=256 clock cycles Memory bandwidth= ss ; + Incase of main memory width 2 words, Miss penalty=(4+56+4)clock cycles x2=128 clock cycles Memory bandwidth= = ; + Incase of main memory width 4 words Miss penalty=(4+56+4)clack cycles x1=64 clock cycles Memory bandwidth= = 2 + Incase of interleaved main memory with 4 banks, Miss penalty=(4+56+4x4)clock cycles =76 clock cycles Memory bandwidth= “*° =Logical to Physical address translation using MMU ———<— Aread Paocation 000 occupied register memory 5000 ‘space Logical Physical 5000] address address ceu 0340 5340 Free memory space MMU Main memory * The addresses generated by CPU or program is called logical addresses. + The corresponding addresses in the physical memory occupied by the executing program, are called physical addresses. The memory management unit (MMU) maps each logical address to physical address For example, if the relocation register or base register holds an address value 5000. Then logical address 0 will mapped with 5000 address and similarly logical address 0340 is mapped to the physical address 5340,Paging The program space will be known as a logical address space and main memory epace will be physical address space, Total logical address space (generated by the CPU is divided into equal size partitions and each partition is known as a page. Similarly the main memory is also divided into equal size partitions and each partition is known as a frame. Page sizesframe size. So, When a particular page is getting loaded into a certain frame there will be no free space remaining Paging Hardware Physical Logical address Ly Physical memory Paging contd. Example: The size of logical address apace is 128 KG and the page size is 2 KB. 1 How many pages are there? How many bits are used to represent page number and page offact? MN How many frames are there if the main memory size is 2MG? How many bita are used to represent frame number? To represent logical address 17 bits (217 or 128 KO) are required Total number of pages = logical address space/ page size= 2° / 21 = 26 = 64 6 bite (17 -11) are representing the page number. 11 bits are representing the page offset. To represent physical address 21 bits (22 or 2 MB) are required Total number of frames = physical address space/ page sizo= 22 / 21 = 210m 1024 10 bits are representing the frame number.Paging example for a 32 byte memory with 4 byte pages Logical address space is 16 bytes (2*). Page size i: Physical memory of 32 bytes (2°) Number of pages = Logical address space /page see=2/ 22 =4 Since page size = frame size, number of frames=size of physical memory /frame size =2/2=8)) There are four pages Page0, pagel, page2, page3. There are eight frames frame0, frame1, ame According to page table (ininiext slide) paged, pagel, page2/vage3 are mapped with frames, frame6,frame1 and frame2 respectively. Paging example for a 32 byte memory with 4 byte pages contd. no of pg - 2°4 “frame o 12°2 - 242) so pg no 2 bits - 4 pgs pg size 4 = 2%(2) , d(data) = 2 bit Physical address f page frame number number o[s ife apa 3 no of frame» 2°5 / 2°2 = 243), f= 3 bit = a frames Logical addres: pg size 4 242) so ofset (d)-2 2 Page table Logical memory Physical memoryLogieat ete ee noe Addsuss = 6 bit Page Size = 8 words / byl o®) : Caleulate wo of pages & wo of prmrnis zea oe ieee 4) a) wo of bits of page Side om rege APS Wo of, pits 3% 4 eee Rae size 2228 [weap erga eat rm ete a 15 PA ta ra [24 Some as LA TE frome (in bits) = 3 & has? mo of purrs o228 08) |T 4Paging Hardware with TLB Logical address Llp rinver_ sumer TLB hit 7 f Physical memory Page table Paging with TLB ‘The entire page table was kept in main memory. In case of paging the problem is main memory is accessed two times. One for finding the frame number in page table and another for accessing the memory address specified by frame number. ‘To overcome this problem a high-speed cache is set up for page table entries called a Translation Look aside Buffer (TLB).. ‘Translation Look aside Buffer (TL8) is nothing but a special cache used to keep track of recently used transactions. For a logical address, the processor examines the TLB ifa page table entry Is present then itis called TLB hit, the frame number is retrieved and then physical address is formed. If a page table entry is not found in the TLB then TLB miss will occur, the page number is used to index the page table. Paging with TLB example {A paging scheme uses a Translation Look aside buffer (718). ‘ATLB access takes 20 ns and a main memory access takes 100 ns. What isthe effective memory access time (in ns) if the TLB hit ratio is 8096? Solution: TLB access time = 20 ns (given) Main memory acces time =200 ns (given) Hit ratio = 0.8 (given) 13 Effective memory access time menery Effective memory access time = hit ratio x ( TLB access time + memory access time) + {A hit ratio) x (TLB access time + 2 x memory access time) Effective memory access time = 0.8 ( 20+ 100) + (1- 0.8) x ( 20+ 2x 100) ns 8x 120+0.2* 220 ns 140 ns IFTLB Is not used here then the effective memory access time will be 200s.ae Virtual Memory + Virtual memory is commonly implemented by swap in and swap out. * Process reside on secondary memory. When the process is ready for execution it be brought in into main memory. + The page is swapped into main memory unless the page will be needed. After completion of execution the pages are swapped out from main memory Swap out OO OO oo 820 00 word 2D wO wow vO 2 Program a Program a Main memory Virtual memory * Virtual memory is a technique that allows the execution of processes that may 1 be completely in main memory . + It represents to the programmer in such a way that programmer thinks a large amount of main memory available, but it really does not exist. + The size of the virtual memory is equivalent to the size of secondary memory. * Gach address generated by CPU called virtual address (logical address) is mappa with Physical address in main memory. Advantages of virtual memory i) The program larger than free memory space can be executed by using virtual memory technique. ii) The programmers do not need to worried about the size of the program. iif) It allows muttiprogramming which increases the CPU utilization.Page Replacement algorithm 1, FIFO(First in First Out) algorithm 2, Optimal Page replacement algorithm 3. LRU(Least Recently Used) algorithmFIFO Page replacement Algorithm In FIFO page replacement algorithm, which page come first that will be replaced first Reference String PONS GEN EP ‘Available memory frame = 3 paces fe eet crs elf] [| Pe Fe Eee. LRU Page replacement Algorithm {In LeU page replacement algorithm, replace the page that has not been used for the longest period of time. Reference String Pare pas ‘Available memory frame = 3 pars pasreqp ee if (is ‘ : ‘Number of page fault = 8 Optimal Page replacement Algorithm In Optimal page replacement algorithm, replace the page that will not be used for the longest period of time. Reference String pean eae ‘Available memory frame = 3 parspqersa HeTTT aeglener Aae NEA, TIT 2/2 [2 |2] Faults =1 5S) 44 il = eel a BEI o els fe fey| tie |r] |S | ote ohio [e fa tho | ly |= loa ]a atx © vlolalelz of .[sisfalt 2 ~ | Th a] |: T SER NT o 4t-lolele Mia lol & a -——— ce -loly|+ dal ily 0 cnr a SOs {kis ies = EEE vo = Mee Pele Fy Oy by 201 30. 412130, 3, 251) 2) O44 410% BeOUNUUI EQUGAUDI Ue pate | Optimal. Page Reploeorrent Atgovation | 0,1, 2,0,38,0,4,2)% | |Ss Hit |i fo fo jw [3 |3 | 3 [ZZ [2 Go. O}0/O}7 12132 |¥ [2 /2i2{zAli fils £1ElF[FIZE2 [ZI Ia [42 |? [oP fo ee eR OP eR NM ES SHR ® Pe Hert 7,011,2,0,3,9,4,2, 3,0,3,1,2,0 Pa Rpbamd of %. Rene o> ) FIFo Het Rules Noo his Pyepat/ = 12 eof hits 2) OP tral Pye Ry Re bias _ No-of Rebesmesy “im” 3 3) Let hag VeoP tim fae Repent pc pee pahiohe ib not od im bogat Dares of Lime in fuse Su 2a Zea eens 21212 ([2([2) 2/2/1212 2 {, Vyy fa ft jr [4 G Ty [wf rpep i pefe ye te $3. 0 |o jo [o [ojo jo o lo lo |e Jelole [ole fo a)F iF lFl4 [413 [3 13 313 (2l3l3lalaittls ad a RE Stoia > FO ae i 1 ABs, Clk Bh, On Bh Attar (2 pes PA DI Pofruttse &urd ( Repo the frost acest, ved Pye in Past} d st Recent Lea 4 W|-lo |r fe o[-|o |» JE vf-lole fe _ N]-lo Fo SE w{—]o fo |e = loin a. w{kfolm|E | vel als] ol9F me “>| ol9 = o ae OR Nisley j= ~ ~ wlrfoja|* Te 2 !X Jo [09 |= > ons ss + sly xl-jolk|E wf 8 xj-folrej* > x ~|ofer|* oe * Ol . a ree SG

5 1
No ratings yet
5 1
39 pages
Advanced Architecture Memory
No ratings yet
Advanced Architecture Memory
13 pages
Test 6 PracticeQuestion Cachememory 1 Updated
No ratings yet
Test 6 PracticeQuestion Cachememory 1 Updated
22 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
CSE332 Cache Memory 2 May2024
No ratings yet
CSE332 Cache Memory 2 May2024
138 pages
Cache Memory Performance
No ratings yet
Cache Memory Performance
10 pages
Computer Architecture and Organization: Lecture15: Cache Performance
No ratings yet
Computer Architecture and Organization: Lecture15: Cache Performance
17 pages
ch5-1
No ratings yet
ch5-1
44 pages
Test 6 PracticeQuestion Cachememory 1
No ratings yet
Test 6 PracticeQuestion Cachememory 1
21 pages
Unit-6 Cache Memory Organization
No ratings yet
Unit-6 Cache Memory Organization
36 pages
CA_Lecture_08
No ratings yet
CA_Lecture_08
38 pages
Computer Architecture and Organization: Lecture16: Cache Performance
No ratings yet
Computer Architecture and Organization: Lecture16: Cache Performance
17 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
15 Memory Hierarchy FINAL
No ratings yet
15 Memory Hierarchy FINAL
29 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Week 13 - Lecture 13 - Memory (cont)
No ratings yet
Week 13 - Lecture 13 - Memory (cont)
31 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Chapter 2z Ppt
No ratings yet
Chapter 2z Ppt
54 pages
Unit 4 Memory Hierarchy
No ratings yet
Unit 4 Memory Hierarchy
66 pages
53-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Mekelle Institute of Technology: PC Hardware Troubleshooting (CSE501) Lecture - 4
No ratings yet
Mekelle Institute of Technology: PC Hardware Troubleshooting (CSE501) Lecture - 4
63 pages
Week6 Memory Part2
No ratings yet
Week6 Memory Part2
23 pages
10-cacheperf
No ratings yet
10-cacheperf
24 pages
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
No ratings yet
Parameters of Cache Memory: - Cache Hit - Cache Miss - Hit Ratio - Miss Penalty
18 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cache Performance Average Memory Access Time
No ratings yet
Cache Performance Average Memory Access Time
23 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
101 pages
Chapter 6 Cache Memory
No ratings yet
Chapter 6 Cache Memory
22 pages
L18-Cache-Wrap-up
No ratings yet
L18-Cache-Wrap-up
30 pages
Cache
No ratings yet
Cache
14 pages
UNIT IV.ppt
No ratings yet
UNIT IV.ppt
61 pages
Memory Organization: Dr. Bernard Chen PH.D
No ratings yet
Memory Organization: Dr. Bernard Chen PH.D
77 pages
Cache Memory
No ratings yet
Cache Memory
13 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
IT3030E CA Chap6 Memory
No ratings yet
IT3030E CA Chap6 Memory
65 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
64 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
02 Cache1
No ratings yet
02 Cache1
21 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
IT3030E-CA-Chap6-Memory
No ratings yet
IT3030E-CA-Chap6-Memory
69 pages
Chapter_2z-ppt
No ratings yet
Chapter_2z-ppt
54 pages
Cache
No ratings yet
Cache
34 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Unit - 5 DPCO
No ratings yet
Unit - 5 DPCO
35 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Lec 23 CAOCache Memory
No ratings yet
Lec 23 CAOCache Memory
11 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
Lecture 16
No ratings yet
Lecture 16
22 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ca Mod 2

Uploaded by

Ca Mod 2

Uploaded by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.