Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible - PowerPoint PPT Presentation

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) • flexible block placement schemes • overview of set associative caches • block replacement strategies • associative cache implementation • size and performance

Direct-mapped caches • byte: smallest addressable unit of memory • word size : w >= 1, w in bytes – natural granularity of CPU; size of registers • block size : k >= 1, k in words – granularity of cache to memory transfers • cache size : c >= k , c in words – total size of cache storage; number of blocks is ( c div k ) • Lookup byte_address in a direct-mapped cache 1. word_address = (byte_address div w); 2. block_address = (word_address div k); 3. block_index = (block_address mod (c div k) );

Placement schemes • direct mapped: each address maps to one cache block – simple, fast access, high miss rate • fully associative: each address maps to ( c mod k ) blocks – low miss rate, costly, slow (need to search everywhere) • n -way set associative: each address maps to n blocks – each set contains n blocks; number of sets s = (c div (n*k)) – set_address = block_address mod s; • fixed number of blocks, increased associativity leads to – more blocks per set : increased n – fewer sets per cache : decreased s – more flexible, but more search overhead : n compares per lookup

Possible organisations of 8-block cache

Placement flexibility vs need for search 0 8 0 8 - 1 33 1 37 17 2 - 2 6 - 3 - 3 - - 4 - 2-way associative (n=2) 5 37 6 6 0 8 6 - - 7 - 1 37 17 33 13 Direct-mapped (n=1) 4-way associative (n=4) 0 37 17 8 33 13 6 - - fully associative (n=8) Previously accessed block addresses: 6, 13, 33, 8, 17, 37

Real-world caches --------------------------------------------------------------------- Feature Intel P4 AMD Opteron --------------------------------------------------------------------- L1 instruction 96KB 64KB L1 data 8KB 64KB L1 associativity 4-way set assoc. 2-way set assoc. L2 512KB 1024KB L2 associativity 8-way set assoc. 16-way set assoc. ---------------------------------------------------------------------

n -way associative: lookup algorithm type block_t is { tag_t tag; bool valid; word_t data[k]; }; type set_t is block_t[n]; type cache_t is set_t[s]; cache_t cache; 1. uint block_address = (word_address div k); 2. uint block_offset = (word_address mod k); 3. uint set_index = (block_address mod s ); 4. set_t set = cache[set_index]; 5. parallel_for (i in 0..n-1){ if ( set[i].tag = block_address and set[i].valid ) return set[i].data[block_offset]; } 1. MISS! ...

Direct-mapped multi-word cache w = bytes/word k = words/block s = blocks/set c = bytes/cache

4-way set associative cache w = bytes/word k = words/block s = blocks/set c = bytes/cache

Block replacement policy • for fully associative or set associative caches • random selection – simple: just evict a random block – possible hardware support • LRU - replace block unused for the longest time • ↑ cache size: – ↓ miss rate for both policies – ↓ advantage of LRU over random

Compare number of misses: LRU replacement • block addresses accessed: 0,8,0,6,8 (four 1-word blocks) • direct-mapped: (address → block) 0 → 0, 6 → 2, 8 → 0 cache content: M 0 , M 8 , M 0 , M 0 M 6 , M 8 M 6 . 5 misses • 2-way set assoc.: (address → set) 0 → 0, 6 → 0, 8 → 0 cache content: M 0 , M 0 M 8 , M 0 M 8 , M 0 M 6 , M 8 M 6 . 4 misses, 1 hit • fully associative: cache content: M 0 , M 0 M 8 , M 0 M 8 , M 0 M 8 M 6 , M 0 M 8 M 6 . 3 misses, 2 hits

Associative cache: size and performance • resources required – storage – processing • performance – miss rate – hit time – clock speed • effect of increasing associativity on – resources? – performance?

Multi-level on-chip caches Intel Nehalem - per core: 32KB L1 I-cache, 32KB L1 D-cache, 512KB L2 cache

3-level cache organization Intel Nehalem AMD Opteron X4 L1 caches L1 I-cache: 32KB, 64-byte blocks, 4- L1 I-cache: 32KB, 64-byte blocks, 2- (per core) way, approx LRU replacement, hit way, LRU replacement, hit time 3 time n/a cycles L1 D-cache: 32KB, 64-byte blocks, 8- L1 D-cache: 32KB, 64-byte blocks, 2- way, approx LRU replacement, write- way, LRU replacement, write- back/allocate, hit time n/a back/allocate, hit time 9 cycles L2 unified 256KB, 64-byte blocks, 8-way, approx 512KB, 64-byte blocks, 16-way, cache LRU replacement, write-back/allocate, approx LRU replacement, write- (per core) hit time n/a back/allocate, hit time n/a L3 unified 8MB, 64-byte blocks, 16-way, 2MB, 64-byte blocks, 32-way, replace cache (shared) replacement n/a, write-back/allocate, block shared by fewest cores, write- hit time n/a back/allocate, hit time 32 cycles n/a: data not available

Virtual memory and TLB (3 rd Ed: p.511-594, 4 th Ed: p.492-517) • virtual memory: overview • page table • page faults • TLB: accelerating address translation • TLB, page table and cache • memory read and write

Virtual memory: motivation • use main memory as ‘cache’ for secondary memory, e.g. disk (10 2 cheaper, 10 5 slower than DRAM) • to allow – sharing memory among multiple programs – running programs too large for physical memory – automatic memory management – anything else? • problem: – can only manage sharing of physical memory (main memory) at run time – mapping and replacement strategies

Virtual memory: introduction • compile each program using virtual address space • translate virtual address into physical address or disk address, isolated from other processes • terms – page : virtual memory block, usually fixed size – page fault : virtual memory miss – relocation : virtual address can be mapped to any physical address – address translation : map virtual address to physical or disk address

Virtual address • virtual address = virtual page number + page offset • number of bits in offset field determines page size • high cost of page miss – DRAM 10 2 ns, disk 10 7 ns access time – allow data anywhere in DRAM, locate by page table (also example of associative placement)

Page table (for each process)

Page table arrangement

Design considerations • large page size to amortise long access time • flexible placement of pages to reduce page faults • clever miss-handling algorithms to reduce miss rate (page replacement policy) • use write-back – accumulate writes to a page – copy back during replacement – use dirty bit in page table to indicate if writes occurred while in memory

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible - PowerPoint PPT Presentation

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible block placement schemes overview of set associative caches block replacement strategies associative cache implementation size and performance Direct-mapped

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Associative arrays Associative arrays map a key to a value Keys and values can be different

In-Place Associative Computing Avidan Akerib Ph.D. Vice President Associative Computing BU

Set-Associative Caches Improve cache hit ratio by allowing a memory location to be placed in

Multicore Workshop Caches Mark Bull David Henty EPCC, University of Edinburgh Overview

Trace Caches and optimizations therein CSE 240C - Rushi Chakrabarti - Winter 2009 Trace Caches

Associative Fine-Tuning of Biologically Inspired Active Neuro-Associative Knowledge Graphs Adrian

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Associative dyadic boolean functions Goals Def: A Boolean function f : { 0 , 1 } 2 { 0 , 1 }

Example: Associative Arrays An environment can be expressed as an associative array, e.g.:

10. Left-associative grammar (LAG) 10.1 Rule types and derivation order 10.1.1 The notion

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of

Highly-Associative Caches for Low-Power Processors

Caching 3 1 last time tag / index / ofgset lookup in associative caches replacement policies

A Fast Analytical Model of Fully Associative Caches spcl.inf.ethz.ch @spcl_eth The Cost of Data

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer

Python Session # 02 By: Saeed Haratian Spring 2016 Outlines My First Program Comments

Computer Programming: Skills & Concepts (CP) Variables and ints C. Alexandru 25 September

Introduction to Java 15-121 Fall 2020 Margaret Reid-Miller Academic Integrity You must do

CSC 530 Lecture Notes Week 7 More on Tennent-Style Denotational Semantics CSC530-W02-L7 Slide 2

Caching (part 2) byte block size result address (hex) exercise 3 tag index ofgset exercise:

Chapt hapter er 5 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction

Direct Addressed Caches for Reduced Power Consumption Emmett Witchel Sam Larsen C. Scott

WCET Driven Design Space Exploration of an Object Cache Benedikt Huber, Wolfgang Puffitsch,

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible - PowerPoint PPT Presentation

Associative caches (3 rd Ed: p.496-504, 4 th Ed: 479-487) flexible block placement schemes overview of set associative caches block replacement strategies associative cache implementation size and performance Direct-mapped

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Associative arrays Associative arrays map a key to a value Keys and values can be different

In-Place Associative Computing Avidan Akerib Ph.D. Vice President Associative Computing BU

Set-Associative Caches Improve cache hit ratio by allowing a memory location to be placed in

Multicore Workshop Caches Mark Bull David Henty EPCC, University of Edinburgh Overview

Trace Caches and optimizations therein CSE 240C - Rushi Chakrabarti - Winter 2009 Trace Caches

Associative Fine-Tuning of Biologically Inspired Active Neuro-Associative Knowledge Graphs Adrian

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Associative dyadic boolean functions Goals Def: A Boolean function f : { 0 , 1 } 2 { 0 , 1 }

Example: Associative Arrays An environment can be expressed as an associative array, e.g.:

10. Left-associative grammar (LAG) 10.1 Rule types and derivation order 10.1.1 The notion

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of

Highly-Associative Caches for Low-Power Processors

Caching 3 1 last time tag / index / ofgset lookup in associative caches replacement policies

A Fast Analytical Model of Fully Associative Caches spcl.inf.ethz.ch @spcl_eth The Cost of Data

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer

Python Session # 02 By: Saeed Haratian Spring 2016 Outlines My First Program Comments

Computer Programming: Skills &amp; Concepts (CP) Variables and ints C. Alexandru 25 September

Introduction to Java 15-121 Fall 2020 Margaret Reid-Miller Academic Integrity You must do

CSC 530 Lecture Notes Week 7 More on Tennent-Style Denotational Semantics CSC530-W02-L7 Slide 2

Caching (part 2) byte block size result address (hex) exercise 3 tag index ofgset exercise:

Chapt hapter er 5 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction

Direct Addressed Caches for Reduced Power Consumption Emmett Witchel Sam Larsen C. Scott

WCET Driven Design Space Exploration of an Object Cache Benedikt Huber, Wolfgang Puffitsch,

Computer Programming: Skills & Concepts (CP) Variables and ints C. Alexandru 25 September