Computer Architecture Memory System Virendra Singh Associate - PowerPoint PPT Presentation

Computer Architecture Memory System Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in CS-683: Advanced Computer Architecture Lecture 6 (13 Aug 2013) CADSL

Memory Performance Gap CADSL 13 Aug 2013 2 CS683@IITB

Why Memory Hierarchy? • Need lots of bandwidth   1 . 0 inst 1 Ifetch 4 B 0 . 4 Dref 4 B 1 Gcycles = × × + × × BW   cycle inst Ifetch inst Dref sec   5 . 6 GB = sec • Need lots of storage – 64MB (minimum) to multiple TB • Must be cheap per bit – (TB x anything) is a lot of money! • These requirements seem incompatible CADSL 13 Aug 2013 CS683@IITB 3

Memory Hierarchy Design • Memory hierarchy design becomes more crucial with recent multi-core processors: – Aggregate peak bandwidth grows with # cores: ● Intel Core i7 can generate two references per core per clock ● Four cores and 3.2 GHz clock – 25.6 billion 64-bit data references/second + – 12.8 billion 128-bit instruction references – = 409.6 GB/s! ● DRAM bandwidth is only 6% of this (25 GB/s) ● Requires: – Multi-port, pipelined caches – Two levels of cache per core – Shared third-level cache on chip CADSL 13 Aug 2013 4 CS683@IITB

Why Memory Hierarchy? • Fast and small memories – Enable quick access (fast cycle time) – Enable lots of bandwidth (1+ L/S/I-fetch/cycle) • Slower larger memories – Capture larger share of memory – Still relatively fast • Slow huge memories – Hold rarely-needed state – Needed for correctness • All together: provide appearance of large, fast memory with cost of cheap, slow memory CADSL 13 Aug 2013 CS683@IITB 5

Memory Hierarchy CADSL CS683@IITB 6 13 Aug 2013

Why Does a Hierarchy Work? • Locality of reference – Temporal locality ● Reference same memory location repeatedly – Spatial locality ● Reference near neighbors around the same time • Empirically observed – Significant! – Even small local storage (8KB) often satisfies >90% of references to multi-MB data set CADSL 13 Aug 2013 CS683@IITB 7

Why Locality? • Analogy: – Library (Disk) – Bookshelf (Main memory) – Stack of books on desk (off-chip cache) – Opened book on desk (on-chip cache) • Likelihood of: – Referring to same book or chapter again? ● Probability decays over time ● Book moves to bottom of stack, then bookshelf, then library – Referring to chapter n+1 if looking at chapter n? CADSL 13 Aug 2013 CS683@IITB 8

Memory Hierarchy Temporal Locality Spatial Locality CPU • Keep recently referenced • Bring neighbors of recently items at higher levels referenced to higher levels • Future references satisfied • Future references satisfied quickly quickly I & D L1 Cache Shared L2 Cache Main Memory Disk CADSL 13 Aug 2013 CS683@IITB 9

Performance CPU execution time = (CPU clock cycles + memory stall cycles) x Clock Cycle time Memory Stall cycles = Number of misses x miss penalty = IC x misses/Instruction x miss penalty = IC x memory access/instruction x miss rate x miss penalty CADSL CS683@IITB 13 Aug 2013 10

Four Burning Questions • These are: – Placement ● Where can a block of memory go? – Identification ● How do I find a block of memory? – Replacement ● How do I make space for new blocks? – Write Policy ● How do I propagate changes? • Consider these for caches – Usually SRAM • Will consider main memory, disks later CADSL 13 Aug 2013 CS683@IITB 11

Placement Memory Placement Comments Type Registers Anywhere; Compiler/programme Int, FP, SPR r manages Cache Fixed in Direct-mapped, H/W (SRAM) set-associative, fully-associative DRAM Anywhere O/S manages Disk Anywhere O/S manages CADSL 13 Aug 2013 CS683@IITB 12

Placement Block Size Address • Address Range – Exceeds cache capacity Index Hash • Map address to finite capacity SRAM Cache – Called a hash – Usually just masks high-order bits • Direct-mapped Offset – Block can only exist in one Data Out location 32-bit Address – Hash collisions cause problems Index Offset CADSL 13 Aug 2013 CS683@IITB 13

Placement Tag Address • Fully-associative ?= – Block can exist anywhere Hit – No more hash collisions Tag Check Hash • Identification SRAM Cache – How do I know I have the right block? – Called a tag check ● Must store address tags ● Compare against address Offset • Expensive! Data Out – Tag & comparator per 32-bit Address block Tag Offset CADSL 13 Aug 2013 CS683@IITB 14

Placement Address SRAM Cache • Set-associative – Block can be in a Index Index a Data Blocks a Tags Hash locations – Hash collisions: ● a still OK • Identification – Still perform tag check ?= ?= ?= – However, only a few Tag ?= in parallel Offset 32-bit Address Data Out Tag Index Offset CADSL 13 Aug 2013 CS683@IITB 15

Placement and Identification 32-bit Address Tag Index Offset Portion Length Purpose Offset o=log2(block size) Select word within block Index i=log2(number of Select set of blocks sets) • Consider: <BS=block size, S=sets, B=blocks> Tag t=32 - o - i ID block within set – <64,64,64>: o=6, i=6, t=20: direct-mapped (S=B) – <64,16,64>: o=6, i=4, t=22: 4-way S-A (S = B / 4) – <64,1,64>: o=6, i=0, t=26: fully associative (S=1) • Total size = BS x B = BS x S x (B/S) CADSL 13 Aug 2013 CS683@IITB 16

Replacement • Cache has finite size – What do we do when it is full? • Analogy: desktop full? – Move books to bookshelf to make room • Same idea: – Move blocks to next level of cache CADSL 13 Aug 2013 CS683@IITB 17

Replacement • How do we choose victim ? – Verbs: Victimize, evict, replace, cast out • Several policies are possible – FIFO (first-in-first-out) – LRU (least recently used) – NMRU (not most recently used) – Pseudo-random • Pick victim within set where a = associativity – If a <= 2, LRU is cheap and easy (1 bit) – If a > 2, it gets harder – Pseudo-random works pretty well for caches CADSL 13 Aug 2013 CS683@IITB 18

Write Policy • Memory hierarchy – 2 or more copies of same block ● Main memory and/or disk ● Caches • What to do on a write? – Eventually, all copies must be changed – Write must propagate to all levels CADSL 13 Aug 2013 CS683@IITB 19

Write Policy • Easiest policy: write-through • Every write propagates directly through hierarchy – Write in L1, L2, memory, disk (?!?) • Why is this a bad idea? – Very high bandwidth requirement – Remember, large memories are slow • Popular in real systems only to the L2 – Every write updates L1 and L2 – Beyond L2, use write-back policy CADSL 13 Aug 2013 CS683@IITB 20

Write Policy • Most widely used: write-back • Maintain state of each line in a cache – Invalid – not present in the cache – Clean – present, but not written (unmodified) – Dirty – present and written (modified) • Store state in tag array, next to address tag – Mark dirty bit on a write • On eviction, check dirty bit – If set, write back dirty line to next level – Called a writeback or castout CADSL 13 Aug 2013 CS683@IITB 21

Write Policy • Complications of write-back policy – Stale copies lower in the hierarchy – Must always check higher level for dirty copies before accessing copy in a lower level • Not a big problem in uniprocessors – In multiprocessors: the cache coherence problem • I/O devices that use DMA (direct memory access) can cause problems even in uniprocessors – Called coherent I/O – Must check caches for dirty copies before reading main memory CADSL 13 Aug 2013 CS683@IITB 22

Cache Example • 32B Cache: <BS=4,S=4,B=8> Tag Array – o=2, i=2, t=2; 2-way set- Tag0 Tag1 LRU associative – Initially empty 0 – Only tag array shown on right • Trace execution of: 0 Reference Binary Set/Way Hit/Miss 0 0 CADSL 13 Aug 2013 CS683@IITB 23

Computer Architecture Memory System Virendra Singh Associate - PowerPoint PPT Presentation

Computer Architecture Memory System Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &

Introduction to Software Architecture Reid Holmes Architecture Architecture is: All

CMS Strip Readout Architecture for SLHC OUTLINE brief review of LHC strip readout architecture p

A New Golden Age for 1. Software advances can inspire architecture Computer Architecture:

cse141: Introduction to Computer Architecture Steven Swanson Alice Liang 1 Todays Agenda

cse141: Introduction to Computer Architecture Steven Swanson Andiry Xu Qi Li 1 Today s

cse141: Introduction to Computer Architecture Steven Swanson Nathan Goulding Manoj Mardithaya

The eXplicit MultiThreading (XMT) Parallel Computer Architecture Parallel Computer Architecture

Hot Topics in Computer System Architecture Computer Architecture 1950s and 1960s:

Betting on Software Architecture as Code a note on hypothesis-driven architecture James Lewis :

Institute for East Asian Architecture and Urbanism in Kyoto www.East-Asian-Architecture.org

Defense Daily Open Architecture Summit 2014 Defense Daily Open Architecture Summit 2014 PEO IWS

Wisznia | Architecture + Development Wisznia | Architecture + Development The Rebirth of a

Memory Address Map CS RD RAM 0 0 RD WR 1K x 8 WR DB(0..7) 1 AB Decoder AB(10..11) 2

WCET Driven Design Space Exploration of an Object Cache Benedikt Huber, Wolfgang Puffitsch,

Direct Addressed Caches for Reduced Power Consumption Emmett Witchel Sam Larsen C. Scott

Chapt hapter er 5 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction

Chapter Seven 1 2004 Morgan Kaufmann Publishers Memories: Review SRAM: value is

IC220 Slide Set #6: Digital Logic (Appendix B) 1 2 Appendix Goals Logic Design Digital

tr rtt t

A sequent calculus for a semi-associative law 1 Noam Zeilberger University of Birmingham

Computer Architecture Memory System Virendra Singh Associate - PowerPoint PPT Presentation

Computer Architecture Memory System Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture &amp; Computer Architecture &amp;

Introduction to Software Architecture Reid Holmes Architecture Architecture is: All

CMS Strip Readout Architecture for SLHC OUTLINE brief review of LHC strip readout architecture p

A New Golden Age for 1. Software advances can inspire architecture Computer Architecture:

cse141: Introduction to Computer Architecture Steven Swanson Alice Liang 1 Todays Agenda

cse141: Introduction to Computer Architecture Steven Swanson Andiry Xu Qi Li 1 Today s

cse141: Introduction to Computer Architecture Steven Swanson Nathan Goulding Manoj Mardithaya

The eXplicit MultiThreading (XMT) Parallel Computer Architecture Parallel Computer Architecture

Hot Topics in Computer System Architecture Computer Architecture 1950s and 1960s:

Betting on Software Architecture as Code a note on hypothesis-driven architecture James Lewis :

Institute for East Asian Architecture and Urbanism in Kyoto www.East-Asian-Architecture.org

Defense Daily Open Architecture Summit 2014 Defense Daily Open Architecture Summit 2014 PEO IWS

Wisznia | Architecture + Development Wisznia | Architecture + Development The Rebirth of a

Memory Address Map CS RD RAM 0 0 RD WR 1K x 8 WR DB(0..7) 1 AB Decoder AB(10..11) 2

WCET Driven Design Space Exploration of an Object Cache Benedikt Huber, Wolfgang Puffitsch,

Direct Addressed Caches for Reduced Power Consumption Emmett Witchel Sam Larsen C. Scott

Chapt hapter er 5 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction

Chapter Seven 1 2004 Morgan Kaufmann Publishers Memories: Review SRAM: value is

IC220 Slide Set #6: Digital Logic (Appendix B) 1 2 Appendix Goals Logic Design Digital

tr rtt t

A sequent calculus for a semi-associative law 1 Noam Zeilberger University of Birmingham

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &