CS 6354: Memory Hierarchy II Prioritize reads over writes Band- - PowerPoint PPT Presentation

CS 6354: Memory Hierarchy II Prioritize reads over writes Band- width Increase block size N Y Increase cache size N Y Increase associavity N Y Multilevel caches Y Y Hit Virtual-index = Physical Y Pipelined cache accesses Y Non-blocking caches Y Y Prefetching Y Y Way-prediction Y + complexity costs (adapted from tables in H&P B and H&P 2.2) rate penalty 31 August 2016 Smith, “Cache memories” 1 Memory Hierarchy Registers L1 cache L2 cache L3 cache main memory Image: approx 2004 AMD press image of Opteron die Miss Last time 2 Trace-based simulation of lots of cache parameters Suggestions for crypto implementors time Hit Technique Improves Last time: Cache optimizations 3 Suggestions for architects Overlap virtual to physical lookup and cache lookup Bernstein, “Cache timing attacks on AES” 4 < 1 ns ∼ 1 ns ∼ 5 ns ∼ 20 ns ∼ 100 ns Fighting for constant-time (with respect to secrets)

Homework 1 instruction ptr, … select tag cache way 1 cache way 2 tag predict way calcuation step 1 data bufger execution step 2 7 6 Why not direct-mapped? Virtual Page # Physical Page # Index of Set? Index of Set? Ofgset Avoid virtual caches Mitigation: way prediction Difgerent HW speed tradeofgs today? Way prediction execution step 2 Checkpoint due 12 September of main memory Intuition: 32KB much faster than 34KB, then 32KB cache Required for checkpoint: * For each data or unifjed (data and instruction) cache: - The size of that cache - The size of blocks (AKA lines) in that cache * For each data or unifjed TLB: - The size (number of entries) of that TLB * The single-core sequential throughput (read and write) * The single-core random throughput (read and write) of bufger main memory 5 Avoiding associativity = cache tag calcuation step 1 data 8 tag index ofgset tag index ofgset

Victim Caches Old prefetch strategies 0x0000 0000 0040 0000 Used by OS Virtual memory Stack Memory mappings Writable data Code + Constants Confmict in low-order bits? 10 Prefetch always V Fetch next on miss Tagged prefetch — next on non-prefetch use 11 Sequential access patterns Examples? Instructions Dense matrix/array math String processing Some database operations 0x7FFF … 0xFFFF 8000 0000 0000 0xFFFF FFFF FFFF FFFF Difgerent kinds of memory Tag Data 1 0x1000x2000x300AA BBCC DDEE FF V Address Data 1 0x1006 AA BB 1 0x10060x2006AA BBCC DD Access pattern: 0x1006 0x2006 0x3006 0x2006 0x1006 Direct-mapped Cache Victim Cache 9 12 Common goal: sequential access patterns

Stream bufgers Prefetching on recent Intel (1) current address and the stride. … prefetch is sent to the next address which is the sum of the prefetcher keeps track of individual load instructions. If a load … Two hardware prefetchers load data to the L1 DCache: hit: shift up From the Intel Optimization Manual on Sandy Bridge: 15 Performance Results 14 miss: clear LRU hit: shift up Multi-way stream bufgers 13 miss: clear 16 • Data cache unit (DCU) prefetcher . This prefetcher … is triggered by an ascending access to very recently loaded data. • Instruction pointer (IP)-based stride prefetcher . This instruction is detected to have a regular stride, then a

Prefetching on recent Intel (2) Sandy Bridge die Memory bandwidth Prefetchers Last level cache size Number of threads Cook’s Benchmark Categorization 19 Image: Intel’s Optimization Reference Manual Within each core 18 via anandtech (original is Intel press photo??) 17 From the Intel Optimization Manual on Sandy Bridge: fetch. operations and … the [L1] hardware prefetchers, and … code addresses. Monitored read requests include … load and store Streamer : This prefetcher monitors read requests from the that completes it to a 128-byte aligned chunk. Spatial Prefetcher : This prefetcher strives to complete memory to the L2 cache and last level cache: The following two hardware prefetchers fetched data from 20 every cache line fetched to the L2 cache with the pair line L1 cache for ascending and descending sequences of

Interference between programs 1111 11 0.5 MB Way 12 0.5 MB foreground application mask 1111 1000 0000 background application mask 0000 0111 1111 23 Page coloring 0101 1111 110110 1000 0.5 MB 11 0000 0000 … Physical Page # Index of Set Ofgset cache indices 0x000 – 0x3FF cache indices 0x400 – 0x7FF cache indices 0x800 – 0xBFF cache indices 0xCFF – 0xFFF cache (possibly direct-mapped) Page colors: 00, 01, 10, 11 Way 11 Way 10 21 0.5 MB Why a shared last-level cache? 22 Sandy Bridge’s cache partitioning 12-way cache way — ‘column’ of set-associative cache each way is like a direct-mapped cache Mask for which ways are used to store things on miss LLC: Way 1 0.5 MB Way 2 Way 3 0.5 MB 0.5 MB Way 4 0.5 MB Way 5 0.5 MB Way 6 0.5 MB Way 7 0.5 MB Way 8 0.5 MB Way 9 24

Experiment Design 25 Energy: Race-to-Halt 26 Phases 27 Dynamic partitioning Dynamic partitioning inputs: LLC misses over 100 ms, every 100 ms Thresholds for detecting changes 28 Increase to max allocation — then decrease slowly

Reproducibility 29

CS 6354: Memory Hierarchy II Prioritize reads over writes Band- - PowerPoint PPT Presentation

CS 6354: Memory Hierarchy II Prioritize reads over writes Band- width Increase block size N Y Increase cache size N Y Increase associavity N Y Multilevel caches Y Y Hit Virtual-index = Physical Y Pipelined cache accesses Y

CS 6354: Memory Hierarchy III for ( int i = 0; i < I; ++i) { 5 September 2016 Goto Fig. 4 3

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Abstractions for Practical Systems Caching and the memory hierarchy Operating systems and the

1 5.1 Introduction A Typical Memory Hierarchy A Typical Memory Hierarchy Memory Technology

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

1 Basic use of caches Levels in the memory hierarchy When fetching an instruction, first

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several

Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Data Management Systems Storage Management The Memory hierarchy Memory hierarchy

Memory Hierarchy Design Issues Memory Hierarchy Design Issues in Many in Many-Core Processors

Hierarchy of School Marketing Needs Leadership Day - February 16, 2018 Maslows Hierarchy of

Careers in economics careers@bath.ac.uk Finance Huge variety of job roles e.g. investment

Systems H ADOOP Distributed File System Dr. Taieb Znati Computer Science Department University

UHD - USRP Hardware Driver Universal Software Radio Peripheral Hardware Driver A Brief USRP

Commissioning with cosmic rays of the Commissioning with cosmic rays of the Commissioning with

The role of adaptive streaming standardization activities in defining broadband television

Computing Resources for ProtoDUNE A. Norman, H. Schellman Software and Computing Questions

Charlie Garrod Michael Hilton School of Computer Science 15-214 1 Administrivia Homework 5

Principles of Programming Languages

Sambuz

Useful Links

Newsletter

Mail Us

CS 6354: Memory Hierarchy II Prioritize reads over writes Band- - PowerPoint PPT Presentation

CS 6354: Memory Hierarchy II Prioritize reads over writes Band- width Increase block size N Y Increase cache size N Y Increase associavity N Y Multilevel caches Y Y Hit Virtual-index = Physical Y Pipelined cache accesses Y

CS 6354: Memory Hierarchy III for ( int i = 0; i &lt; I; ++i) { 5 September 2016 Goto Fig. 4 3

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

Memory Hierarchy Motivation, Definitions, Four Questions about Memory Hierarchy Soner Onder

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Abstractions for Practical Systems Caching and the memory hierarchy Operating systems and the

1 5.1 Introduction A Typical Memory Hierarchy A Typical Memory Hierarchy Memory Technology

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory Hierarchy: Caching CSE 141, S2'06 Jeff Brown The memory subsystem Computer Control

1 Basic use of caches Levels in the memory hierarchy When fetching an instruction, first

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy &amp; Caching Use several

Why memory hierarchy (3 rd Ed: p.468-487, 4 th Ed: p. 452-470) users want unlimited fast

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Data Management Systems Storage Management The Memory hierarchy Memory hierarchy

Memory Hierarchy Design Issues Memory Hierarchy Design Issues in Many in Many-Core Processors

Hierarchy of School Marketing Needs Leadership Day - February 16, 2018 Maslows Hierarchy of

Careers in economics careers@bath.ac.uk Finance Huge variety of job roles e.g. investment

Systems H ADOOP Distributed File System Dr. Taieb Znati Computer Science Department University

UHD - USRP Hardware Driver Universal Software Radio Peripheral Hardware Driver A Brief USRP

Commissioning with cosmic rays of the Commissioning with cosmic rays of the Commissioning with

The role of adaptive streaming standardization activities in defining broadband television

Computing Resources for ProtoDUNE A. Norman, H. Schellman Software and Computing Questions

Charlie Garrod Michael Hilton School of Computer Science 15-214 1 Administrivia Homework 5

Principles of Programming Languages

Sambuz

Useful Links

Newsletter

Mail Us

CS 6354: Memory Hierarchy III for ( int i = 0; i < I; ++i) { 5 September 2016 Goto Fig. 4 3

EE 457 Unit 7a Cache and Memory Hierarchy 2 Memory Hierarchy & Caching Use several