architectures with large
play

Architectures with Large Die-Stacked DRAM Cache Adarsh Patil - PowerPoint PPT Presentation

TLB and Pagewalk Performance in Multicore Architectures with Large Die-Stacked DRAM Cache Adarsh Patil Adviser: Prof. R Govindarajan Perspective Seminar 6 th Nov 2015 Outline Introduction Address Translation - TLBs and Page Walks


  1. TLB and Pagewalk Performance in Multicore Architectures with Large Die-Stacked DRAM Cache Adarsh Patil Adviser: Prof. R Govindarajan Perspective Seminar 6 th Nov 2015

  2. Outline ■ Introduction ฀ Address Translation - TLBs and Page Walks ฀ Die stacked DRAM caches ■ Objective ■ Experimental Setup ฀ Framework ฀ Methodology ■ Results ■ Conclusion and Future Work CSA Perspective Seminar 2 6th Nov 2015

  3. Computing Trends ■ Software ฀ Large memory footprint *Apps in Big Data Bench / Cloud Suite Benchmark CSA Perspective Seminar 3 6th Nov 2015

  4. Computing Trends ■ Software ฀ Large memory footprint ฀ Virtualization and cloud computing *Source : VMware CSA Perspective Seminar 4 6th Nov 2015

  5. Computing Trends ■ Software ฀ Large memory footprint ฀ Virtualization and cloud computing Intel Haswell-E & IBM Power 8 ■ Architectural ฀ Multicore / Manycore architectures CSA Perspective Seminar 5 6th Nov 2015

  6. Computing Trends ■ Software ฀ Large memory footprint ฀ Virtualization and cloud computing ■ Architectural ฀ Multicore / Manycore architectures ฀ Large Die stacked DRAM cache *Source : Invensas, Tessera CSA Perspective Seminar 6 6th Nov 2015

  7. Paged Virtual Memory ■ Virtual address space divided into “ pages ” ■ “ Page Table ” : In -memory table, organized as radix tree , to map virtual to physical address and store meta-information (replacement, access privilege, dirty bit etc.) ■ Page table entries cached in fast lookup structures called “ Translation Lookaside Buffers (TLBs) ” ■ Page Table has evolved to 4-level tree to accommodate 48-bit VA CSA Perspective Seminar 7 6th Nov 2015

  8. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 8 6th Nov 2015

  9. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 9 6th Nov 2015

  10. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 10 6th Nov 2015

  11. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 11 6th Nov 2015

  12. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 12 6th Nov 2015

  13. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 13 6th Nov 2015

  14. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 14 6th Nov 2015

  15. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 15 6th Nov 2015

  16. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 16 6th Nov 2015

  17. Page Table Structure 47 : 39 63 : 48 38 : 30 29 : 21 20 : 12 11 : 0 Sign extension PD | PL2 PTE | PL1 Page Offset PML4 | PL4 PDP | PL3 L1 L4 … CR 3 Register … ppn: 362 L3 ppn: 382 ppn: 156 … … ppn: 467 ppn: NULL L2 ppn: NULL Data Superpage ppn: 684 ppn: 137 … … ppn: 673 L1 ppn: 041 … ppn: 734 ■ Hierarchical page table Data ppn: 424 … Page ppn: 016 ■ 4 memory references - 4KB page … 3 memory references - 2MB superpage ■ Each entry is 8 bytes ■ TLB stores VA to PA CSA Perspective Seminar 17 6th Nov 2015

  18. Page Table Structure-Virtualization ■ Guest Page Table (gPT) ฀ Translate guest virtual to guest physical ฀ Setup and modified by guest independently ■ Nested Page Table (nPT) ฀ Translate host virtual to host physical ฀ Controlled by host ■ Upto 24 memory references on page walk ■ TLB stores to end to end translation CSA Perspective Seminar 18 6th Nov 2015

  19. Address Translation in Hardware CORE Cache Cache CORE Cache Cache L1 L1 L1 TLB L3 Cache Hardware Page L2 L2 Instr MMU MMU Shared Walker Page Walk Caches Cache L2 TLB L1 TLB Data Cache Cache CORE Cache CORE L1 TLB Translatio Cache Cache L1 L1 Page Walk Superpage n L2 L2 cycles MMU MMU Page Walk Caches Caches Miss Multi-level TLB < 4 cycles VA PA CPU Large Miss Set Miss L1 Cache ? PIPT Memory VIPT Caches 4 cycles Hit Hit 6 / 10 180-200 cycles cycles Data CSA Perspective Seminar 19 6th Nov 2015

  20. TLB-reach & page walk latencies ■ TLB-reach: amount of data that can be accessed without causing a miss ฀ Clustered [HPCA ‘14] and Coalesced [MICRO ’12] TLBs ฀ Superpage friendly TLBs [HPCA ‘15] using skewed TLBs ฀ Shared last level TLBs [HPCA ‘11] evaluates shared TLBs for multi -cores ฀ Direct segment [ISCA ‘13] - primary region abstraction to map part of the virtual address space using segment registers and avoid paging completely ฀ Redundant memory mappings [ISCA ‘15] - allocation in units called ranges (eager paging in OS) and maintain ranges in a separate range-TLB, compatible with traditional paging. ■ Speeding up miss handling ฀ AMD proposed Accelerating 2D page walks [ASPLOS ‘08] by using page walk caches for virtualization ฀ Characterize TLB behaviors and sensitivity of individual SPEC 2000 [SIGMETRICS ‘02] and PARSEC [PACT ‘09] applications CSA Perspective Seminar 20 6th Nov 2015

Recommend


More recommend