Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism Dimitrios Skarlatos , Apostolos Kokolis, Tianyin Xu, Josep Torrellas University of Illinois at Urbana-Champaign skarlat2.web.engr.illinois.edu ASPLOS 2020
Virtual Memory Translation is Expensive Main Memory PA 1 PA 4 Application Page Tables VA 1 PA 4 L1 L2 L3 Core VA 8 PA 1 Cache Cache Cache Issue LD VA 1 TLB TLB Miss! 2
Virtual Memory Translation is Expensive Main Memory PA 1 PA 4 Application Page Tables VA 1 PA 4 VA1 PA4 L1 L2 L3 Core VA 8 PA 1 Cache Cache Cache Issue LD VA 1 TLB TLB Miss à “Page Walk” = Fetch entry from page table 3
x86-64 Radix Page Tables Main Memory PA 1 PA 4 4
x86-64 Radix Page Tables Main Memory PA 1 Virtual Address 47 … 39 38 … 30 29 … 21 20 … 12 11 … 0 PA 4 Address A 9-bits 9-bits 9-bits 9-bits Page Offset + pgd CR3 + pud + pmd + pte TLB Entry PGD PUD PMD VA1 PA4 PTE 5
Virtual Memory Translation is Expensive Main Memory PA 1 PA 4 Application Radix Page Tables pgd pgd pud pmd pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE Issue LD VA 1 TLB TLB Miss à “Page Walk” = Fetch entry from radix page table 6
Virtual Memory Translation is Expensive Main Memory PA 1 PA 4 Application Radix Page Tables pgd pud pud pmd pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE Issue LD VA 1 TLB pgd TLB Miss à “Page Walk” = Fetch entry from radix page table 7
Virtual Memory Translation is Expensive Main Memory PA 1 PA 4 Application Radix Page Tables pgd pud pmd pmd pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE Issue LD VA 1 TLB pud TLB Miss à “Page Walk” = Fetch entry from radix page table 8
Virtual Memory Translation is Expensive Main Memory PA 1 PA 4 Application Radix Page Tables pgd pud pmd pte pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE Issue LD VA 1 TLB pmd TLB Miss à “Page Walk” = Fetch entry from radix page table 9
Virtual Memory Translation is Expensive Main Memory PA 1 PA 4 Application Page Tables VA 1 PA 4 L1 L2 L3 Core VA 8 PA 1 Cache Cache Cache Issue LD VA 1 TLB VA1 PA4 10
Multilevel TLBs Main Memory PA 1 PA 4 Application Radix Page Tables pgd pud pmd pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE L1 TLB L2 TLB 11
Memory Management Unit (MMU) Cache Main Memory PA 1 PA 4 Application Radix Page Tables pgd pud pmd pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE MMU Cache L1 TLB L2 TLB 12
Translations in Data Caches Main Memory PA 1 PA 4 Application Radix Page Tables pgd pgd pgd pud pud pmd pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE pte pud pmd MMU Cache L1 TLB L2 TLB 13
NVM will Make the Problem Worse Main Memory Sunny Cove introduces 5-Level Radix Page Tables!! PA 1 PA 4 Application Radix Page Tables pgd pgd pgd pud pud pmd pte L1 L2 L3 Core PGD Cache Cache Cache PUD PMD PTE pte pud pmd MMU Cache Non-Volatile Memory Technology L1 TLB L2 TLB 14
Contribution: Elastic Cuckoo Page Tables • Rethinking virtual memory translation for parallelism • Idea: Dynamically resizable page tables based on cuckoo hashing • No sequential page table lookups à parallel single-step lookups • Application speedup over state-of-the-art: • 3-28% with 4KB pages • 3-18% with Huge pages 15
Alternative: A Global Hashed Page Table 16
Alternative: A Global Hashed Page Table The old approach from Intel and IBM COLLISIONS Global Hash Table VA 1 Application H Tag Tag VA 9 H Collisions OS is invoked to resolve them! 17
Alternative: A Global Hashed Page Table The old approach from Intel and IBM How to share pages? Multiple page sizes? Global Hash Table VA 1 Application A H Tag Tag VA 9 H VA 6 Application B H 18
Alternative: A Global Hashed Page Table The old approach from Intel and IBM How to share pages? New level of indirection!! Multiple page sizes? Global Hash Table VA 1 Application A H Tag Tag VA 9 H VA 6 Application B H 19
Alternative: A Global Hashed Page Table The old approach from Intel and IBM How to share pages? New level of indirection!! Multiple page sizes? Global Hash Table VA 1 Application A H Tag Tag VA 9 H VA 6 Application B H Tag 20
Alternative: A Global Hashed Page Table The old approach from Intel and IBM How to share pages? New level of indirection!! COLLISIONS Multiple page sizes? PAGE SHARING Global Hash Table PAGE SIZES VA 1 Application A H Tag Tag VA 9 H VA 6 Application B H Tag 21
Alternative: A Global Hashed Page Table The old approach from Intel and IBM Switched to radix page tables! Global Hash Table VA 1 Application A H Tag Tag DEAD END VA 9 H VA 6 Application B H Tag 22
Elastic Cuckoo Page Tables Rethinking virtual memory translation for parallelism 23
Cuckoo Hashing [Pagh 2001, Fotakis 2005] d H 1 H 2 H 3 d b c f a g T1 T2 T3 d -ary Cuckoo Hash Table 24
Insertions with Cuckoo Hashing e H 1 d b c f a g T1 T2 T3 d -ary Cuckoo Hash Table 25
Insertions with Cuckoo Hashing H 1 d b c e f a g T1 T2 T3 d -ary Cuckoo Hash Table 26
Insertions with Cuckoo Hashing f H 3 d b c e a g T1 T2 T3 d -ary Cuckoo Hash Table 27
Insertions with Cuckoo Hashing H 3 d b f c e a g T1 T2 T3 d -ary Cuckoo Hash Table 28
Insertions with Cuckoo Hashing b H 2 d f c e a g T1 T2 T3 d -ary Cuckoo Hash Table 29
Insertions with Cuckoo Hashing d b f c e a g T1 T2 T3 d -ary Cuckoo Hash Table 30
Insertions with Cuckoo Hashing COLLISIONS PAGE SHARING PAGE SIZES d b f c e a g T1 T2 T3 d -ary Cuckoo Hash Table 31
Private Hashed Page Tables COLLISIONS PRIVATE PAGE SHARING PAGE TABLES PAGE SIZES d b f c e a g T1 T2 T3 d -ary Cuckoo Hash Table 32
Cannot Be Too Big à Waste Memory Main Memory COLLISIONS PRIVATE Page Tables A PAGE SHARING App A PAGE TABLES PAGE SIZES Private page tables cannot be too big Page Tables B App B 33
Need to Dynamically Resize Main Memory COLLISIONS PRIVATE Page Tables A PAGE SHARING App A PAGE TABLES VA 1 PA 4 PAGE SIZES VA 8 PA 1 Private page tables cannot be too big Need to dynamically resize 34
Need to Dynamically Resize Main Memory COLLISIONS PRIVATE Page Tables A PAGE SHARING App A PAGE TABLES VA 1 PA 4 PAGE SIZES VA 8 PA 1 New Page Tables A Private page tables cannot be too big Need to dynamically resize 35
Cannot Rehash All Entries at Once Main Memory COLLISIONS PRIVATE Page Tables A PAGE SHARING App A PAGE TABLES VA 1 PA 4 PAGE SIZES VA 8 PA 1 New Page Tables A Private page tables cannot be too big Need to dynamically resize 36
Cannot Rehash All Entries at Once Main Memory COLLISIONS PRIVATE Page Tables A PAGE SHARING App A PAGE TABLES VA 1 PA 4 PAGE SIZES VA 8 PA 1 New Page Tables A Private page tables cannot be too big Need to dynamically resize 37
Cannot Rehash All Entries at Once Main Memory COLLISIONS PRIVATE Page Tables A PAGE SHARING App A PAGE TABLES VA 1 PA 4 PAGE SIZES VA 8 PA 1 New Page Tables A Private page tables cannot be too big Need to dynamically resize While the program is running Gradual Resizing! 38
Gradual Resizing Cuckoo Hash Tables At every insert à Rehash one element m b k f c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ New d -ary Cuckoo Hash Table 39
Gradual Resizing Cuckoo Hash Tables At every insert à Rehash one element m b k f H' 1 c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ New d -ary Cuckoo Hash Table 40
Lookup During Gradual Resizing m H' 1 H' 2 H' 3 H 1 H 2 H 3 b k f m c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ New d -ary Cuckoo Hash Table 41
Problem of Resizing: Double #Lookups m H' 1 H' 2 H' 3 H 1 H 2 H 3 2 x d Lookups! b k f m c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ New d -ary Cuckoo Hash Table 42
Contribution: Elastic Cuckoo Hashing Rehashing Pointers P 1 P 2 P 3 m b k f c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ New d -ary Cuckoo Hash Table 43
Elastic Cuckoo Migration m P 2 P 3 b P 1 k f c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ Migrated Region New d -ary Cuckoo Hash Table 44
Elastic Cuckoo Migration m P 2 P 3 b P 1 k f H' 1 c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ Migrated Region New d -ary Cuckoo Hash Table 45
Elastic Cuckoo Migration P 2 P 3 b P 1 f k m c e l a g T2 T3 T1 Old d -ary Cuckoo Hash Table T1’ T2’ T3’ Migrated Region New d -ary Cuckoo Hash Table 46
Elastic Cuckoo Migration P 3 P 2 f P 1 m c e l a g k T2 T3 T1 b Old d -ary Cuckoo Hash Table T1’ T2’ T3’ Migrated Region New d -ary Cuckoo Hash Table 47
Recommend
More recommend