SEESAW: Se t E nhanced S uperpage Aw are caching Mayank Parasar ∑ , Abhishek Bhattacharjee Ω , Tushar Krishna ∑ http://synergy.ece.gatech.edu/ ∑ School of Electrical and Computer Engineering Associativity Georgia Institute of Technology Ω Department of Computer Science Set Rutgers University mparasar3@gatech.edu
2 Outline ¡ Motivation ¡ SEESAW: Concept ¡ SEESAW: Micro-architecture ¡ Evaluation Methodology ¡ Results ¡ Conclusion Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
3 L1 Cache Characteristics Fast lookup High hit-rate Energy Efficiency Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
Virtually Indexed Physically Tagged [VIPT] 4 Cache VPN Page Offset VA block Set index offset v tag Data block Way-1 Way-1 TLB Way-2 Way-2 set-1 Way-3 Way-3 Way-4 Way-4 PPN PA Page Offset Way-1 Way-1 Way-2 Way-2 set-N Way-3 Way-3 Way-4 Way-4 Cache = HIT/MISS Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
Virtually Indexed Physically Tagged [VIPT] 5 Cache VPN Page Offset VA block Set index offset v tag Data block VIPT Caches necessitate: (set-index + block-offset) <= Page-offset Way-1 Way-1 TLB Way-2 Way-2 set-1 Way-3 Way-3 Way-4 Way-4 PPN PA Page Offset Way-1 Way-1 Way-2 Way-2 set-N Way-3 Way-3 Way-4 Way-4 Cache = HIT/MISS Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
Impact of Associativity on Access Latency 6 and Energy of cache Cache Access Latency Cache Access Energy Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
7 Effect of associativity on MPKI of cache High Associativity hurts latency and energy without commensurately improving hit rate Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
Revisiting L1 Cache Characteristics for VIPT 8 Cache Virtual Fast lookup memory! High hit-rate Virtual Energy memory! Efficiency Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
9 Opportunity: Superpage Is it possible to relax constrains of Yes Traditional VIPT cache? How ? Offset-bits: Offset-bits: Offset-bits: 12 21 30 More page-offset bits 4-KB 2-MB for superpage! 1-GB Baseline Page Super Page HW and OS Support for Superpages in modern processors Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
Prevalence of superpages in modern OSes 10 under memory fragmentation Ran on 32-core; Sandybridge; 32 GB RAM Memhog causes memory fragmentation; higher %age indicates higher fragmentation Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
11 Outline ¡ Motivation ¡ SEESAW: Concept ¡ SEESAW: Micro-architecture ¡ Evaluation Methodology ¡ Results ¡ Conclusion Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
12 SEESAW: Concept Faster Energy-Efficient v v tag Data block tag Data block Way-1 Way-1 Set:1 Way-1 Way-1 Way-2 Way-1 Way-1 Way-2 Set:2 Set:1 super-page Way-3 Way-3 Set:3 Way-1 Way-1 Way-1 Way-1 Set:4 Way-1 Way-1 Set:2 Set:5 Way-1 Way-1 Way-2 Way-2 Way-1 Way-3 Way-3 Set:6 Way-1 Base-page Way-1 Way-1 Set:7 Way-1 Way-1 Way-1 Way-2 Way-2 Way-1 Set:3 Set:8 Way-3 Way-3 Set:9 Way-1 Way-1 More-sets Less-sets Less-associativity More-associativity Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
13 Outline ¡ Motivation ¡ SEESAW: Concept ¡ SEESAW: Micro-architecture ¡ Evaluation Methodology ¡ Results ¡ Conclusion Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
14 SEESAW: Micro-architecture Superpage offset Decodes VA VPN Basepage Offset partition index from partition bit block Partition Set offset bit index Partition Translation decoder tag Data block v v Data block tag Filter Table Way-1 (TFT) Way-1 Way-3 Way-3 set-1 set-1 Way-2 Way-4 Way-4 Way-2 TLB Predicts whether page is superpage Way-1 Way-1 Way-3 Way-3 set-N set-N Way-2 Way-2 Way-4 Way-4 Partition-0 Partition-1 PPN PA Basepage Offset Cache Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
15 SEESAW: Micro-architecture Superpage offset VA VPN Basepage Offset block Partition Set offset bit index Partition Translation decoder tag Data block v v Data block tag Filter Table Way-1 (TFT) Way-1 Way-3 Way-3 set-1 set-1 Way-2 Way-4 Way-4 Way-2 TLB Way-1 Way-1 Way-3 Way-3 set-N set-N Way-2 Way-2 Way-4 Way-4 Partition-0 Partition-1 PPN PA Basepage Offset Cache Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
16 SEESAW: Superpage access Superpage offset VA VPN Basepage Offset block Partition Set Super Page offset bit index Partition Translation decoder tag Data block v v Data block tag Filter Table Way-1 (TFT) Way-1 Way-3 Way-3 set-1 set-1 Way-2 Way-4 Way-4 Way-2 TLB Way-1 Way-1 Way-3 Way-3 set-N set-N Way-2 Way-2 Way-4 Way-4 Partition-0 Partition-1 PPN PA Basepage Offset Cache HIT/MISS = Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
17 SEESAW: Basepage access VA VPN Basepage Offset block Partition Set Not a offset index index Super Page Partition Translation decoder tag Data block v v Data block tag Filter Table Way-1 (TFT) Way-1 Way-3 Way-3 set-1 set-1 Way-2 Way-4 Way-4 Way-2 TLB Way-1 Way-1 Way-3 Way-3 set-N set-N Way-2 Way-2 Way-4 Way-4 Partition-0 Partition-1 PPN PA Basepage Offset Cache = HIT/MISS Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
18 SEESAW: TFT and Partition Decoder Translation Partition Filter Table decoder (TFT) Super Tag: VA[63:21] page? Translation Filter Table Ø TFT Lookup Ø Direct mapped Partition Decoder Ø False negative due to size Ø For 32kB Cache Ø TFT Update Ø For 64kB Cache Ø VA misprediction Ø 2MB L1-TLB fill Ø 2MB L1-TLB Invalidation Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
19 SEESAW: Cache line insertion policy Which partition VA VPN Baseline Page Offset should cache- line be inserted? block Partition Set offset bit index Partition Translation decoder tag Data block v v Data block tag Filter Table Way-1 (TFT) Way-1 Way-3 Way-3 set-1 set-1 Way-2 Way-4 Way-4 Way-2 TLB Way-1 Way-1 Way-3 Way-3 set-N set-N Way-2 Way-2 Way-4 Way-4 Partition-0 Partition-1 PPN PA Baseline Page Offset Cache Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
20 SEESAW: Cache line insertion policy ¡ 4way-8way ¡ Superpage miss: victim within the partition ¡ Basepage miss: victim within the set ¡ 4way ¡ Uses LRU within the associated partition ¡ Avoid installing the same line twice ¡ Saves energy Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
21 SEESAW: System Level Optimization ¡ Cache coherence ¡ Cache coherence lookups use physical address ¡ Snoopy provide higher energy benefits over Directory based coherence ¡ Page table modifications ¡ Superpage splintered into multiple basepages ¡ Multiple basepages promoted to superpages Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
22 Outline ¡ Motivation ¡ SEESAW: Concept ¡ SEESAW: Micro-architecture ¡ Evaluation Methodology ¡ Results ¡ Conclusion Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
23 SEESAW: Simulated system Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
24 SEESAW: Workloads ¡ Spec ¡ Server Workload ¡ graph500 ¡ Parsec ¡ Nutch Hadoop ¡ Cloudsuite ¡ Social-event web ¡ Tunkrank service ¡ Biobench ¡ Olia ¡ Mummer ¡ Tiger ¡ Key value store ¡ Redis ¡ MongoDB Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
25 Outline ¡ Motivation ¡ SEESAW: Concept ¡ SEESAW: Micro-architecture ¡ Evaluation Methodology ¡ Results ¡ Conclusion Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
26 SEESAW: Performance improvement SEESAW observes 3-10% better runtime over baseline Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
27 SEESAW: Performance improvement Out-of-order in-order CPU CPU ~10% performance improvement for 64kB cache in OoO CPUs Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
28 SEESAW: Energy savings 10-20% more energy savings over CPUs using baseline VIPT caches! Approx. one-third of energy savings from coherence Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
29 SEESAW: TFT analysis and Way-Prediction TFT Analysis SEESAW + Way-prediction 16-entry TFT drives miss-rate under 10% SEESAW+WP shows symbiotic behavior Mayank Parasar, School of Electrical and Computer 6/26/18 Engineering, Georgia Tech
Recommend
More recommend