coordinated and efficient huge page management with ingens
play

Coordinated and Efficient Huge Page Management with Ingens Youngjin - PowerPoint PPT Presentation

Coordinated and Efficient Huge Page Management with Ingens Youngjin Kwon , Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel 1 High address translation cost Modern applications: large memory footprint, low memory access


  1. Coordinated and Efficient Huge Page Management with Ingens Youngjin Kwon , Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel 1

  2. High address translation cost Modern applications: large memory footprint, low memory access locality • TLB coverage using base pages is insufficient • % of cpu cycles spent by page walk Virtual address 70% 60% Page table 50% Cpu cycles Physical 40% address 30% 20% 10% 0% 429.mcf Graph analytics SVM MongoDB 2

  3. High address translation cost Virtualization requires additional address translation • % of cpu cycles spent by page walk Virtual address 70% Host page table walk 60% Guest page table Guest page table walk 50% Cpu cycles Guest physical 40% address 30% Host page table 20% Host physical 10% address 0% 429.mcf Graph analytics SVM MongoDB 3

  4. Huge pages improve TLB coverage Architecture supports larger page size (e.g., 2MB page) • Intel: 0 to 1,536 entries in 2 years (2013 ~ 2015) • Operating system has the burden of better huge page support • TLB coverage proportional to 64 GB DRAM 4KB page 2MB page 5% 4.6% 4% 3.2% 3% 2% 1% 0.11% 0.01% 0.1% 0.01% 0.1% 0.05% 0% Sandy Bridge Ivy Bridge Haswell Skylake 2011 2013 2014 2015 4

  5. Operating system support for huge pages OS transparently allocates/deallocates huge pages • Huge pages in both guest and host • FreeBSD Linux LWN.net, 2011 5

  6. Huge pages improve performance Application speed up over using base pages only • 60% 50% r e t 40% t e Speed up B 30% 20% 10% 0% 
 
 s s B Average g 
 r l 
 c i n e D a d 
 f i 
 c ) ) M ) ) e t i v e o U C r ) n e y m r n a h R V g l r e n ) P E a n e p a b . n s o S 9 n C S a n a e i o t l 2 a b L s C i r l R M l c G 4 b e d e M h A e i W u n r L p P p e i o k ( a h S w ( r l r c C a ( G o a p ( P M S ( ( 6

  7. Are huge pages a free lunch? 7

  8. Are huge pages a free lunch? 8

  9. Are huge pages a free lunch? 8

  10. Are huge pages a free lunch? 8

  11. Are huge pages a free lunch? 8

  12. Huge page pathologies in Linux • High page fault latency • Memory bloating • Unfair huge page allocation • Uncoordinated memory management 9

  13. Huge page pathologies in Linux • High page fault latency • Memory bloating • Unfair huge page allocation • Uncoordinated memory management 10

  14. Ingens Efficient huge page management system How to allocate huge pages? Problems Linux Ingens High page fault Synchronous Asynchronous latency allocation allocation Memory Spatial utilization Greedy allocation bloating based allocation 11

  15. High page fault latency 12

  16. Huge page allocation increases page fault latency Page allocation path of both base and huge page • Physical memory manager Page fault handler Application Get page(s) Allocate page(s) pause from free page list Application Map the page(s) Zero the page(s) resume to page table Page fault latency • 4KB page : 3.6 us • 2MB page : 378.0 us (mostly from page zeroing) • Increases tail latency 13

  17. Huge page allocation might require extra memory copying Page allocation path of huge page • Physical memory manager Page fault handler Application Get page(s) Allocate page(s) pause from free page list Map the page(s) Application Zero the page(s) to page table resume 14

  18. Huge page allocation might require extra memory copying Page allocation path of huge page • Physical memory manager Page fault handler Application Get page(s) Allocate page(s) pause from free page list Not enough contiguous memory Map the page(s) Application Zero the page(s) to page table resume 14

  19. External fragmentation Not enough contiguous memory 15

  20. External fragmentation Not enough contiguous memory B Huge page As system ages, physical memory is • boundary fragmented B B 2 minutes to fragment 24 GB • B All memory sizes eventually fragment • B Linux compacts physical memory to • create contiguous pages B Virtual Physical Allocated Base page B address address 15

  21. External fragmentation Not enough B contiguous memory B B Huge page As system ages, physical memory is • boundary fragmented 2 minutes to fragment 24 GB • All memory sizes eventually fragment • B Linux compacts physical memory to • create contiguous pages B Virtual Physical Allocated Base page B address address 15

  22. External fragmentation Not enough B contiguous memory B B Huge page As system ages, physical memory is • boundary fragmented 2 minutes to fragment 24 GB • H All memory sizes eventually fragment • B Linux compacts physical memory to • create contiguous pages B Virtual Physical Allocated Base page B address address

  23. External fragmentation Not enough contiguous memory

  24. Huge page allocation might require extra memory copying Page allocation path of huge page includes memory compaction • Physical memory manager Page fault handler Application Get page(s) Allocate page(s) pause from free page list Not enough contiguous memory Map the page(s) Application Zero the page(s) to page table resume 17

  25. Huge page allocation might require extra memory copying Page allocation path of huge page includes memory compaction • Physical memory manager Page fault handler Application Get page(s) Allocate page(s) pause from free page list Not enough contiguous memory Compact physical memory Map the page(s) Application Zero the page(s) to page table resume 17

  26. Huge page allocation might require extra memory copying Page allocation path of huge page includes memory compaction • Physical memory manager Page fault handler Application Get page(s) Allocate page(s) pause from free page list Not enough contiguous memory Compact physical memory Compaction may or may not succeed Map the page(s) Application Zero the page(s) to page table resume 17

  27. Ingens: asynchronous allocation • Page fault handler only bit vector allocates base pages Page fault 1 bit per handler base page • Huge page allocation in Read/update background on each base page fault • Memory compaction in background Asynchronous promotion • No extra page fault latency • No huge page zeroing Promotion • No compaction Kernel thread Fast page fault handling 18

  28. Page fault latency experiment Machine specification • Two Intel Xeon E5-2640 2.60GHz CPUs • 64GB memory and two 250 MB SSDs • Cloudstone workload (latency sensitive) • Web service for social event planning • nginx/PHP/MySQL running in virtual machines • 85% read, 10% login, 5% write workloads • 2 of 7 web pages modified to use modern web page sizes • The average web page is 2.1 MB • https://www.soasta.com/blog/page-bloat-average-web-page-2-mb/ 19

  29. Cloudstone result Throughput (requests/s) Memory is highly fragmented Linux Ingens • 922.3 1091.9 (+18%) Ingens reduces • Latency (millisecond) average latency up to 29.2% • tail latency up to 41.4% • View event Visit home page 600 Linux 500 Linux page fault handler • Ingens performs 461,383 memory r 400 e t compactions t 300 e B 200 100 0 Avg. 90th Avg. 90th 20

  30. Cloudstone result Throughput (requests/s) Memory is highly fragmented Linux Ingens • 922.3 1091.9 (+18%) Ingens reduces • Latency (millisecond) average latency up to 29.2% • tail latency up to 41.4% • View event Visit home page 600 Linux 500 Linux page fault handler • Ingens performs 461,383 memory r 400 e t compactions t 300 e B 200 100 0 Avg. 90th Avg. 90th 20

  31. Memory bloating Application occupies more memory than it uses 21

  32. Internal fragmentation Greedy allocation in Linux • H Allocate a huge page on first • fault to huge page region Huge page boundary The huge page region may not • be fully used Huge page H region Greedy allocation causes severe • internal fragmentation Memory use often sparse • H Used virtual address Virtual Physical address address Unused virtual address 22

  33. Memory bloating experiment Physical memory consumption Redis • Delete 70% objects after Using huge Using only • page base page populating 8KB objects 20.7GB 12.2GB Redis MongoDB • (+69%) 15 million get requests for • 12.4GB 10.1GB MongoDB 1KB object with YCSB (+23%) Bloating makes memory consumption unpredictable Memory-intensive applications can’t provision to avoid swap 23

  34. Ingens: Spatial utilization based allocation Ingens monitors spatial utilization • 100% H of each huge page region utilization Utilization-based allocation • B Page fault handler requests • 75% promotion when the utilization is utilization B beyond a threshold (e.g., 90%) Bounds the size of internal • B fragmentation 25% utilization B Virtual Physical address address 24

  35. Redis memory bloating experiment Huge : 2MB page Base : 4KB page Physical memory consumption Better 12.3 GB 12.2 GB 20.7 GB Linux (huge) Linux (base only) Ingens GET throughput Better 20.9K 21.7K 19.0K Linux (base only) Ingens Linux (huge) - 4% + 10% 25

Recommend


More recommend