compress objects not cache lines an object based
play

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory - PowerPoint PPT Presentation

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy Po-An Tsai and Daniel Sanchez Prior memory compression techniques are limited to compressing cache lines 2 Prior memory compression techniques are limited to


  1. Compressing objects would be hard to do on cache hierarchies  Ideally, we want a memory system that  Moves objects, rather than cache lines  Transparently updates pointers during compression  Therefore, we realize our ideas on Hotpads [Tsai et al., MICRO’18]  A recent object-based memory hierarchy 9

  2. Baseline system: Hotpads overview 10

  3. Baseline system: Hotpads overview L2 L3 L1 Core pad pad pad 10

  4. Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly Data Array Objects Free space 10

  5. Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly Data Array Obj. A Objects Free space 10

  6. Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B Free space 10

  7. Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly  Can store variable-sized compressed objects compactly too! Data Array Obj. A Objects Obj. B Free space 10

  8. Baseline system: Hotpads overview  Data array  Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad  Stores variable-sized objects compactly  Can store variable-sized compressed objects compactly too! Data Array (word/object) Obj. A Metadata  C-Tags Objects Obj. B  Decoupled tag store Free space  Metadata C-Tags  Pointer? valid? dirty? recently-used? 10

  9. Hotpads moves objects instead of cache lines 11

  10. Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 11

  11. Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B r3 11

  12. Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 11

  13. Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 63 50 48 47 0 Hotpads takes control of the Size Object address (48b) memory layout, hides pointers Fetching size words from the starting address yields the entire object from software, and encodes object information in pointers 11

  14. Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 48 47 0 63 50 Hotpads takes control of the Compressed size Compressed object address (48b) memory layout, hides pointers Fetching compressed size words from the starting compressed from software, and encodes address yields the entire compressed object object information in pointers 11

  15. Hotpads updates pointers among objects on evictions 12

  16. Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D 12

  17. Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated 12

  18. Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated  Bulk eviction amortizes the cost of finding and updating pointers across objects 12

  19. Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated  Bulk eviction amortizes the cost of finding and updating pointers across objects  Since updating pointers already happens in Hotpads, there is no extra cost to update them to compressed locations! 12

  20. Zippads: Locating objects without translations 13

  21. Zippads: Locating objects without translations  Zippads leverages Hotpads to  Manipulate and compress objects rather than cache lines  Avoid translation by pointing directly to compressed objects during evictions 13

  22. Zippads: Locating objects without translations  Zippads leverages Hotpads to  Manipulate and compress objects rather than cache lines  Avoid translation by pointing directly to compressed objects during evictions Uncompressed Compressed Compress L1 L2 L3 Main Core Pad Pad Pad Memory Decompress 13

  23. Zippads: Locating objects without translations  Zippads leverages Hotpads to  Manipulate and compress objects rather than cache lines  Avoid translation by pointing directly to compressed objects during evictions Neutral to the Uncompressed Compressed algorithm Compress L1 L2 L3 Main Core Pad Pad Pad Memory Decompress Compress both on-chip and off-chip memories 13

  24. Zippads compresses objects when they move 14

  25. Zippads compresses objects when they move  Objects are compressed during bulk object evictions 14

  26. Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Object (uncompressed) L2 pad Free space Objects start their lifetime uncompressed in private levels 14

  27. Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Compression Object New object (uncompressed) HW (compressed) L2 pad Free space When objects are evicted into a Objects start their lifetime uncompressed compressed level, they are compressed in in private levels that level and store compactly 14

  28. Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Compression Object New object (uncompressed) HW (compressed) L2 pad Free space When objects are evicted into a Objects start their lifetime uncompressed compressed level, they are compressed in in private levels that level and store compactly Piggyback the bulk eviction process to find and update all pointers at once, amortizing update costs 14

  29. Zippads compresses objects when they move  Objects are compressed during bulk object evictions 15

  30. Zippads compresses objects when they move  Objects are compressed during bulk object evictions Case 2: Dirty writeback L3 pad Objects Compression Updated object Old object HW (uncompressed) (compressed) L2 pad Objects Free space 15

  31. Zippads compresses objects when they move  Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Free space 15

  32. Zippads compresses objects when they move  Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Objects Free space Forwarding thunk Unused space Objects Updated object (compressed) 15

  33. Zippads compresses objects when they move  Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Objects Free space Forwarding thunk Unused space Periodic compaction reclaims those unused spaces Objects (Bulk eviction in on-chip pads, GC in main memory) Updated object (compressed) 15

  34. Zippads uses pointers to accelerate decompression 16

  35. Zippads uses pointers to accelerate decompression  Every object access starts with a pointer!  Pointers are updated to the compressed locations, so no translation is needed 16

  36. Zippads uses pointers to accelerate decompression  Every object access starts with a pointer!  Pointers are updated to the compressed locations, so no translation is needed  Prior work shows it’s beneficial to use different algorithms for various patterns  Zippads encodes compression metadata in pointers to decompress objects quickly 63 50 48 48-X 0 Compressed size Compressed object address (48-X bits) Compression encoding bits (X bits) 16

  37. Zippads uses pointers to accelerate decompression  Every object access starts with a pointer!  Pointers are updated to the compressed locations, so no translation is needed  Prior work shows it’s beneficial to use different algorithms for various patterns  Zippads encodes compression metadata in pointers to decompress objects quickly 63 50 48 48-X 0 Compressed size Compressed object address (48-X bits) Compression encoding bits (X bits)  Zippads thus knows how to locate and what decompression algorithm to use when accessing compressed objects with pointers 16

  38. COCO: Cross-object-compression algorithm 17

  39. COCO: Cross-object-compression algorithm  COCO exploits similarity across objects with shared base objects  A collection of representative objects 17

  40. COCO: Cross-object-compression algorithm  COCO exploits similarity across objects with shared base objects  A collection of representative objects Base object Uncompressed Compression object HW 17

  41. COCO: Cross-object-compression algorithm  COCO exploits similarity across objects with shared base objects  A collection of representative objects Base object Compressed object Pointer to the base object Uncompressed Compression Bytes that are object HW different 17

  42. COCO: Cross-object-compression algorithm 18

  43. COCO: Cross-object-compression algorithm  COCO requires accessing base objects for every compression/decompression 18

  44. COCO: Cross-object-compression algorithm  COCO requires accessing base objects for every compression/decompression  Caching base objects avoids extra latency and bandwidth to fetch them  A small (8KB) base object cache works well  Few types account for most accesses 18

  45. See paper for additional features and details  Compressing large objects with subobjects and allocate-on-access  COCO compression/decompression circuit RTL implementation details  Details on integrating Zippads and COCO  Discussion on using COCO with conventional memory hierarchies 19

  46. Evaluation 20

  47. Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM 20

  48. Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes 20

  49. Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression 20

  50. Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04] 20

  51. Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04]  Hotpads: The baseline system we build on 20

  52. Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04]  Hotpads: The baseline system we build on  Zippads: With and without COCO 20

  53. Evaluation  We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17]  A simulator combining ZSim and Maxine JVM  We compare 4 schemes  Uncomp: Conventional 3-level cache hierarchy with no compression  CMH: Compressed memory hierarchy  LLC: VSC [Alameldeen and Wood, ISCA’04]  Main memory: LCP [Pekhimenko et al., MICRO’13]  Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15]  BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04]  Hotpads: The baseline system we build on  Zippads: With and without COCO  Workloads: 8 Java apps with large memory footprint from different domains 20

  54. Zippads improves compression ratio 21

  55. Zippads improves compression ratio 21

  56. Zippads improves compression ratio 21

  57. Zippads improves compression ratio Same algo as CMH 21

  58. Zippads improves compression ratio Same algo as CMH CMH algo + COCO 21

  59. Zippads improves compression ratio Same algo as CMH CMH algo + COCO 21

  60. Zippads improves compression ratio Same algo as CMH CMH algo + COCO Only 24% better than Uncomp. 21

  61. Zippads improves compression ratio Same algo as CMH CMH algo + COCO 70% better Only 24% better than Uncomp. 21

  62. Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 21

  63. Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 1. Both Zippads and CMH work well in array-heavy apps 21

  64. Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 1. Both Zippads and CMH work 2. Zippads works much better than well in array-heavy apps CMH in object-heavy apps 21

  65. Zippads reduces memory traffic and improves performance 22

  66. Zippads reduces memory traffic and improves performance Lower is better 22

  67. Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression Lower is better 22

  68. Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression 2. Hotpads reduces traffic by 66% with object-based data movement Lower is better 22

  69. Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression 2. Hotpads reduces traffic by 66% with object-based data movement Lower is better 3. Zippads combines the benefits of both, reducing traffic by 2X ( 70% less traffic than CMH) 22

Recommend


More recommend