Compressing objects would be hard to do on cache hierarchies Ideally, we want a memory system that Moves objects, rather than cache lines Transparently updates pointers during compression Therefore, we realize our ideas on Hotpads [Tsai et al., MICRO’18] A recent object-based memory hierarchy 9
Baseline system: Hotpads overview 10
Baseline system: Hotpads overview L2 L3 L1 Core pad pad pad 10
Baseline system: Hotpads overview Data array Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad Stores variable-sized objects compactly Data Array Objects Free space 10
Baseline system: Hotpads overview Data array Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad Stores variable-sized objects compactly Data Array Obj. A Objects Free space 10
Baseline system: Hotpads overview Data array Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad Stores variable-sized objects compactly Data Array Obj. A Objects Obj. B Free space 10
Baseline system: Hotpads overview Data array Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad Stores variable-sized objects compactly Can store variable-sized compressed objects compactly too! Data Array Obj. A Objects Obj. B Free space 10
Baseline system: Hotpads overview Data array Managed as a circular buffer using simple L2 L3 L1 Core pad pad sequential allocation pad Stores variable-sized objects compactly Can store variable-sized compressed objects compactly too! Data Array (word/object) Obj. A Metadata C-Tags Objects Obj. B Decoupled tag store Free space Metadata C-Tags Pointer? valid? dirty? recently-used? 10
Hotpads moves objects instead of cache lines 11
Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 11
Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B r3 11
Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 11
Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 63 50 48 47 0 Hotpads takes control of the Size Object address (48b) memory layout, hides pointers Fetching size words from the starting address yields the entire object from software, and encodes object information in pointers 11
Hotpads moves objects instead of cache lines RegFile L1 Pad L2 Pad Main Mem Example object: r0 class ListNode { Objects 0 A int value; r1 Initial state. Free ListNode next; r2 B space } r3 Program code: r0 1 A copied into L1 pad. int v = A.value; A r1 A r2 B B B copied into L1 pad. Program code: 2 r3 v = A.next.value; 48 47 0 63 50 Hotpads takes control of the Compressed size Compressed object address (48b) memory layout, hides pointers Fetching compressed size words from the starting compressed from software, and encodes address yields the entire compressed object object information in pointers 11
Hotpads updates pointers among objects on evictions 12
Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D 12
Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated 12
Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated Bulk eviction amortizes the cost of finding and updating pointers across objects 12
Hotpads updates pointers among objects on evictions A (stale) L1 pad is now full, L1 pad is full because of A (modified) 3 B triggering a bulk fetched objects or newly- B C eviction in HW. allocate objects D B After an L1 bulk eviction: A Pointers are updated to Free 4 point to the new locations. B space D Copied objects ( A ) are New objects ( D ) are back to old location sequentially allocated Bulk eviction amortizes the cost of finding and updating pointers across objects Since updating pointers already happens in Hotpads, there is no extra cost to update them to compressed locations! 12
Zippads: Locating objects without translations 13
Zippads: Locating objects without translations Zippads leverages Hotpads to Manipulate and compress objects rather than cache lines Avoid translation by pointing directly to compressed objects during evictions 13
Zippads: Locating objects without translations Zippads leverages Hotpads to Manipulate and compress objects rather than cache lines Avoid translation by pointing directly to compressed objects during evictions Uncompressed Compressed Compress L1 L2 L3 Main Core Pad Pad Pad Memory Decompress 13
Zippads: Locating objects without translations Zippads leverages Hotpads to Manipulate and compress objects rather than cache lines Avoid translation by pointing directly to compressed objects during evictions Neutral to the Uncompressed Compressed algorithm Compress L1 L2 L3 Main Core Pad Pad Pad Memory Decompress Compress both on-chip and off-chip memories 13
Zippads compresses objects when they move 14
Zippads compresses objects when they move Objects are compressed during bulk object evictions 14
Zippads compresses objects when they move Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Object (uncompressed) L2 pad Free space Objects start their lifetime uncompressed in private levels 14
Zippads compresses objects when they move Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Compression Object New object (uncompressed) HW (compressed) L2 pad Free space When objects are evicted into a Objects start their lifetime uncompressed compressed level, they are compressed in in private levels that level and store compactly 14
Zippads compresses objects when they move Objects are compressed during bulk object evictions Case 1: Newly moved objects L3 pad Objects Compression Object New object (uncompressed) HW (compressed) L2 pad Free space When objects are evicted into a Objects start their lifetime uncompressed compressed level, they are compressed in in private levels that level and store compactly Piggyback the bulk eviction process to find and update all pointers at once, amortizing update costs 14
Zippads compresses objects when they move Objects are compressed during bulk object evictions 15
Zippads compresses objects when they move Objects are compressed during bulk object evictions Case 2: Dirty writeback L3 pad Objects Compression Updated object Old object HW (uncompressed) (compressed) L2 pad Objects Free space 15
Zippads compresses objects when they move Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Free space 15
Zippads compresses objects when they move Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Objects Free space Forwarding thunk Unused space Objects Updated object (compressed) 15
Zippads compresses objects when they move Objects are compressed during bulk object evictions Objects Case 2: Dirty writeback Updated object L3 pad (compressed) Objects Unused space Compression Updated object Old object Objects HW (uncompressed) (compressed) Free space L2 pad Objects Objects Free space Forwarding thunk Unused space Periodic compaction reclaims those unused spaces Objects (Bulk eviction in on-chip pads, GC in main memory) Updated object (compressed) 15
Zippads uses pointers to accelerate decompression 16
Zippads uses pointers to accelerate decompression Every object access starts with a pointer! Pointers are updated to the compressed locations, so no translation is needed 16
Zippads uses pointers to accelerate decompression Every object access starts with a pointer! Pointers are updated to the compressed locations, so no translation is needed Prior work shows it’s beneficial to use different algorithms for various patterns Zippads encodes compression metadata in pointers to decompress objects quickly 63 50 48 48-X 0 Compressed size Compressed object address (48-X bits) Compression encoding bits (X bits) 16
Zippads uses pointers to accelerate decompression Every object access starts with a pointer! Pointers are updated to the compressed locations, so no translation is needed Prior work shows it’s beneficial to use different algorithms for various patterns Zippads encodes compression metadata in pointers to decompress objects quickly 63 50 48 48-X 0 Compressed size Compressed object address (48-X bits) Compression encoding bits (X bits) Zippads thus knows how to locate and what decompression algorithm to use when accessing compressed objects with pointers 16
COCO: Cross-object-compression algorithm 17
COCO: Cross-object-compression algorithm COCO exploits similarity across objects with shared base objects A collection of representative objects 17
COCO: Cross-object-compression algorithm COCO exploits similarity across objects with shared base objects A collection of representative objects Base object Uncompressed Compression object HW 17
COCO: Cross-object-compression algorithm COCO exploits similarity across objects with shared base objects A collection of representative objects Base object Compressed object Pointer to the base object Uncompressed Compression Bytes that are object HW different 17
COCO: Cross-object-compression algorithm 18
COCO: Cross-object-compression algorithm COCO requires accessing base objects for every compression/decompression 18
COCO: Cross-object-compression algorithm COCO requires accessing base objects for every compression/decompression Caching base objects avoids extra latency and bandwidth to fetch them A small (8KB) base object cache works well Few types account for most accesses 18
See paper for additional features and details Compressing large objects with subobjects and allocate-on-access COCO compression/decompression circuit RTL implementation details Details on integrating Zippads and COCO Discussion on using COCO with conventional memory hierarchies 19
Evaluation 20
Evaluation We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17] A simulator combining ZSim and Maxine JVM 20
Evaluation We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17] A simulator combining ZSim and Maxine JVM We compare 4 schemes 20
Evaluation We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17] A simulator combining ZSim and Maxine JVM We compare 4 schemes Uncomp: Conventional 3-level cache hierarchy with no compression 20
Evaluation We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17] A simulator combining ZSim and Maxine JVM We compare 4 schemes Uncomp: Conventional 3-level cache hierarchy with no compression CMH: Compressed memory hierarchy LLC: VSC [Alameldeen and Wood, ISCA’04] Main memory: LCP [Pekhimenko et al., MICRO’13] Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15] BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04] 20
Evaluation We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17] A simulator combining ZSim and Maxine JVM We compare 4 schemes Uncomp: Conventional 3-level cache hierarchy with no compression CMH: Compressed memory hierarchy LLC: VSC [Alameldeen and Wood, ISCA’04] Main memory: LCP [Pekhimenko et al., MICRO’13] Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15] BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04] Hotpads: The baseline system we build on 20
Evaluation We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17] A simulator combining ZSim and Maxine JVM We compare 4 schemes Uncomp: Conventional 3-level cache hierarchy with no compression CMH: Compressed memory hierarchy LLC: VSC [Alameldeen and Wood, ISCA’04] Main memory: LCP [Pekhimenko et al., MICRO’13] Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15] BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04] Hotpads: The baseline system we build on Zippads: With and without COCO 20
Evaluation We simulate Zippads using MaxSim [Rodchenko et al., ISPASS’17] A simulator combining ZSim and Maxine JVM We compare 4 schemes Uncomp: Conventional 3-level cache hierarchy with no compression CMH: Compressed memory hierarchy LLC: VSC [Alameldeen and Wood, ISCA’04] Main memory: LCP [Pekhimenko et al., MICRO’13] Algorithm: HyComp-style hybrid algorithm [Arelakis et al., MICRO’15] BDI [Pekhimenko et al., PACT’12] + FPC [Alameldeen and Wood, ISCA’04] Hotpads: The baseline system we build on Zippads: With and without COCO Workloads: 8 Java apps with large memory footprint from different domains 20
Zippads improves compression ratio 21
Zippads improves compression ratio 21
Zippads improves compression ratio 21
Zippads improves compression ratio Same algo as CMH 21
Zippads improves compression ratio Same algo as CMH CMH algo + COCO 21
Zippads improves compression ratio Same algo as CMH CMH algo + COCO 21
Zippads improves compression ratio Same algo as CMH CMH algo + COCO Only 24% better than Uncomp. 21
Zippads improves compression ratio Same algo as CMH CMH algo + COCO 70% better Only 24% better than Uncomp. 21
Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 21
Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 1. Both Zippads and CMH work well in array-heavy apps 21
Zippads improves compression ratio Same algo as CMH CMH algo + COCO 2X 70% better better Only 24% better than Uncomp. 1. Both Zippads and CMH work 2. Zippads works much better than well in array-heavy apps CMH in object-heavy apps 21
Zippads reduces memory traffic and improves performance 22
Zippads reduces memory traffic and improves performance Lower is better 22
Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression Lower is better 22
Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression 2. Hotpads reduces traffic by 66% with object-based data movement Lower is better 22
Zippads reduces memory traffic and improves performance 1. CMH reduces traffic by 15% with data compression 2. Hotpads reduces traffic by 66% with object-based data movement Lower is better 3. Zippads combines the benefits of both, reducing traffic by 2X ( 70% less traffic than CMH) 22
Recommend
More recommend