a true hardware read barrier
play

A True Hardware Read Barrier Matthias Meyer Institute of - PowerPoint PPT Presentation

INSTITUT FR INSTITUT FR NACHRICHTENVERMITTLUNG KOMMUNIKATIONSNETZE Universitt Stuttgart Universitt Stuttgart UND DATENVERARBEITUNG UND RECHNERSYSTEME Prof. Dr.-Ing. Dr. h. c. mult. P. J. Khn Prof. Dr.-Ing. Dr. h. c. mult. P. J.


  1. INSTITUT FÜR INSTITUT FÜR NACHRICHTENVERMITTLUNG KOMMUNIKATIONSNETZE Universität Stuttgart Universität Stuttgart UND DATENVERARBEITUNG UND RECHNERSYSTEME Prof. Dr.-Ing. Dr. h. c. mult. P. J. Kühn Prof. Dr.-Ing. Dr. h. c. mult. P. J. Kühn A True Hardware Read Barrier Matthias Meyer Institute of Communication Networks and Computer Engineering University of Stuttgart, Germany matthias.meyer@ikr.uni-stuttgart.de International Symposium on Memory Management June 10–11, 2006 Ottawa, Canada

  2. Outline A True Hardware Read Barrier ❐ Real-time garbage collection: The synchronization problem ❐ A hardware-supported approach ✗ Novel processor architecture ✗ Garbage collection coprocessor ✗ Prototype ❐ The read barrier ✗ Effect on mutator progress ✗ A closer look at the read barrier fault handler ✗ Novel hardware read barrier design ❐ Conclusions and further work Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  3. Real-Time Garbage Collection The synchronization problem Root set Application Garbage collector (“Mutator”) (GC) Heap memory ❐ Mutator and GC modify graph of objects ➠ read or write barriers ➠ mechanisms for mutual exclusion ❐ Mutator and GC access same object ➠ or atomic processing of objects ❐ Critical regions (root set processing) ➠ unbounded pauses ➠ ➠ high synchronization overhead no hard real-time capabilities Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  4. A Novel Processor Architecture (1) Basic idea ❐ Hide garbage collection at the assembly language level ❐ Efficiently realize garbage collection and synchronization in hardware Precondition ❐ Knowledge of pointers and objects in hardware Novel approach ❐ Strictly separate pointers from non-pointer data ✗ in the register file ✗ in the instruction set ✗ in memory Object Structure Attributes Pointer Area Data Area π δ 0 1 2 π –1 0 1 2 δ –1 Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  5. A Novel Processor Architecture (2) Extensions to a classical RISC pipeline ❐ Separate data and Instruct. Register Data pointer registers ALU Cache Set Cache ❐ Extend pointer registers by attributes ❐ Add PGU for operations AGU π π that generate pointers δ δ (allocate, copy pointer) Attribute ❐ Add attribute stage PGU Cache for efficient attribute access Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  6. A Novel Processor Architecture (3) Support for concurrent compaction Fromspace Tospace forwarding pointer scan δ π backlink ❐ Extend pointer register Instruct. Register Data set by backlink entry Cache Set Cache ALU ❐ Extend attribute cache by backlink entry ❐ AGU dynamically uses AGU π π δ δ tospace pointer or backlink for address Attribute PGU generation Cache Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  7. A Novel Processor Architecture (4) The read barrier ❐ Two comparators check loaded pointers (hardware read barrier) ❐ Read barrier will trigger interrupt if loaded pointer refers to fromspace ❐ Interrupt handled by a dedicated garbage collection coprocessor Instruct. Register Data Read- Cache Set Cache Barrier GC ALU Coprocessor AGU π π δ δ Attribute PGU Cache Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  8. Garbage Collection Coprocessor Features ✗ performs garbage collection concurrently with application processing ✗ low cost device, specialized for garbage collection Integration ✗ tightly coupled to main processor Main GC ✗ realized on same device Processor Coprocessor ✗ separate ports to memory controller Caches Memory interface ✗ no temporal locality: no cache! ✗ spatial locality: burst registers! Memory Controller Algorithm ✗ based on Baker’s algorithm ✗ directly implemented in microcode Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  9. Prototype Serial Standard SDRAM modules Ethernet PS/2 DVI Parallel Main Processor with on-chip GC Coprocessor Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  10. Prototype Hardware ❐ Main processor: 3-way multiple issue, “in order” ❐ GC coprocessor: 256 x 80 bit microcode memory ❐ Synchronously operated at 25 MHz Software ❐ Static Java compiler (bytecode to machine code) ❐ Subset of the Java class libraries Features ❐ Low-cost fine-grained synchronization ✗ independent of compiler and runtime system ✗ no code size overhead, little runtime overhead ❐ First known system that limits any GC-related pause to max. 500 clock cycles Question How are the pauses distributed over time? Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  11. Read barrier effect on mutator progress Experimental results Percentage of pause cycles (in intervals of 500 clock cycles, benchmark “database”) 100 80 60 40 20 0 0s 5s 10s 15s 100 Minimum mutator utilization 80 1 ms intervals 7.2% 60 40 5 ms intervals 8.3% 5ms 20 25 ms intervals 11.4% 0 3.04s 3.08s 3.12s 3.16s 3.20s Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  12. A closer look at the read barrier fault handler Trigger: Processor reads fromspace pointer Fromspace Tospace π δ free Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  13. A closer look at the read barrier fault handler Step 1: Coprocessor reads faulting pointer Fromspace Tospace π δ free Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  14. A closer look at the read barrier fault handler Step 2: Coprocessor reads object attributes Fromspace Tospace π δ free Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  15. A closer look at the read barrier fault handler Step 3: Coprocessor advances free Fromspace Tospace π δ free + 8 + π + δ = free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  16. A closer look at the read barrier fault handler Step 4: Coprocessor overwrites fromspace attributes Fromspace Tospace forwarding pointer π free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  17. A closer look at the read barrier fault handler Step 5: Coprocessor initializes tospace attributes Fromspace Tospace π δ backlink free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  18. A closer look at the read barrier fault handler Step 6: Coprocessor updates fromspace pointer Fromspace Tospace π δ free new Main GC Processor Coprocessor Caches Memory Controller Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  19. A novel hardware read barrier design Analysis ❐ Read barrier fault handling expensive despite hardware support ❐ Necessary to sacrifice the tospace invariant to avoid clustering? Insights 1. Read barrier in hardware ... but read barrier fault handling still in software 2. Processors expensively communicate via main memory ... because faulting pointer local to main processor, not to garbage collector Novel idea Live with the clustering, save the tospace invariant 1. Increase efficiency of the handler ➠ Realize fault handling completely in hardware! 2. Resolve the locality issue ➠ Move fault handling to main processor! Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  20. A novel hardware read barrier design Trigger: Processor reads fromspace pointer Fromspace Tospace π δ free Read- ALU Barrier Instruct. Register Data AGU Cache Set Cache Attribute Cache PGU Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  21. A novel hardware read barrier design Step 1: Advance free, write fromspace attributes, update fromspace pointer Fromspace Tospace π free + 8 + π + δ = free new Read- ALU Barrier Instruct. Register Data AGU Cache Set Cache Attribute Cache PGU Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  22. A novel hardware read barrier design Step 2: Initialize tospace attributes Fromspace Tospace π δ free new Read- ALU Barrier Instruct. Register Data AGU Cache Set Cache Attribute Cache PGU Fetch Decode Execute Memory Attribute Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

  23. A novel hardware read barrier design Experimental results Percentage of stall cycles within intervals of 500 clock cycles (benchmark “database”) 100 80 60 40 20 0 0s 5s 10s 15s 100 Minimum mutator utilization 80 1 ms intervals 56.8% (7.2%) 60 40 5 ms intervals 58.1% (8.3%) 20 25 ms intervals 62.1% (11.4%) 0 3.04s 3.08s 3.12s 3.16s 3.20s Institut für Kommunikationsnetze und Rechnersysteme Universität Stuttgart

More recommend