Online Phase-Adaptive Data Layout Selection Chengliang Zhang Martin Hirzel Microsoft IBM (former IBM intern) ECOOP, 10 July 2008 1
Problem Statement No training run. Online Phase-Adaptive Data Layout Selection Cache line or page A measure Object 1 Object 2 decide Cache line or page B try Object 3 Object 4 2
Data Layouts from Copying Garbage Collection BF HI 3
Layout Performance Comparison HI faster BF faster 8-processor AMD 4
Multi-Armed Bandit Problem HI Depth-first BF Allocation order Size Popularity Type Random Thread 5
Layout Auditing Performance Program measure Reward Layout Profiler Data decide try Perfor- Profiling mance Decision Controller Data Layout reorganizer Decision 6
Profiler Data reorg. Program Data reorg. Program Data reorg. Program measure l 3 l 4 l 5 Data layout Reward Profiler Physical time r 3 e 3 r 4 e 4 r 5 e 5 (wall clock) Virtual time v 3 v 4 v 5 (allocated bytes) (always on) Decision Profiling Reward for layout l i uses historical average of: • Virtual time v i / program execution time e i • Virtual time v i-1 / reorganizer time r i 7
Controller: Blind Justice Goals • Match performance Rewards of best layout • Online Profiling Challenges Decision • Confidence vs. Curiosity Layout • Phase changes Decision vs. Noise 8
Confidence vs. Curiosity Pick layout l if either: • High confidence that l gives best reward • High curiosity about l ’s reward Confidence Curiosity Never tried 0 ∞ layout Few samples / High variance Many samples / Low variance ⇒ use simulated annealing 9
Phase Changes vs. Noise Phase Adaptivity Noise Tolerance – When layout performance – Perturbation from changes, learn new best extraneous causes layout ⇒ Remember ⇒ Forget historical rewards historical rewards ⇒ use exponential decay 10
SASO Properties of Control Systems • S tability • A ccuracy • Phase adaptivity • S ettling • Overshoot • O verhead 11
Methodology 20 Java programs (DaCapo suite, SPECjvm98 suite, and a few more) J9 = IBM’s product Java VM HI BF LA hierarchical breadth-first layout auditing 4 Hardware Platforms Intel-2 AMD-2 AMD-4 AMD-8 12
Accuracy and Overhead Average % 2 BF slowdown 1 HI vs. best 0 LA Intel -2 AMD-2 AMD-4 AMD-8 Number of 10 BF programs 5 HI not optimal 0 LA Intel -2 AMD-2 AMD-4 AMD-8 Worst % 20 BF slowdown 10 HI vs. best 0 LA Intel -2 AMD-2 AMD-4 AMD-8 13
Stability and Settling Decay = 0.9 No decay (Decay=1.0) BF BF better better HI HI HI HI Reward better better better better 75s Layout HI Decision 20s BF 14
Related Work • Lau/Arnold/Hind/Calder PLDI’06: performance auditing for JIT optimization • Soman/Krintz/Bacon ISMM’04: switch copy vs. mark-sweep, generations or not • Chen/Bhansali/Chilimbi/Gao/Chuang PLDI’06: throttle unless miss rate reduced • Saavedra/Park PACT’96: adapt prefetch distance based on cancellation & latency 15
Conclusions • Accurate • Phase adaptive (good settling/stability) • Negligible overhead profiling • Online, hardware independent 16
Clustering Layouts by Performance [SIGMETRICS 2007] HI BF 17
Recommend
More recommend