adaptive sam pling based profiling techniques for optim
play

Adaptive Sam pling-Based Profiling Techniques for Optim izing the - PowerPoint PPT Presentation

Adaptive Sam pling-Based Profiling Techniques for Optim izing the Distributed JVM Runtim e King Tin Lam, Yang Luo, Cho-Li Wang Speaker: King Tin Lam Date: Apr 20, 2010 Systems Research Group Department of Computer Science The University


  1. Adaptive Sam pling-Based Profiling Techniques for Optim izing the Distributed JVM Runtim e King Tin Lam, Yang Luo, Cho-Li Wang Speaker: King Tin Lam Date: Apr 20, 2010 Systems Research Group Department of Computer Science The University of Hong Kong IPDPS’10, Atlanta, Georgia, USA

  2. Outline 1 Background 2 Challenges and Problems 3 Adaptive Object Sampling 4 Adaptive Stack Sampling 5 Performance Evaluation 2 2

  3. Parallel Programming Paradigms For a single computer (multiprocessor,  multicore),  Shared m em ory e.g. OpenMP  Much easier  For a multicomputer (distributed-memory  system),  Message passing e.g. MPI, PVM  Hard to programmers   Shared virtual m em ory ( SVM) a.k.a. Software DSM  e.g. Treadmarks, CVM, JiaJia  Bind to a memory consistency model  Resemble ease of shared memory  Less efficient  3

  4. Parallel Programming Paradigms System Developer I m plem entation Level Granularity Consistency Model For a single computer (multiprocessor,  IVY Yale Library + OS Page (1KB) SC multicore), Munin Rice Library + OS Variable ERC TreadMarks  Rice Library Page (4KB) LRC Shared m em ory CVM Maryland Library Page LRC, SC e.g. OpenMP  Midway CMU Library + Compiler Variable EC, PC, RC Much easier  NCP2 UFRJ, Brail Library + Hardware support Page (4KB) EC, RC For a multicomputer (distributed-memory  Quarks Utah Library Region, Page RC, SC system), softFLASH Stanford OS Page (16KB) RC, DIRC Cashmere-2L Rochester Library Page (8KB) HLRC  Message passing Brazos Rice Library Page ScC e.g. MPI, PVM  Shasta DEC WRL Compiler Variable SC Hard to programmers  Mermaid Toronto Library+OS Page (1KB, 8KB) SC  Shared virtual m em ory ( SVM) Mirage UCLA OS 512Bytes SC a.k.a. Software DSM  JIAJIA CAS, China Library Page (4KB) ScC e.g. Treadmarks, CVM, JiaJia  Simple-COMA SICS (Sweden) OS Page SC and SUN Bind to a memory consistency model  Blizzard-S Wisconsin Library Cache line SC Resemble ease of shared memory  Shrimp Princeton OS+Hardware support Page AURC, SC Less efficient  Linda Yale Language Variable SC Orca Vrije Univ., Language Variable EC-like Netherlands 4

  5. Parallel Programming Paradigms For a single computer (multiprocessor, Memory consistency models   Memory consistency models  multicore), Strict Consistency  Strict Consistency  Sequential Consistency (SC)   Shared m em ory Sequential Consistency (SC)  e.g. OpenMP Release consistency (RC)   Release consistency (RC)  Much easier Eager Release Consistency (ERC)   Eager Release Consistency (ERC)  For a multicomputer (distributed-memory Lazy Release Consistency (LRC)   Lazy Release Consistency (LRC)  system), Scope Consistency (ScC)  Scope Consistency (ScC)  Entry Consistency (EC)   Entry Consistency (EC)  Message passing e.g. MPI, PVM  Hard to programmers   Shared virtual m em ory ( SVM) a.k.a. Software DSM  e.g. Treadmarks, CVM, JiaJia  Bind to a memory consistency model  Resemble ease of shared memory  Less efficient  5

  6. Parallel Programming Paradigms For a single computer (multiprocessor,  Remote memory access is the scalability killer!  Remote memory access is the scalability killer!  multicore), Remote >> local latency (assume in 50-60ns)  Remote >> local latency (assume in 50-60ns)   Shared m em ory Infiniband cluster (1-2 μ s): 20 x slower!  Infiniband cluster (1-2 μ s): 20 x slower!  e.g. OpenMP  Ethernet cluster (100 μ s): 2,000 x slower!!  Ethernet cluster (100 μ s): 2,000 x slower!!  Much easier  Grid/Internet (av. 500ms): 10,000,000 x slower!!!  For a multicomputer (distributed-memory  Grid/Internet (av. 500ms): 10,000,000 x slower!!!  system),  Message passing  "To speed up" ≈ "Reduce as m uch rem ote  "To speed up" ≈ "Reduce as m uch rem ote e.g. MPI, PVM  access as possible" Hard to programmers  access as possible"   Shared virtual m em ory ( SVM) The key is to im prove locality  The key is to im prove locality a.k.a. Software DSM  e.g. Treadmarks, CVM, JiaJia  Bind to a memory consistency model  Resemble ease of shared memory  Less efficient  6

  7. The PGAS Model User hints  Add annotation  Use special API constructs for locality hint inputs  (e.g. X10’s places )  PGAS (Partitioned Global Address Space) "Hybrid" parallel paradigm  Essentially Distributed Shared Memory (DSM)  But corporate some MPI-like constructs  Research languages:  UPC, Co-Array Fortran (CAF), Titanium HPCS Languages:  X10 (IBM), Chapel (Cray) A burden to programmers  7

  8. Our Dream Model: PGPGAS or (PG) 2 AS  Profile-Guided PGAS ( PG 2 AS) A built-in runtim e profiler instead of humans for  digging out the locality hints Profile-guided adaptive locality management  Thread migration  Object home migration  Something new in Object prefetching  this paper API-free shared virtual memory  Transparent clustering and scaling  Automatic thread distribution  Location-transparent access  System instruments cluster-wide logics  No modification to existing applications  Previous distributed JVM research (e.g. cJVM, JavaSplit, JESSICA, …) 8

  9. Techniques to improve locality  Runtime techniques Migration  Thread  T2 T1 Object (Home)  Prefetching  Spatial  Temporal  objects node 1 node 2 remote access 9

  10. Techniques to improve locality  Runtime techniques Migration  Thread  T2 T1 Object (Home)  Prefetching  Spatial  Temporal  objects node 1 node 2 remote access 10

  11. Techniques to improve locality  Runtime techniques Migration  Thread  T2 T1 Object (Home)  Prefetching  Spatial  Temporal  objects node 1 node 2 remote access 11

  12. J ava JESSICA Distributed Java VM E nabled S ingle A cluster-wide JVM with  S ystem Dynamic thread mobility in JIT mode  I mage Global Object Space (GOS)  C omputing Portable Java Frames A rchitecture Thread Migration Source Java Class Java Source Class Remote Class Loading Code Compiler Files Compiler Code Files Thread Thread Thread Thread Thread Thread Scheduler Scheduler Scheduler Scheduler Scheduler Scheduler Class Class Class Class Class Class Load Load Load Loader Loader Loader Loader Loader Loader Monitor Monitor Monitor Thread 3 Thread 3 Thread 3 Daemon Daemon Daemon Thread 2 Thread 2 Thread 2 Thread 1 Java Thread 1 Java Thread 1 Java Java Java Java Method Area Method Area Method Area Method Area Method Area Method Area Registers PC Registers PC Registers PC Stack Stack Stack Execution Execution Execution Execution Execution Execution Frames Frames Frames Engine Engine Engine Engine Engine Engine Local Heap Local Heap Local Heap Local Heap Local Heap Local Heap Host Manager Host Manager Host Manager Master JVM Worker JVM Worker JVM Host Manager Host Manager Host Manager OS OS OS Hardware Hardware Hardware 12 Communication Network

  13. J ava JESSICA Distributed Java VM E nabled S ingle A cluster-wide JVM with  S ystem Dynamic thread mobility in JIT mode  I mage Global Object Space (GOS)  C omputing Portable Java Frames A rchitecture Thread Migration Source Java Class Java Source Class Remote Class Loading Code Compiler Files Compiler Code Files Thread Thread Thread Thread Thread Thread Scheduler Scheduler Scheduler Scheduler Scheduler Scheduler Class Class Class Class Class Class Load Load Load Loader Loader Loader Loader Loader Loader Monitor Monitor Monitor Thread 3 Thread 3 Thread 3 Daemon Daemon Daemon Thread 2 Thread 2 Thread 2 Thread 1 Java Thread 1 Java Thread 1 Java Java Java Java Method Area Method Area Method Area Method Area Method Area Method Area Registers PC Registers PC Registers PC Stack Stack Stack Execution Execution Execution Execution Execution Execution Frames Frames Frames Engine Engine Engine Engine Engine Engine Heap object Heap object Local Heap Local Heap Local Heap Local Heap Local Heap object Local Heap (Global Object Space) object (Global Object Space) Host Manager Host Manager Host Manager Master JVM Worker JVM Worker JVM Host Manager Host Manager Host Manager OS OS OS Hardware Hardware Hardware 13 Communication Network

Recommend


More recommend