Optim imiz izatio ion Coachin ing for Fork/Join in Applic licatio ions on the Java Vir irtual l Machin ine Eduardo Eduar do Ros osal ales es Advisor: Prof. Walter Binder Research area: Parallel applications, performance analysis EuroDW 2018 April 23, 2018 PhD stage: Planner Porto, Portugal
Opti timizati ation Coachi on Coaching ng for for For Fork/J /Joi oin n Applicati cations ons on the J on the Jav ava V a Virtual tual M Machi achine ne § The The pro probl blem: despite the complexities associated with developing and tuning fork/join applications, there is little work focused on assisting developers in optimizing such applications on the JVM . § Re Relevance: fork/join parallelism has an increasing popularity among developers targeting the JVM. It has been integrated to support parallel processing on the Java library , thread management in JVM languages and a variety of parallel applications based on Actors, MapReduce, etc. § Ou Our pro propo posal: coaching developers towards optimizing fork/join applications by diagnosing performance issues on such applications and further suggest concrete code refactoring to solve them. § Ex Expe pected out outcom come: in contrast to the manual experimentation often required to tune fork/join applications on the JVM, we devise a tool able to automatically assist developers in optimizing a fork/join application.
Fork/join Application § Wh What is is a fo fork/j /join appl applicat cation? on? solve(Problem problem) { if (problem is small) if directly solve problem sequentially else { else recursively split problem into independent parts: fork new new tasks to solve each part fork rk fo fork fo rk join all forked tasks join in j j o o jo i i n n } } join in join in j j j j fork rk o o jo fork rk o o jo i i i i n n fo n n fo fork fo fork fo rk rk
The Java Fork/Join Framework § The The Jav ava for fork/j /joi oin fr fram amewor ework [1] is the implementation enabling fork/join applications on the JVM § It implements the work-stealing [2] scheduling strategy: Worker thread 1 Push Push e e k k Task a a T T P P o o p p Deque 1 Submission Worker St Steal eal task thread 2 Tak Take Push Push P P o o p p CP CPU Deque 2 COR ORE COR ORE [1] D. Lea. A Java Fork/Join Framework . JAVA 2000. [2] Burton et al. Executing Functional Programs on a Virtual Tree of Processors . FPCA 1981.
The Java Fork/Join Framework § The The Jav ava for fork/j /joi oin fr fram amewor ework [1] is the implementation enabling fork/join applications on the JVM § It implements the work-stealing [2] scheduling strategy: Worker thread 1 Push Push e e k k Task a a T T P P o o p p Deque 1 Submission Deque 2 Worker task thread 2 Take Tak Push Push P P o o p p CP CPU COR ORE COR ORE [1] D. Lea. A Java Fork/Join Framework . JAVA 2000. [2] Burton et al. Executing Functional Programs on a Virtual Tree of Processors . FPCA 1981.
The Java Fork/Join Framework § Supports parallel processing in the Java library: • java.util.Array • java.util.streams (package) • java.util.concurrent.CompletableFuture<T> § Supports thread management for other JVM languages: • Scala • Apache Groovy • Clojure § Supports diverse fork/join parallelism, including applications based on Actors and MapReduce
The Java Fork/Join Framework § Many of the design forces encountered when implementing fork/join designs surround task granularity at four levels [3] : M M a a x x i i m m i i z z i i n n g g M M i i n n i i m m i i z z i i n n g g l l o o c c a a l l i i t t y y c c o o n n t t e e n n t t i i o o n n M M a a x x i i m m i i z z i i n n g g M M i i n n i i m m i i z z i i n n p p g g a a r r a a l l l l e e l l i i s s m m o o v v e e r r h h e e a a d d s s Task granul Task anular arity [3] D. Lea. Concurrent Programming in Java. Second Edition: Design Principles and Patterns . Addison-Wesley Professional, 2nd edition, 1999.
Example of a common performance issues 1/4 Too Too fine ne-gr grain ined d tasks Sub ubop opti timal al for forking ng § Ex Excessiv ive forkin ing § Push Pus Take Tak Pop Pop Push Pus Take Tak Pop Pop Push Pus Take Tak Pop Pop Take Push Pus Tak Pop Pop ✗ Parallelization overheads due to excessive: CP CPU COR ORE COR ORE • Deque accesses • Object creation/reclaiming COR ORE COR ORE
Example of a common performance issues 2/4 Few coars Few coarse-gr grain ined d tasks Sub ubop opti timal al for forking ng § Spa Sparse forkin ing § Push Pus Pop Pop Take Tak Push Pus Take Tak Pop Pop Push Pus Take Tak Pop Pop Take Push Pus Tak Steal St eal Pop Pop ✗ CPU CP Missed parallelization opportunities: • Low CPU utilization COR ORE COR ORE • Load imbalance COR ORE COR ORE ✗ idle id le
The problem De Despite the complexities associated wi with developing and tuning fork/j fo /join a applicati tions, , there is little wo work focused on assisting developers towa wards optimizing such applications on the JVM. The scope: CPU CORE CORE CORE CORE CPU CPU Memory CORE CORE CORE CORE CORE CORE CORE CORE CPU CORE CORE CORE CORE Fork/j For /joi oin ap n applicati cations ons A single shared-memory running in a single multicore JVM
Our Approach In contrast to manual experimentation used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Op Optimization Ap Approach te techniques Coachi Coaching ng
Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Op Optimization Ap Approach techniques te Coaching Coachi ng Static and dynamic analysis to autom automati atical cally d diag agnos nose e per erfor formance i ance issues ues
Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Optimization Op Ap Approach techniques te Coaching Coachi ng § Stati tatic anal c analysis: : to automatically inspect the source code to detect fork/join anti patterns. § Dy Dynam namic anal c analysis: : to automatically diagnose performance issues noticeable at runtime (e.g., suboptimal forking, excessive garbage collection, low CPU usage, contention).
Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Ou Our Pr Profiling g Optimization Op Approach Ap techniques te Coaching Coachi ng Opti timizati ation coachi on coaching ng [4]: [4]: processing the output generated by the compiler’s optimizer to suggest concrete code modifications that may enable the compiler to achieve missed optimizations. [4] St-Amour et al. Optimization Coaching: Optimizers Learn to Communicate with Programmers . OOPSLA 2012.
Our Approach In contrast to manual experimentation often used to tune a fork/join application, we propose an approach based on: Our Ou Pr Profiling g Optimization Op Approach Ap techniques te Coaching Coachi ng Inspired by Optimization Coaching the goal is aut automat omatical cally sugges uggesting ng concr concret ete e code code modi modificat cations ons to o sol olve e th the d dete tecte ted i issues
Future Work Method ethodol olog ogy for for the autom the automati atic d c diag agnos nosing ng of p of per erfor formance i ance issues ues: § Define a model to characterize fork/join tasks § Characterize all tasks spawned by a fork/join application § Determine the metrics and entities worth to consider to § automatically diagnose performance issues Method ethodol olog ogy for for the autom the automati atic s c sug ugges esti tion of op on of opti timizati ations ons: § Automatic recognition of fork/join anti patterns and matching to § concrete suggestions to avoid them Val alidati ation of the r on of the res esul ults ts: § Discover fork/join workloads, suitable for validating both § aforementioned methodologies
BAC BACKU KUP P SL SLIDES. ES.
Related Work § An Analy lysis is of paralle llel l applic licatio ions on the JVM § A number of parallelism profilers focus on the JVM [9][10] [9][10] Yo YourKi Kit Java Java Java Java JProf JP ofiler er Profiler Pr Inte In tel l vTune vTune Mission Control Mi The The goal oal Characterizing processes or threads over time. o None of the existing tools targets fork/join applications. Limitat Li ations ons [9] Adhianto et al. HPCTOOLKIT: Tools for Performance Analysis of Optimized Parallel Programs . Concurr. Comput.: Pract. Exper., 22(6): pp. 685–701, 2010. [10] Teng et al. THOR: a Performance Analysis Tool for Java Applications Running on Multicore Systems . IBM Journal of Research and Development, 54(5):4:1–4:17, 2010. 18
Recommend
More recommend