1 Thanh-Chung Dao Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo
2 Thanh-Chung Dao Supercomputers • Expensive clusters • Multi-core processors • Large capacity of main memory • High-speed network • Focus mainly on compute-intensive applications • Data-intensive workloads are emerging as supercomputing problems • Graph processing • Pre-processing of simulation data
3 Thanh-Chung Dao MapReduce • Simple parallel paradigm to process large datasets • Hidden parallelization & communication • PageRank example Splitting Reducing Result Mapping Shuffling Input Function Reducer <PageB, 0.5> Function Mapper <PageA, 0.5> PageA à PageB, PageC <PageA, 0.5> <PageC, 0.5> Input <PageA, x 1 >, … <PageA, x n > Input PageA à PageB, PageC Shuffling Begin Rank contribution Begin Done rank = 0 PageA 0.5 PageA à PageB, PageC N = outbound links automatically <PageB, 0.5> <PageB, 1> PageB 1 For each item x i PageB à PageC <PageC, 1> PageB à PageC <PageB, 0.5> (Users can For each outbound link PageC 1.5 PageC à PageA, PageB rank += x i ignore) output <Page, 1/N> output <PageA, rank> End End <PageA, 0.5> <PageC, 0.5> PageC à PageA, PageB <PageC, 1.5> <PageB, 0.5> <PageC, 1>
4 Thanh-Chung Dao Hadoop MapReduce • Standard of MapReduce implementation • Provide easy-to-use MapReduce APIs • TCP/IP-based communication • Designed to run on commodity clusters • Lab clusters, or Amazon EC2 • Scalability (32,000 nodes at Yahoo) & Resilience • Written in Java
5 Thanh-Chung Dao Improving Hadoop MapReduce Performance on Supercomputers • Hadoop MapReduce is good choice on supercomputers • Maturity • Productivity Supercomputer Hadoop Resource allocation at runtime Static Dynamic (# of processes, memory, CPU) Communication MPI TCP/IP Workload Compute-intensive Data-intensive
6 Thanh-Chung Dao Our Approach • JVM Reuse • Statically create JVM processes and dynamically allocate to Hadoop tasks • Enable efficient MPI communication by Hadoop tasks • Statically created processes can exploit efficient MPI • Dynamic allocation enables to use the original Hadoop implementation • Shorten start-up time of processes • Technique • Process pool is used to implement JVM Reuse • Minimize changes of the original Hadoop engine
7 Thanh-Chung Dao Why MPI is required for Hadoop • The de facto high-speed communication on supercomputers Throughput (Mbps) On FX10 MPI 30000 Throughput (Mbps) supercomputer TCP • Improve slow MapReduce shuffling 10 times faster 10000 0 2 0 2 4 2 8 2 12 2 16 2 20 2 26 Message size (Bytes) • Enable Hadoop to co-host traditional MPI applications • Combine MPI and MapReduce models • Rich data analysis workflow • Efficient data sharing between MPI and MapReduce models • E.g. MPI can access data located at Hadoop file system (HDFS)
8 Thanh-Chung Dao Slow MapReduce shuffling on Hadoop • TCP/IP-based communication • JVM-Bypass (Wang et al., IPDPS 2013) ReduceTasks MapTasks Map output 1 HTTP Servlet Sort & Merge Server Map output n Multiple requests at once Reducing Local disk Slave nodes Reducing Phase Mapping Phase Shuffling Phase
9 Thanh-Chung Dao Dynamic Process Creation on MPI • Discouraged on supercomputers • Reasons of performance • Collective mechanism (MPISpawn) • Gang scheduling (error-prone if not enough resource) • Gerbil (Xu el al., CCGrid 2015) • Co-hosting MPI applications on Hadoop • Creating dynamically processes • Its experiments showed significant overhead • Resources should be specified before running MPI applications • Number of processes is known (static) • Memory and CPU cores
10 Thanh-Chung Dao Dynamic Process Creation on Hadoop • Required • Resources are allocated on demand to run MapReduce applications • Number of processes is unknown (dynamic)
11 Thanh-Chung Dao Dynamic Process Creation on Hadoop • Required • Resources are allocated on demand to run MapReduce applications • Number of processes is unknown (dynamic) Slave 1 Master Slave node 2 … Slave n A Node
12 Thanh-Chung Dao Dynamic Process Creation on Hadoop • Required • Resources are allocated on demand to run MapReduce applications • Number of processes is unknown (dynamic) Slave 1 6 tasks 6 tasks Master Slave User node 2 Job … Submission 8 tasks Slave n A Node Request sending
13 Thanh-Chung Dao Dynamic Process Creation on Hadoop • Required • Resources are allocated on demand to run MapReduce applications • Number of processes is unknown (dynamic) Slave 6 1 processes Process creation 6 tasks 6 tasks 6 Master Slave User processes node 2 Process creation Job … Submission 8 tasks Slave 8 n processes Process creation Each task is run Processes on a process A Node Request sending
14 Thanh-Chung Dao Dynamic Process Creation on Hadoop • Required • Resources are allocated on demand to run MapReduce applications • Number of processes is unknown (dynamic) Slave Task 1 running Process creation 6 tasks 6 tasks Task Master Slave User running node 2 Process creation Job … Submission 8 tasks Slave Task n running Process creation Each task is run Processes on a process A Node Request sending
15 Thanh-Chung Dao Dynamic Process Creation on Hadoop • Required • Resources are allocated on demand to run MapReduce applications • Number of processes is unknown (dynamic) Slave Terminated 1 Process creation 6 tasks 6 tasks Master Slave Terminated User node 2 Process creation Job … Submission 8 tasks Slave Terminated n Process creation Each task is run Processes on a process A Node Request sending
16 Thanh-Chung Dao Dynamic Process Creation on Hadoop • Required • Resources are allocated on demand to run MapReduce applications • Number of processes is unknown (dynamic) Slave 1 Master Slave User node 2 Job … Completion Slave n A Node Request sending
17 Thanh-Chung Dao Idea of Reusing • JVM Pool added • Idle JVM processes • Number of processes is statically fixed Slave idle idle idle 1 JVM Pool Master Slave idle idle idle node 2 JVM Pool … Slave idle idle idle n JVM Pool Processes A Node
18 Thanh-Chung Dao Idea of Reusing • JVM Pool added • Idle JVM processes • Number of processes is statically fixed Slave idle idle idle 1 6 tasks JVM Pool 6 tasks Master Slave User Job idle idle idle node 2 JVM Pool Submission … 8 tasks Slave idle idle idle n JVM Pool Processes A Node Request sending
19 Thanh-Chung Dao Idea of Reusing • JVM Pool added • Idle JVM processes • Number of processes is statically fixed Slave Busy idle Busy 1 6 tasks JVM Pool Allocation 6 tasks Master Slave User Job Busy Busy idle node 2 JVM Pool Submission … Allocation 8 tasks Slave Busy Busy Busy n JVM Pool Processes A Node Request sending
20 Thanh-Chung Dao Idea of Reusing • JVM Pool added • Idle JVM processes • Number of processes is statically fixed Slave idle Running Running 1 6 tasks JVM Pool 6 tasks Master Slave User Job idle Running Running node 2 JVM Pool Submission … 8 tasks Slave Running Running Running n JVM Pool Processes A Node Request sending
21 Thanh-Chung Dao Idea of Reusing • JVM Pool added • Idle JVM processes • Number of processes is statically fixed Slave idle Cleanup Cleanup 1 6 tasks JVM Pool 6 tasks Master Slave User Job idle Cleanup Cleanup node 2 JVM Pool Submission … 8 tasks Slave Cleanup Cleanup Cleanup n JVM Pool Processes A Node Request sending
22 Thanh-Chung Dao Idea of Reusing • JVM Pool added • Idle JVM processes • Number of processes is statically fixed Slave idle idle idle 1 6 tasks JVM Pool 6 tasks Master Slave User Job idle idle idle node 2 JVM Pool Submission … 8 tasks Slave idle idle idle n JVM Pool Processes A Node Request sending
23 Thanh-Chung Dao Idea of Reusing • JVM Pool added • Idle JVM processes • Number of processes is statically fixed Slave idle idle idle 1 JVM Pool Master Slave User Job idle idle idle node 2 JVM Pool Completion … Slave idle idle idle n JVM Pool Processes A Node Request sending
24 Thanh-Chung Dao JVM Reuse enables MPI communication • MPI communication is established at the beginning • JVM Reuse keeps processes running • MPI connection is always available
25 Thanh-Chung Dao JVM Reuse shortens start-up time JVM start-up flow of Program A Cmd: java A Class loader subsystem Invoke Class loading OS level main() method process creation of A Class linking Execution engine (verification & initializing) JIT compiler After A finishes, Execute A instructions Program B wants to reuse JVM of A Invoke Execute B Process creation & class loader are skipped main() method instructions of B
26 Thanh-Chung Dao Iterative jobs benefit from JVM Reuse • Iterative jobs • Many short running JVM processes • PageRank is an example Iterative job flow Yes, then quit Stop Cond? No Pre-processing Initial data Job A Job A Maps use results job of the previous Iteration Map Reduce Map Reduce Map Reduce
Recommend
More recommend