 
              Exploiting HPC Technologies to Accelerate Big Data Processing (Hadoop, Spark, and Memcached) Talk at Intel HPC Developer Conference (SC ‘16 ) by Dhabaleswar K. (DK) Panda Xiaoyi Lu The Ohio State University The Ohio State University E-mail: panda@cse.ohio-state.edu E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda http://www.cse.ohio-state.edu/~luxi
Introduction to Big Data Applications and Analytics • Big Data has become the one of the most important elements of business analytics • Provides groundbreaking opportunities for enterprise information management and decision making • The amount of data is exploding; companies are capturing and digitizing more information than ever • The rate of information growth appears to be exceeding Moore’s Law Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 2
Data Management and Processing on Modern Clusters • Substantial impact on designing and utilizing data management and processing systems in multiple tiers – Front-end data accessing and serving (Online) • Memcached + DB (e.g. MySQL), HBase – Back-end data analytics (Offline) • HDFS, MapReduce, Spark Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 3
Trends for Commodity Computing Clusters in the Top 500 List (http://www.top500.org) 500 100 Percentage of Clusters 85% Percentage of Clusters 450 90 Number of Clusters Number of Clusters 400 80 350 70 300 60 250 50 200 40 150 30 100 20 50 10 0 0 Timeline Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 4
Drivers of Modern HPC Cluster Architectures Accelerators / Coprocessors High Performance Interconnects - high compute density, high InfiniBand SSD, NVMe-SSD, NVRAM performance/watt Multi-core Processors <1usec latency, 100Gbps Bandwidth> >1 TFlop DP on a chip • Multi-core/many-core technologies • Remote Direct Memory Access (RDMA)-enabled networking (InfiniBand and RoCE) • Solid State Drives (SSDs), Non-Volatile Random-Access Memory (NVRAM), NVMe-SSD • Accelerators (NVIDIA GPGPUs and Intel Xeon Phi) Tianhe – 2 Stampede Titan Tianhe – 1A Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 5
Trends in HPC Technologies • Advanced Interconnects and RDMA protocols – InfiniBand – 10-40 Gigabit Ethernet/iWARP – RDMA over Converged Enhanced Ethernet (RoCE) • Delivering excellent performance (Latency, Bandwidth and CPU Utilization) • Has influenced re-designs of enhanced HPC middleware – Message Passing Interface (MPI) and PGAS – Parallel File Systems (Lustre, GPFS, ..) • SSDs (SATA and NVMe) • NVRAM and Burst Buffer Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 6
Interconnects and Protocols in OpenFabrics Stack for HPC (http://openfabrics.org) Application / Middleware Application / Middleware Interface Sockets Verbs Protocol Kernel Space RSockets RDMA RDMA TCP/IP TCP/IP SDP TCP/IP Ethernet Hardware User User User User RDMA IPoIB Driver Offload Space Space Space Space Adapter InfiniBand InfiniBand InfiniBand Ethernet Ethernet InfiniBand iWARP RoCE Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter Switch InfiniBand InfiniBand InfiniBand InfiniBand Ethernet Ethernet Ethernet Ethernet Switch Switch Switch Switch Switch Switch Switch Switch 1/10/40/100 10/40 GigE- SDP iWARP RSockets RoCE IB Native IPoIB GigE TOE Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 7
Large-scale InfiniBand Installations • 205 IB Clusters (41%) in the Jun’16 Top500 list – (http://www.top500.org) • Installations in the Top 50 (21 systems): 220,800 cores (Pangea) in France (11 th ) 74,520 cores (Tsubame 2.5) at Japan/GSIC (31 st ) 462,462 cores (Stampede) at TACC (12 th ) 88,992 cores (Mistral) at DKRZ Germany (33 rd ) 185,344 cores (Pleiades) at NASA/Ames (15 th ) 194,616 cores (Cascade) at PNNL (34 th ) 72,800 cores Cray CS-Storm in US (19 th ) 76,032 cores (Makman-2) at Saudi Aramco (39 th ) 72,800 cores Cray CS-Storm in US (20 th ) 72,000 cores (Prolix) at Meteo France, France (40 th ) 124,200 cores (Topaz) SGI ICE at ERDC DSRC in US (21 st ) 42,688 cores (Lomonosov-2) at Russia/MSU (41 st ) 72,000 cores (HPC2) in Italy (22 nd ) 60,240 cores SGI ICE X at JAEA Japan (43 rd ) 152,692 cores (Thunder) at AFRL/USA (25 th ) 70,272 cores (Tera-1000-1) at CEA France (44 th ) 147,456 cores (SuperMUC) in Germany (27 th ) 54,432 cores (Marconi) at CINECA Italy (46 th ) 86,016 cores (SuperMUC Phase 2) in Germany (28 th ) and many more! Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 8
Open Standard InfiniBand Networking Technology • Introduced in Oct 2000 • High Performance Data Transfer – Interprocessor communication and I/O – Low latency (<1.0 microsec), High bandwidth (up to 12.5 GigaBytes/sec -> 100Gbps), and low CPU utilization (5-10%) • Multiple Operations – Send/Recv – RDMA Read/Write – Atomic Operations (very unique) • high performance and scalable implementations of distributed locks, semaphores, collective communication operations • Leading to big changes in designing – HPC clusters – File systems – Cloud computing systems – Grid computing systems Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 9
How Can HPC Clusters with High-Performance Interconnect and Storage Architectures Benefit Big Data Applications? Can HPC Clusters with Can RDMA-enabled How much Can the bottlenecks be high-performance performance benefits high-performance alleviated with new storage systems (e.g. designs by taking can be achieved interconnects SSD, parallel file advantage of HPC through enhanced benefit Big Data technologies? systems) benefit Big designs? processing? Data applications? How to design What are the major benchmarks for bottlenecks in current Big evaluating the Data processing performance of Big Data middleware on middleware (e.g. Hadoop, HPC clusters? Spark, and Memcached)? Bring HPC and Big Data processing into a “convergent trajectory”! Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 10
Designing Communication and I/O Libraries for Big Data Systems: Challenges Applications Benchmarks Upper level Big Data Middleware (HDFS, MapReduce, HBase, Spark and Memcached) Changes? Programming Models Other Protocols? (Sockets) Communication and I/O Library Point-to-Point Threaded Models Virtualization Communication and Synchronization I/O and File Systems QoS Fault-Tolerance Commodity Computing System Networking Technologies Architectures Storage Technologies (InfiniBand, 1/10/40/100 GigE (Multi- and Many-core (HDD, SSD, and NVMe-SSD) and Intelligent NICs) architectures and accelerators) Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 11
Can Big Data Processing Systems be Designed with High- Performance Networks and Protocols? Our Approach Current Design Application Application OSU Design Sockets Verbs Interface 1/10/40/100 GigE 10/40/100 GigE or Network InfiniBand • Sockets not designed for high-performance – Stream semantics often mismatch for upper layers – Zero-copy not available for non-blocking sockets Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 12
The High-Performance Big Data (HiBD) Project • RDMA for Apache Spark • RDMA for Apache Hadoop 2.x (RDMA-Hadoop-2.x) – Plugins for Apache, Hortonworks (HDP) and Cloudera (CDH) Hadoop distributions • RDMA for Apache HBase • RDMA for Memcached (RDMA-Memcached) • RDMA for Apache Hadoop 1.x (RDMA-Hadoop) Available for InfiniBand and RoCE • OSU HiBD-Benchmarks (OHB) – HDFS, Memcached, and HBase Micro-benchmarks • http://hibd.cse.ohio-state.edu • Users Base: 195 organizations from 27 countries • More than 18,500 downloads from the project site • RDMA for Impala (upcoming) Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 13
RDMA for Apache Hadoop 2.x Distribution • High-Performance Design of Hadoop over RDMA-enabled Interconnects – High performance RDMA-enhanced design with native InfiniBand and RoCE support at the verbs-level for HDFS, MapReduce, and RPC components – Enhanced HDFS with in-memory and heterogeneous storage – High performance design of MapReduce over Lustre – Memcached-based burst buffer for MapReduce over Lustre-integrated HDFS (HHH-L-BB mode) – Plugin-based architecture supporting RDMA-based designs for Apache Hadoop, CDH and HDP – Easily configurable for different running modes (HHH, HHH-M, HHH-L, HHH-L-BB, and MapReduce over Lustre) and different protocols (native InfiniBand, RoCE, and IPoIB) • Current release: 1.1.0 – Based on Apache Hadoop 2.7.3 – Compliant with Apache Hadoop 2.7.1, HDP 2.5.0.3 and CDH 5.8.2 APIs and applications – Tested with • Mellanox InfiniBand adapters (DDR, QDR, FDR, and EDR) • RoCE support with Mellanox adapters http://hibd.cse.ohio-state.edu • Various multi-core platforms • Different file systems with disks and SSDs and Lustre Network Based Computing Laboratory Intel HPC Dev Conf (SC ‘16) 14
Recommend
More recommend