: Welcome the Second Spring of Dataflow and Parallel Computing -- Toward a Path of Convergence for Ecosystems of Extreme-Scale HPC, Big Data and Beyond Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor, University of Delaware And Founder of ETI A&M 05-16-2016 1
Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges:HPC vs. Big Data – Divergence or Convergence ? • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks A&M 05-16-2016 2
Looking Back 20+ Years The Pessimism over our field .. • HPC is a small and relatively unimportant field ? • Is Parallel Computing dead – Ken Kennedy? • Computer architecture is a dead field ? • Full Artificial Intelligence is a “fantasy” ? • Dataflow model of computation suffered great setback …. A&M 05-16-2016 3
Looking Back 20+ Years .. • “Parallel Computing is dead” • “Death of computer architecture” • “Death of dataflow model of computation” • “Death of Artificial Intelligence!” • …. AIST-03-01-2016 演讲 4
S tate of Parallel Computer Architecture Innovations – “…researchers basked in parallel-computing glory. They developed an amazing variety of parallel algorithms for every applicable sequential operation. They proposed every possible structure to interconnect thousands of processors …” – But “.. The market for massively parallel computers has collapsed , and many companies have gone out of business . [IEEE Computer, Nov. 1994, pp 74-75] IPDPS2005-Keynote 5
S tate of Parallel Computer Architecture Innovations • “ ..The term 'proprietary architecture' has become pejorative. For computer designers, the revolution is over and only 'fine tuning' remains… “ [“End of Architecture”, Burton Smith 1990s ] IPDPS2005-Keynote 6
Corporations Vanishing (1985 – 2005) Myrias BBN 1991 1997 Convex Kendall Square nCube ESCD MasPar Cray Research Computer Resarch 2005 Multiflow Keynote at the 2005 IPDPS Conference 1990 1996 1996 1994 1996 1990 Denver, CO 1985 1990 1992 1994 1996 1998 2000 2005 1989 1992 1994 1995 1998 1999 ETA Meiko Scientific Thinking Machines Pyramid DEC Sequent A&M 05-16-2016 5/25/2016 7
“Is Parallel Computing Dead ?” - Ken Kennedy, 1994 “The announcement that Thinking Machines would seek Chapter 11 bankruptcy protection, although not unexpected, sent shock waves through the high- performance computing community. Coupled with the well-publicized problems of Kendall Square Research and the rumored problems of Intel Supercomputer Systems Division , this event has led many people to question the long- term viability of the parallel computing industry and even parallel computing itself . Meanwhile, the dramatic strides in the performance of scientific workstations continues to squeeze the market for parallel supercomputing. On several recent occasions, I have been asked whether parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it. Washington certainly seems to be looking in the other direction--agency program managers, if they talk of high-performance computing at all, seem to view it as a small and relatively unimportant subcomponent of the National Information Infrastructure. AIST-03-01-2016 演讲 8
Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges: HPC vs. Big Data – Divergence or Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks A&M 05-16-2016 9
山穷水尽疑无路 柳暗花明又一村 宋代诗人陆游的作品 《游山西村》 A&M 05-16-2016 10
2005-Present A Second Spring of HPC Parallel Computing • Sequential processing hits serious walls – Heat wall – Memory wall – Other walls • Parallel processing (appear) to provide a powewrful alternative to beat the walls • Moors Law (appear) to still enjoy good years in the past decade • Two examples (see next 2 slides) A&M 05-16-2016 11
IPDPS2005-Keynote 12 2016/5/25
Cyclops-64 Programming Models and System Software Supports Application Programming API Advanced Execution/ Co- array …… Programming Model UPC+ / - EARTH-C + / - OpenMP-XN MPI Fortran Location Consistency Percolation Kcc/ gcc Cyclops Thread Virtual Machine Com piler Infrastructure Base and Tools Thread Shared Mem ory Fine- Grain Others Execution Managem ent Operations Multithreading Tool Model System Software Thread Creation & Dynamic memory async function Termination management invocation chain Simulation / Fine-Grain Thread acquire / release Emulation fibers Multithreading Synchronization (e.g. EARTH, Put / get Load Balancing CARE) Analytical Scheduling Modeling Put / get with sync Cyclops-64 ISA 24x24 Off-Chip Off-Chip Off-Chip Memory Memory Memory 1 PetaFlops SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP … … TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU TU 4 GB/sec 4 GB/sec FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU 1 Gbit/s 1 Gbit/s Off-Chip Off-Chip Off-Chip Memory Memory Memory ethernet ethernet 4 GB/sec 4 GB/sec Crossbar Network Crossbar Network Crossbar Network 24 PC cards 4 GB/sec 4 GB/sec A-switch A-switch A-switch Other Other * 6 * 6 DMA DMA DMA A-Switch A-Switch 6 6 Chips via 3D Chips via 3D in 1 shishkebab 6 6 mesh mesh Off-Chip Off-Chip Off-Chip Memory Memory Memory 50 MB/sec 50 MB/sec MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY MEMORY BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK BANK Off-Chip Off-Chip Off-Chip Memory Memory Memory … … IDE IDE IPDPS2005-Keynote 13 2016/5/25 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP HDD HDD Communication Ports for 3D Mesh Inter-Chip Network
Outline • Introduction • The Second Spring of HPC Parallel Computing • HPC vs. Big Data – Divergence or Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks A&M 05-16-2016 14
What is HPC High-Performance Computing: The term "high-performance computing" refers to systems that, through a combination of processing capability and storage capacity, can solve computational problems that are beyond the capability of small- to medium-scale systems . [Obama’s Executive Order] Gao-03-07-2016 MEXT 演讲 15
What is Big Data ? Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. • Challenges include analysis, capture, data curation, search, sharing, storage, transfer,visualization, querying and information privacy. • The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Gao-03-07-2016 MEXT 演讲 16
Computational Science FORTRAN,C,C++ : languages PAPI : performance and debugging tool MPI/OpenMP : multi-core parallel model SLURM : batch scheduler Lustre : parallel file system Data Analytic Mahout : machine learning tool Hive : data warehouse software Pig : provide high level language for big data Sqoop : exchange data with traditional database Flume : log management Zookeeper : maintaining consistency Storm : real-time computation system. Hbase : a distributed, scalable big data store. AVRO : data serialization system. NOTE: The Divergence of Big Data and HPC Eco-Systems! Data analytics and computing ecosystem compared Waseda-01-26-2016 演讲 17 Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015
Key Insights The tools and cultures of high-performance computing and big data analytics have diverged , to the detriment of both; unification is essential to address a spectrum of major research domains The challenges of scale tax our ability to transmit data, compute complicated functions on that data, or store a substantial part of it ; new approaches are required to meet these challenges The international nature of science demands further development of advanced computer architectures and global standards for processing data, even as international competition complicates the openness of the scientific process Waseda-01-26-2016 演讲 18 Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015
Outline • Introduction • Second Spring of HPC Parallel Computing • New Challenges: HPC vs. Big Data – Divergence or Convergence • The Codelet Model and SWARM • Challenges/Opportunities: HPC + Big Data • Summary Remarks A&M 05-16-2016 19
A Quiz: Have you heard the following terms ? Actors (dataflow) ? strand ? fiber ? codelet ? A&M 05-16-2016 20
Recommend
More recommend