High Performance Data Intensive Computing Dongfang Zhao, Assistant - PowerPoint PPT Presentation

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno

Who am I • 2017 – , Assistant Professor, University of Nevada, Reno • 2016, Postdoctoral Fellow, University of Washington, Seattle • 2015, PhD in Computer Science, Illinois Institute of Technology, Chicago • 2015, Summer Intern, IBM Research – Almaden, San Jose, CA • 2009-2011, Software Engineer, Epic Systems, Madison, WI • 2008, MS in Computer Science, Emory University, Atlanta, GA • 2005, MS in Statistics, Katholieke Universiteit Leuven, Belgium

Outline • Past Work – 2005-2008: Machine Intelligence, Computer Vision – 2012-2015: High Performance Computing, Distributed Systems – 2015-2016: Big Data Systems, Database Systems • Current Status – Personnel – Facilities • Future Research Directions – Distributed Memory Management for Big Data Systems – Locality-aware Resource Management in Virtualized Computing – High Performance Database Systems

Past Work: 2005-2008 • Incremental Dimensionality Reduction • E.g., published at IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)

Past Work: 2012-2015 • High Performance Computing • E.g., published at IEEE Transactions on Parallel and Distributed Systems (TPDS)

Past Work: 2015-2016 • Big Data Systems • E.g., published at Very Large Database Systems (VLDB)

Current Status: Personnel • Currently at Nevada: – 1 PhD student starting Fall 2017 – 1 master student starting Fall 2017 – Collaborating closely with Prof. Dr. Feng Yan working on Data Mining and Performance Modelling. He used to publish at KDD, SIGMETRICS, Supercomputing, CLOUD, NOMS, etc. • Plan: – By Fall 2018, the lab will recruit: two more PhD students, two more master students

Current Status: Facilities • Nevada’s HPC cluster – 56 compute nodes: PowerEdge C6320 • 1792 cores • 128 (or 192?) GB RAM per node – 11 GPU Nodes: PowerEdge C4130 each with 4xP100 with NVLink • 352 cores • 44 P100 GPUs • Our lab’s 10 -node GPU cluster, each node has – 12 CPU cores – 4 GeForce GTX 1080 cards – 64 GB RAM

Future Directions • Distributed Memory Management for Big Data Systems – Motivation: Modern big data systems do not have a coordinated way to manage memory • Users are asked to specify the memory allocation • Local OS takes the responsibility – Objective • A middleware to automatically manipulate memory for big data systems • The middleware oversees the overall memory status rather than optimizing the local usage • Users should be able to plug in ad-hoc strategy for the underlying memory management

Future Directions • Locality-aware Resource Management in Virtualized Computing – Extension of my intern work in Summer 2015 – Motivation: Load balance is sometimes overemphasized – Objective: improve data locality for virtualized computation

Future Directions • High Performance Distributed Databases – Motivation: for some reason, HPC’s dominant storage solution is file system – Objective: building a high-performance distributed database system atop existing parallel/distributed file systems that will support performant: • Queries expressed in SQL • Data load, transform, extract, etc. – Challenges • Performance bottleneck: from network to what? • How to leverage GPUs, InfiniBand, MPI, etc. for database workloads? • …

Thanks! Dongfang Zhao dzhao@unr.edu

High Performance Data Intensive Computing Dongfang Zhao, Assistant - PowerPoint PPT Presentation

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno Who am I 2017 , Assistant Professor, University of Nevada, Reno 2016,

What is Advanced Research Computing? Data Supercomputing Computationally Mining Intensive

Decoupled I/O for Data-Intensive High Performance Computing Chao Chen 1 Yong Chen 1 Kun Feng 2

MapReduce Data Intensive Computing Data-intensive computing is a class of parallel

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand , Chitra

D ESIGN OF COOPERATIVE VISUALIZATION ENVIRONMENT WITH INTENSIVE DATA MANAGEMENT IN PROJECT

High Performance Networking for Wide Area Data Grids Brian L. Tierney (bltierney@lbl.gov) Data

High Performance Computing and Which Big Data? Chaitan Baru, Associate Director, Data

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Presented By:

HPC Analytics Dan Stanzione Fulton High Performance Computing dstanzi@asu.edu 2/20/05 Theme

HPC & BD Services @ Uni.lu Building up High Performance Computing & Big Data Competence

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

Big Bang, Big Data, Big Iron: High Performance Computing for Cosmic Microwave Background Data

IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph

Data-Intensive Distributed Computing 451/651 (Fall 2020) Part 1: Introduction to Big Data Ali

Data-Intensive Distributed Computing 431/631 (Fall 2020) Part 1: Introduction to Big Data Ali

Concurrent Programming Romolo Marotta Data Centers and High Performance Computing Amdahl

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2020) Part 7: Data Mining (2/4)

NSF Future of High Performance Computing Bill Kramer NSF Workshop on the Future of High

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 6: Data Mining (3/4)

High Performance Data Intensive Computing Dongfang Zhao, Assistant - PowerPoint PPT Presentation

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno Who am I 2017 , Assistant Professor, University of Nevada, Reno 2016,

What is Advanced Research Computing? Data Supercomputing Computationally Mining Intensive

Decoupled I/O for Data-Intensive High Performance Computing Chao Chen 1 Yong Chen 1 Kun Feng 2

MapReduce Data Intensive Computing Data-intensive computing is a class of parallel

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand , Chitra

D ESIGN OF COOPERATIVE VISUALIZATION ENVIRONMENT WITH INTENSIVE DATA MANAGEMENT IN PROJECT

High Performance Networking for Wide Area Data Grids Brian L. Tierney (bltierney@lbl.gov) Data

High Performance Computing and Which Big Data? Chaitan Baru, Associate Director, Data

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Mercury: RPC for High-Performance Computing Jerome Soumagne The HDF Group June 23, 2017 RPC and

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Presented By:

HPC Analytics Dan Stanzione Fulton High Performance Computing dstanzi@asu.edu 2/20/05 Theme

HPC &amp; BD Services @ Uni.lu Building up High Performance Computing &amp; Big Data Competence

High-performance computing in Java: the data processing of Gaia X. Luri &amp; J. Torra ICCUB/IEEC

Trends in High Performance Trends in High Performance Computing and the Grid Computing and the

Big Bang, Big Data, Big Iron: High Performance Computing for Cosmic Microwave Background Data

IDC Update on How Big Data Is Redefining High Performance Computing Earl Joseph

Data-Intensive Distributed Computing 451/651 (Fall 2020) Part 1: Introduction to Big Data Ali

Data-Intensive Distributed Computing 431/631 (Fall 2020) Part 1: Introduction to Big Data Ali

Concurrent Programming Romolo Marotta Data Centers and High Performance Computing Amdahl

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2020) Part 7: Data Mining (2/4)

NSF Future of High Performance Computing Bill Kramer NSF Workshop on the Future of High

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 6: Data Mining (3/4)

HPC & BD Services @ Uni.lu Building up High Performance Computing & Big Data Competence

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC