Pattern-guided Big Data Processing on Hybrid Parallel Architectures - PowerPoint PPT Presentation

Pattern-guided Big Data Processing on Hybrid Parallel Architectures Fahad Khalid, Frank Feinbube, and Andreas Polze Operating Systems and Middleware Group

Motivation • Insights from developing simulations for, – Enumeration of Elementary Flux Modes in Metabolic Networks – Prediction of aftershocks following earthquakes – Prediction of volcanic events – Adiabatic Quantum Computing • Collaborations – Max Planck Institute of Molecular Plant Physiology – GFZ German Research Center for Geosciences September 25, 2014 Frank Feinbube | BigSys 2014 2

Motivation • Complications with Hybrid Architectures – Memory hierarchy per processor type – Designed for high FLOP/s, not Big Data • Then, assuming the hardware available is hybrid, – How can we improve both performance and productivity of a simulation that requires processing of very large data sets? September 25, 2014 Frank Feinbube | BigSys 2014 3

Definitions • Performance – Significant speedup • Productivity – Ease of development • Hybrid Architecture – One or more CPUs = Host – One or more accelerators, e.g., GPUs = Device September 25, 2014 Frank Feinbube | BigSys 2014 4

Efficient Hybrid-Resource Utilization ( EHRU ) • Design Approach – Hierarchical application of patterns for parallel programming • Expected Outcome – Improved simulation performance – Improved productivity, by serving as foundation for: • Frameworks • Automation tools September 25, 2014 Frank Feinbube | BigSys 2014 5

Parallel Pipeline Pattern 3 1 7 0 4 9 5 4 3 1 7 0 4 9 5 4 ⋯ Serial processing of stages 𝑇 1 𝑇 2 𝑇 3 𝑇 1 𝑇 2 𝑇 3 ⋯ Pipelined processing of stages 𝑇 1 𝑇 2 𝑇 3 𝑇 1 𝑇 2 𝑇 3 𝑇 1 𝑇 2 𝑇 3 September 25, 2014 Frank Feinbube | BigSys 2014 6

Parallel Pipeline Pattern • Simulation as Pipeline Read input data from file Analytical solutions to 3D Partial Differential Equations in Vectors Numerical solution to a System of Linear Equations Write output data to file September 25, 2014 Frank Feinbube | BigSys 2014 7

Data Partitioning • Motivation – Main memory and Cache sizes are limited P 1,1 Out of OK • Factors affecting partitioning Memory P 1,2 P 1 OK – Total memory required/available P 1,3 OK – Impact of partition size on pipeline performance Complete Partition 0 Partition 0 Dataset ⋯ Partition 1 Partition 1 Chunk ⋮ ⋮ September 25, 2014 Frank Feinbube | BigSys 2014 8

EHRU Pattern Hierarchy September 25, 2014 Frank Feinbube | BigSys 2014 9

Hybrid Pipeline ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ • Uses of Hybrid Pipelining – Overlapping computation and communication – Load balancing and optimal resource utilization – Kernel placement based on architecture September 25, 2014 Frank Feinbube | BigSys 2014 10

Hybrid Pipeline Framework ( HyPi ) • HyPi Stages – DeviceFilter : CUDA Device kernel – CallbackFilter : D2H Communication – PostProcessFilter : Host processing Device Callback PostProcess ⋯ ⋯ Filter Filter Filter Device Callback PostProcess ⋯ ⋯ Filter Filter Filter Device Callback PostProcess ⋯ ⋯ Filter Filter Filter September 25, 2014 Frank Feinbube | BigSys 2014 11

HyPi & EHRU – Evaluation 60 CPU-only Parallel Custom Pipeline HPF Pipeline 55 50 45 40 35 Time (seconds) 30 25 20 15 10 5 0 500 million 2 billion 2.5 billion 3.5 billion 4.5 billion 6.3 billion 8.1 billion No. of candidate vectors generated September 25, 2014 Frank Feinbube | BigSys 2014 12

Feasibility and Limitations of EHRU • Suitable for • Not suitable for – Dense Linear Algebra – Sparse Linear Algebra – Structured Grids – Unstructured Grids – Monte Carlo – Graph Traversal September 25, 2014 Frank Feinbube | BigSys 2014 13

Architecture-based Algorithm Decomposition • Decompose the algorithm into two parts: Pattern 1 1. Suitable for execution on the GPU Accelerator 2. Suitable for execution on the CPU Pattern 2 • CPUs support a diverse range of Algorithm ⋮ kernels – Everything goes, except for massive Pattern 𝑜 − 1 CPU parallelism • How do we decide which part of Pattern 𝑜 the algorithm is suitable for GPUs? September 25, 2014 Frank Feinbube | BigSys 2014 14

Characteristics of Computational Kernels • Degree of Parallelism (DoP) – The amount of parallelism exposed by the kernel • Arithmetic Intensity – Ratio of No. of arithmetic instructions to the No. of memory access instructions • Control Divergence – No. and complexity of conditional statements September 25, 2014 Frank Feinbube | BigSys 2014 15

Design Patterns and Algorithm Decomposition • Patterns suitable for GPUs – Map – Stencil • Patterns NOT suitable for GPUs – Reduce – Scan – Dynamic Programming • This categorization is based on Degree of Parallelism September 25, 2014 Frank Feinbube | BigSys 2014 16

Program Flow with Algorithm Decomposition << Map >> GPU Kernel Intermediate Result << Reduce >> CPU Kernel September 25, 2014 Frank Feinbube | BigSys 2014 17

Tool-guided Parallelization for Hybrid Architectures • Motivation – Automatically discerning patterns from serial code – Efficient mapping of parallel code with EHRU • How? – Dependence Analysis to discern patterns – Developer feedback to improve affine transformations • This is work in progress September 25, 2014 Frank Feinbube | BigSys 2014 18

Future Work • Information Theoretic approach to improve serial to parallel transformations • Partitioning for Complex Data structures • Automated tool for architecture-based algorithm decomposition Th Thank Yo You! September 25, 2014 Frank Feinbube | BigSys 2014 19

Pattern-guided Big Data Processing on Hybrid Parallel Architectures - PowerPoint PPT Presentation

Pattern-guided Big Data Processing on Hybrid Parallel Architectures Fahad Khalid, Frank Feinbube, and Andreas Polze Operating Systems and Middleware Group Motivation Insights from developing simulations for, Enumeration of Elementary

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Awk, Awk Pattern matching and processing language Looks for pattern in file If pattern

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making

Pattern Structures Pattern Structures Models describe whole or a large part of the data

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

How to understand switching w/o switches work by : , Podobnik, Gabaix, Preis, Vodenska,

(doi:10.1093/eurheartj/ehs288) Background and Study Design 1. The Great East Japan Earthquake

Antebellum North Erica Armstrong Dunbar March 19, 2011 Amy Matilda Cassey Album The Cassey House

1. Practical Application Scenario: Cell Cycle & Kinetochore 2. From Hash Life to Particle

Key Workshop Outcomes Interoperability and Sharing in Emergency Management Big interest in GIS;

EUROPE 2020: Towards a more social EU ? David Natali University of Bologna-Forl Observatoire

Science Standards Dr. Katy Brner Cyberinfrastructure for Network Science Center, Director

A Deep Learning-based approach to VM behavior identification in cloud systems Matteo Stefanini,

Pattern-guided Big Data Processing on Hybrid Parallel Architectures - PowerPoint PPT Presentation

Pattern-guided Big Data Processing on Hybrid Parallel Architectures Fahad Khalid, Frank Feinbube, and Andreas Polze Operating Systems and Middleware Group Motivation Insights from developing simulations for, Enumeration of Elementary

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Awk, Awk Pattern matching and processing language Looks for pattern in file If pattern

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity &amp; Education Update Feb 7, 2020 Guided Pathways Decision Making

Pattern Structures Pattern Structures Models describe whole or a large part of the data

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

How to understand switching w/o switches work by : , Podobnik, Gabaix, Preis, Vodenska,

(doi:10.1093/eurheartj/ehs288) Background and Study Design 1. The Great East Japan Earthquake

Antebellum North Erica Armstrong Dunbar March 19, 2011 Amy Matilda Cassey Album The Cassey House

1. Practical Application Scenario: Cell Cycle &amp; Kinetochore 2. From Hash Life to Particle

Key Workshop Outcomes Interoperability and Sharing in Emergency Management Big interest in GIS;

EUROPE 2020: Towards a more social EU ? David Natali University of Bologna-Forl Observatoire

Science Standards Dr. Katy Brner Cyberinfrastructure for Network Science Center, Director

A Deep Learning-based approach to VM behavior identification in cloud systems Matteo Stefanini,

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making

1. Practical Application Scenario: Cell Cycle & Kinetochore 2. From Hash Life to Particle