pattern guided big data processing
play

Pattern-guided Big Data Processing on Hybrid Parallel Architectures - PowerPoint PPT Presentation

Pattern-guided Big Data Processing on Hybrid Parallel Architectures Fahad Khalid, Frank Feinbube, and Andreas Polze Operating Systems and Middleware Group Motivation Insights from developing simulations for, Enumeration of Elementary


  1. Pattern-guided Big Data Processing on Hybrid Parallel Architectures Fahad Khalid, Frank Feinbube, and Andreas Polze Operating Systems and Middleware Group

  2. Motivation • Insights from developing simulations for, – Enumeration of Elementary Flux Modes in Metabolic Networks – Prediction of aftershocks following earthquakes – Prediction of volcanic events – Adiabatic Quantum Computing • Collaborations – Max Planck Institute of Molecular Plant Physiology – GFZ German Research Center for Geosciences September 25, 2014 Frank Feinbube | BigSys 2014 2

  3. Motivation • Complications with Hybrid Architectures – Memory hierarchy per processor type – Designed for high FLOP/s, not Big Data • Then, assuming the hardware available is hybrid, – How can we improve both performance and productivity of a simulation that requires processing of very large data sets? September 25, 2014 Frank Feinbube | BigSys 2014 3

  4. Definitions • Performance – Significant speedup • Productivity – Ease of development • Hybrid Architecture – One or more CPUs = Host – One or more accelerators, e.g., GPUs = Device September 25, 2014 Frank Feinbube | BigSys 2014 4

  5. Efficient Hybrid-Resource Utilization ( EHRU ) • Design Approach – Hierarchical application of patterns for parallel programming • Expected Outcome – Improved simulation performance – Improved productivity, by serving as foundation for: • Frameworks • Automation tools September 25, 2014 Frank Feinbube | BigSys 2014 5

  6. Parallel Pipeline Pattern 3 1 7 0 4 9 5 4 3 1 7 0 4 9 5 4 ⋯ Serial processing of stages 𝑇 1 𝑇 2 𝑇 3 𝑇 1 𝑇 2 𝑇 3 ⋯ Pipelined processing of stages 𝑇 1 𝑇 2 𝑇 3 𝑇 1 𝑇 2 𝑇 3 𝑇 1 𝑇 2 𝑇 3 September 25, 2014 Frank Feinbube | BigSys 2014 6

  7. Parallel Pipeline Pattern • Simulation as Pipeline Read input data from file Analytical solutions to 3D Partial Differential Equations in Vectors Numerical solution to a System of Linear Equations Write output data to file September 25, 2014 Frank Feinbube | BigSys 2014 7

  8. Data Partitioning • Motivation – Main memory and Cache sizes are limited P 1,1 Out of OK • Factors affecting partitioning Memory P 1,2 P 1 OK – Total memory required/available P 1,3 OK – Impact of partition size on pipeline performance Complete Partition 0 Partition 0 Dataset ⋯ Partition 1 Partition 1 Chunk ⋮ ⋮ September 25, 2014 Frank Feinbube | BigSys 2014 8

  9. EHRU Pattern Hierarchy September 25, 2014 Frank Feinbube | BigSys 2014 9

  10. Hybrid Pipeline ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ • Uses of Hybrid Pipelining – Overlapping computation and communication – Load balancing and optimal resource utilization – Kernel placement based on architecture September 25, 2014 Frank Feinbube | BigSys 2014 10

  11. Hybrid Pipeline Framework ( HyPi ) • HyPi Stages – DeviceFilter : CUDA Device kernel – CallbackFilter : D2H Communication – PostProcessFilter : Host processing Device Callback PostProcess ⋯ ⋯ Filter Filter Filter Device Callback PostProcess ⋯ ⋯ Filter Filter Filter Device Callback PostProcess ⋯ ⋯ Filter Filter Filter September 25, 2014 Frank Feinbube | BigSys 2014 11

  12. HyPi & EHRU – Evaluation 60 CPU-only Parallel Custom Pipeline HPF Pipeline 55 50 45 40 35 Time (seconds) 30 25 20 15 10 5 0 500 million 2 billion 2.5 billion 3.5 billion 4.5 billion 6.3 billion 8.1 billion No. of candidate vectors generated September 25, 2014 Frank Feinbube | BigSys 2014 12

  13. Feasibility and Limitations of EHRU • Suitable for • Not suitable for – Dense Linear Algebra – Sparse Linear Algebra – Structured Grids – Unstructured Grids – Monte Carlo – Graph Traversal September 25, 2014 Frank Feinbube | BigSys 2014 13

  14. Architecture-based Algorithm Decomposition • Decompose the algorithm into two parts: Pattern 1 1. Suitable for execution on the GPU Accelerator 2. Suitable for execution on the CPU Pattern 2 • CPUs support a diverse range of Algorithm ⋮ kernels – Everything goes, except for massive Pattern 𝑜 − 1 CPU parallelism • How do we decide which part of Pattern 𝑜 the algorithm is suitable for GPUs? September 25, 2014 Frank Feinbube | BigSys 2014 14

  15. Characteristics of Computational Kernels • Degree of Parallelism (DoP) – The amount of parallelism exposed by the kernel • Arithmetic Intensity – Ratio of No. of arithmetic instructions to the No. of memory access instructions • Control Divergence – No. and complexity of conditional statements September 25, 2014 Frank Feinbube | BigSys 2014 15

  16. Design Patterns and Algorithm Decomposition • Patterns suitable for GPUs – Map – Stencil • Patterns NOT suitable for GPUs – Reduce – Scan – Dynamic Programming • This categorization is based on Degree of Parallelism September 25, 2014 Frank Feinbube | BigSys 2014 16

  17. Program Flow with Algorithm Decomposition << Map >> GPU Kernel Intermediate Result << Reduce >> CPU Kernel September 25, 2014 Frank Feinbube | BigSys 2014 17

  18. Tool-guided Parallelization for Hybrid Architectures • Motivation – Automatically discerning patterns from serial code – Efficient mapping of parallel code with EHRU • How? – Dependence Analysis to discern patterns – Developer feedback to improve affine transformations • This is work in progress September 25, 2014 Frank Feinbube | BigSys 2014 18

  19. Future Work • Information Theoretic approach to improve serial to parallel transformations • Partitioning for Complex Data structures • Automated tool for architecture-based algorithm decomposition Th Thank Yo You! September 25, 2014 Frank Feinbube | BigSys 2014 19

Recommend


More recommend