Common Subexpression Convergence (CSC) Sana Damani and Vivek Sarkar - PowerPoint PPT Presentation

Jan 19, 2023 •254 likes •494 views

Common Subexpression Convergence (CSC) Sana Damani and Vivek Sarkar Habanero Extreme Scale Software Research Lab Georgia Institute of Technology Short paper at LCPC 19, Atlanta, GA Agenda Motivation Common Subexpression Convergence

Common Subexpression Convergence (CSC) Sana Damani and Vivek Sarkar Habanero Extreme Scale Software Research Lab Georgia Institute of Technology Short paper at LCPC ’19, Atlanta, GA
Agenda • Motivation • Common Subexpression Convergence Transformations • Approach • Preliminary Results and Discussion 2
3
Divergence in SIMT processors • SIMT (Single Instruction Multiple Threads) • All threads in a warp execute the same instruction in parallel • Divergence • A conditional branch dependent on thread-local values • Threads in the warp execute different paths • Serialized execution of a warp threadIdx.x 7 (1) 6 (2) 5 4 3 (3) 2 (4) 1 0 (5) 4 Image credits: https://devblogs.nvidia.com/inside-volta
Problem: Serialization of common code • Divergent Code • Warp Execution 5
6
Hoist • Move to convergent common ancestor 7
Sink • Move to convergent common successor 8
Split • Move to new convergent join point • Duplicate conditional branch 9 • Alternative solution: hoist defs/sink uses
Operand Renaming • Insert copy instructions then sink/split 10
Branches • Flatten branch, then sink/split 11
Recursive CSC entry c = ... b = ... tid%2 F T tid%3 a=b*c F T a=b*c a=b*c Bottom-Up Traversal Through CDG 12
Common Loops • Loop distribution • Index set splitting 13
14
Problem Statement Given a GPU program, identify and move divergent common code to a convergent region using Hoist/Sink/Split such that dependences are preserved, and the benefit of code motion is maximized. 15
Algorithm 16
Identifying common code: Dynamic Programming 17
Profitability Heuristics • Benefit: • Function Call > Memory Instructions > Math Instructions > Copy Instructions • Loop nest depth • Cost: • Copy Instructions for Operand Renaming • Increase in register live range and/or stalls with hoist/sink • Increase in branches, smaller blocks, more barriers with Split 18
19
Experimental Setup CUDA NVPTX/LLVM Nvidia Volta V100 20
Preliminary Results: Microbenchmarks SIMT efficiency Speedup 120% 12 100% 10 80% 8 60% 6 40% 4 20% 2 0% 0 Hoist Sink Split Function Nested Switch Hoist Sink Split Switch SIMT efficiency Before SIMT efficiency After Speedup Note: nvprof shows major gains due to reduction in global reads of up to 27% with CSC (common address reads/coalesced accesses) 21
Preliminary Results: Bitonic Sort Run Time for Bitonic Sort 500 450 400 350 300 250 200 150 100 50 0 min run time max run time avg run time before after SIMT Efficiency for Bitonic Sort 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 22 0.00% min eff max eff avg eff before after
Discussion and Future Work • Legality • CSE and PRE • Interprocedural analysis • Opportunity in automatically parallelized programs • Profile information for divergence, cost, bottlenecks 23

Recommend

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and SE Programs Curr icula are 85-90% the same. Similarity of CSC and SE Programs Curr icula are 85-90% the same. Its appropr iate

664 views • 39 slides

Reuse Optimization Last time Common subexpression elimination (CSE) Today Partial

Reuse Optimization Last time Common subexpression elimination (CSE) Today Partial redundancy elimination (PRE) CS553 Lecture Reuse Optimization: PRE 2 Partial Redundancy Elimination (PRE) Partial Redundancy An expression ( e.g.,

454 views • 9 slides

CSC Effectiveness Review CSC Effectiveness Review Team October 2018 ICANN63 Need for Review of

CSC Effectiveness Review CSC Effectiveness Review Team October 2018 ICANN63 Need for Review of Effectiveness Review Effectiveness of the CSC shall be reviewed two years after the first meeting of the CSC; and then every three years

607 views • 9 slides

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of convergence - convergence = experiments all working towards using same MC generator (common basis for comparison) - convergence =

239 views • 11 slides

Lecture 1 Dr. Tom Way CSC 4700 1 Introduction Dr. Tom Way CSC 4700 2 Software engineering

What is Software Engineering? CSC 4700 Software Engineering Lecture 1 Dr. Tom Way CSC 4700 1 Introduction Dr. Tom Way CSC 4700 2 Software engineering Facts Fact : The economies of ALL developed nations are dependent on software.

175 views • 6 slides

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences: Multi Multi-Disciplinary Convergence in Life Sciences: Is the Overarching CBRN Security Culture Is the Overarching CBRN Security Culture Is the

685 views • 40 slides

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Working together for a world free of chemical weapons OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology Convergence in Chemistry and Biology Convergence in Chemistry and Biology Convergence in Chemistry

156 views • 14 slides

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types of Convergence Almost sure convergence Convergence in probability Convergence in mean and mean-square Convergence in distribution How do they

718 views • 44 slides

II of large Number Lattin in probability almost convergence convergence sure - - "

Chapters : Proofs of the laws II of large Number Lattin in probability almost convergence convergence sure - - " weak " strong " " cow corn Two tests for stray convergence - O almost sure conveyance . ) - X

339 views • 11 slides

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator Office Office of the Director, NSF (on assignment from SDSC, UC San Diego) 1 NS NSF Bi Big Ideas 2 Con Convergence Research The grand challenges

477 views • 13 slides

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz, 2003 BGP-RCN: Improving Convergence Through Root Cause Notification Pei, Azuma, Massey, Zhang, 2005 Spring 2013 BGP Path Exploration dest. ( ) Z

541 views • 31 slides

19 FEM Convergence Requirements IFEM Ch 19 Slide 1 Introduction to FEM Convergence

Introduction to FEM 19 FEM Convergence Requirements IFEM Ch 19 Slide 1 Introduction to FEM Convergence Requirements for Finite Element Discretization Convergence: discrete (FEM) solution approaches the analytical (math model) solution

479 views • 17 slides

Common Core State Standards (CCSS) By: Amy Ezhaya & Kelsey Ritzel Common Core Background

Making Common Sense of the Common Core State Standards (CCSS) By: Amy Ezhaya & Kelsey Ritzel Common Core Background and Statistics Why the switch to Common Core The way Common Core is Structured School on Wheels and Common

427 views • 38 slides

Introduction to Data Science: Common observation to be religion, income, frequency where sex and

Tidying data Common problems in messy data Tidy data and the ER model Common problems in messy data Common problems in messy data Common problems in messy data Common problems in messy data Common problems in messy data Common problems in

680 views • 21 slides

Nuffield CSC 2014 taking advantage of market opportunities Nuf Nuffield CS field CSC

Nuffield CSC 2014 taking advantage of market opportunities Nuf Nuffield CS field CSC C Taking advantage of marketing opportunities March 2014 Ov Overv rvie iew - Who am I? - Fruit, hops, dairy (fresh milk), beef + - Most is

625 views • 25 slides

ALICE Erik Edelmann erik.edelmann@csc.fi NeIC / CSC NeIC/NDGF All hands meeting February 2013

ALICE Erik Edelmann erik.edelmann@csc.fi NeIC / CSC NeIC/NDGF All hands meeting February 2013 p. 1/ Pledges: CPU 180 Fin 160 Swe Den 140 Nor NDGF 120 % of pledge 100 80 60 40 20 0 May12 Jul12 Sep12 Nov12 Jan13

361 views • 22 slides

GPGPU 03 NVIDIA case study GeForce 7800 (2006) GeForce 7800 Impossible to maximize

GPGPU 03 NVIDIA case study GeForce 7800 (2006) GeForce 7800 Impossible to maximize throughput with such a rigid architecture: you cant keep vertex and fragment shading units busy all the time As a result, many bottlenecks in the

1.4k views • 108 slides

WITH CUDA C/C++ Pedro Mario Cruz e Silva, Solutions Architect Manager ELEVEN YEARS OF GPU

ACCELERATING APPLICATIONS WITH CUDA C/C++ Pedro Mario Cruz e Silva, Solutions Architect Manager ELEVEN YEARS OF GPU COMPUTING GPU-Trained AI Machine Beats World Champion in Go Worlds First Atomic Model of HIV Capsid Oak Ridge Deploys

562 views • 23 slides

Solving Domain Wall Dirac Equation Using Multisplitting Preconditioned Conjugate Gradient Jiqun

Solving Domain Wall Dirac Equation Using Multisplitting Preconditioned Conjugate Gradient Jiqun Tu 1 1 Department of Physics, Columbia University The 36th International Symposium on Lattice Field Theory, July 23, 2018 @ 16:10 Talk based on: Duo

725 views • 31 slides

VASP 5.4.4 October 2017 Silica IFPEN on V100s PCIe 0.00700 0.00628 0.00600 (Untuned on Volta)

VASP 5.4.4 October 2017 Silica IFPEN on V100s PCIe 0.00700 0.00628 0.00600 (Untuned on Volta) 3.0X 0.00537 Running VASP version 5.4.4 0.00500 The blue node contains Dual Intel Xeon 2.6X E5-2690 v4@2.6GHz [3.5GHz Turbo] 0.00418

402 views • 13 slides

Discourse Structure & Wrap-up: Q-A Ling571 Deep Processing Techniques for NLP March 9, 2016

Discourse Structure & Wrap-up: Q-A Ling571 Deep Processing Techniques for NLP March 9, 2016 TextTiling Segmentation Depth score: Difference between position and adjacent peaks E.g., (y a1 -y a2 )+(y a3 -y a2 ) Evaluation

783 views • 40 slides

The Present and Absent Lord 2019 TRINITY LECTURE 2 30 JULY 2019 MARKUS BOCKMUEHL, UNIVERSITY

St Paul on the Absence and Presence of Jesus The Present and Absent Lord 2019 TRINITY LECTURE 2 30 JULY 2019 MARKUS BOCKMUEHL, UNIVERSITY OF OXFORD Introduction Lecture 1: despite a number of seemingly straightforward Pauline

460 views • 14 slides

r r tr

r r tr r r sr t ts void put(x) Val

484 views • 35 slides

Two-Player Perfect Information Games: A Brief Survey Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Two-Player Perfect Information Games: A Brief Survey Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract Domain: two-player games. Which game characters are predominant when the solution of a game is the

805 views • 49 slides