Lecture 6.2 Loop Optimizations EN 600.320/420 Instructor: Randal - PowerPoint PPT Presentation

Aug 27, 2022 •458 likes •615 views

Lecture 6.2 Loop Optimizations EN 600.320/420 Instructor: Randal Burns 14 February 2018 Department of Computer Science, Johns Hopkins University How to Make Loops Faster Make bigger to eliminate startup costs Loop unrolling Loop

Lecture 6.2 Loop Optimizations EN 600.320/420 Instructor: Randal Burns 14 February 2018 Department of Computer Science, Johns Hopkins University
How to Make Loops Faster  Make bigger to eliminate startup costs Loop unrolling – Loop fusion –  Get more parallelism Coalesce inner and outer loops –  Improve memory access patterns Access by row rather than column – Tile loops –  Use reductions Lecture 8: Concepts in Parallelism
Loop Optimization (Fusion)  Merge loops to create larger tasks (amortize startup) Lecture 8: Concepts in Parallelism
Loop Optimization (Fusion)  Merge loops to create larger tasks (amortize startup) Lecture 8: Concepts in Parallelism
Loop Optimization (Coalesce)  Coalesce loops to get more UEs and thus more II-ism Lecture 8: Concepts in Parallelism
Loop Optimization (Coalesce)  Coalesce loops to get more UEs and thus more II-ism Lecture 8: Concepts in Parallelism
Loop Optimization (Unrolling)  Loops that do little work have high startup costs for ( int i=0; i<N; i++ ) { a[i] = b[i]+1; c[i] = a[i]+a[i-1]+b[i-1]; } Lecture 8: Concepts in Parallelism
Loop Optimization (Unrolling)  Unroll loops (by hand) to reduce – Some compiler support for this for ( int i=0; i<N; i+=2 ) { a[i] = b[i]+1; c[i] = a[i]+a[i-1]+b[i-1]; a[i+1] = b[i+1]+1; c[i+1] = a[i+1]+a[i]+b[i]; } Lecture 8: Concepts in Parallelism
Memory Access Patterns  Reason about how loops iterate over memory Prefer sequential over random access (7x speedup here) –  Row v. column is the classic case http://www.akira.ruc.dk/~keld/teaching/IPDC_f10/Slides/pdf4x/4_Performance.4x.pdf Lecture 8: Concepts in Parallelism
Memory Access Patterns  Reason about how loops iterate over memory Prefer sequential over random access (7x speedup here) –  Row v. column is the classic case cache line http://www.akira.ruc.dk/~keld/teaching/IPDC_f10/Slides/pdf4x/4_Performance.4x.pdf Lecture 8: Concepts in Parallelism
Loop Tiling  Tiling localizes memory twice In cache lines for read (sequential) – Into cache regions for writes (TLB hits) – Lecture 8: Concepts in Parallelism
Loop Tiling  Tiling localizes memory twice In cache lines for write (sequential) – Into cache regions for writes (TLB hits) – Lecture 8: Concepts in Parallelism
OpenMP Reductions  Variable sharing when computing aggregates leads to poor performance #pragma omp parallel for shared(max_val) for( i=0;i<10; i++) { #pragma omp critical { if(arr[i] > max_val){ max_val = arr[i]; } } } Lecture 8: Concepts in Parallelism
OpenMP Reductions  Reductions are private variables (not shared) Allocated by OpenMP –  Updated by function (max) on exit for each chunk Safe to write from different threads –  Eliminates interference in parallel loop #pragma omp parallel for reduction(max : max_val) for( i=0;i<10; i++) { if(arr[i] > max_val){ max_val = arr[i]; } } Lecture 8: Concepts in Parallelism

Recommend

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations time occurs in loops First, we will identify loops We will study three optimizations Loop-invariant code motion This lecture is

250 views • 5 slides

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

10/19/2010 Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop (Continuous Quality Improv (Continuous Quality Improvement (Continuous Quality Improv

889 views • 45 slides

Repetition Types of Loops Counting loop Know how many times to loop

Repetition Types of Loops Counting loop Know how many times to loop Sentinel-controlled loop Expect specific input value to end loop Endfile-controlled loop End of data file is end of loop Input validation loop

257 views • 8 slides

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three Strategies 1. Mean Reversion 2. Momentum 3. Pairs Trading Budish, E., Cramton, P ., & Shim, J. (2015). The high-frequency trading arms race:

803 views • 22 slides

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop interchange and skewing, Loop Strip-mining cs6363 1 Introduction Our previous loop transformations target vector and superscalar architectures Now

538 views • 32 slides

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis Discovers properties of a program Optimizations Using Program Analysis Use analysis results to transform program Use analysis results

744 views • 17 slides

Concepts Introduced in Chapter 9 introduction to compiler optimizations basic blocks and

Concepts Introduced in Chapter 9 introduction to compiler optimizations basic blocks and control flow graphs local optimizations global optimizations 1 EECS 665 Compiler Construction Compiler Optimizations Compiler

454 views • 24 slides

Advances in Loop Analysis Frameworks and Optimizations Adam Nemet & Michael Zolotukhin

Advances in Loop Analysis Frameworks and Optimizations Adam Nemet & Michael Zolotukhin Apple Loop Unrolling for (x = 0; x < 6; x++) { foo(x); } Loop Unrolling for (x = 0; x < 6; x += 2) { for (x = 0; x < 6; x++) { foo(x);

1.62k views • 115 slides

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed loop invariant is valid only if the loop body actually maintains the property, i.e., the loop invariant remains true at the end of each execution of

386 views • 20 slides

Upper and Lower Loop Bound Estimation by Symbolic Execution and Loop Acceleration Pavel Cadek

Upper and Lower Loop Bound Estimation by Symbolic Execution and Loop Acceleration Pavel Cadek 1 1 TU Wien, Austria Formal Methods in Computer-Aided Design 30 Oct - 2 Nov, 2018 Loop Bound Analysis Upper loop bound: max { n , 0 } Lower loop

248 views • 5 slides

Enhancing Fine- Grained Parallelism Loop vectorization, Loop distribution, Scalar expansion

Enhancing Fine- Grained Parallelism Loop vectorization, Loop distribution, Scalar expansion Scalar and array renaming 1 Fine-Grained Parallelism Theorem 2.8. A sequential loop can be converted to a parallel loop if the loop carries no

606 views • 41 slides

c } false loop body P (postcondition) Loop Invariant Defn : A boolean condition that

while (c) { loop body true c } false loop body P (postcondition) Loop Invariant Defn : A boolean condition that is checked immediately before every evaluation of the loop guard . while (c) I //@loop_invariant I; true c { loop

362 views • 8 slides

2 3 Motivations 4 Motivations 5 Motivations 6 Motivations 7 8 System Implementation and

2 3 Motivations 4 Motivations 5 Motivations 6 Motivations 7 8 System Implementation and Optimizations 9 System Implementation and Optimizations 10 System Implementation and Optimizations 11 System Implementation and Optimizations

797 views • 52 slides

Verifying Optimizations using SMT Solvers Nuno Lopes technology Why verify optimizations? from

technology from seed Verifying Optimizations using SMT Solvers Nuno Lopes technology Why verify optimizations? from seed Catch bugs before they even exist Corner cases are hard to debug Time spent in additional verification step

1.34k views • 44 slides

Implementing Data Layout Optimizations Implementing Data Layout Optimizations in the LLVM

compilertree.com Implementing Data Layout Optimizations Implementing Data Layout Optimizations in the LLVM Framework in the LLVM Framework Prashantha NR (Speaker) CompilerTree Technologies CompilerTree Technologies

509 views • 33 slides

Khem Raj Embedded Linux Conference 2014, San Jose, CA } What is GCC } General Optimizations

Khem Raj Embedded Linux Conference 2014, San Jose, CA } What is GCC } General Optimizations } GCC specific Optimizations } Embedded Processor specific Optimizations } Approaches to speed up compile time } Additional tools }

419 views • 29 slides

ISM science with Euclid? Pierre-Alain Duc The diffuse sky background Diffuse dust clouds in

ISM science with Euclid? Pierre-Alain Duc The diffuse sky background Diffuse dust clouds in our Milky Way scatter light in the optical regime. Very extended,highly structured emission. A range of color. They may be taken

327 views • 9 slides

Galaxy Evolution interstellar matter (ISM) drives galaxy evolution, but SFR evolution driven

Galaxy Evolution interstellar matter (ISM) drives galaxy evolution, but SFR evolution driven by gas supply ?? starburst vs main sequence ?? need to measure the mass of ISM gas or dust CO / long dust em. w/ ALMA high J CO ??

416 views • 38 slides

Advanced Computer Graphics CS 563: Making Imperfect Shadow Maps View Adaptive Frederik

Advanced Computer Graphics CS 563: Making Imperfect Shadow Maps View Adaptive Frederik Clinckemaillie Computer Science Dept. Worcester Polytechnic Institute (WPI) Background: Virtual Point Lights Simulates indirect illumination by creating

435 views • 30 slides

Dust from AGBs and the ISM in the Early Universe Dust-catalyzed H 2 Formation Andy Liao, 4 th Year

Dust from AGBs and the ISM in the Early Universe Dust-catalyzed H 2 Formation Andy Liao, 4 th Year UG 1 Supervisor: Milos Milosavljevic 1 Graduate Supervisor: Chalence Safranek-Shrader 1 1 The University of Texas at Austin Undergraduate Research

202 views • 9 slides

USING MOLECULAR GAS PROPERTIES TO STUDY AGN Anne Klitsch DARK, Niels Bohr Institute, University

USING MOLECULAR GAS PROPERTIES TO STUDY AGN Anne Klitsch DARK, Niels Bohr Institute, University of Copenhagen Cline Proux (ESO), Martin Zwaan (ESO), Ian Smail (CEA), Mark Swinbank (CEA), Rob Ivison (ESO) Marianne Vestergaard (DARK), Sandra

293 views • 25 slides

Pulsar contribution to electron/positron cosmic rays Dmitry Malyshev CCPP, NYU together with

Pulsar contribution to electron/positron cosmic rays Dmitry Malyshev CCPP, NYU together with Ilias Cholis & Joseph Gelfand arXiv:0903.1310 PRD80:063005(2009 ) Fermi Symposium 09 2000 Anomalous flux: Total flux ATIC Extra source

456 views • 11 slides

ISM@FIRE-2013 Information Access in The Legal Domain Ambedkar Kanapala Sukomal Pal Department

ISM@FIRE-2013 Information Access in The Legal Domain Ambedkar Kanapala Sukomal Pal Department of Computer Science & Engineering Indian School of Mines Dhanbad, India Contents Introduction FIRE Tasks Approach Result

316 views • 16 slides

Monetary policy decision October 2019 Slowdown towards a more normal economic situation

Monetary policy decision October 2019 Slowdown towards a more normal economic situation Slowdown in international growth Continued high confidence among households Clear decline in manufacturing sector confidence Index and deviation from

671 views • 12 slides