Loop Vectorization: How to vectorize interleave memory access? Hao - PowerPoint PPT Presentation

Sep 16, 2023 •169 likes •263 views

Loop Vectorization: How to vectorize interleave memory access? Hao Liu, James Molloy and Jiangning Liu 14th April 2015 1 Background: Interleave Access Case: visit 24-bit RGB image Memory: B3 G3 R3 B0 G2 R2 B0 G1 R1 B0 G0 R0

Loop Vectorization: How to vectorize interleave memory access? Hao Liu, James Molloy and Jiangning Liu 14th April 2015 1
Background: Interleave Access • Case: visit 24-bit RGB image Memory: … B3 G3 R3 B0 G2 R2 B0 G1 R1 B0 G0 R0 for (i = 0; i < N; i += 3) { for.body: R = RGB[i]; ... G = RGB[i+1]; %R = load i8, i8* %idx0 B = RGB[i+2]; %G = load i8, i8* %idx1 R += C; %B = load i8, i8* %idx2 G -= C; %add = add i8 %R, %C B *= C; %sub = sub i8 %G, %C RGB[i] = R; %mul = mul i8 %B, %C RGB[i + 1] = G; store i8 %add, i8* %idx0 RGB[i + 2] = B; store i8 %sub, i8* %idx1 } store i8 %mul, i8* %idx2 ... 2
Background: Interleave Access … B3 G3 R3 B0 G2 R2 B0 G1 R1 B0 G0 R0 Memory: Interleave Load (LD3) % wide.B: B7 B6 B5 B4 B3 B2 B1 B0 G7 G6 G5 G4 G3 G2 G1 G0 % wide.G: R7 R6 R5 R4 R3 R2 R1 R0 + - * % wide.R: C C C C C C C C C C C C C C C C % wide.C: C C C C C C C C = % mul.B: B7 B6 B5 B4 B3 B2 B1 B0 G7 G6 G5 G4 G3 G2 G1 G0 % sub.G: R7 R6 R5 R4 R3 R2 R1 R0 % add.R: Interleave Store (ST3) … B3 G3 R3 B0 G2 R2 B0 G1 R1 B0 G0 R0 Memory: 3
Loop Vectorizer Overview • 3 phases: Legality Inductions – Legality Reductions Memory – Profitability – Transform Profitability CostModel Transform Scalar ->Vector Unroll 4
Teach Loop Vectorizer: Legality • Identification – Collect: Constant strided accesses – Sort: Consecutive accesses the same stride – Select: Number of accesses equal to the stride Step1: StrideList = {<%R, 3>, <%G, 3>, <%B, 3>, ...} Step2: ConsecutiveList = {%R, %G, %B, ...} Step3: InterleaveList = {%R, %G, %B} 5
Teach Loop Vectorizer: Legality • Induction with arbitrary steps (Patch upstreamed) for (unsigned i = 0; i < N; i += 3 ) { ... • Memory check for (i = 0; i < N; i += ?) { R = RGB[i]; True Dependence: G = RGB[i+1]; i+=1, i+=2 B = RGB[i+2]; ... No Dependence: RGB[i] = R; i+=3 RGB[i + 1] = G; RGB[i + 2] = B; } 6
Teach Loop Vectorizer: Transform • IRs to intrinsics %R = load i8, i8* %ptr0 %G = load i8, i8* %ptr1 %B = load i8, i8* %ptr2 Loop Vectorizer <8 x i8> stride.load(%ptr0, 0, 3) <24 x i8> index.load (%ptr0, <0,3,6,…,1,... <8 x i8> stride.load(%ptr0, 1, 3) <8 x i8> shuffle <0,1,2,3,4,5,6,7> <8 x i8> stride.load(%ptr0, 2, 3) <8 x i8> shuffle <8,9,10,11,12,13,14,15> <8 x i8> shuffle <16,17,18,19,20,21,22,23> Back End call {<8xi8>, <8xi8>, <8xi8>} llvm.aarch64.ld3(%ptr) 7
Expect Performance Gain • Expected improvements in specific benchmarks – EEMBC.rgbcmy 6x – EEMBC.rgbyiq 3x • Need more testing and tuning • More Challenges – Runtime memory dependence checks – Type promotion: i8 is illegal but <8 x i8> is legal 8
Thank you! 9

Recommend

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

10/19/2010 Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop (Continuous Quality Improv (Continuous Quality Improvement (Continuous Quality Improv

889 views • 45 slides

Is vectorization easy? Is vectorization enough? Sbastien Ponce Florian Lemaitre Plan

Is vectorization easy? Is vectorization enough? Sbastien Ponce Florian Lemaitre Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Plan Introduction 1 What

601 views • 42 slides

Enhancing Fine- Grained Parallelism Loop vectorization, Loop distribution, Scalar expansion

Enhancing Fine- Grained Parallelism Loop vectorization, Loop distribution, Scalar expansion Scalar and array renaming 1 Fine-Grained Parallelism Theorem 2.8. A sequential loop can be converted to a parallel loop if the loop carries no

606 views • 41 slides

Repetition Types of Loops Counting loop Know how many times to loop

Repetition Types of Loops Counting loop Know how many times to loop Sentinel-controlled loop Expect specific input value to end loop Endfile-controlled loop End of data file is end of loop Input validation loop

257 views • 8 slides

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three Strategies 1. Mean Reversion 2. Momentum 3. Pairs Trading Budish, E., Cramton, P ., & Shim, J. (2015). The high-frequency trading arms race:

803 views • 22 slides

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop interchange and skewing, Loop Strip-mining cs6363 1 Introduction Our previous loop transformations target vector and superscalar architectures Now

538 views • 32 slides

Function Call Re-Vectorization Pupil: Rubens Emilio Alves Moreira Advisor: Fernando Magno Quinto

Function Call Re-Vectorization Pupil: Rubens Emilio Alves Moreira Advisor: Fernando Magno Quinto Pereira Function Call Re-Vectorization Programmability Efficiency Function Call Re-Vectorization CUDA: kernel <<<#warps,

1.62k views • 118 slides

LLVM Auto-Vectorization Past Present Future Renato Golin www.linaro.org LLVM

LLVM Auto-Vectorization Past Present Future Renato Golin www.linaro.org LLVM Auto-Vectorization Plan: What is auto-vectorization? Short-history of the LLVM vectorizer What do we support today, and an overview of how it works

386 views • 22 slides

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE Computing with Graphical Processing Units (GPUs) Scott B. Baden / CSE 262 / UCSD, Wi '15 2 Performance programming for Mtx Multiply

700 views • 41 slides

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed loop invariant is valid only if the loop body actually maintains the property, i.e., the loop invariant remains true at the end of each execution of

386 views • 20 slides

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations time occurs in loops First, we will identify loops We will study three optimizations Loop-invariant code motion This lecture is

250 views • 5 slides

Upper and Lower Loop Bound Estimation by Symbolic Execution and Loop Acceleration Pavel Cadek

Upper and Lower Loop Bound Estimation by Symbolic Execution and Loop Acceleration Pavel Cadek 1 1 TU Wien, Austria Formal Methods in Computer-Aided Design 30 Oct - 2 Nov, 2018 Loop Bound Analysis Upper loop bound: max { n , 0 } Lower loop

248 views • 5 slides

c } false loop body P (postcondition) Loop Invariant Defn : A boolean condition that

while (c) { loop body true c } false loop body P (postcondition) Loop Invariant Defn : A boolean condition that is checked immediately before every evaluation of the loop guard . while (c) I //@loop_invariant I; true c { loop

362 views • 8 slides

Objectives You should be able to ... Loop Invariants Explain the concept of well formed

od Introduction Loops Loop Equations Loop Invariants Termination Introduction Loops Loop Equations Loop Invariants Termination Objectives You should be able to ... Loop Invariants Explain the concept of well formed induction. Dr.

579 views • 5 slides

Loop Statements & Vectorizing Code Chapter 5 Attaway MATLAB 4E for loop used as a

Loop Statements & Vectorizing Code Chapter 5 Attaway MATLAB 4E for loop used as a counted loop repeats an action a specified number of times an iterator or loop variable specifies how many times to repeat the action general

745 views • 31 slides

Trace while Loop, cont. Trace while Loop, cont. Print Welcome to Java Print Welcome to Java int

while Loop Flow Chart int count = 0; while (loop-continuation-condition) { while (count < 100) { // loop-body; Chapter 4 Loops System.out.println("Welcome to Java!"); Statement(s); count++; } } count = 0; Loop false false

215 views • 9 slides

Robust Pose Optimization Made Differentiable Eric Brachmann 5th International Workshop on

Robust Pose Optimization Made Differentiable Eric Brachmann 5th International Workshop on Recovering 6D Object Pose @ICCV19 Background 2012-2017 Dr. PhD at Eric Brachmann @eric_brachmann since 2018 Post-Doc at since 2019 Guest at Prof.

720 views • 32 slides

Multimedia Event Detection Using GMM Supervectors and Camera Motion Cancelled Features Yusuke

TRECVID2012 MED TokyoTechCanon Multimedia Event Detection Using GMM Supervectors and Camera Motion Cancelled Features Yusuke Kamishima, Nakamasa Inoue, Koichi Shinoda Tokyo Institute of Technology TRECVID2012 MED TokyoTechCanon

294 views • 28 slides

Simple Digital Camera with Image Editor Group 3 Jun Zhao, Kwan Yin Lau, and Xiang Gao The

Simple Digital Camera with Image Editor Group 3 Jun Zhao, Kwan Yin Lau, and Xiang Gao The System Major Components D5M camera DE2 Board VGA monitor Image processing operations Color Negation Brightness Adjustment Image

485 views • 11 slides

Loving Kindness Meditation Mindfulness through the eyes of a Veteran video Third level

TCMLHFT Day3 Click To Insert Title Text boxes can move to accommodate content Click To Insert Title Text boxes can move to accommodate content Size 30pt, Calibri Bold, RGB 0,63,114 Size 30pt, Calibri Bold, RGB 0,63,114 Text no

329 views • 4 slides

Wide RGB-D for Scaled Layout Reconstruction Alejandro Perez-Yus, Gonzalo Lopez-Nicolas, Jose J.

Wide RGB-D for Scaled Layout Reconstruction Alejandro Perez-Yus, Gonzalo Lopez-Nicolas, Jose J. Guerrero Universidad de Zaragoza, Spain International Workshop on Lines, Planes and Manhattan Models for 3-D Mapping September 28, 2017 at IROS

645 views • 48 slides

Kaldera Hendrik Proosa hendrik@kalderafx.com Field of work 2D/3D visualization and animation

Kaldera Hendrik Proosa hendrik@kalderafx.com Field of work 2D/3D visualization and animation Visual effects Technical tinkering https://vimeo.com/97715012 https://vimeo.com/159210457 Feature film work Cleanup work & compositing

787 views • 58 slides

ECE 3 3567 M Microc ocon ontrol oller ers L Lab Laboratory #3 Pulse Width Modulation

ECE 3 3567 M Microc ocon ontrol oller ers L Lab Laboratory #3 Pulse Width Modulation Spring 2020 Dr. Gregg Chapman 1 Lab #3 ab #3 Over erview Preliminar aries NOTE: Create a Lab 3 Project and copy in the files from Lab 2 as a

671 views • 35 slides

Foundations of Computer Science Lecture 19 Expected Value The Average Over Many Runs of an

Foundations of Computer Science Lecture 19 Expected Value The Average Over Many Runs of an Experiment Mathematical Expectation: A Number that Summarizes a PDF Conditional Expectation Law of Total Expectation Last Time 1 Random variables.

657 views • 55 slides