PolyMage: High-Performance Compilation for Heterogeneous Stencils Uday Bondhugula (with Ravi Teja Mullapudi, Vinay Vasista) Department of Computer Science and Automation Indian Institute of Science Bangalore, India Apr 15, 2015 Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Domain-Specific Languages A DSL and compiler for optimizing image processing pipelines Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Domain-Specific Languages A DSL and compiler for optimizing image processing pipelines Too specialized Need to learn a new language! A Dodo (highly special- ized, but extinct) Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Domain-Specific Languages A DSL and compiler for optimizing image processing pipelines Too specialized Need to learn a new language! But DSLs can be embedded in existing languages Can grow and become more general-purpose A DSL compiler can “see” across A Dodo (generalized to routines – allow whole program adapt) optimization Generate optimized code for multiple targets Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Image Processing Pipelines Graphs of interconnected processing stages I in I x I y I xx I xy I yy S xx S xy S yy det trace harris Figure: Harris corner detection Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Computation Patterns g f Point-wise f ( x, y ) = w r · g ( x, y, • ) + w g · g ( x, y, • ) + w b · g ( x, y, • ) Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Computation Patterns g f Stencil +1 +1 f ( x, y ) = � � g ( x + σ x , y + σ y ) · w ( σ x , σ y ) σ x = − 1 σ y = − 1 Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Computation Patterns g f Downsample +1 +1 f ( x, y ) = � � g (2 x + σ x , 2 y + σ y ) · w ( σ x , σ y ) σ x = − 1 σ y = − 1 Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Computation Patterns f g Upsample +1 +1 f ( x, y ) = � � g (( x + σ x ) / 2 , ( y + σ y ) / 2) · w ( σ x , σ y , x, y ) σ x = − 1 σ y = − 1 Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Example: Pyramid Blending pipeline ↓ y ↓ x ↓ y ↓ x ↓ y ↓ x M ↓ y ↓ x ↓ y ↓ x ↓ y ↓ x ↓ x ↓ y ↑ x ↑ x ↑ x ↑ x ↑ x ↑ x ↓ x ↓ y ↑ y ↑ y ↑ y ↑ y ↑ y ↑ y ↓ x ↓ y L L L L L L X X ↑ x X ↑ x X ↑ x ↑ + ↑ + ↑ + Image courtesy: Kyros Kutulakos Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Where are Image Processing Pipelines used? On images uploaded to social networks like Facebook, Google+ On all camera-enabled devices Everyday workloads from data center to mobile device scales Computational photography, computer vision, medical imaging, ... Google + Auto Enhance Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Naive vs Optimized Implementation 354 . 56 Execution time (ms) Naive implementation in C Naive parallelization – 7 × OpenMP, Vector pragmas (icc) 53 . 91 12 . 3 Manual optimization – 29 × Seq Par Tuned Locality, Parallelism, Vector intrinsics Harris corner detection (16 cores) Manually optimizing pipelines is hard Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Introduction Naive vs Optimized Implementation 354 . 56 Execution time (ms) Naive implementation in C Naive parallelization – 7 × OpenMP, Vector pragmas (icc) 53 . 91 12 . 3 Manual optimization – 29 × Seq Par Tuned Locality, Parallelism, Vector intrinsics Harris corner detection (16 cores) Goal: Performance levels of manual tuning Without the pain Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Approach Our Approach: PolyMage High-level language (DSL embedded in Python) – Allow expressing common patterns intuitively – Enables compiler analysis and optimization Automatic Optimizing Code Generator – Uses domain-specific cost models to apply complex combinations of scaling, alignment, tiling and fusion to optimize for parallelism and locality Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Approach Harris Corner Detection R, C = Parameter ( I n t ), Parameter ( I n t ) I = Image ( Float , [R+2, C+2]) x, y = V a r i a b l e (), V a r i a b l e () I n t e r v a l (0,R+1 ,1), I n t e r v a l (0,C+1 ,1) row , col = c = Condition (x,’>=’ ,1) & Condition (x,’<=’,R) & Condition (y,’>=’ ,1) & Condition (y,’<=’,C) I in cb = Condition (x,’>=’ ,2) & Condition (x,’<=’,R -1) & Condition (y,’>=’ ,2) & Condition (y,’<=’,C -1) Iy = Function (varDom = ([x,y],[row ,col ]), Float ) Iy.defn = [ Case (c, S t e n c i l (I(x,y), 1.0/12 , I x I y [[-1, -2, -1], [ 0, 0, 0], [ 1, 2, 1]]) ] Ix = Function (varDom = ([x,y],[row ,col ]), Float ) I xx I xy Ix.defn = [ Case (c, S t e n c i l (I(x,y), 1.0/12 , I yy [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]) ] Ixx = Function (varDom = ([x,y],[row ,col ]), Float ) Ixx.defn = [ Case (c, Ix(x,y) * Ix(x,y)) ] S xx S xy S yy Iyy = Function (varDom = ([x,y],[row ,col ]), Float ) Iyy.defn = [ Case (c, Iy(x,y) * Iy(x,y)) ] Ixy = Function (varDom = ([x,y],[row ,col ]), Float ) Ixy.defn = [ Case (c, Ix(x,y) * Iy(x,y)) ] det Sxx = Function (varDom = ([x,y],[row ,col ]), Float ) trace Syy = Function (varDom = ([x,y],[row ,col ]), Float ) Sxy = Function (varDom = ([x,y],[row ,col ]), Float ) f o r pair i n [(Sxx , Ixx), (Syy , Iyy), (Sxy , Ixy)]: pair [0]. defn = [ Case (cb , S t e n c i l (pair [1], 1, [[1, 1, 1], [1, 1, 1], [1, 1, 1]]) ] Function (varDom = ([x,y],[row ,col ]), Float ) det = d = Sxx(x,y) * Syy(x,y) - Sxy(x,y) * Sxy(x,y) harris det.defn = [ Case (cb , d) ] trace = Function (varDom = ([x,y],[row ,col ]), Float ) trace.defn = [ Case (cb , Sxx(x,y) + Syy(x,y)) ] harris = Function (varDom = ([x,y],[row ,col ]), Float ) coarsity = det(x,y) - .04 * trace(x,y) * trace(x,y) harris.defn = [ Case (cb , coarsity) ] Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Our Approach: PolyMage High-level language (DSL embedded in Python) – Allow expressing common patterns intuitively – Enables compiler analysis and optimization Automatic Optimizing Code Generator – Uses domain-specific cost models to apply complex combinations of scaling, alignment, tiling and fusion to optimize for parallelism and locality Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Polyhedral Representation f out f 2 f 1 x V a r i a b l e () x = f in = Image ( Float , [18]) f 1 = Function (varDom = ([x], [ I n t e r v a l (0, 17, 1)]), Float ) f 1 .defn = [ f in (x) + 1 ] f 2 = Function (varDom = ([x], [ I n t e r v a l (1, 16, 1)]), Float ) f 2 .defn = [ f 1 (x -1) + f 1 (x+1) ] Function (varDom = ([x], [ I n t e r v a l (2, 15, 1)]), Float ) f out = f out .defn = [ f 2 (x -1) + f 2 (x+1) ] Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Polyhedral Representation Domains f out f 2 f 1 x V a r i a b l e () x = f in = Image ( Float , [18]) f 1 = Function (varDom = ([x], [ I n t e r v a l (0, 17, 1)]), Float ) f 1 .defn = [ f in (x) + 1 ] f 2 = Function (varDom = ([x], [ I n t e r v a l (1, 16, 1)]), Float ) f 2 .defn = [ f 1 (x -1) + f 1 (x+1) ] Function (varDom = ([x], [ I n t e r v a l (2, 15, 1)]), Float ) f out = f out .defn = [ f 2 (x -1) + f 2 (x+1) ] Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Polyhedral Representation Dependence vectors f out f 2 f 1 x Function Dependence Vectors f out ( x ) = f 2 ( x − 1) · f 2 ( x + 1) (1 , 1) , (1 , − 1) f 2 ( x ) = f 1 ( x − 1) + f 1 ( x + 1) (1 , 1) , (1 , − 1) f 1 ( x ) = f in ( x ) Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Polyhedral Representation Live-outs f out f 2 f 1 x Function Dependence Vectors f out ( x ) = f 2 ( x − 1) · f 2 ( x + 1) (1 , 1) , (1 , − 1) f 2 ( x ) = f 1 ( x − 1) + f 1 ( x + 1) (1 , 1) , (1 , − 1) f 1 ( x ) = f in ( x ) Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Scheduling Criteria f out f 2 f 1 x Locality Storage Parallelism Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Scheduling Criteria Default schedule f out f 2 f 1 x Locality Storage Parallelism Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Scheduling Criteria Default schedule f out f 2 f 1 x Locality Storage Parallelism Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Scheduling Criteria Default schedule f out f 2 f 1 x Locality Storage Parallelism Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Compiler Scheduling Criteria Parallelogram tiling f out f 2 f 1 x Locality Storage Parallelism Uday Bondhugula, Indian Institute of Science Dagstuhl seminar, Apr 12-17, 2015
Recommend
More recommend