Extended Lattice-Based Memory Allocation Alain Darte Tomofumi Yuki Alexandre Isoard Laboratoire de l’Informatique du Parallélisme Lyon, France 25th International Conference on Compiler Construction March 17–18, 2016 Barcelona Spain A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 1 / 20
Motivation We want to automatically find compact memory allocations. Because: domain specific languages (DSL) abstract away memory allocation: ◮ Alpha ◮ Single Assignment C ◮ . . . array expansion increases parallelism but also memory footprint ◮ Parallelizing/optimizing compilers useful for programming memory hierarchy ◮ Kernel offloading to GPU ◮ High level synthesis for FPGA ◮ Flat-mode of Xeon Phi A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 2 / 20
Static intra-array optimization Principle: Reuse memory locations (values without overlapping lifetimes) Reuse within a given array (element wise) Reduce memory of temporary arrays Static optimization (at compile time) for ( int t = 0; t < n-1; ++t) for ( int i = 0; i < n; ++i) A[t+1][i] = f(A[t][i-1], A[t][i], A[t][i+1]); Uses n 2 storage. for ( int t = 0; t < n-1; ++t) for ( int i = 0; i < n; ++i) A[(t+1)%2][i] = f(A[t%2][i-1], A[t%2][i], A[t%2][i+1]); Uses 2 n storage. A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 3 / 20
Static intra-array optimization Principle: Reuse memory locations (values without overlapping lifetimes) Reuse within a given array (element wise) Reduce memory of temporary arrays Static optimization (at compile time) for ( int t = 0; t < n-1; ++t) for ( int i = 0; i < n; ++i) A[t+1][i] = f(A[t][i-1], A[t][i], A[t][i+1]); Uses n 2 storage. for ( int t = 0; t < n-1; ++t) for ( int i = 0; i < n; ++i) int x = i-t; int b = n+1; A[x%b] = f(A[x%b], A[(x+1)%b], A[(x+2)%b]); Uses n + 1 storage! A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 3 / 20
History: shift in approach Schedule-based approaches: Wilde & Rajopadhye (1996) De Greef, Cathoor & al. (1997) Lefebvre-Feautrier (1998) Quilleré & Rajopadhye (2000) Thies, Amarasinghe, & al. (2007) Separation of concerns (live-range conflict vs. modular mapping search) Universal Occupancy Vectors, Strout & al. (1998) Lattice-based, Darte & al. (2005) SMO, Bhaskaracharya & al. (2015) Extended Lattice-based, Darte & al. (this paper) A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 4 / 20
Our contribution Basis selection heuristic driven by reuse vectors Provides: Support for wide range of languages (cf. our paper at IMPACT’16) Method to optimize mapping for size Natural treatment of non-convex union of polyhedra Parametric analysis and parametric modular mapping Simple to use (only requires conflict difference set) Scope: Intra-array optimization ☛ Inter-array? Size focused ☛ Locality? Affine mapping ☛ Redundant storage, live-range splitting? A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 5 / 20
Successive modulo for ( int t = 0; t < n; ++t) for ( int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1]; i Canonical basis: (Lefebvre-Feautrier) A[t][i] �→ A[t % 2][i % n] t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20
Successive modulo for ( int t = 0; t < n; ++t) for ( int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1]; i Canonical basis: (Lefebvre-Feautrier) A[t][i] �→ A[t % 2][i % n] t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20
Successive modulo for ( int t = 0; t < n; ++t) for ( int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1]; i Canonical basis: (Lefebvre-Feautrier) A[t][i] �→ A[t % 2][i % n] t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20
Successive modulo for ( int t = 0; t < n; ++t) for ( int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1]; i Canonical basis: (Lefebvre-Feautrier) A[t][i] �→ A[t % 2][i % n] Skewed basis: A[t][i] �→ A[(i-t) % (n+1)] t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20
Successive modulo for ( int t = 0; t < n; ++t) for ( int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1]; i Canonical basis: (Lefebvre-Feautrier) A[t][i] �→ A[t % 2][i % n] Skewed basis: A[t][i] �→ A[(i-t) % (n+1)] t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 6 / 20
Tiling case We usually get this kind of conflicts (live-out set) after tiling: i t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20
Tiling case We usually get this kind of conflicts (live-out set) after tiling: i Canonical basis: t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20
Tiling case We usually get this kind of conflicts (live-out set) after tiling: i Canonical basis: A[t][i] �→ A[t % n][i % n] t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20
Tiling case We usually get this kind of conflicts (live-out set) after tiling: i Canonical basis: A[t][i] �→ A[t % n][i % n] Skewed basis: t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20
Tiling case We usually get this kind of conflicts (live-out set) after tiling: i Canonical basis: A[t][i] �→ A[t % n][i % n] Skewed basis: A[t][i] �→ A[(i-t) % (2n-1)] t Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 7 / 20
Tiling in general (with longer dependences) We also get this kind of conflicts (live-out set) after tiling: i t Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 8 / 20
Tiling in general (with longer dependences) We also get this kind of conflicts (live-out set) after tiling: i Bhaskaracharya & al. basis: A[t][i] �→ A[(i-2t) % (5n-4)] t Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 8 / 20
Tiling in general (with longer dependences) We also get this kind of conflicts (live-out set) after tiling: i Bhaskaracharya & al. basis: A[t][i] �→ A[(i-2t) % (5n-4)] Best basis: A[t][i] �→ A[t%2][(i-t)%(2n-1)] t Bhaskaracharya & al.: Looks for hyperplanes interecting at most one point Ours: Finds the basis that minimizes reuse distance A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 8 / 20
Running example i for ( int t = 0; t < n; ++t) for ( int i = 0; i < n; ++i) A[t][i] = A[t-1][i-1] + A[t-1][i] + A[t-1][i+1] + A[t-2][i]; ֒ → t A[t][i] �→ A[t%2][i] A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 9 / 20
Running example (skewed) i for ( int t = 0; t < n; ++t) for ( int i = t; i < n+t; ++i) int j = i-t; A[t][j] = A[t-1][j-1] + A[t-1][j] + A[t-1][j+1] + A[t-2][j]; ֒ → t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 10 / 20
Running example (tiled) i t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 11 / 20
Running example (tiled) i t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 11 / 20
Running example (tiled) i t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 11 / 20
Conflict set to conflict differences (visualization) δ i i δ t t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 12 / 20
Running example (successive modulo) δ i Canonical basis: � � � � 1 0 � x mod 0 1 m 0 δ t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20
Running example (successive modulo) δ i Canonical basis: � � � � 1 0 n � x mod 0 1 n − 1 m 0 δ t A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20
Running example (successive modulo) δ i Canonical basis: � � � � 1 0 n � x mod 0 1 n − 1 m 0 δ t slice A. Darte, A. Isoard, T. Yuki (LIP, Lyon) Extended Lattice-Based Memory Allocation Compiler Construction 2016 13 / 20
Recommend
More recommend