1 Optimizing Stream Programs Using Linear State Space Analysis Sitij Agrawal 1,2 , William Thies 1 , and Saman Amarasinghe 1 1 Massachusetts Institute of Technology 2 Sandbridge Technologies CASES 2005 http://cag.lcs.mit.edu/streamit
Streaming Application Domain 2 AtoD • Based on a stream of data – Graphics, multimedia, software radio Decode – Radar tracking, microphone arrays, duplicate HDTV editing, cell phone base stations • Properties of stream programs LPF 1 LPF 2 LPF 3 – Regular and repeating computation HPF 1 HPF 2 HPF 3 – Parallel, independent actors with explicit communication roundrobin – Data items have short lifetimes Encode Transmit
Conventional DSP Design Flow 3 Spec. (data-flow diagram) Design the Datapaths (no control flow) Signal Processing Expert in Matlab DSP Optimizations Coefficient Tables Rewrite the program Software Engineer Architecture-specific in C and Assembly Optimizations (performance, power, code size) C/Assembly Code
Ideal DSP Design Flow 4 Application-Level Design High-Level Program Application Programmer (dataflow + control) DSP Optimizations Compiler Architecture-Specific Optimizations Challenge: maintaining performance Challenge: maintaining performance C/Assembly Code
The StreamIt Language 5 • Goals: – Provide a high-level stream programming model – Invent new compiler technology for streams • Contributions: – Language design [CC ’02, PPoPP ’05] – Compiling to tiled architectures [ASPLOS ’02, ISCA ’04, Graphics Hardware ’05] – Cache-aware scheduling [LCTES ’03, LCTES ’05] – Domain-specific optimizations [PLDI ’03, CASES ‘05]
Programming in StreamIt 6 void->void pipeline FMRadio(int N, float lo, float hi) { AtoD add AtoD(); add FMDemod(); FMDemod add splitjoin { split duplicate; Duplicate for (int i=0; i<N; i++) { add pipeline { add LowPassFilter(lo + i*(hi - lo)/N); LPF 1 LPF 2 LPF 3 add HighPassFilter(lo + i*(hi - lo)/N); HPF 1 HPF 2 HPF 3 } } RoundRobin join roundrobin(); } add Adder(); Adder add Speaker(); Speaker }
Example StreamIt Filter 7 float->float filter LowPassButterWorth (float sampleRate, float cutoff) { float coeff; float x; init { coeff = calcCoeff(sampleRate, cutoff); } work peek 2 push 1 pop 1 { filter x = peek (0) + peek (1) + coeff * x; push (x); pop (); } }
Focus: Linear State Space Filters 8 • Properties: 1. Outputs are linear function of inputs and states 2. New states are linear function of inputs and states • Most common target of DSP optimizations – FIR / IIR filters – Linear difference equations – Upsamplers / downsamplers – DCTs
Representing State Space Filters 9 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs u states 〈 A, B, C, D 〉 x’ = Ax + Bu y = Cx + Du outputs
Representing State Space Filters 10 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { float u = pop(); 〈 A, B, C, D 〉 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs
Representing State Space Filters 11 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = D = 2 2 x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs
Representing State Space Filters 12 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 D = C = 2 2 x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs
Representing State Space Filters 13 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = 2 2 D = x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs
Representing State Space Filters 14 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = 2 2 D = x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs
Representing State Space Filters 15 • A state space filter is a tuple 〈 A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = 2 2 D = x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs
Representing State Space Filters 16 • A state space filter is a tuple � A, B, C, D 〉 inputs float->float filter IIR { u float x1, x2; states work push 1 pop 1 { 0.3 0.9 0 float u = pop(); A = B = 0.2 0 0.9 x’ = Ax + Bu push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; 2 C = 2 2 D = x2 = 0.9*x2 + 0.2*u; y = Cx + Du } } outputs Linear dataflow analysis
State Space Optimizations 17 1. State removal 2. Reducing the number of parameters 3. Combining adjacent filters
Change-of-Basis Transformation 18 x’ = Ax + Bu y = Cx + Du
Change-of-Basis Transformation 19 x’ = Ax + Bu y = Cx + Du T = invertible matrix Tx’ = TAx + TBu y = Cx + Du
Change-of-Basis Transformation 20 x’ = Ax + Bu y = Cx + Du T = invertible matrix Tx’ = TA(T -1 T)x + TBu y = C(T -1 T)x + Du
Change-of-Basis Transformation 21 x’ = Ax + Bu y = Cx + Du T = invertible matrix Tx’ = TAT -1 (Tx) + TBu y = CT -1 (Tx) + Du
Change-of-Basis Transformation 22 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx Tx’ = TAT -1 (Tx) + TBu y = CT -1 (Tx) + Du
Change-of-Basis Transformation 23 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx z’ = TAT -1 z + TBu y = CT -1 z + Du
Change-of-Basis Transformation 24 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx A’ = TAT -1 B’ =TB z’ = A’z + B’u y = C’z + D’u C’ = CT -1 D’ = D
Change-of-Basis Transformation 25 x’ = Ax + Bu y = Cx + Du T = invertible matrix, z = Tx A’ = TAT -1 B’ =TB z’ = A’z + B’u y = C’z + D’u C’ = CT -1 D’ = D Can map original states x to transformed states z = Tx without changing I/O behavior
1) State Removal 26 • Can remove states which are: a. Unreachable – do not depend on input b. Unobservable – do not affect output • To expose unreachable states, reduce [A | B] to a kind of row-echelon form – For unobservable states, reduce [A T | C T ] • Automatically finds minimal number of states
State Removal Example 27 1 0 0.3 0.9 0 0.3 0.9 0 T = x’ = 0 0.9 x + u x’ = 0 0.9 x + u 1 1 0.2 0.5 y = 2 2 x + 2u x + 2u y = 0 2 float->float filter IIR { float x1, x2; work push 1 pop 1 { float u = pop(); push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; x2 = 0.9*x2 + 0.2*u; } }
State Removal Example 28 1 0 0.3 0.9 0 0.3 0.9 0 T = x’ = 0 0.9 x + u x’ = 0 0.9 x + u 1 1 0.2 0.5 y = 2 2 x + 2u x + 2u y = 0 2 x1 is unobservable float->float filter IIR { float x1, x2; work push 1 pop 1 { float u = pop(); push(2*(x1+x2+u)); x1 = 0.9*x1 + 0.3*u; x2 = 0.9*x2 + 0.2*u; } }
State Removal Example 29 1 0 0.3 0.9 0 T = x’ = 0 0.9 x + u x’ = 0.9x + 0.5u 1 1 0.2 y = 2x + 2u y = 2 2 x + 2u float->float filter IIR { float->float filter IIR { float x1, x2; float x; work push 1 pop 1 { work push 1 pop 1 { float u = pop(); float u = pop(); push(2*(x1+x2+u)); push(2*(x+u)); x1 = 0.9*x1 + 0.3*u; x = 0.9*x + 0.5*u; x2 = 0.9*x2 + 0.2*u; } } } }
State Removal Example 30 5 FLOPs 9 FLOPs 8 load/store 12 load/store output output float->float filter IIR { float->float filter IIR { float x1, x2; float x; work push 1 pop 1 { work push 1 pop 1 { float u = pop(); float u = pop(); push(2*(x1+x2+u)); push(2*(x+u)); x1 = 0.9*x1 + 0.3*u; x = 0.9*x + 0.5*u; x2 = 0.9*x2 + 0.2*u; } } } }
2) Parameter Reduction 31 • Goal: Convert matrix entries (parameters) to 0 or 1 • Allows static evaluation: 1*x � x Eliminate 1 multiply 0*x + y � y Eliminate 1 multiply, 1 add • Algorithm (Ackerman & Bucy, 1971) – Also reduces matrices [A | B] and [A T | C T ] – Attains a canonical form with few parameters
Parameter Reduction Example 32 T = 2 x’ = 0.9x + 1 u x’ = 0.9x + 0.5u y = 1 x + 2u y = 2x + 2u 6 FLOPs 4 FLOPs output output
3) Combining Adjacent Filters 33 u Filter 1 u y = D 1 u Combined y z = D 2 D 1 u z = Eu Filter E Filter 2 z z = D 2 y z
3) Combining Adjacent Filters 34 u u B 1 A 1 0 x’ = x + u Combined B 2 D 1 B 2 C 1 A 2 Filter 1 Filter z = D 2 C 1 C 2 x + D 2 D 1 u y z Also in paper: Filter 2 - combination of parallel streams - combination of feedback loops - expansion of mis-matching filters z
Recommend
More recommend