efficient analysis of
play

Efficient Analysis of Multidimensional Linear Systems for - PowerPoint PPT Presentation

Efficient Analysis of Multidimensional Linear Systems for Wordlength Optimization Gal Deest Tomofumi Yuki Olivier Sentieys Steven Derrien This work was funded by European FP7 project Alma 1 Embedded System Design Many constraints:


  1. Efficient Analysis of Multidimensional Linear Systems for Wordlength Optimization Gaël Deest Tomofumi Yuki Olivier Sentieys Steven Derrien This work was funded by European FP7 project Alma 1

  2. Embedded System Design Many constraints: • Power efficiency • Production cost • Performance / speed • Time-to-market • … 2

  3. Design-Space Exploration (DSE) Cost Feasible solutions Execution time Cost = power or area 3

  4. Design-Space Exploration (DSE) Time constraint Cost Execution time Cost = power or area 3

  5. Design-Space Exploration (DSE) Time constraint Cost Optimum Execution time Cost = power or area 3

  6. Design-Space Exploration (DSE) Accuracy constraint Cost Optimum Accuracy degradation (Signal to Noise Ratio) Cost = power or area 3

  7. Design-Space Exploration (DSE) Accuracy constraint Cost Optimum Accuracy degradation (Signal to Noise Ratio) Cost = power or area Custom fixed-point formats used to reduce cost 3

  8. Wordlength Optimization Process Soft accuracy constraints (eg., noise power) Fast accuracy evaluation is critical for thorough design-space exploration 4

  9. This Work State of the art: Techniques Applicability Depth of DSE Simulation-based Excellent Limited Current analytical Limited Good techniques Our approach Good Good Diagnostic: Applicability issues for analytical techniques. Contribution: Extended applicability and scalability. 5

  10. Overview • Background • Analytical techniques • Proposed approach 6

  11. Fixed-Point Arithmetic • Scaled integers: 2 −𝑙 × Integer value • Product of 2 𝑜 -bit numbers 2𝑜 bits ! • Some bits must be dropped ( quantization) Example (truncation): 𝟑 −𝟐 𝟑 −𝟑 𝟑 −𝟒 𝟑 −𝟓 𝟑 −𝟔 𝟑 −𝟕 𝟑 𝟒 𝟑 𝟑 𝟑 𝟐 𝟑 𝟏 Dropped bits 7

  12. Quantization Errors Modeled as noise / random variable Example: Truncation to 𝟑 −𝒐 precision 𝑓𝑠𝑠𝑝𝑠 ~ 𝑉 −2 −𝑜 ; 0 • Assumptions: Widrow hypothesis • Statistical moments: 2 −2𝑜 𝜏 2 = 𝜈 = −2 −𝑜−1 12 8

  13. Analytical Techniques Goal: Compute an output noise formula Idea: Model propagation of errors to the output Representation : Signal Flow Graphs (SFG) 9

  14. Accuracy Model Construction 10

  15. Accuracy Model Construction Quantization errors = new inputs 10

  16. Accuracy Model Construction Quantization errors = new inputs 10

  17. Accuracy Model Construction Compute transfer function for each error 10

  18. The Challenge How to go from …to this ? this … float xb[N]; float fir(float in) { float y = 0; xb[0] = in; for (int i=0; i<N; i++) acc += b[i]*xb[i]; for (int i=N-1; i>0; i--) xb[i] = xb[i-1]; return y; } 11

  19. The Challenge Current methods: • Flatten control (completely unroll loops, etc.) • Heavy use of annotations: #pragma DELAY Example: float xb[N]; Limitations: • Scalability issues (large graphs) • Implicit 1D stream assumption • Not easily applicable to image processing 12

  20. Motivating Example: Deriche Filter Recursive Filter Horizontal: 𝑐 1 𝑐 2 Left-to-right 𝑏 1 𝑏 2 𝑏 3 𝑏 4 Right-to-left Vertical: 𝑐 1 𝑐 2 Similar along columns 13

  21. Motivating Example: Deriche Filter Issues with SFG representation: • Requires image size to be statically known • Each pixel is a different input • Number of transfer functions: 𝑃(𝑂 4 ) • For 32x32 image: 1,048,576 ! Cannot be handled with current methods 14

  22. Intuition of the Technique • Current tools cannot capture regularity of multidimensional filters. Idea : • Generalize SFGs to multidimensional systems of equations. • Infer this representation using polyhedral dependence analysis. 15

  23. Steps of our Method 1. Build an equational representation of the program. float xb[N]; float fir(float x) { float y = 0; 𝑇 0 𝑜 = 0 xb[0] = in; 𝑇 1 𝑜 = 𝑦(𝑜) 𝑇 0 𝑜 + 𝑐(𝑗) × 𝑇 1 (𝑜) 𝑗 = 0 𝑇 2 𝑜, 𝑗 = for (int i=0; i<N; i++) 𝑇 2 𝑜, 𝑗 − 1 + 𝑐(𝑗) × 𝑇 3 (𝑜 − 1, 𝑗) 𝑗 > 0 y += b[i]*xb[i]; 𝑇 1 (𝑜) 𝑗 = 1 𝑇 3 𝑜, 𝑗 = 𝑇 3 (𝑜 − 1, 𝑗 − 1) 𝑗 > 1 for (int i=N-1; i>0; i--) 𝑧 𝑜 = 𝑇 2 (𝑜, 𝑂 − 1) xb[i] = xb[i-1]; return y; } 16

  24. Equational Representation Example: float tmp = 0; 𝑇 0 () = 0 𝑇 0 () 𝑗 = 0 for (int i=0; i<N; i++) 𝑇 1 (𝑗) = 𝑏𝑠𝑠 𝑗 + 𝑇 1 (𝑗 − 1) 𝑗 > 0 tmp = arr[i] + tmp; • Statement ≡ equation • Keeps track of data dependencies • Easy to transform / reason about • Relies on Array Dataflow Analysis (Feautrier, 1991) 17

  25. Example: Simplified Deriche Filter for (int i=0; i<N; i++) { prev = 0; for (int j=0; j<N; j++) { Horizontal pass tmp[i][j] = a1*x[i][j] + b1*prev; (row scan) prev = tmp[i][j]; } } for (int j=0; j<N; j++) { prev = 0; for (int i=0; i<N; i++) { Vertical pass y[i][j] = a2*tmp[i][j] + b2*prev; (column scan) prev = y[i][j]; } } 20

  26. Equation System After pre-processing: 𝑇 1 𝑗, 𝑘 = 𝑏 1 𝑦 𝑗, 𝑘 + 𝑐 1 𝑇 1 (𝑗, 𝑘 − 1) 𝑇 2 𝒌, 𝒋 = 𝑏 2 𝑇 1 𝒋, 𝒌 + 𝑐 2 𝑇 2 (𝑘, 𝑗 − 1) 21

  27. Equation System After pre-processing: 𝑇 1 𝑗, 𝑘 = 𝑏 1 𝑦 𝑗, 𝑘 + 𝑐 1 𝑇 1 (𝑗, 𝑘 − 1) 𝑇 2 𝒌, 𝒋 = 𝑏 2 𝑇 1 𝒋, 𝒌 + 𝑐 2 𝑇 2 (𝑘, 𝑗 − 1) Swapped dimensions (Non-uniform dependences) 21

  28. Steps of our method 2. Uniformization 𝑇 1 𝑗, 𝑘 = 𝑏 1 𝑦 𝑗, 𝑘 + 𝑐 1 𝑇 1 (𝑗, 𝑘 − 1) 𝑇 2 𝒋, 𝒌 = 𝑏 2 𝑇 1 𝒋, 𝒌 + 𝑐 2 𝑇 2 (𝑗 − 1, 𝑘) 23

  29. Steps of our Method 3. Convolution Detection Computation pattern: 𝑧 𝒍 = 𝑑(𝒘) × 𝑦(𝒍 − 𝒘) 𝒘 • Pattern matching. • Simplifies noise propagation. 19

  30. Convolution Detection After pre-processing: 𝑇 1 𝑗, 𝑘 = 𝑏 1 𝑦 𝑗, 𝑘 + 𝑐 1 𝑇 1 (𝑗, 𝑘 − 1) 𝑇 2 𝑗, 𝑘 = 𝑏 2 𝑇 1 𝑗, 𝑘 + 𝑐 2 𝑇 2 (𝑗 − 1, 𝑘) 𝑐 1 ∗ 𝑏 1 𝑦 ∗ 𝑇 1 23

  31. Convolution Detection After pre-processing: 𝑇 1 𝑗, 𝑘 = 𝑏 1 𝑦 𝑗, 𝑘 + 𝑐 1 𝑇 1 (𝑗, 𝑘 − 1) 𝑇 2 𝑗, 𝑘 = 𝑏 2 𝑇 1 𝑗, 𝑘 + 𝑐 2 𝑇 2 (𝑗 − 1, 𝑘) 𝑐 1 𝑐 2 ∗ ∗ 𝑏 1 𝑏 2 𝑦 ∗ 𝑇 1 ∗ 𝑇 2 23

  32. Accuracy Model Construction 4. Compute noise propagation for each source Extract subfilter to the output. Example: From statement 𝑇 1 to 𝑇 2 𝑐 1 𝑐 2 ∗ ∗ 𝑏 1 𝑏 2 𝑦 ∗ 𝑇 1 ∗ 𝑇 2 24

  33. Impulse Response Computation Determines noise propagation: 𝒊 𝑐 1 𝑐 2 ∗ ∗ 𝑓𝑠𝑠 𝑓𝑠𝑠 𝑏 2 𝑗𝑜 𝑝𝑣𝑢 𝑇 1 ∗ 𝑇 2 𝑓𝑠𝑠 𝑝𝑣𝑢 𝒘 = 𝑓𝑠𝑠 𝑗𝑜 ∗ 𝒊 𝒘 • Easy to compute for non-recursive filters • Infinite for recursive filters 25

  34. Non-Recursive Filters 𝑨 = 𝑦 ∗ ℎ 1 + 𝑦 ∗ ℎ 2 𝑧 = 𝑨 ∗ ℎ 3 ℎ 1 ℎ 3 ∗ 𝑦 𝑨 ∗ 𝑧 ∗ ℎ 2 26

  35. Non-Recursive Filters 𝑨 = 𝑦 ∗ ℎ 1 + 𝑦 ∗ ℎ 2 𝑧 = 𝑨 ∗ ℎ 3 ℎ 1 ℎ 3 ∗ 𝑦 𝑨 ∗ 𝑧 ∗ ℎ 2 27

  36. Non-Recursive Filters 𝑨 = 𝑦 ∗ (ℎ 1 +ℎ 2 ) 𝑧 = 𝑨 ∗ ℎ 3 ℎ 1 + ℎ 2 ℎ 3 𝑦 ∗ 𝑨 ∗ 𝑧 28

  37. Non-Recursive Filters 𝑨 = 𝑦 ∗ (ℎ 1 +ℎ 2 ) 𝑧 = 𝑨 ∗ ℎ 3 ℎ 1 + ℎ 2 ℎ 3 𝑦 ∗ 𝑨 ∗ 𝑧 29

  38. Non-Recursive Filters 𝑧 = 𝑦 ∗ ℎ 3 ∗ (ℎ 1 + ℎ 2 ) ℎ 3 ∗ (ℎ 1 + ℎ 2 ) 𝑦 ∗ 𝑧 𝒊 = ℎ 3 ∗ (ℎ 1 + ℎ 2 ) 30

  39. Recursive Filters 𝑧 = 𝑦 ∗ ℎ 1 + 𝑧 ∗ ℎ 2 ℎ 2 ∗ ℎ 1 𝑦 ∗ 𝑧 Finding 𝒊 ≡ solving multidimensional recurrence • • Hard problem 31

  40. Impulse Response Approximation • Hypothesis: Filter is stable 𝒊 𝒘 < ∞ 𝒘 • Consequence: 𝑠→∞ lim ℎ(𝑤) = 0 𝑤 >𝑠 • Idea: Evaluate impulse response on sufficiently large window. 32

  41. Back to the Definition • Impulse response = output of the filter when the input is a unit impulse : 𝜀 𝑤 = 1 𝒘 = 𝟏 0 otherwise • Feed the filter with impulse and use the output as impulse response 33

  42. Experimental Results: Model Construction Time Algorithm ID.Fix (s) Our Tool (s) IIR8 23.1 20.5 Sobel ( 𝟒𝟑 × 𝟒𝟑 ) 169.1 9.2 Sobel ( 𝟕𝟓 × 64) 2173.1 9.7 Sobel ( 𝟐𝟑𝟗 × 𝟐𝟑𝟗 ) - 9.4 Gaussian blur ( 𝟒𝟑 × 𝟒𝟑 ) 160.1 10.2 Gaussian blur ( 𝟕𝟓 × 𝟕𝟓 ) 2010.9 9.5 Gaussian blur ( 𝟐𝟑𝟗 × 𝟐𝟑𝟗 ) - 9.4 Deriche ( 𝟐𝟕 × 16) - 6.5 34

  43. Experimental Results: Model Validity Algorithm Simulation (dB) Our Tool (dB) Error (%) IIR8 -17.80 -17.84 -0.2 Sobel 11.62 12.04 3.6 Gauss 3.78 3.78 0.1 Deriche -18.01 -18.06 -2.78 35

  44. Conclusion 1. Extraction of a compact program representation (generalization of SFGs). 2. Reformulation of analytical techniques on this representation. 3. Wider applicability for analytical accuracy analysis 36

  45. Open Issues • Extension to non linear, non time-invariant filters • Extensions exist for 1D SFGs • Expected to be easily applicable to our model • Regular, but non affine programs • Example: FFT • Highly correlated Inputs 37

Recommend


More recommend