an empirical characterization of a e i i l ch t i ti f
play

An Empirical Characterization of A E i i l Ch t i ti f - PowerPoint PPT Presentation

An Empirical Characterization of A E i i l Ch t i ti f Stream Programs and its Implications g p for Language and Compiler Design Bill Thies 1 and Saman Amarasinghe 2 1 Mi 1 Microsoft Research India f R h I di 2 Massachusetts Institute


  1. An Empirical Characterization of A E i i l Ch t i ti f Stream Programs and its Implications g p for Language and Compiler Design Bill Thies 1 and Saman Amarasinghe 2 1 Mi 1 Microsoft Research India f R h I di 2 Massachusetts Institute of Technology gy PACT 2010

  2. What Does it Take to Evaluate a New Language? Evaluate a New Language? StreamIt (PACT'10) ( ) Contessa (FPT'07) AG (LDTA'06) AG (LDTA 06) RASCAL (SCAM'09) NDL (LCTES'04) Anne (PLDI'10) UR (PLDI'10) Teapot (PLDI'96) Teapot (PLDI 96) Facile (PLDI'01) 0 0 1000 1000 2000 2000 0 1000 2000 Lines of Code

  3. What Does it Take to Evaluate a New Language? Evaluate a New Language? StreamIt (PACT'10) ( ) Small studies make it hard to assess: Contessa (FPT'07) - Experiences of new users over time Experiences of new users over time AG (LDTA'06) AG (LDTA 06) - Common patterns across large programs RASCAL (SCAM'09) NDL (LCTES'04) Anne (PLDI'10) UR (PLDI'10) Teapot (PLDI'96) Teapot (PLDI 96) Facile (PLDI'01) 0 0 1000 1000 2000 2000 0 1000 2000 Lines of Code

  4. What Does it Take to Evaluate a New Language? Evaluate a New Language? StreamIt (PACT’10) ( ) StreamIt (PACT'10) ( ) Contessa (FPT'07) AG (LDTA'06) AG (LDTA 06) RASCAL (SCAM'09) NDL (LCTES'04) Anne (PLDI'10) UR (PLDI'10) Teapot (PLDI'96) Teapot (PLDI 96) Facile (PLDI'01) 0 0 10K 10K 20K 20K 30K 30K 0 0 2000 4000 6000 800010000 2000 4000 6000 800010000 12000 12000 14000 14000 16000 16000 18000 18000 20000 20000 22000 22000 24000 24000 26000 26000 28000 28000 30000 30000 32000 32000 34000 34000 Lines of Code

  5. What Does it Take to Evaluate a New Language? Evaluate a New Language? StreamIt (PACT’10) ( ) StreamIt (PACT'10) ( ) Contessa (FPT'07) AG (LDTA'06) AG (LDTA 06) RASCAL (SCAM'09) NDL (LCTES'04) Anne (PLDI'10) UR (PLDI'10) Teapot (PLDI'96) Teapot (PLDI 96) Facile (PLDI'01) 0 0 10K 10K 20K 20K 30K 30K 0 0 2000 4000 6000 800010000 2000 4000 6000 800010000 12000 12000 14000 14000 16000 16000 18000 18000 20000 20000 22000 22000 24000 24000 26000 26000 28000 28000 30000 30000 32000 32000 34000 34000 Lines of Code

  6. What Does it Take to Evaluate a New Language? Evaluate a New Language? StreamIt (PACT’10) ( ) StreamIt (PACT'10) ( ) Contessa (FPT'07) Our characterization: - 65 programs 65 programs AG (LDTA'06) AG (LDTA 06) - 34,000 lines of code RASCAL (SCAM'09) - Written by 22 students - Over period of 8 years O i d f 8 NDL (LCTES'04) Anne (PLDI'10) This allows: - Non-trivial benchmarks UR (PLDI'10) - Broad picture of application space Teapot (PLDI 96) Teapot (PLDI'96) - Understanding long-term user Understanding long term user experience Facile (PLDI'01) 0 0 10K 10K 20K 20K 30K 30K 0 0 2000 4000 6000 800010000 2000 4000 6000 800010000 12000 12000 14000 14000 16000 16000 18000 18000 20000 20000 22000 22000 24000 24000 26000 26000 28000 28000 30000 30000 32000 32000 34000 34000 Lines of Code

  7. Streaming Application Domain AtoD • For programs based on streams of data – Audio, video, DSP, networking, and Audio video DSP networking and FMDemod cryptographic processing kernels – Examples: HDTV editing, radar p g, Duplicate tracking, microphone arrays, cell phone base stations, graphics LPF LPF 1 LPF LPF 2 LPF LPF 3 • Properties of stream programs – Regular and repeating computation HPF 1 HPF 1 HPF 2 HPF 2 HPF 3 HPF 3 – Independent filters with explicit communication RoundRobin RoundRobin Adder Speaker

  8. StreamIt: A Language and Compiler for Stream Programs for Stream Programs • • Key idea: design language that enables static analysis Key idea: design language that enables static analysis Goals: • 1. Improve programmer productivity in the streaming domain 2. Expose and exploit the parallelism in stream programs • Project contributions: – Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05] – Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06, MIT’10] – Domain-specific optimizations [PLDI'03, CASES'05, MM'08] – Cache-aware scheduling [LCTES'03, LCTES'05] – Extracting streams from legacy code [MICRO'07] – User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

  9. StreamIt Language Basics • High-level, architecture-independent language – Backend support for uniprocessors, multicores (Raw, SMP), Backend support for uniprocessors multicores (Raw SMP) cluster of workstations [Lee & Messerschmidt, • Model of computation: synchronous dataflow • Model of computation: synchronous dataflow 1987] 1987] – Program is a graph of independent filters x 10 Input – Filters have an atomic execution step Filters have an atomic execution step 1 1 with known input / output rates 10 x 1 – Compiler is responsible for p p Decimate scheduling and buffer management 1 1 • Extensions to synchronous dataflow Extensions to synchronous dataflow x 1 x 1 O tp t Output – Dynamic I/O rates – Support for sliding window operations Support for sliding window operations – Teleport messaging [PPoPP’05]

  10. Example Filter: Low Pass Filter float-> float filter LowPassFilter (int N, float[N] weights;) { work peek N push 1 pop 1 { float result = 0; N for (int i= 0; i< weights.length; i+ + ) { result + = weights[i] * peek (i); p g ( ) } Stateless Stateful filter push (result); pop (); p p (); } }

  11. Example Filter: Low Pass Filter float-> float filter LowPassFilter (int N ) { float[N] weights; float[N] weights; work peek N push 1 pop 1 { float result = 0; N weights = adaptChannel(); h d h l() for (int i= 0; i< weights.length; i+ + ) { result + = weights[i] * peek (i); p g ( ) } Stateful filter push (result); pop (); p p (); } }

  12. Structured Streams filter • Each structure is single- input single-output input, single-output pipeline i li may be • Hierarchical and any StreamIt language composable composable construct splitjoin splitter joiner feedback loop splitter joiner

  13. StreamIt Benchmark Suite (1/2) • Realistic applications (30): – Serpent encryption – Serpent encryption – MPEG2 encoder / decoder MPEG2 encoder / decoder – Ground Moving Target Indicator – Vocoder – RayTracer RayTracer – Mosaic Mosaic – MP3 subset – 3GPP physical layer – Radar Array Front End Radar Array Front End – Medium Pulse Compression Radar Medium Pulse Compression Radar – JPEG decoder / transcoder – Freq-hopping radio – Orthogonal Frequency g q y – Feature Aided Tracking g Division Multiplexer – HDTV – Channel Vocoder – H264 subset – Filterbank – Synthetic Aperture Radar – Target Detector – GSM Decoder – FM Radio – 802.11a transmitte – DToA Converter – DES encryption

  14. StreamIt Benchmark Suite (2/2) • Libraries / kernels (23): – Autocorrelation Autocorrelation – Matrix Multiplication Matrix Multiplication – Cholesky – Oversampler – CRC – Rate Convert – DCT (1D / 2D, float / int) – Time Delay Equalization – FFT (4 granularities) – Trellis – Lattice – VectAdd • Graphics pipelines (4): p p p ( ) – Reference pipeline – Shadow volumes – Phong shading – Particle system • Sorting routines (8) – Insertion sort – Bitonic sort (3 versions) to c so t (3 e s o s) – Bubble Sort – Merge sort – Comparison counting – Radix sort

  15. 3GPP

  16. 802.11a

  17. Bitonic Sort

  18. Note to online viewers: For high-resolution stream graphs of all benchmarks, f please see pp. 173-240 of this thesis: http://groups csail mit edu/commit/papers/09/thies-phd-thesis pdf http://groups.csail.mit.edu/commit/papers/09/thies phd thesis.pdf

  19. Characterization Overview • Focus on architecture-independent features – Avoid performance artifacts of the StreamIt compiler Avoid performance artifacts of the StreamIt compiler – Estimate execution time statically (not perfect) • Three categories of inquiry: Th t i f i i 1. Throughput bottlenecks 2 2. Scheduling characteristics S h d li h t i ti 3. Utilization of StreamIt language features

  20. Lessons Learned from Lessons Learned from the StreamIt Language g g What we did right What we did wrong Opportunities for doing better

  21. 1. Expose Task, Data, & Pipeline Parallelism Data parallelism • Analogous to DOALL loops Splitter Task parallelism Pipeline parallelism Joiner Task

  22. 1. Expose Task, Data, & Pipeline Parallelism Data parallelism Splitter Stateless Joiner Splitter ne Pipeli Task parallelism Joiner Pipeline parallelism Data Task

  23. 1. Expose Task, Data, & Pipeline Parallelism Data parallelism Splitter • 74% of benchmarks contain entirely data-parallel filters • In other benchmarks, 5% to 96% Joiner (median 71%) of work is data-parallel Splitter ne Pipeli Task parallelism • 82% of benchmarks contain at least one splitjoin Joiner • Median of 8 splitjoins per benchmark Pipeline parallelism Data Task

Recommend


More recommend