satisfying dataflow programs constraints on multicore
play

Satisfying Dataflow Programs Constraints on Multicore Architectures - PowerPoint PPT Presentation

Satisfying Dataflow Programs Constraints on Multicore Architectures Citi lab PhD day 2014 Manuel Selva Supervisor: Lionel Morel Director: Stphane Frnot Bull: Frdric Soinne 27th March 2014 1 / 14 More and more parallelism Nehalem


  1. Satisfying Dataflow Programs Constraints on Multicore Architectures Citi lab PhD day 2014 Manuel Selva Supervisor: Lionel Morel Director: Stéphane Frénot Bull: Frédéric Soinne 27th March 2014 1 / 14

  2. More and more parallelism Nehalem multicore die (source: www.intel.com) Apple/ARM dual core (source: www.cultofmac.com) How to program ? • Threads (Java, C + Pthreads) • Annotations to sequential code (OpenMP) • Dataflow Multiprocessor motherboard (source: www.bit-tech.net) 2 / 14

  3. Why and what is dataflow programming ? Text.Y Mot.Y Merger Display Parser Text.U Mot.U Text.V Mot.V (a) H264 decoding (b) LTE-Adv decoding 3 / 14

  4. Why and what is dataflow programming ? Text.Y Mot.Y Merger Display Parser Text.U Mot.U Text.V Mot.V (a) H264 decoding (b) LTE-Adv decoding • Different kinds of parallelism • Actors exchanging data only through FIFO channels 3 / 14

  5. Why and what is dataflow programming ? Text.Y Mot.Y Merger Display Parser Text.U Mot.U Text.V Mot.V (a) H264 decoding (b) LTE-Adv decoding • Different kinds of parallelism • Actors exchanging data only through FIFO channels • Task 3 / 14

  6. Why and what is dataflow programming ? Text.Y Mot.Y Merger Display Parser Text.U Mot.U Text.V Mot.V (a) H264 decoding (b) LTE-Adv decoding • Different kinds of parallelism • Actors exchanging data only through FIFO channels • Task , pipeline 3 / 14

  7. Why and what is dataflow programming ? Text. Y Text. Y Text.Y Text. Y Mot.Y Merger Display Parser Text.U Mot.U Text.V Mot.V (a) H264 decoding (b) LTE-Adv decoding • Different kinds of parallelism • Actors exchanging data only through FIFO channels • Task , pipeline , data 3 / 14

  8. Why and what is dataflow programming ? Text.Y Mot.Y 256 768 256 Merger Display Parser Text.U Mot.U 256 Text.V Mot.V (a) H264 decoding (b) LTE-Adv decoding • Different kinds of parallelism • Actors exchanging data only through FIFO channels • Task , pipeline , data Many models SDF ... ... ... (DDF) Static analyses Expressiveness 3 / 14

  9. How to execute dataflow programs ? • Compilation to synchronized tasks respecting data dependencies Text.Y Mot.Y Parser Text.U Mot.U Merger Display Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; 4 / 14

  10. How to execute dataflow programs ? • Compilation to synchronized tasks respecting data dependencies • Mapping of tasks and channels to hardware C4 Text.Y Mot.Y C1 C7 C5 C10 Parser Text.U Mot.U Merger Display C2 C8 C6 C3 C9 Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; DF Mapper T1 T2 T5 T6 Core1 Core2 Core5 Core6 Dual socket processor Core3 Core4 Core7 Core8 T3 T4 T7 T8 T9 RAM 1 RAM 2 C1 C2 C3 C7 C8 C9 C4 C5 C6 C10 4 / 14

  11. Proposition Motivations • DF applications with throughput constraints • Mapping satisfying constraints requires: • Actors internal execution time • Concurrent applications • DF actors consumption/production rates Goals Extend DF language Compile time Runtime Monitor app/resources Adapt static choices 5 / 14

  12. Contrib 1: Languages/compilers extensions • Languages extensions taken into account in compilation flow [9, 10] Text.Y Mot.Y 25 f / s Merger Display Parser Text.U Mot.U Text.V Mot.V H264 graph with throughput constraint 6 / 14

  13. Contrib 1: Languages/compilers extensions • Languages extensions taken into account in compilation flow [9, 10] th 4 Text.Y Mot.Y th 1 th 7 25 f / s th 5 th 2 th 8 Merger Display Parser Text.U Mot.U th 6 th 3 Text.V Mot.V th 9 H264 graph with throughput constraint • Propagate this value in SDF languages • Determine actors acceptable exec time 6 / 14

  14. Contrib II - Fine grain monitoring Text.Y Mot.Y 25 f / s Merger Display Parser Text.U Mot.U Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; 7 / 14

  15. Contrib II - Fine grain monitoring Text.Y Mot.Y 25 f / s Merger Display Parser Text.U Mot.U Text.V Mot.V Merger instrumented to measure throughput DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Merger ; Display ; 7 / 14

  16. Contrib II - Fine grain monitoring Text.Y Mot.Y 25 f / s Merger Display Parser Text.U Mot.U Text.V Mot.V DF Compiler Tasks instrumented Task 1 Task 1 Task 2 Task 2 Task 3 Task 3 Task 4 Task 4 Task 5 Task 5 Task 6 Task 6 Task 7 Task 7 Task 8 Task 8 Task 9 Task 9 to measure Parser ; Parser ; Text . Y ; Text . Y ; Text . U ; Text . U ; Text . V ; Text . V ; Mot . Y ; Mot . Y ; Mot . U ; Mot . U ; Mot . V ; Mot . V ; Merger ; Merger ; Display ; Display ; actors execution times 7 / 14

  17. Contrib II - Fine grain monitoring Text.Y Mot.Y 25 f / s Merger Display Parser Text.U Mot.U Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; DF Mapper T1 T2 T5 T6 Core1 Core2 Core5 Core6 Dual socket processor Core3 Core4 Core7 Core8 T3 T4 T7 T8 T9 RAM 1 RAM 1 RAM 2 RAM 2 Memory monitoring using PMU: C1 C2 C3 C7 C8 C9 RAM controllers load C4 C5 C6 C10 QPI traffic 7 / 14

  18. Contrib II - Fine grain monitoring Text.Y Mot.Y 25 f / s Merger Display Parser Text.U Mot.U Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; DF Mapper T1 T2 T5 T6 Core1 Core2 Core5 Core6 Dual socket processor Core3 Core4 Core7 Core8 T3 T4 T7 T8 T9 RAM 1 RAM 1 RAM 2 RAM 2 Memory monitoring using PMU: C1 C2 C3 C7 C8 C9 RAM controllers load C4 C5 C6 C10 QPI traffic Conclusions Are we facing cores load imbalance ? Are the actors too slow because of memory latencies ? 7 / 14

  19. Contrib III - Dataflow adaptations Text.Y Mot.Y Parser Text.U Mot.U Merger Display Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; DF Mapper T1 T2 T5 T6 Core1 Core2 Core5 Core6 Dual socket processor Core3 Core4 Core7 Core8 T3 T4 T7 T8 T9 RAM 1 RAM 2 C1 C2 C3 C7 C8 C9 C4 C5 C6 C10 8 / 14

  20. Contrib III - Dataflow adaptations • Cpu load balancing Text.Y Mot.Y Parser Text.U Mot.U Merger Display Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; DF Mapper T1 T9 T2 T5 T6 Core1 Core2 Core5 Core6 Dual socket processor Core3 Core4 Core7 Core8 T3 T4 T7 T8 RAM 1 RAM 2 C1 C2 C3 C7 C8 C9 C4 C5 C6 C10 8 / 14

  21. Contrib III - Dataflow adaptations • Cpu load balancing • Memory load balancing Text.Y Mot.Y Parser Text.U Mot.U Merger Display Text.V Mot.V DF Compiler Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Parser ; Text . Y ; Text . U ; Text . V ; Mot . Y ; Mot . U ; Mot . V ; Merger ; Display ; DF Mapper T1 T9 T2 T5 T6 Core1 Core2 Core5 Core6 Dual socket processor Core3 Core4 Core7 Core8 T3 T4 T7 T8 RAM 1 RAM 2 C1 C2 C3 C7 C8 C4 C5 C6 C9 C10 8 / 14

  22. Conclusion Dynamic framework for DF programs State of the art • Applicative monitoring • DF compilation [7, 13, 4] • Hardware monitoring • DF theoretical throughput • Runtime adaptations analysis [3, 11] making profit of DF • DF adaptation [12, 8, 5, 1] information • Non-DF NUMA Current work adaptations [6, 2] • Finishing implementation in Streamit 9 / 14

  23. C HOI , Y., L I , C.-H., S ILVA , D. D., B IVENS , A., AND S CHENFELD , E. Adaptive task duplication using on-line bottleneck detection for streaming applications. In Proceedings of the 9th Conference on Computing Frontiers (New York, NY, USA, 2012), CF ’12, ACM, pp. 163–172. D ASHTI , M., F EDOROVA , A., F UNSTON , J., G AUD , F., L ACHAIZE , R., L EPERS , B., Q UEMA , V., AND R OTH , M. Traffic management: A holistic approach to memory placement on numa systems. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2013), ASPLOS ’13, ACM, pp. 381–394. 10 / 14

  24. G HAMARIAN , A.-H., G EILEN , M. C. W., S TUIJK , S., B ASTEN , T., M OONEN , A. J. M., B EKOOIJ , M., T HEELEN , B., AND M OUSAVI , M. Throughput analysis of synchronous data flow graphs. In Application of Concurrency to System Design, 2006. ACSD 2006. Sixth International Conference on (2006), pp. 25–36. G ORDON , M. I. Compiler Techniques for Scalable Performance of Stream Programs PhD thesis, MIT, 2010. H ORMATI , A. H., C HOI , Y., K UDLUR , M., R ABBAH , R., M UDGE , T., AND M AHLKE , S. Flextream: Adaptive compilation of streaming applications for heterogeneous architectures. 11 / 14

Recommend


More recommend