a power aware online scheduling algorithm for streaming
play

A Power-Aware Online Scheduling Algorithm for Streaming - PowerPoint PPT Presentation

PATMOS 2010: 7-10 September 2010,Grenoble,France A Power-Aware Online Scheduling Algorithm for Streaming Applications in Embedded MPSoC T. Sassolas, N. Ventroux, G. Blanc CEA LI ST, Em bedded Com puting Laboratory contact:


  1. PATMOS 2010: 7-10 September 2010,Grenoble,France A Power-Aware Online Scheduling Algorithm for Streaming Applications in Embedded MPSoC T. Sassolas, N. Ventroux, G. Blanc CEA LI ST, Em bedded Com puting Laboratory contact: tanguy.sassolas@cea.fr

  2. Table of content • Context • Previous works • Proposed solution • Implementation • Results • Conclusion 2

  3. Table of content • Context • Previous works • Proposed solution • Implementation • Results • Conclusion 3

  4. Context of the Study • Embedded systems must support � Various application domains 3D Graphics � more computation intensive applications 1 TOPS Digital TV � Application that become more and more dynamic H264 • Move to multiprocessor Architectures DVB-S2 MPEG2 100 SDR 3GPP-LTE WIMAX OpenGL 2.0 UMTS 10 OpenGL1.1 EDGE GPRS Mobile multimedia 1 GOPS HD Audio GSM Telecom Multimedia 0.1 4

  5. Multiprocessor issues • Need to maximize resource usage � Increase task parallelism out � Streaming applications in T1 T2 T3 � Set of tasks with data dependencies � Scheduling of dependent tasks � Execution speed determined by slowest task • Need to reduce power consumption � Real case execution � Worst case execution � Dynamism implies loss of energy � Some energy savings can be performed � Need of a dynamic control T3 T3 T3 T3 T3 T3 P 2 D1 D1 D2 D2 D3 D3 T2 T2 T2 T2 P 1 D1 D2 D3 D4 T1 T1 T1 T1 T1 P 0 Slack Slack Slack Slack D2 D2 D3 D4 D5 time 5

  6. DVFS vs DPM • Dynamic Voltage and Frequency Scaling (DVFS) � Low mode switching penalty Power � Reduces mainly dynamic power consumption T2 T1 time • Dynamic power management (DPM) Power � High energy and time switching penalty � Reduces both static and dynamic power T1 T2 consumption time • Optimal functioning points are highly dependent on the technological process 6

  7. Table of content • Context • Previous works • Proposed solution • Implementation • Results • Conclusion 7

  8. Streaming schedulings: offline solutions • Scheduling on a multiprocessor is an NP complete problem [1] • Adding power optimization adds complexity • Monoprocessor solutions [2] [3] … � Find minimum power consumption given data production rate and communication buffer sizes � With DPM or DVFS functionalities � Variable production rate following probability rule • Multiprocessor solutions � Minimize energy consumption by finding optimum number of resources and their speed to meet QoS requirements [4] � Various models : communication costs, consumption model, optimization techniques… [5] • But regular workload was assumed: Application dynamism imposes online solutions 8

  9. Streaming scheduling: Online solutions • Monoprocessor T2 T1 T3 P 0 � Slack time reclamation: GSR [6] D2 D2 D2 � Offline and online partitioning T2 T1 P 0 T3 D2 D2 D2 • Multiprocessor � Many solutions for independent tasks -> do not apply � Partitioning -> apply monoprocessor solution to every processor [7] [8] Resulting execution : No slack time! Buffer added T1 T3 T3 P 1 in P 1 D2 D2 T0 T3 out T2 T2 T1 T4 T1 P 0 P 0 T2 T4 D2 D2 D2 D2 D2 D2 time time 9

  10. Table of content • Context • Previous works • Proposed solution • Implementation • Results • Conclusion 10

  11. Power-aware streaming application scheduling • Properties � Throughput constrained by slowest task » Other tasks can be slowed down to reach the same throughput -> DVFS � Task deeper in the pipeline can be blocked waiting for available data » Preemption mechanisms are required for a higher resource usage rate » Unused resources can be shut down -> DPM • Objective: keep the throughput while making substantial energy savings 11

  12. Static Priorities • If PE number < task number : need to specify static priority � Describes the position in the pipeline � Allows to execute oldest data first. � Prevents to buffer instead of executing critical tasks T1 T3 T4 out in T0 T2 Prio = 0 Prio = 1 Prio = 2 Prio = 3 12

  13. buffers monitors Buffer emptying Buffer filling • Priority impact threshold : increase threshold : reduce DVFS couple of Writer DVFS couple of � Task is blocked Writer � Task executes at fastest speed � Application priority Change Change � Task priority QoS QoS Buffer empty threshold : Buffer full threshold : Preempt Reader(s) Preempt Writer 13

  14. Table of content • Context • Previous works • Proposed solution • Implementation • Results • Conclusion 14

  15. Consumption model • SESAM[9] simulation environment � SystemC � AT-TLM � IP: Noc, caches, memories… Turbo A=1,B=1 � Processors ArchC ISS [10] Consumption 923 mW � Statistics • Modified ArchC models 1 µs 2 µs 1 µs 3 µs � MIPS32 ISS annotated with PXA270 [11] PSM � mode power consumption Half-Turbo Deep Idle 2 µs A=1,B=2 A=0,B=1 � Execution speed variation Consumption Consumption 390 mW 64 mW 3 µs � Mode switching penalties » Energy » t ime 15

  16. Implemented platform in T1 T2 Processing elements D1 Task 1 Task 2 Task 1 D1 Threshold reached Scheduling CPU T1 T2 Controller Algorithm Central memory Shared memory banks 16

  17. The scheduling loop Buffer Statuses Update Dynamic task priorities Task Statuses Order task along with priority Keep already allocated tasks on the same PE Execution / preemption demands Allocate remaining tasks on remaining PE Mode switching demands Update PE consumption mode along with buffer status 17

  18. Table of content • Context • Previous works • Proposed solution • Implementation • Results • Conclusion 18

  19. The WCDMA test case • Wideband code division multiple access application [12] � 13 tasks � Variable workload : pilot frame once every 10 frames � Irregular pipeline task lengths 19

  20. Results – Energy saving • 3 scheduling solutions: Standard, DPM only and DVFS + DPM • Substantial energy saving S tandard dpm only dpm + dvfs energy saving dpm only energy saving dpm+dvfs 100 90 ) upancy (% 80 ) nergy saving (% 70 60 effective occ 50 40 30 E 20 PE 10 0 1 2 4 8 13 16 Number of PE 20

  21. Results – Execution time • No deviation in execution time 21

  22. Results – pipeline balancing • Blocked states are reduced by the use of DVFS • More could be achieved with other DVFS couples 22

  23. Table of content • Context • Previous works • Proposed solution • Implementation • Results • Conclusion 23

  24. Conclusion • Power reduction for variable pipeline � Substantial powers saving when PE load drops : 45% on 13 processors � No performance loss � Light execution to reduce control overhead • This work was partly funded by project SCALOPES (ARTEMIS) • Upcoming works � Implementation on hardware multiprocessor platform � Evaluation with other applications from various domains � Evaluation of optimal buffer sizes 24

  25. References � [1] M. L. Dertouzos and A. K. Mok. Multiprocessor Online Scheduling of Hard-Real- Time Tasks. IEEE Transactions on Software Engineering , 15(12):1497-1506, 1989. � [2] Y.-H. Lu, L. Benini, and G. De Micheli. Dynamic Frequency Scaling with Buffer Insertion for Mixed Workloads. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , 21(5):1284-1305, 2002. � [3] N. Pettis, L. Cai, and Y.-H. Lu. Statistically Optimal Dynamic Power Management for Streaming Data. IEEE Transactions on Computers , 55(7):800-814, 2006. � [4] Xu, R., Melhem, R., and Mosse, D. 2007. Energy-Aware Scheduling for Streaming Applications on Chip Multiprocessors. In Proceedings of the 28th IEEE international Real-Time Systems Symposium (RTSS),pages 25-38, 2007. � [5] L. Benini, D. Bertozzi, A. Guerri, and M. Milano. Allocation, Scheduling and Voltage Scaling on Energy Aware MPSoCs. In Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems (CPAIOR) , pages 44-58, 2006. � [6] D. Mosse, H. Aydin, B. Childers and R. Melhem. Compiler-Assisted Dynamic Power-Aware Scheduling for Real-Time Applications, In Workshop on Compilers and Operating Systems for Low Power, 2000. 25

  26. References � [7] P. Choudhury, P. P. Chakrabarti, and R. Kumar. Online Dynamic Voltage Scaling using Task Graph Mapping Analysis for Multiprocessors. In International Conference on VLSI Design (VLSID) , pages 89-94, 2007. � [8] S. Hua, G. Qu, and S. S. Bhattacharyya. Energy-Ecient Embedded Software Implementation on Multiprocessor System-on-Chip with Multiple Voltages. ACM Transactions on Embedded Computing Systems (TECS), 5(2):321-341, 2006. � [9] N. Ventroux, A. Guerre, T. Sassolas, L. Moutaoukil, C. Bechara, and R. David. SESAM: an MPSoC Simulation Environment for Dynamic Application Processing. In IEEE International Conference on Embedded Software and Systems (ICESS), 2010. � [10] M. Bartholomeu G. Araujo C. Araujo R. Azevedo, S. Rigo and E. Barros. The ArchC Architecture Description Language and Tools. Parallel Programming , 33(5):453–484, 2005. � [11] Intel PXA27x Processor Family, Electrical, Mechanical, and Thermal Specication,2005. � [12]A. Richardson. WCDMA Design Handbook. 2006. 26

  27. Thank you for your attention We value your opinion and questions

Recommend


More recommend