execution time prediction for energy efficient hardware
play

Execution Time Prediction for Energy- Efficient Hardware - PowerPoint PPT Presentation

Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University Accelerators in Interactive Computing Systems Interactive systems have response


  1. Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University

  2. Accelerators in Interactive Computing Systems • Interactive systems have response time requirements and often use hardware accelerators • Observation : Finishing earlier than the requirement is usually not needed • Goal : Perform DVFS for hardware accelerators to save energy while meeting response time requirements 2 Cornell University Tao Chen

  3. DVFS for Interactive Computing Systems • Save energy by running slower (lower frequency/voltage) Job 0 Job 1 deadline deadline Time Job 0 Job 1 deadline deadline Time Predict and Set DVFS Level • Requirement • Correctly predict each job’s execution time 3 Cornell University Tao Chen

  4. Opportunity and Challenge Execution time of a video decoding accelerator deadline • Opportunity: Most jobs finish earlier than the deadline • Challenge: Irregular variations in job execution time 4 Cornell University Tao Chen

  5. Conventional DVFS Controllers • History-based execution time prediction • Example: PID controller • Problem of history-based prediction • Reactive — decisions lag behind changes 5 Cornell University Tao Chen

  6. Predictive DVFS Framework for Accelerators • Approach: Build a predictor hardware for each accelerator that uses job input data to predict execution time • Design Time: Build predictor and train prediction model • Identify features related to execution time • Generate a hardware slice that can calculate features quickly • Train a prediction model that maps features to execution time • Run Time: Run predictor to inform DVFS decisions Job Job Hardware Job Execution DVFS DVFS Execution Input Slice Features Time Model Model Level Time 6 Cornell University Tao Chen

  7. Features to Capture Execution Time Variation • Source of variation : input-dependent control decisions initial state S1 Job 1 S1 S2 S4 S1 S2 S4 S1 S2 S3 Job 2 S1 S2 S4 S1 S3 S4 S1 S4 Time • Feature : State Transition Count 𝑇𝑈𝐷 = [𝑡𝑢 (,* , 𝑡𝑢 (,, , 𝑡𝑢 *,- , 𝑡𝑢 ,,- , 𝑡𝑢 -,( ] 2 0 2 0 2 Job 1 1 1 1 1 2 Job 2 7 Cornell University Tao Chen

  8. Features to Capture Execution Time Variation • Variable state latency initial state ! done init done init done S1 4 3 2 1 0 2 1 0 S2 S3 Job 3 S1 S3 S4 S1 S3 S4 S1 Time S4 done • Feature : Counter Average Initial Value init 𝐵𝐽𝑊 = [𝑗𝑤 3, ] FSM Counter done 3 Job 3 • Other counter features in the paper 8 Cornell University Tao Chen

  9. Identifying and Extracting Features • Automated flow based on RTL analysis • Identify FSM and counter features in RTL • Instrument RTL to extract features • More details in the paper Job Job Hardware Job Execution DVFS DVFS Execution Input Slice Features Time Model Model Level Time 9 Cornell University Tao Chen

  10. Hardware Slicing • Need to obtain features before running the accelerator • Create a minimal version of the accelerator • Program slicing on accelerator RTL code Accelerator Logic Control Unit Datapath Hardware slice • Optimize hardware slice to run fast init done init done Counter 4 3 2 1 0 2 1 0 FSM S1 S3 S4 S1 S3 S4 S1 Time 10 Cornell University Tao Chen

  11. Execution Time Prediction Model features execution model time coefficients Linear model : 𝑧 5 = 𝑌𝑐 • Train model using convex optimization • Reduce the number of features • Prioritize meeting deadlines over saving energy Job Job Hardware Job Execution DVFS DVFS Execution Input Slice Features Time Model Model Level Time 11 Cornell University Tao Chen

  12. Evaluation Methodology • Vertically integrated evaluation methodology • Circuit-level simulation: obtain voltage-frequency relationship • Gate-level modeling: obtain area, power and energy numbers • Register-transfer-level simulation: obtain execution time • Benchmark accelerators Name Description h264 Video decoding cjpeg Image encoding djpeg Image decoding aes Cryptography sha Cryptography md Molecular dynamics stencil Image processing • Deadline: 16.7 ms 12 Cornell University Tao Chen

  13. Results: Energy and Deadline Misses • 36.7% energy savings on average • 0.4% deadline misses 13 Cornell University Tao Chen

  14. Results: Overheads of Slice-Based Predictor • 5.1% area overhead • 1.5% energy overhead • 3.5% execution time overhead 14 Cornell University Tao Chen

  15. More Evaluation Results in Paper • More detailed experimental results • Prediction Accuracy Analysis • Results with Predictor Overheads Removed • Sensitivity Study on Varying Deadlines • Platform extensions • DVFS with Voltage Boosting • Results for FPGA-based Accelerators • Results for Accelerators Generated by HLS 15 Cornell University Tao Chen

  16. Summary Observation : Finishing faster than the deadline is not needed Goal : DVFS for accelerators with response time requirements Solution : Prediction-based DVFS • Execution time depends on input-dependent control decisions • Hardware features can be used to capture control decisions • Proposed a framework to generate predictors automatically Results : Highly accurate DVFS for accelerators 16 Cornell University Tao Chen

  17. Questions? Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University

Recommend


More recommend