Execution Time Prediction for Energy- Efficient Hardware - PowerPoint PPT Presentation

Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University

Accelerators in Interactive Computing Systems • Interactive systems have response time requirements and often use hardware accelerators • Observation : Finishing earlier than the requirement is usually not needed • Goal : Perform DVFS for hardware accelerators to save energy while meeting response time requirements 2 Cornell University Tao Chen

DVFS for Interactive Computing Systems • Save energy by running slower (lower frequency/voltage) Job 0 Job 1 deadline deadline Time Job 0 Job 1 deadline deadline Time Predict and Set DVFS Level • Requirement • Correctly predict each job’s execution time 3 Cornell University Tao Chen

Opportunity and Challenge Execution time of a video decoding accelerator deadline • Opportunity: Most jobs finish earlier than the deadline • Challenge: Irregular variations in job execution time 4 Cornell University Tao Chen

Conventional DVFS Controllers • History-based execution time prediction • Example: PID controller • Problem of history-based prediction • Reactive — decisions lag behind changes 5 Cornell University Tao Chen

Predictive DVFS Framework for Accelerators • Approach: Build a predictor hardware for each accelerator that uses job input data to predict execution time • Design Time: Build predictor and train prediction model • Identify features related to execution time • Generate a hardware slice that can calculate features quickly • Train a prediction model that maps features to execution time • Run Time: Run predictor to inform DVFS decisions Job Job Hardware Job Execution DVFS DVFS Execution Input Slice Features Time Model Model Level Time 6 Cornell University Tao Chen

Features to Capture Execution Time Variation • Source of variation : input-dependent control decisions initial state S1 Job 1 S1 S2 S4 S1 S2 S4 S1 S2 S3 Job 2 S1 S2 S4 S1 S3 S4 S1 S4 Time • Feature : State Transition Count 𝑇𝑈𝐷 = [𝑡𝑢 (,* , 𝑡𝑢 (,, , 𝑡𝑢 *,- , 𝑡𝑢 ,,- , 𝑡𝑢 -,( ] 2 0 2 0 2 Job 1 1 1 1 1 2 Job 2 7 Cornell University Tao Chen

Features to Capture Execution Time Variation • Variable state latency initial state ! done init done init done S1 4 3 2 1 0 2 1 0 S2 S3 Job 3 S1 S3 S4 S1 S3 S4 S1 Time S4 done • Feature : Counter Average Initial Value init 𝐵𝐽𝑊 = [𝑗𝑤 3, ] FSM Counter done 3 Job 3 • Other counter features in the paper 8 Cornell University Tao Chen

Identifying and Extracting Features • Automated flow based on RTL analysis • Identify FSM and counter features in RTL • Instrument RTL to extract features • More details in the paper Job Job Hardware Job Execution DVFS DVFS Execution Input Slice Features Time Model Model Level Time 9 Cornell University Tao Chen

Hardware Slicing • Need to obtain features before running the accelerator • Create a minimal version of the accelerator • Program slicing on accelerator RTL code Accelerator Logic Control Unit Datapath Hardware slice • Optimize hardware slice to run fast init done init done Counter 4 3 2 1 0 2 1 0 FSM S1 S3 S4 S1 S3 S4 S1 Time 10 Cornell University Tao Chen

Execution Time Prediction Model features execution model time coefficients Linear model : 𝑧 5 = 𝑌𝑐 • Train model using convex optimization • Reduce the number of features • Prioritize meeting deadlines over saving energy Job Job Hardware Job Execution DVFS DVFS Execution Input Slice Features Time Model Model Level Time 11 Cornell University Tao Chen

Evaluation Methodology • Vertically integrated evaluation methodology • Circuit-level simulation: obtain voltage-frequency relationship • Gate-level modeling: obtain area, power and energy numbers • Register-transfer-level simulation: obtain execution time • Benchmark accelerators Name Description h264 Video decoding cjpeg Image encoding djpeg Image decoding aes Cryptography sha Cryptography md Molecular dynamics stencil Image processing • Deadline: 16.7 ms 12 Cornell University Tao Chen

Results: Energy and Deadline Misses • 36.7% energy savings on average • 0.4% deadline misses 13 Cornell University Tao Chen

Results: Overheads of Slice-Based Predictor • 5.1% area overhead • 1.5% energy overhead • 3.5% execution time overhead 14 Cornell University Tao Chen

More Evaluation Results in Paper • More detailed experimental results • Prediction Accuracy Analysis • Results with Predictor Overheads Removed • Sensitivity Study on Varying Deadlines • Platform extensions • DVFS with Voltage Boosting • Results for FPGA-based Accelerators • Results for Accelerators Generated by HLS 15 Cornell University Tao Chen

Summary Observation : Finishing faster than the deadline is not needed Goal : DVFS for accelerators with response time requirements Solution : Prediction-based DVFS • Execution time depends on input-dependent control decisions • Hardware features can be used to capture control decisions • Proposed a framework to generate predictors automatically Results : Highly accurate DVFS for accelerators 16 Cornell University Tao Chen

Questions? Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University

Execution Time Prediction for Energy- Efficient Hardware - PowerPoint PPT Presentation

Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University Accelerators in Interactive Computing Systems Interactive systems have response

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Hardware Observability Framework Hardware Observability Framework Hardware Observability

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables Marius Granns

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Global Virtual Time Wallclock time T (GVT t t ) during the execution of a ) during the execution

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Energy Efficient Mortgages Initiative Energy efficient Mortgages Action Plan (EeMAP) Energy

Hardware Enclaves & In Intel SGX CS261 Hardware Enclaves HW abstractions for

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

LINEE GUIDA DELLASMA: UP TO DATE Azienda Universit degli Ospedaliera Studi di Pisa Pisana

Productivity with Pre-Packed Chromatography Columns Michael Dittmer MedImmune, BioProcess

A Faster way to do ECC Mike Scott Dublin City University Joint work with Steven Galbraith

Pulaski County Master Gardeners Butterfly Seminar February 20, 2018 Kitty Sanders, Butterfly

Smart Energy Communities in Northern & Remote Canada: The Northwest Territories Marie-Soleil

Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 11 Stefano Ermon,

The energy based discontinuous Galerkin method Daniel Appel Applied Mathematics University of

HAWC High Energy Upgrade with a Sparse Outrigger Array t h 3 5 I n t e r n a t i o

Execution Time Prediction for Energy- Efficient Hardware - PowerPoint PPT Presentation

Execution Time Prediction for Energy- Efficient Hardware Accelerators Tao Chen, Alex Rucker, and G. Edward Suh Computer Systems Laboratory Cornell University Accelerators in Interactive Computing Systems Interactive systems have response

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Hardware Observability Framework Hardware Observability Framework Hardware Observability

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Storage Efficient Hardware Prefetching using Delta Correlating Prediction Tables Marius Granns

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Global Virtual Time Wallclock time T (GVT t t ) during the execution of a ) during the execution

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Energy Efficient Mortgages Initiative Energy efficient Mortgages Action Plan (EeMAP) Energy

Hardware Enclaves &amp; In Intel SGX CS261 Hardware Enclaves HW abstractions for

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

LINEE GUIDA DELLASMA: UP TO DATE Azienda Universit degli Ospedaliera Studi di Pisa Pisana

Productivity with Pre-Packed Chromatography Columns Michael Dittmer MedImmune, BioProcess

A Faster way to do ECC Mike Scott Dublin City University Joint work with Steven Galbraith

Pulaski County Master Gardeners Butterfly Seminar February 20, 2018 Kitty Sanders, Butterfly

Smart Energy Communities in Northern &amp; Remote Canada: The Northwest Territories Marie-Soleil

Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 11 Stefano Ermon,

The energy based discontinuous Galerkin method Daniel Appel Applied Mathematics University of

HAWC High Energy Upgrade with a Sparse Outrigger Array t h 3 5 I n t e r n a t i o

Hardware Enclaves & In Intel SGX CS261 Hardware Enclaves HW abstractions for

Smart Energy Communities in Northern & Remote Canada: The Northwest Territories Marie-Soleil