ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu
DNN is Deployed Everywhere Trading Auto Smart driving city QA robot Weather Text forecast generator 2
DNN Deployment is Challenging. ? L A DNN System E Challenges • Configuration space is huge Road • Environment may change dynamically • Must be low overhead 3
Previous Work Previous Works Challenges Huge Space of Configuration Resource Dynamic Management Environment [1] H. Hoffmann et. al. Jouleguard: DNN design energy guarantees for approximate applications. SOSP, 2015. Low Overhead [2] C. Imes et. al. Poet: a portable approach to minimizing energy under soft real-time constraints. RTAS, 2015 [3] N. Mishra et. al. CALOREE: learning control for predictable latency and low energy. ASPLOS, 2018. [4] A. Rahmani et. al. SPECTR: formal supervisory control and coordination for many-core systems resource management. ASPLOS, 2018. 4 …
Our ALERT System ? A L E R T DNN System DNN & Power Cap Selection L A E Feedback-based estimation Challenges Road Measurement • Configuration space is huge • Environment may change dynamically 5 • Must be low overhead
Our ALERT System ? A L E R T DNN System DNN & Power Cap Selection L A E Feedback-based ξ estimation Challenges Road Measurement • Configuration space is huge • Environment may change dynamically 6 • Must be low overhead
Evaluation Highlights ✔ ALERT satisfies LAE constraints. 99.9% cases for vision; 98.5% cases for NLP ✔ Probabilistic design overcomes dynamic variability efficiently. ALERT achieves 93-99% of Oracle’s performance ✔ Coordinating App- and Sys- level improves performance. Reduces 13% energy and 27% error over prior approach 7
Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 8
Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 9
Experiment Settings Platforms DNNs 4 4 ODroid, CPUs, GPU ResNet50, VGG16, RNN, Bert Tasks 3 Image classification (ImageNet) Sentence prediction (PTB) Question Answering (SQuAD) 10
Tradeoffs from DNNs 42 DNNs on ImageNet classifications 40 MobileNet-v1 (α=1) Top-5 Error Rate (%) 35 30 MobileNet-v2 (α=1.3) 25 ResNet50 20 NasNet-large 15 10 PnasNet-large 5 High accuracy comes with long latency. 0 0 0.05 0.1 0.15 0.2 Inference Time of One Image (s) 11
Tradeoffs from System Settings 16 Power limit setting (W) 15.5 Average Energy (J) 15 14.5 14 Fastest 13.5 Least Energy 13 12.5 12 No setting is optimal for both energy and latency. 0.07 0.09 0.11 0.13 0.15 0.17 Inference Time of One Image (s) 12
Run-time Variability Without co-locate job With co-locate job 13
Run-time Variability Latency variation increased by co-located jobs. Without co-locate job With co-locate job 14
Potential Solutions ∞ 100 Sys-level 90 App-level 80 Average Energy (J) Combined 70 60 50 40 30 20 10 Combining both level achieves best performance. 0 Deadline 0.1s 0.2s 0.3s 0.4s 0.5s 0.6s 0.7s Constraint Settings (deadline × accuracy_goal) 15
Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 16
Three Dimensions & Two Tasks Maximize Accuracy L A With energy consumption goal and inference deadline Inference Accuracy E Deadline Goal Minimize Energy With accuracy goal and inference Energy Consumption deadline Goal 17
Maximize Accuracy Task Configurations Constraints Optimization Power cap L <X 1,1 1,2 1,3 A max( ) 2,1 2,2 2,3 DNNs E 3,1 3,2 3,3 <Y 4,1 4,2 4,3 18
L How to estimate the inference latency? ● Two key challenges ○ Runtime variation: The inference time may be different even for same the configuration Profiling 50 Runtime 52 46 58 53 70 99 75 51 … 94 19
L How to estimate the inference latency? ● Two key challenges ○ Runtime variation ○ Too many combinations of DNNs and resources Power Cap p 1 p 2 … p k X X … X d 1 d 2 X X … X DNNs … … … … X X … X d l 20
L Potential Solution ● Kalman filter ○ Estimate latency for each configuration ○ Use recent execution history 51 52 43 58 49 DNN2, P1 29 31 30 DNN1, P2 History Prediction 21
L Potential Solution: drawback ● Cannot solve the problem ○ Not enough history for each configuration ? DNN1, P1 51 52 43 58 49 DNN2, P1 29 31 30 DNN1, P2 ? DNN2, P2 History Prediction 22
L How to estimate the inference latency? ● Global Slow-down factor ξ ○ Use recent execution history under any DNN or resources 40 60 ? DNN1, P1 34 51 DNN2, P1 ξ 20 30 DNN1, P2 150% 30 45 ? DNN2, P2 Profiling Runtime 23
L How to estimate the inference latency? ● Mean estimation is not sufficient ○ The variation might be too big to provide a good prediction. ● Different implications on DNN selection Mean Variation 50 52 43 58 49 Sequence 1 5 51 50 49 49 50 Sequence 2 1 15 99 10 70 50 Sequence 3 40 History Prediction 24
L How to estimate the inference latency? ● Global Slow-down factor ξ ○ Use recent execution history under any DNN or resources ○ Estimate its distribution: mean and variance Mean Variation ξ 52 43 58 49 50 5 History 25
A How to estimate accuracy under a deadline? ● Can inference be finished before deadline? ○ If yes, training accuracy of the selected DNN ○ If not, random guess accuracy Inference ■ Unless it’s an Anytime DNN. Accuracy ! " ! &'() # ",% Time 26
A What is an Anytime DNN? Deadline Road Traditional DNN Timeline Anytime DNN Chocolate Ground Road [1] C. Wan et. al. Orthogonalized SGD and Nested Architectures for Anytime Neural Networks . ICML, 2020. 27
A How to estimate accuracy under a deadline? ● Can inference be finished before deadline? ○ If yes, training accuracy of the selected DNN ○ If not, ■ Traditional DNN: random guess accuracy. ■ Anytime DNN: accuracy of the last output Inference Accuracy Inference Accuracy ! " ! + ! * ! " ! &'() ! &'() Time # " # * # + # " Time 28 Anytime DNN Traditional DNN
A How to estimate accuracy under a deadline? Accuracy- Latency Expectation Latency Distribution of Accuracy Relation 29
E How to manage energy? ● Power-cap as a knob to configure system resource ● Idle power: other process may still consume energy when DNN inference has finished Power DNN active1 DNN active2 DNN Idle Latency Target New input Time 30
E How to estimate the energy consumption? ● Estimate energy from power ○ DNN active power is power setting ○ DNN idle power is estimated by Kalman filter Power DNN active Power setting × time DNN Idle ?× time Time Latency Target New input 31
Our ALERT System ? A L E R T DNN System DNN & Power Cap Selection L A E Feedback-based estimation Road Measurement 32
Outline Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results 33
Experiment Settings Platforms Tasks 3 2 CPUs, GPU 1. Minimize energy 2. Maximize accuracy DNNs Scenarios 2 5 Sparse ResNet50, RNN Default, Compute intensive (2), Memory intensive (2) 34
Schemes Oracles • Oracle : Change configuration for every input. Assume perfect knowledge of future. Emulated from profiling result. • Oracle-static : Same configuration for all inputs. Baselines • Sys-only : Only adjust power-cap • App-only : Use an Anytime DNN • No-coord : Anytime DNN without coordination with power-cap 35
Evaluation: Scheduler Performance Average performance normalized to Oracle_Static (Smaller is better) 1.2 1.0 App-only 0.8 Sys-only 0.6 No-coord 0.4 Sys+App(ALERT) Oracle 0.2 Violations (%) 0.0 Minimize Energy 36
Evaluation: Scheduler Performance Average performance normalized to Oracle_Static (Smaller is better) 1.2 1.0 App-only 0.8 Sys-only 0.6 No-coord 0.4 Sys+App(ALERT) Oracle 0.2 Violations (%) 0.0 Minimize Error 37
How ALERT Works with Traditional DNN Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment 38
How ALERT Works with Traditional DNN Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment 39
How ALERT Works with Anytime +Traditional DNN Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment 40
Conclusion • Understand DNN inference challenges • ALERT Run-time inference System • High performance and energy efficiency 41
Recommend
More recommend