Neu euOS OS: : A L A Lat aten ency cy-Pr Pred edict ictabl able e Mul ulti ti-Di Dime mens nsion ional al Op Opti timi mizat atio ion n Fram amewor ork k for or DN DNN-dr driv iven en Aut uton onom omous ous Sys ystem ems Soroush Bateni The University of Texas at Dallas Cong Liu The University of Texas at Dallas
Background The e ta tale of e of tw two w o wor orlds ds Deep Neural Networks (DNNs) Autonomous Embedded Systems Σ Σ Σ Σ Σ Σ FC FC Autonomous SoftMax Decision 2
Background The e ta tale of e of tw two w o wor orlds ds Deep Neural Networks (DNNs) Autonomous Embedded Systems Σ Σ Σ Σ Σ Σ Main Objectives Main Objective • Timing predictability • Energy efficiency • Maximum Accuracy • Safety FC FC Autonomous SoftMax Decision 3
Background Mar arriage iage betw tween een th the tw e two w o wor orlds ds Deep Neural Networks (DNNs) Autonomous Embedded Systems Σ Σ Σ Σ Σ Σ FC FC Autonomous SoftMax Decision 4
Background The e big ig pic ictur ture Hardware/software DNN stack for executing DNNs in Autonomous Embedded Systems Framework/OS 5
Background The e big ig pic ictur ture Hardware/software DNN stack for executing DNNs in Autonomous Embedded Systems Framework/OS The focus of related research in AES is currently mostly on the DNN and the hardware. 6
Background The e big ig pic ictur ture Efficient DNNs Hardware/software DNN • Quantization stack for executing • Lowrank approximation DNNs in Autonomous Embedded Systems Framework/OS 7
Background The e big ig pic ictur ture Hardware/software DNN stack for executing DNNs in Autonomous Embedded Systems Framework/OS Special Processors • AI accelerators • DNN-focused SoCs 8
Goals Wh Wher ere syst e system em sof softw twar are/fr e/frame amewor ork k ca can n hel elp DNN Challenges • Meet timing requirements Framework/OS • Be energy efficient • Minimize accuracy loss. All the above goals must be achieved at the same time. 9
Motivation Jack Ja ck of of al all tr trad ades, es, ma mast ster er of of no none ne Timing predictable & Timing predictable & Master of none energy efficient accurate Combining the two (even at Can be achieved at system Can be achieved at different rates) will yield level via Dynamic Voltage application level via DNN unpredictable results. Frequency Scaling (DVFS). configuration change. 10
Motivation Jack Ja ck of of al all tr trad ades, es, ma mast ster er of of no none ne Timing predictable & Timing predictable & Master of none energy efficient accurate Combining the two (even at Can be achieved at system Can be achieved at different rates) will yield level via Dynamic Voltage application level via DNN unpredictable results. Frequency Scaling (DVFS). configuration change. 11
Motivation Jack Ja ck of of al all tr trad ades, es, ma mast ster er of of no none ne Timing predictable & Timing predictable & Master of none energy efficient accurate Combining the two (even at Can be achieved at system Can be achieved at Need per-layer Need per-layer different rates) will yield level via Dynamic Voltage application level via DNN Need coordination. unpredictable results. adjustments. adjustments. Frequency Scaling (DVFS). configuration change. 12
Motivation No o on one is a e is alon one Multiple ResNet-50 instances executed together Takeaways The underlying system-level solution here is PredJoule 1 The first DNN instance is the winner, other DNN instances not as lucky because the method used here is greedy. The DVFS configurations chosen only work well for the first DNN instance. 1 Bateni, Soroush, Husheng Zhou, Yuankun Zhu, and Cong Liu. "Predjoule: A timing-predictable energy optimization framework for deep neural networks." In 2018 IEEE Real-Time Systems Symposium (RTSS) 13
Motivation No o on one is a e is alon one Multiple ResNet-50 instances executed together Takeaways The underlying system-level solution here is PredJoule 1 The first DNN instance is the winner, other DNN instances not as lucky Need cross-DNN because the method used coordination. here is greedy. The DVFS configurations chosen only work well for the first DNN instance. 1 Bateni, Soroush, Husheng Zhou, Yuankun Zhu, and Cong Liu. "Predjoule: A timing-predictable energy optimization framework for deep neural networks." In 2018 IEEE Real-Time Systems Symposium (RTSS) 14
Design Desig Des ign Goal n Goals Core Targets Optimization Targets Timing ng predicta ctable: the system must meet The system must also be flexible to adapt to • deadlines set by the system designer for the different system constraints. We offer three DNN. optimization targets (switchable by an • Energy y efficient icient: the system must use DVFS to external policy controller): achieve near-optimal energy usage for DNNs. • Min Energy y (M p ) ) is used when our design is deployed in Ac Accu curat ate: the system can change accuracy • extremely low power scenario such as remote sensing. dynamically but must do so cautiously. • Max Ac Accur urac acy y (M A ) ) is used when our design is deployed • Multi ti-DNN DNN compa mpati tibility ility: the system should be in extremely mission-critical scenarios. able to coordinate and find an efficient • Balanc nced ed Energy y and Ac Accur urac acy y is the scenario where our solution for all DNN instances. design can choose what is best given the timing requirement. 15
Design Tim imin ing predi edict ctabi abili lity ty LAG analysis Proportional Deadline Keep track of per-layer progress Build an ideal schedule by setting sub • • deadlines in proportion to their execution Tracked execution time time Per-layer execution time Accumulative LAG Per-layer sub-deadline Per-layer sub-deadline End-to-end deadline for the DNN instance 16
Design Coo oordi dinati nation on Building a cohort ∆ Calculator X i Calculator We keep a pair of local 1. Based on the last reported 1. For each element of ∆, values of LAG in the calculate the required variables for each DNN cohort, calculate a (further) speedup (or instance. speedup (or slowdown) slowdown) for other DNN Lookup 1 the best possible instances. 2. This time, lookup 1 the best DVFS configuration for 2. that slowdown. possible approximation configuration that 3. The output is a list (∆) of matches that slowdown. optimal DVFS configurations for each DNN instance. 1 Please see the paper and the source code for more information. 17
Design Op Opti timi mizat atio ion The decision tree Overview of modes 2 … Cohort 1 n-1 n ∆ Calculator … ∆= δ 1 δ 2 δ n-1 δ n … X i Calculator X i Calculator … … 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 18
Design Op Opti timi mizat atio ion The decision tree Overview of modes 2 … Choosing a δ (DVFS configuration) will have consequences in Cohort 1 n-1 n terms of accuracy for all DNNs in the cohort. Therefore, the question is, which δ is the best? ∆ Calculator Min. . Energy (M p ) ) chooses the δ that has the least PowerUp value in the … ∆= δ 1 δ 2 δ n-1 δ n PowerUp/SpeedUp table, without looking at accuracy loss. Max. Ac Accur uracy cy (M A ) ) chooses the δ so to minimize the value of σ ∀𝜀 𝑗 𝑇 𝐵𝑗 . … X i Calculator X i Calculator Balanc nced ed Energy y and Ac Accur urac acy y uses the Bivariate Regression Analysis (BRA) to … … achieve a balanced approach backed by statistical analysis of the tree 1 . 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 𝑇 𝐵1 𝑇 𝐵2 𝑇 𝐵𝑜 1 Please see the paper for more information. 19
Implementation and Evaluation Over Ov ervie view Based on Caffe Tested extensively Available as an open-source project on Tested on NVIDIA Jetson TX2 and Jetson • • GitHub AGX Xavier No need to use APIs Tested using image recognition DNNs • • No need to redesign DNN models • AlexNet, GoogleNet, ResNet-50, VGGNet • Tested using three cohort sizes • Need to generate • • Small: 1 DNN instance • Hash tables • Medium: 2-4 DNN instances • Lowrank approximated version of your DNN model. • Large: 6-8 DNN instances We include a mixed scenario that uses a • combination of all the DNN models 20
Evaluation Ene Energy 68% avg. improvement on TX2 70% avg. improvement on TX2 46% avg. improvement on AGX Xavier 21
Evaluation Ene Energy 22
Evaluation La Latenc ency 23
Evaluation La Latenc ency 68% avg. improvement on TX2 53% avg. improvement on TX2 40% avg. improvement on AGX Xavier 32% avg. improvement on AGX Xavier 24
Evaluation Tai ail La Latenc ency Small cohort Medium cohort Large cohort 3.25% deadline miss rate. Deadline miss rate same as Deadline miss rate same as the small cohort. the small cohort. 25
Evaluation Rel elat ativ ive e Acc ccur uracy acy 26
Evaluation Fl Flexi xibil ilit ity 27
Evaluation Fl Flexi xibil ilit ity 11759 DVFS configurations on Jetson TX2. 51967 DVFS configurations on Jetson AGX Xavier. 28
Recommend
More recommend