robust power estimation and simultaneous switching noise
play

Robust Power Estimation and Simultaneous Switching Noise Prediction - PowerPoint PPT Presentation

Robust Power Estimation and Simultaneous Switching Noise Prediction Methods Using Machine Learning March 20 th , 2019 Robust Simultaneous Switching Noise Prediction for Test using Deep Neural Network Seyed Nima Mozaffari, Bonita Bhaskaran,


  1. Robust Power Estimation and Simultaneous Switching Noise Prediction Methods Using Machine Learning March 20 th , 2019

  2. Robust Simultaneous Switching Noise Prediction for Test using Deep Neural Network Seyed Nima Mozaffari, Bonita Bhaskaran, Kaushik Narayanun Ayub Abdollahian, Vinod Pagalone, Shantanu Sarangi RTL-Level Power Estimation Using Machine Learning Mark Ren, Yan Zhang, Ben Keller, Brucek Khailany Yuan Zhou, Zhiru Zhang 2

  3. Robust Simultaneous Switching Noise Prediction for Test using Deep Neural Network Seyed Nima Mozaffari, Bonita Bhaskaran, Kaushik Narayanun Yuan Zhou, Zhiru Zhang Ayub Abdollahian, Vinod Pagalone, Shantanu Sarangi 3

  4. DFT – A BIRD’S EYE VIEW • At-Speed Tests – verify performance Stuck-at Tests – detect logical • faults • Parametric Tests – verify AC/DC parameters • Leakage Tests – catch defects that cause high leakage NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 4 Images – National Applied Research Laboratories

  5. SCAN TEST - SHIFT Primary Combinational Logic Primary Inputs Outputs Data Data Data D D D Sc an Out (SO) Sc an In (SI) Q Q Q SI SI SI Sc an Enable (SE) = 1 Sl ow capture c lk 0 clk clk clk Test Clk 1 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 5

  6. SCAN TEST - CAPTURE Primary Combinational Logic Primary Inputs Outputs Data Data Data D D D Sc an In (SI) Sc an Out (SO) Q Q Q SI SI SI Sc an Enable (SE) = 0 Sl ow capture c lk 0 clk clk clk Test Clk 1 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 6

  7. TEST WASTE FROM POWER NOISE 100 100 • Power balls overheated; Scan Freq target was lowered 90 90 80 80 • Slower frequency → Test Cost 70 70 Normalized Vdd % Normalized Dominant fclk % • Higher Vmin issue 60 60 50 50 Vmin thresholds had to be raised; impacts DPPM. • 40 40 • During MBIST, overheating was observed 30 30 Serialized tests; increase in Test Time & Test Cost • 20 20 10 10 • Vmin issues observed and being debugged 0 0 Nominal Test Voltage Freq Linear (Voltage) Linear (Freq) 7

  8. CAPTURE NOISE Low Power Capture Controller FF FF FF Q E TD_0 CG-0 SCAN IN CP FF FF FF Q E TD_1 CG-1 CP FF FF FF LPC Q E TD_2 CONTROLLER CG-2 CP JTAG FF FF FF E Q TD_15 CG-15 CP 8

  9. TEST NOISE ESTIMATION The traditional way Issues Pre-Silicon Estimation Post-Silicon Validation • Can simulate only a handful of vectors ATE Input files Not easy to pick top IR-Drop inducing • test patterns always IR Drop Hardware & Test Program Machine Time to simulate 3000 patterns • Analysis Dev is 6-7 years! Measurement is feasible for 3-5K • patterns Post-Processing Power noise during test <= functional Noise per pattern budget directly impacts test quality ! 9

  10. IMPORTANCE Test Coverage vs Test Time Strategy – we pick conservative 100 LPC settings! 90 80 TEST COVERAGE (%) 70 60 • Higher Test Time → Higher Test Cost 50 40 LPC7% LPC42 • For example - Test Time savings of 40% 30 LPC17% could have been achieved. LPC73 20 LPC40% LPC105 10 0 t1 t2 TEST TIME (mS) 10

  11. Why is Deep Learning a good fit? • Labeled data is available • Precision is not the focus • Need a prediction scheme that encompasses the entire production set 11

  12. PROPOSED APPROACH • Design Flow • Feature Engineering • Deep Learning Models • Classification and Regression 12

  13. PROPOSED APPROACH • Design Flow • Feature Engineering • Deep Learning Models • Classification and Regression 13

  14. DESIGN FLOW Goal: • Supervised learning model to reduce the time and effort spent • Most effective set of input features Dataset: • Input features → parameters that impact the V droop • Lebels → V droop values from silicon measurements • Train phase → train:80% & dev:10% • Inference phase → test:10% Addresses the following: • Takes into account all the corner cases for PVT f variations • Helps predict achievable V min • Cuts down post-silicon measurements – typically 6-8 weeks of engineering effort 14

  15. HARDWARE SET-UP AND SCOPESHOT Yellow – PSN Green – Scan Enable Purple – CLK Pink – Trigger 15

  16. MATLAB POST PROCESSING • To be able to accurately tabulate the VDD_Sense droop vs. respective clock domain frequency, a Matlab script is used. Inputs to this script are the stored “.bin” files from the scope • Outputs from Matlab script are: • 16

  17. SNAPSHOT OF DATASET Global Switch Freq Droop Granular Pattern Factor % Process Voltage Temp (MHz) IP Name Product LPC (mV) Features 1 2.00% 3 1 10 1000 1 2 3 30 2 3.00% 3 1 10 1000 1 2 3 35 3 3.00% 3 1 10 1000 1 2 3 35 4 4.00% 3 1 10 1000 1 2 3 35 5 3.00% 3 1 10 1000 1 2 3 33 6 2.00% 3 1 10 1000 1 2 3 33 7 60.00% 3 1 10 1000 1 2 3 100 8 45.00% 3 1 10 1000 1 2 3 85 9 65.00% 3 1 10 1000 1 2 3 105 10 36.10% 3 1 10 1000 1 2 3 60 11 36.00% 3 1 10 1000 1 2 3 61 12 33.00% 3 1 10 1000 1 2 3 60 13 50.00% 3 1 10 1000 1 2 3 90 . . . . . . . . . 2998 29.87% 3 1 10 1000 1 2 3 55 2999 47.84% 3 1 10 1000 1 2 3 85 3000 58.92% 3 1 10 1000 1 2 3 91 17

  18. DEPLOYMENT Goal • Optimize low power DFT architecture • Generate reliable test patterns PSN analysis is repeated • at various milestones of the chip design cycle and finalized close to tape-out. • until there are no violations for any of the test patterns. 18

  19. PROPOSED APPROACH • Design Flow • Feature Engineering • Deep Learning Models • Classification and Regression 19

  20. FEATURE ENGINEERING IP-level (Global) • GSF • PVT • PLL frequency f • LP_Value • Type SoC sub-block-level (Local) • LSF • Instance_Count • Sense_Distance • Area 20

  21. EXAMPLE: FEATURE EXTRACTION ➢ on-chip measurement point location Sub-Block-Level layout of an SoC ➢ sense point neighborhood-level graph ➢ global and local feature vectors 21

  22. PROPOSED APPROACH • Design Flow • Feature Engineering • Deep Learning Models • Classification and Regression 22

  23. DEEP LEARNING MODELS Fully Connected (FC) model • basic type of neural network and is used in most of the models. • Flattened FC model • Hybrid FC model Natural Language Processing-based (NLP) model • NLP is traditionally used to analyze human language data. • we apply the concept of the averaging layer to our IR drop prediction problem. • Model is independent of the number of sub-blocks in a chip. 23

  24. FLATTENED FC MODEL All the input features are applied simultaneously to the first layer. 24

  25. HYBRID FC MODEL Input features are divided into different groups, each applied to a different layer. 25

  26. NLP MODEL ➢ Local features of each sub-block form an individual bag of numbers. ➢ Filtered Average (FA): 1) filters out non-toggled sub-blocks, 2) calculates the average. 26

  27. PROPOSED APPROACH • Design Flow • Feature Engineering • Deep Learning Models • Classification and Regression 27

  28. CLASSIFICATION AND REGRESSION ➢ Classificationmodels predict a discrete value (or a bin). ➢ Regression models predict the absolute value. ➢ Optimization: Input Normalization, Adam optimizer, learning rate decay, L2 regularization ➢ Cost Function: 𝑛 𝐾 = 1 𝑛 ෍ 𝑀 𝑧 𝑗 , ො 𝑧 𝑗 + ∅(𝑥) 𝑗=1 ➢ Loss Function: 𝑀 𝑧 𝑗 , ො 𝑧 𝑗 𝑙 𝑡𝑟𝑠𝑢(1 𝑧 𝑗 2 ) −(𝑧 𝑗 log ො 𝑧 𝑗 + (1 − 𝑧 𝑗 )log(1 − ො 𝑧 𝑗 )) 𝑙 ෍ 𝑧 𝑗 − ො 𝑗=1 regression classification 28

  29. RESULTS Benchmark Information - 16nm GPU chips: Volta-IP1 and Xavier-IP2 ➢ Local features are wrapped with zero-padding (only for FC) ➢ Approximately 90% of the samples for training and validation ➢ Approximately 10% of the samples for inference. Models were developed in Python using T ensorFlow and NumPy libraries. Models were run on a cloud-based system with 2 CPUs, 2 GPUs and 32GB memory. GPU No. of Features No. of Train Samples No. Inference Samples Volta-IP1 323 16500 1500 Xavier-IP2 239 2500 500 29

  30. RESULTS Train Inference Train Time MAE Dataset Model-Architecture Accuracy (%) Accuracy (%) (minutes) (mV) 94.5 94.5 10 7.30 Classification-Flattened FC 96.0 96.0 3 6.90 Classification-Hybrid FC Volta-IP1 92.6 92.6 80 7.46 Classification-NLP + 98.0 93.0 9 7.79 Regression-Flattened FC Xavier-IP2 98.0 96.0 3 7.25 Regression-Hybrid FC 95.0 95.0 90 7.28 Regression-NLP Average run-time or prediction time ➢ For a 500-patternset Method Run-Time Pre-Silicon Simulation 416 days Post-Silicon Validation 84 mins Proposed 0.33 secs 30

  31. RESULTS Correlation between the predicted and the silicon-measured V droop Classification Regression 31

  32. FUTURE WORK • Train and apply DL for in-field test vectors noise estimation • Shift Noise prediction • Additional physical parameters • Other architectures 32

  33. RTL-Level Power Estimation Using Machine Learning Yuan Zhou, Zhiru Zhang Mark Ren, Yan Zhang, Ben Keller, Brucek Khailany

Recommend


More recommend