USING MACHINE LEARNING FOR VLSI TESTABILITY AND RELIABILITY Mark Ren, Miloni Mehta
TAKE-HOME MESSAGES • Machine learning can improve approximate solutions for hard problems. • Machine learning can accurately predict and replace brute force methods for computational expensive problems. 2
VLSI TESTABILITY AND RELIABILITY Design Manufacturing Wafer Chip Testability Reliability Pass Fail Years Testing NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 3
PART 1 Testability Prediction and Test Point Insertion with Graph Convolutional Network (GCN) Mark Ren, Brucek Khailany, Harbinder Yuzhe Ma, Bei Yu Sikka, Lijuan Luo, Karthikeyan Natarajan “High Performance Graph Convolutional Networks with Applications in Testability Analysis”, to appear in Proceedings of Design Automation Conference, 2019 4
PART 2 Full Chip FinFET Self-heat Prediction using Machine Learning Miloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski 5
PART 1 OUTLINE Introduction Learning model for testability analysis and enhancement Practical issues Scalability Data imbalance 6
HOW DO WE TEST A CHIP ? 010101 100010 010101 010101 101111 000101 101111 111111 001011 100111 001011 001111 110101 011101 110101 110101 output patterns golden patterns Input patterns Stuck-at-0 GND fault 7
TESTABILITY PROBLEM B’s faults unobservable → Difficult-to-test (DT) Test Point (TP) Stuck-at-0 fault i9 B B’s faults are observable with an inserted register i10 GND O i0 i1 i2 Almost always 0 i3 i4 A i5 i6 i7 Almost always 0 i8 8
MOTIVATION Test Point Insertion Problem: Pick the smallest number of test points to achieve the largest testability enhancement Number of test points → chip area cost Number of test patterns → test time Hard problem, only approximate solutions exist Commercial solution: Synopsys TetraMax Can we improve it with Machine Learning? Predict testability Select test points 9
ML BASED TESTABILITY PREDICTION Given a circuit, predict which gate outputs are difficult-to-test (DT) Gate Features: [logic level, SCOAP_C0, SCOAP_C1, SCOAP_OB] Gate Label: DT (0 or 1) generated by TetraMax Input Features Output classification N1: 0,0,1,1 N1: 0 ML Model N2: 1,0,1,0 N2: 1 N3: 2,0,1,1 N3: 0 . . . . . . 10
BASIC MACHINE LEARNING MODELING Did not fully leverage the inductive bias of circuit structure fanin fanout 12 9 3 7 1 10 a 6 4 2 8 13 11 5 ML Models a is DT LR F(a) = [F a , F 1 , F 2 , F 3 , F 4 , F 5 , F 6 , F 7 , F 8 , F 9 , F 10 ] RF SVM MLP a is not DT fanin fanout 11
GRAPH CONVOLUTIONAL NETWORK (GCN) 1 3 2 8 4 9 5 7 6 Aggregation (mean, sum) Encoding (R 4 → R 32 ,Relu) 12
GCN BASED TESTABILITY PREDICTION Fully Connected Layers Layer 2 Layer 3 Layer 1 1 1 0 0 0 0 Weighted sum Weighted sum Weighted sum (64,64,128,2) & Relu(R 4 → R 32 ) & Relu(R 32 → R 64 ) & Relu(R 64 → R 128 ) 13
ACCURACY IMPACT OF GCN LAYERS (K) Training Accuracy (%) Testing Accuracy(%) 100 100 95 95 90 90 85 85 K=1 80 80 75 75 K=2 70 70 K=3 65 65 60 60 1 31 61 91 121 151 181 211 241 271 1 31 61 91 121 151 181 211 241 271 Epochs Epochs 14
EMBEDDING VISUALIZATION • Embeddings looks more discriminative as stage increase; K=1 K=2 K=3 15
MODEL COMPARISON ON BALANCED DATASET Compare with basic ML modeling: LR, RF, MLP, SVM N=500 nodes in fanin cone and 500 nodes in 1 0.9 0.8 fanout cone, a total of 1000 nodes 0.7 0.6 0.5 0.4 0.3 Compare to 3-layer GCN 0.2 0.1 0 Less than 1000 nodes influence each node, comparable with the baseline GCN has the best accuracy (93%). Precision Recall F1 score Accuracy 16
TEST POINT INSERTION WITH GCN MODEL Circuit Graph GCN Model TP Candidates Graph An iterative process to select TPs Modification enabled by GCN model Graph Modification Select TP candidate based on GCN Model predicted impact new TP Number of reduced DTs in the Impact fanin cone of TP Estimation N new TP Point Selection Done? Y Final TPs 17
TEST POINT INSERTION RESULTS COMPARISON Machine learning can improve approximate solutions for hard problems 11% less test points with 6% less test pattern under same coverage vs TetraMax. 30.00% 25.00% 20.00% 15.00% Test point reduction 10.00% Test pattern reduction 5.00% 0.00% 1 2 3 4 -5.00% -10.00% -15.00% 18
MODEL SCALABILITY Choices of model implementation Batch processing: Recursion Full graph: Sparse matrix multiplication 𝐹 𝑙 = 𝑆𝑓𝑀𝑉((𝐵 ∗ 𝐹 𝑙−1 ) ∗ 𝑋 𝑙 ) Tradeoff Memory vs speed 1M nodes/second on Volta GPU 19
MULTI GPU TRAINING Training dataset has multiple million gates designs that can not fit on one GPU Data parallelism, each GPU computes one design/graph Shared model Graph1 GPU1 Replicate models across multiple GPUs Shared model Graph2 GPU2 Leverage PyTorch DataParallel module Δ Shared model Trained with 4 Tesla V100 GPUs on DGX1 Graph3 GPU3 Shared model Graph4 GPU4 20
IMBALANCE ISSUE It is very common to have much more non-DTs (negative class) than DTs (positive class), imbalance ratio more than 100X Classifier 1: ok precision, low recall Classifier 2: high recall, low precision Predict: 0 Predict: 1 Predict: 0 Predict: 1 Fact: 0 133576 290 Fact: 0 100919 32927 Fact: 1 114 4069 Fact: 1 3681 432 Recall: 10.5% Recall: 97.3% Precision: 59.8% Precision: 11.0% 21
MULTI-STAGE CLASSIFICATION The networks on initial stages only filter out negative data points with high confidence High recall, low precision Positive predictions are sent to the network on the next stage + + + - - - Network 1 Network 2 Network 3 22
MULTI-STAGE CLASSIFICATION RESULT Balanced Recall and Precision Pred: 0 Pred: 1 Pred: 0 Pred: 1 Pred: 0 Pred: 1 Fact: 0 100919 32927 Fact: 0 26935 5992 Fact: 0 5207 785 Fact: 1 114 4069 Fact: 1 221 3848 Fact: 1 309 3539 Stage 1 Stage 3 Stage 2 Recall: 97.3% Recall: 92.05 Recall:94.6% Precision: 11.0% Precision: 81.8% Precision: 39.1% Pred: 0 Pred: 1 Fact: 0 785 133061 Fact: 1 574 3539 Overall Recall: 86.0% Precision: 81.8% 23
PART 1 - SUMMARY Machine learning can improve VLSI design testability beyond the existing solution Predictive power of ML model Graph based model is suitable for VLSI problems Practical issues such as scalability and data imbalance need to be dealt with 24
PART 2 Full Chip FinFET Self-heat Prediction using Machine Learning Miloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski 25
VLSI TESTABILITY AND RELIABILITY Design Manufacturing Wafer Chip Testability Reliability Pass Fail Years Testing NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 26
SEMICONDUCTOR RELIABILITY Source: https://semiengineering.com/improving-automotive-reliability/ 27
RELIABILITY DEVICE SELF-HEAT (SH) Active power in transistors dissipated as heat to the surroundings FinFETs are more sensitive to SH than planar devices Why do we care? Exacerbates Electro-migration (EM) on interconnects 16ff EM limit reduction vs Temperature Transistor threshold voltage (V t ) shifts 2.5 2 Time dependent dielectric breakdown (TDDB) EM rating factor (Imax) 1.5 1 0.5 0 90 110 130 150 170 28 Temperature
SH METHODOLOGIES SO FAR No sign-off tool that can handle full chip SH analysis 2D LUT vs Spice comparison Limitations using Spice simulations Impractical to run on billions of transistors LUT Teams review high power density cells Temperature(C) SPICE 2D Look-up Table approach Based on frequency and capacitive loading for different clock drivers Reduced run time by more than 90% over full Spice simulations Pessimistic wrt Spice 29
SELF-HEAT TRENDS Frequency ∝ SH Capacitive loading ∝ SH Normalised SH Predicted SH Cell size ∝ 1/SH Resistance ∝ 1/SH (non- linear) R/C (1e12) Frequency 30
MOTIVATION TO USE ML Identify problematic cells in the design without exhaustive Spice simulations Complex relationship between design and SH Design database available for several projects Reusability across projects Focus Clock inverters and buffers Quick, easy, light-weight Rank cells above certain SH threshold for thorough analysis 31
MACHINE LEARNING MODEL Get Attributes X training from PrimeTime Select Generate Training ML Model Data Ready for Deployment Equation: Simulate in Y training No Y^ = ? HSPICE Yes Validation X Y pred- Get Attributes Prediction on (Predicted == test test from PrimeTime Test Set Spice)? Select Test Data Y Simulate in test HSPICE 32
DATASET SELECTION Cover a wide range of frequencies Cover different types of standard cell sizes Prevent duplication in training data due to replicated partitions/chiplets Outliers in the design chosen Labels obtained through Spice simulations (supported from foundry spice models) TSMC 16nm FinFET training model used 4300 training samples with 9 features 33
DNN REGRESSOR MODEL Xn1 X 11 X 12 . . . X 19 X 21 X 22 . . . X 29 . Predicted Xn2 . Self-Heat Yn^ . X n1 X n2 . . . X n9 Output Features: layer Output Capacitance Cost = Σ (Y pred - Y) 2 Xn9 Frequency Cell size N Net resistance Input slew Input Layer Output slew hidden hidden hidden # of output loads layer 1 layer 2 Input Capacitance of loads layer 3 Avg transition on load 34
Recommend
More recommend