Machine Learning For Feature‐Based Analytics Li‐C. Wang University of California, Santa Barbara ISPD 2018 Monterey CA ‐ Wang 1
Machine Learning Machine Model Data Learning Machine Learning is supposed to construct an “optimal” model to fit the data (whatever “optimal” means) 2 ISPD 2018 Monterey CA ‐ Wang
ML Tools: e.g. http://scikit‐learn.org/ 3 ISPD 2018 Monterey CA ‐ Wang
Dataset Format features vectors samples labels A learning tool usually takes the dataset as above – Samples: examples to be reasoned on – Features: aspects to describe a sample – Vectors: resulting vector representing a sample – Labels: care behavior to be learned from (optional) 4 ISPD 2018 Monterey CA ‐ Wang
Noticeable ML Applications In Recent Years Self‐Driving Car Mobile Google Translation AlphaGo (Google) Smart Robot *These images are found in public domain 5 ISPD 2018 Monterey CA ‐ Wang
Take Image Recognition As An Example ImageNet: Large Scale Visual Recognition Challenge (http://www.image‐net.org/challenges/LSVRC/) – 1000 Object Classes, 1.4M Images Top‐5 error rate 28,20% 30,00% 25,80% 2016 CUImage: 269 Layers 25,00% 20,00% 16,40% 22‐Layer 15,00% 11,70% GoogleNet 8‐Layer 10,00% 7,30% AlexNet 6,70% 5,10% 3,57% 5,00% 8‐Layer 19‐Layer ZFNet VGG 0,00% 2010 2011 2012 2013 2014 2014 2015 Human 152‐Layer ResNet http://www.image‐net.org/ Also see: O. Russakovsky et al. rXiv:1409.0575v3 [cs.CV] 2014 6 ISPD 2018 Monterey CA ‐ Wang
Deep Learning for Image Recognition ImageNet: Large Scale Visual Recognition Challenge (http://www.image‐net.org/challenges/LSVRC/) – 1000 Object Classes, 1.4M Images Top‐5 error rate 28,20% 30,00% 25,80% 2016 CUImage: 269 Layers 25,00% 1 st Enabler: The availability 20,00% 16,40% of a large dataset to enable 22‐Layer 15,00% 11,70% the study of deeper neural GoogleNet 8‐Layer 10,00% 7,30% AlexNet 6,70% network 5,10% 3,57% 5,00% 8‐Layer 19‐Layer ZFNet VGG 0,00% 2010 2011 2012 2013 2014 2014 2015 Human 152‐Layer ResNet http://www.image‐net.org/ Also see: O. Russakovsky et al. rXiv:1409.0575v3 [cs.CV] 2014 7 ISPD 2018 Monterey CA ‐ Wang
Deep Learning for Image Recognition ImageNet: Large Scale Visual Recognition Challenge (http://www.image‐net.org/challenges/LSVRC/) – 1000 Object Classes, 1.4M Images Top‐5 error rate 28,20% 30,00% 25,80% 2016 CUImage: 269 Layers 25,00% 2 nd Enabler: The availability 20,00% 16,40% of efficient hardware to 22‐Layer 15,00% 11,70% enable training with such a GoogleNet 8‐Layer 10,00% 7,30% AlexNet 6,70% large neural network 5,10% 3,57% 5,00% 8‐Layer 19‐Layer ZFNet VGG 0,00% 2010 2011 2012 2013 2014 2014 2015 Human 152‐Layer ResNet http://www.image‐net.org/ Also see: O. Russakovsky et al. rXiv:1409.0575v3 [cs.CV] 2014 8 ISPD 2018 Monterey CA ‐ Wang
Question Often Asked By A Practitioner Which tool is better? In many EDA/Test applications, it is not just about the tool! 9 ISPD 2018 Monterey CA ‐ Wang
Applications – Experience Supervised learning Unsupervised learning Classification Regression Transformation Clustering Rule Learning Outlier Apply Delay test Fmax prediction Layout Test cost reduction hotspot Functional Po‐Si Validation Yield verification Customer return Design‐silicon timing correlation Pre‐silicon Post‐silicon Post‐shipping See: Li‐C. Wang, “Experience of Data Analytics in EDA and Test – Principles, Promises, and Challenges,” TCAD Vol 36, Issue 6, June 2017 10 ISPD 2018 Monterey CA ‐ Wang
Challenges in Machine Learning for EDA/Test Data – Data can be rather limited – Data can be extremely unbalanced (very few positive samples of interest, many negative samples) – Cross‐validation is not an option Model Evaluation – The meaningfulness of a model specific to the context – Model evaluation can be rather expensive 11 ISPD 2018 Monterey CA ‐ Wang
e.g. Functional Verification Simulation Environment � � � � Design Functional Simulation Tests Traces Coverage � � Point CP Goal: to achieve more coverage on CP Approach: Analyze simulation traces to find out – What combination of signals can activate CP? Features: � are testbench‐controllable signals � � Data: Few or no samples that cover CP – Positive Samples: 0 to few – Negative Samples: 1K to few K’s 12 ISPD 2018 Monterey CA ‐ Wang
e.g. Physical Verification Start End Goal: to model causes for an issue Approach: Analyze snippets of layout images to find out – What combination of features can cause a issue? Features: � � , � � , ⋯ , � � are developed based on domain knowledge to characterize geometry or material properties Data: Few samples for a particular type of issue – Positive Samples: 1 to few – Negative Samples: many 13 ISPD 2018 Monterey CA ‐ Wang
e.g. Timing Verification Step 1 Step 2 Step 3 Step 4 STA slack Slack > x (total 350 paths) e.g. Slack <= x (total 130 paths) Goal: to model causes for a miss‐predicted silicon critical path Approach: Analyze unexpected silicon critical paths – What combination of design features can cause an unexpected critical path? Features: � � , � � , ⋯ , � � are developed based on design knowledge to characterize a timing path Data: Few samples for a particular type of critical path – Positive Samples: 1 to few – Negative Samples: many (STA critical but not silicon critical – about $25K paths) 14 ISPD 2018 Monterey CA ‐ Wang
e.g. Yield 77 GHz @ Hot Test Value (μ ± σ) Test Limits 76 GHz @ Cold Wafers Goal: to find a receipt to improve yield Approach: Analyze wafer yield data with process parameters – Tuning what combination of process parameters can improve yield? Features: � � , � � , ⋯ , � � are tunable process parameters Data: Samples can be parts or wafers – Positive Samples: Failing parts or Low‐yield wafers – Negative Samples: Others 15 ISPD 2018 Monterey CA ‐ Wang
Feature‐Based Analytics Problem: – Search for a combination of features or feature values among a large set of features Data: – Interested in positive samples – Extremely unbalanced – Many more negative samples and very few positive samples Not a traditional feature selection problem – Insufficient data – Cannot apply cross‐validation to check a model 16 ISPD 2018 Monterey CA ‐ Wang
In Practice, This Is What Happens n features to consider Selected Features Run ML Run ML Run ML Check Result Check Result Check Result Learning from data becomes an iterative search process (usually run by a person) 17 ISPD 2018 Monterey CA ‐ Wang
An Iterative Search Process The Analyst Layer Meaningful Sample Feature Dataset Model Data Models Selection Selection Construction Evaluation Models Machine Learning Toolbox Learning is an iterative search process The analyst – (1) Prepare the datasets to be analyzed – (2) Determine if the results are meaningful The effectiveness depends on how the analyst conducts these two steps – not just about the tool in use! 18 ISPD 2018 Monterey CA ‐ Wang
Implications The Analyst Layer Meaningful Sample Feature Dataset Model Data Models Selection Selection Construction Evaluation Models Machine Learning Toolbox The effectiveness of the search largely depends on how the Analyst Layer is conducted 19 ISPD 2018 Monterey CA ‐ Wang
Implications The Analyst Layer Meaningful Sample Feature Dataset Model Data Models Selection Selection Construction Evaluation Models Machine Learning Toolbox The Analyst Layer demands a Machine Learning Toolbox where the model can be assessed WITHOUT cross‐validation 20 ISPD 2018 Monterey CA ‐ Wang
Implications The Analyst Layer Meaningful Sample Feature Dataset Model Data Models Selection Selection Construction Evaluation Models Machine Learning Toolbox Automation requires automating both the Analyst Layer and the Machine Learning Toolbox 21 ISPD 2018 Monterey CA ‐ Wang
Machine Learning Toolbox ISPD 2018 Monterey CA ‐ Wang 22
Questions Recall main issue: We can’t apply cross‐validation Why do we need cross‐validation? Why can a machine learning algorithm guarantees the accuracy of its output model? What’s a machine learning algorithm trying to optimize anyway? 23 ISPD 2018 Monterey CA ‐ Wang
Five Assumptions To Machine Learning (2) D f Hypothesis Sample Function Space H Generator G y=f(x) (1) ( x , y ) (3) m samples (5) Learning Hypothesis h Algorithm L (4) A restriction on H (otherwise, NFL) 1) An assumption on D (i.e. not time‐varied) 2) Assuming size m is in order O(poly( n )), n : # of 3) features Making sure a practical algorithm L exists 4) Assuming a way to measure error, e.g. Err(f(x), h(x)) 5) 24 ISPD 2018 Monterey CA ‐ Wang
In Practice (2) D f Hypothesis Sample Function Space H Generator G y=f(x) (1) ( x , y ) (3) m samples (5) Learning Hypothesis h Algorithm L (4) Because we don’t know how complex H should be, we assume the most complex H we can afford in training 25 ISPD 2018 Monterey CA ‐ Wang
Recommend
More recommend