Imbalance Aware Lithography Hotspot Detection: A Deep Learning Approach Haoyu Yang 1 , Luyang Luo 1 , Jing Su 2 , Chenxi Lin 2 , Bei Yu 1 1 The Chinese University of Hong Kong 2 ASML Brion Inc. Mar. 1, 2017 1 / 34
Outline Introduction Network Architecture Imbalance Aware Learning Experimental Results 2 / 34
Outline Introduction Network Architecture Imbalance Aware Learning Experimental Results 3 / 34
Moore’s Law to Extreme Scaling 3 / 34
4 / 34
Lithography Hotspot Detection Ra#o%of%lithography%simula#on%#me% Required(computa/onal( ◮ What you see � = what you get /me(reduc/on! � (normalized%by%40nm%node)% ◮ Even w. RET: OPC, SRAF, MPL ◮ Still hotspot: low fidelity patterns ◮ Simulations: extremely CPU intensive Technology%node � 5 / 34
Layout Verification Hierarchy Sampling Increasing Hotspot Detection verification accuracy Lithography Simulation (Relative) CPU runtime at each level ◮ Sampling : scan and rule check each region ◮ Hotspot Detection : verify the sampled regions and report potential hotspots ◮ Lithography Simulation : final verification on the reported hotspots 6 / 34
Pattern Matching based Hotspot Detection library' hotspot& hotspot& Pa)ern' hotspot& matching' 7 / 34
Pattern Matching based Hotspot Detection detected � undetected � library' hotspot& hotspot& hotspot& Pa)ern' detected � hotspot& matching' Cannot&detect& hotspots¬&in& the&library& ◮ Fast and accurate ◮ [Yu+,ICCAD’14] [Nosato+,JM3’14] [Su+,TCAD’15] ◮ Fuzzy pattern matching [Wen+,TCAD’14] ◮ Hard to detect non-seen pattern 7 / 34
Machine Learning based Hotspot Detection Hotspot& detec*on& Classifica*on& model& Extract&layout& features& 8 / 34
Machine Learning based Hotspot Detection Non$ Hotspot � Hotspot& Hard,to,trade$off, detec*on& accuracy,and,false, Classifica*on& Hotspot � alarms, model& Extract&layout& features& ◮ Predict new patterns ◮ Decision-tree, ANN, SVM, Boosting ... ◮ [Drmanac+,DAC’09] [Ding+,TCAD’12] [Yu+,JM3’15] [Matsunawa+,SPIE’15] [Yu+,TCAD’15][Zhang+,ICCAD’16] ◮ Crafted features are not satisfactory ◮ Hard to handle ultra-large datasets. 8 / 34
Why Deep Learning? ◮ Feature Crafting v.s. Feature Learning Although prior knowledge is considered during manually feature design, information loss is inevitable. Feature learned from mass dataset is more reliable. ◮ Scalability With shrinking down circuit feature size, mask layout becomes more complicated. Deep learning has the potential to handle ultra-large-scale instances while traditional machine learning may suffer from performance degradation. ◮ Mature Libraries Caffe [Jia+,ACMMM’14] and Tensorflow [Martin+,TR’15] 9 / 34
Hotspot-Oriented Deep Learning Deep Learning has been widely appied in object recognition tasks. Nature of mask layout impedes the availability of existing frameworks. ◮ Imbalanced Dataset Lithographic hotspots are always the minority. ◮ Larger Image Size Effective clip region ( > 1000 × 1000 pixels) is much larger than the image size in traditional computer vision problems. ◮ Sensitive to Scaling Scaling of mask layout patterns modifies its attributes. 10 / 34
Deep Learning based Hostpot Detection Flow Upsampling Training Training Trained … Random Data Set Model Mirroring Accuracy Validation Validation False Alarm Model Testing Testing Data Set 11 / 34
Outline Introduction Network Architecture Imbalance Aware Learning Experimental Results 12 / 34
CNN Architecture Overview ◮ Convolution Layer ◮ Rectified Linear Unit (ReLU) ◮ Pooling Layer ◮ Fully Connected Layer … max(0,x) CONV ReLU POOL …… Hotspot max(0,x) Non-hotspot CONV ReLU POOL FC 12 / 34
Convolution Layer Convolution Operation: c m m � � � I ⊗ K ( x , y ) = I ( i , x − j , y − k ) K ( j , k ) i = 1 j = 1 k = 1 … … max(0,x) max(0,x) CONV CONV ReLU ReLU POOL POOL …… …… Hotspot Hotspot max(0,x) max(0,x) Non-hotspot Non-hotspot CONV CONV ReLU ReLU POOL POOL FC FC 13 / 34
Convolution Layer (cont.) Effect of different convolution kernel sizes: (a) 7 × 7 (b) 5 × 5 (c) 3 × 3 Kernel Size Padding Test Accuracy ∗ 7 × 7 3 87.50% 5 × 5 2 93.75% 3 × 3 1 96.25% ∗ Stop after 5000 iterations. 14 / 34
Rectified Linear Unit … max(0,x) CONV ReLU POOL …… Hotspot max(0,x) Non-hotspot CONV ReLU POOL FC ◮ Alleviate overfitting with sparse feature map ◮ Avoid gradient vanishing problem Activation Function Expression Validation Loss max { x , 0 } ReLU 0.16 1 Sigmoid 87.0 1 + exp ( − x ) exp ( 2 x ) − 1 TanH 0.32 exp ( 2 x )+ 1 log ( 1 + exp ( x )) BNLL 87.0 WOAF NULL 87.0 15 / 34
Pooling Layer … max(0,x) CONV ReLU POOL …… Hotspot max(0,x) Non-hotspot CONV ReLU POOL FC ◮ Extracts the local region statistical attributes in the feature map 1 2 3 4 1 2 3 4 3.5 5.5 5 6 7 8 6 8 5 6 7 8 AVEPOOL MAXPOOL 11.5 13.5 9 10 11 12 14 16 9 10 11 12 13 14 15 16 13 14 15 16 (a) max pooling (b) avg pooling 16 / 34
Pooling Layer (cont.) ◮ Translation invarient ( ✘ ) ◮ Dimension reduction Effect of pooling methods: Pooling Method Kernel Test Accuracy 2 × 2 Max 96.25% 2 × 2 Ave 96.25% 2 × 2 Stochastic 90.00% 17 / 34
Fully Connected Layer ◮ Fully connected layer transforms high dimension feature maps into flattened vector. … max(0,x) CONV ReLU POOL …… Hotspot max(0,x) Non-hotspot CONV ReLU POOL FC 18 / 34
Fully Connected Layer (cont.) ◮ A percentage of nodes are dropped out (i.e. set to zero) ◮ avoid overfitting Effect of dropout ratio: Convolutional Hidden Layers 100 . 00 C5-3 P5 Accuracy (%) …… … 95 . 00 … 16x16x32 90 . 00 0 . 5 0 1 Dropout Ratio 512 2048 19 / 34
Fully Connected Layer (cont.) ◮ A percentage of nodes are dropped out (i.e. set to zero) ◮ avoid overfitting Effect of dropout ratio: Convolutional Hidden Layers 100 . 00 C5-3 P5 Accuracy (%) …… … 95 . 00 … 16x16x32 90 . 00 0 . 5 0 1 Dropout Ratio 512 2048 19 / 34
Architecture Summary ◮ Total 21 layers with 13 convolution layers and 5 pooling layers. ◮ A ReLU is applied after each convolution layer. C1 C2-1C2-2 C2-3 P1 P2 C3-1 C3-2 C3-3 P3 C4-1 C4-2 C4-3 P4 C5-1 C5-2 C5-3 P5 … Hotspot … 16x16x32 Non-Hotspot 32x32x32 32x32x32 64x64x32 128x128x16 64x64x16 128x128x8 256x256x8 256x256x4 512x512x4 512 2048 20 / 34
Architecture Summary Layer Kernel Size Stride Padding Output Vertexes 2 × 2 × 4 512 × 512 × 4 Conv1-1 2 0 2 × 2 256 × 256 × 4 Pool1 2 0 Conv2-1 3 × 3 × 8 1 1 256 × 256 × 8 3 × 3 × 8 256 × 256 × 8 Conv2-2 1 1 Conv2-3 3 × 3 × 8 1 1 256 × 256 × 8 2 × 2 128 × 128 × 8 Pool2 2 0 3 × 3 × 16 128 × 128 × 16 Conv3-1 1 1 Conv3-2 3 × 3 × 16 1 1 128 × 128 × 16 3 × 3 × 16 128 × 128 × 16 Conv3-3 1 1 Pool3 2 × 2 2 0 64 × 64 × 16 3 × 3 × 32 64 × 64 × 32 Conv4-1 1 1 3 × 3 × 32 64 × 64 × 32 Conv4-2 1 1 Conv4-3 3 × 3 × 32 1 1 64 × 64 × 32 2 × 2 32 × 32 × 32 Pool4 2 0 Conv5-1 3 × 3 × 32 1 1 32 × 32 × 32 3 × 3 × 32 32 × 32 × 32 Conv5-2 1 1 3 × 3 × 32 32 × 32 × 32 Conv5-3 1 1 Pool5 2 × 2 2 0 16 × 16 × 32 FC1 – – – 2048 FC2 – – – 512 2 FC3 – – – 21 / 34
Outline Introduction Network Architecture Imbalance Aware Learning Experimental Results 22 / 34
Minority Upsampling Layout datasets are highly imbalanced as after resolution enhancement techniques (RETs) the lithographic hotspots are always the minority. non-hotspot hotspot 100 Percentage (%) 50 0 1 2 3 4 5 - - - - - D D D D D A A A A A C C C C C C C C C C I I I I I 22 / 34
Minority Upsampling Layout datasets are highly imbalanced as after resolution enhancement techniques (RETs) the lithographic hotspots are always the minority. ◮ Multi-label learning [Zhang+,IJCAI’15] non-hotspot ◮ Majority downsampling hotspot [Ng+,TCYB’15] 100 Percentage (%) ◮ Pseudo instance generation [He+,IJCNN’08] 50 Artifically generated instances might not be available because of mask 0 layout nature. 1 2 3 4 5 - - - - - D D D D D A A A A A C C C C C C C C C C I I I I I 22 / 34
Minority Upsampling Layout datasets are highly imbalanced as after resolution enhancement techniques (RETs) the lithographic hotspots are always the minority. ◮ Multi-label learning [Zhang+,IJCAI’15] non-hotspot ◮ Majority downsampling hotspot [Ng+,TCYB’15] 100 Percentage (%) ◮ Pseudo instance generation [He+,IJCNN’08] 50 Artifically generated instances might not be available because of mask 0 layout nature. 1 2 3 4 5 - - - - - D D D D D A A A A A ◮ Naïve upsampling ( � ) C C C C C C C C C C I I I I I 1. Gradient descent 2. Insufficient training samples 22 / 34
Random Mirror Flipping ◮ Before fed into neural network ◮ Each instance is taking one of 4 orientations ◮ Resolve insufficient data Mirror 23 / 34
Recommend
More recommend