Announcements Class is 170. Matlab Grader homework, 1 and 2 (of - PowerPoint PPT Presentation

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22 April tonight, Binary graded. For HW1, please get word count <100 167, 165,164 has done the homework. ( If you have not done it talk to me/TA! ) Homework 3 (released ~tomorrow) due ~ 5 May MNIST Jupiter “GPU” home work released Wednesday. Due 10 May Projects: 27 Groups formed. Look at Piazza for help. Guidelines is on Piazza May 5 proposal due. TAs and Peter can approve. Today: • Stanford CNN 9, Kernel methods (Bishop 6), • Linear models for classification, Backpropagation Monday • Stanford CNN 10, Kernel methods (Bishop 6), SVM, • Play with Tensorflow playground before class http://playground.tensorflow.org

Projects • 3-4 person groups preferred • Deliverables: Poster & Report & main code (plus proposal, midterm slide) • Topics your own or chose form suggested topics. Some physics inspired . • April 26 groups due to TA (if you don’t have a group, ask in piaza we can help). TAs will construct group after that. • May 5 proposal due. TAs and Peter can approve. • Proposal: One page: Title, A large paragraph, data, weblinks, references. • Something physical

DataSet • 80 % preparation, 20 % ML • Kaggle: https://inclass.kaggle.com/datasets https://www.kaggle.com • UCI datasets: http://archive.ics.uci.edu/ml/index.php • Past projects… • Ocean acoustics data

In 2017 Many choose the source localization • two CNN projects,

2018: Best reports 6,10,12 15; interesting 19, 47 poor 17; alone is hard 20. I

Bayes and Softmax (Bishop p. 198) • Bayes: Parametric Approach: Linear Classifier 3072x1 p ( x | y ) = p ( y | x ) p ( x ) p ( y | x ) p ( x ) f(x,W) = Wx + b 10x1 = Image T p ( y ) P y ∈ Y p ( x, y ) 10x1 10x3072 10 numbers giving f( x , W ) class scores C Array of 32x32x3 numbers t W y (3072 numbers total) • Classification of N classes: 7 parameters 0.75 p ( x |C n ) p ( C n ) or weights O p ( C n | x ) = 25 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 54 April 6, 2017 P N k =1 p ( x |C k ) p ( C k ) o o e e exp( a n ) = y 9 P N k =1 exp( a k ) in pied target with a n = ln ( p ( x |C n ) p ( C n )) R 19 like had Ip or

Softmax to Logistic Regression (Bishop p. 198) p ( x |C 1 ) p ( C 1 ) p ( C 1 | x ) = P 2 k =1 p ( x |C k ) p ( C k ) I logistics key exp( a 1 ) 1 = = P 2 1 + exp( − a ) k =1 exp( a k ) sigmoid with xp cap a = ln p ( x |C 1 ) p ( C 1 ) expcar p ( x |C 2 ) p ( C 2 ) s for binary classification we should use logis Epcc mix • " # = ln ' ( ) # ' ) # • " = " # − " + # • ' ) # , = #-./0(2 3 42 5 )

The Kullback-Leibler Divergence P true distribution, q is approximating distribution

Cross entropy • KL divergence (p true q approximating) > ' = ln(' = ) - ∑ = > ' = ln(; = ) 7 89 ('||;) = ∑ = e = −? ' + ?(', ;) cross entropy • Cross entropy > ' = ln(; = ) ? ', ; = ? ; + 7 89 ('||;) = - ∑ = pre In Cg 1h Caffe • Implementations tf.keras.losses.CategoricalCrossentropy() tf.losses.sparse_softmax_cross_entropy e torch.nn.CrossEntropyLoss()

Cross-entropy or “ softmax ” function for multi-class classification z e i = y The output units use a non-local non-linearity: i å z j e j y y y output units 1 2 3 ¶ y i = - y ( 1 y ) i i ¶ z i z z z 1 2 3 target value å = - E t ln y The natural cost function is the negative log prob j j of the right answer j ¶ y ¶ ¶ E E å j = = - y t i i ¶ ¶ ¶ z y z i j i j

Reminder: 1x1 convolutions 1x1 CONV 56 Reminder: 1x1 convolutions with 32 filters 56 preserves spatial dimensions, reduces depth! 56 Projects depth to lower 56 dimension (combination of 64 32 feature maps) b 1x1 CONV Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - 52 May 2, 2017 May 2, 2017 56 with 32 filters 56 (each filter has size 1x1x64, and performs a 64-dimensional dot product) 56 56 64 32 Summary: CNN Architectures Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - 51 May 2, 2017 May 2, 2017 Case Studies - AlexNet - VGG - GoogLeNet - ResNet Also.... - NiN (Network in Network) - DenseNet - Wide ResNet - FractalNet - ResNeXT - SqueezeNet - Stochastic Depth 10 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 2, 2017 0

Softmax Case Study: ResNet FC 1000 Pool 3x3 conv, 64 [He et al., 2015] 3x3 conv, 64 3x3 conv, 64 relu 3x3 conv, 64 Very deep networks using residual F(x) + x 3x3 conv, 64 64 connections n 3x3 conv, 64 .. . conv - 152-layer model for ImageNet 3x3 conv, 128 X 3x3 conv, 128 F(x) relu - ILSVRC’15 classification winner identity 3x3 conv, 128 3x3 conv, 128 (3.57% top 5 error) conv 3x3 conv, 128 - Swept all classification and 3x3 conv, 128 / 2 3x3 conv, 64 detection competitions in 3x3 conv, 64 X ILSVRC’15 and COCO’15! 3x3 conv, 64 Residual block 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 T Pool 7x7 conv, 64 / 2 Input Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - May 2, 2017 May 2, 2017 65 x H Eftt FED

Case Study: ResNet [He et al., 2015] What happens when we continue stacking deeper layers on a “plain” convolutional neural network? 56-layer Training error 56-layer Test error 20-layer i 20-layer Iterations Iterations Case Study: ResNet 56-layer model performs worse on both training and test error [He et al., 2015] -> The deeper model performs worse, but it’s not caused by overfitting! Hypothesis: the problem is an optimization problem, deeper models are harder to Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - 68 May 2, 2017 May 2, 2017 optimize Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - 69 May 2, 2017 May 2, 2017

Case Study: ResNet [He et al., 2015] Solution: Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping H(x) = F(x) + x relu H(x) F(x) + x s Use layers to O fit residual conv conv F(x) = H(x) - x X F(x) relu relu identity instead of conv conv H(x) directly X X “Plain” layers Residual block Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - 72 72 May 2, 2017 May 2, 2017 HH It

Image by MIT OpenCourseWare. 4 5 |{z} |{z} |{z} |{z} Kernels We might want to consider something more complicated than a linear model: � � ⇥ x (1)2 , x (2)2 , x (1) x (2) ⇤ Example 1 : [ x (1) , x (2) ] → Φ [ x (1) , x (2) ] = Information unchanged, but now we have a linear classifier on the transformed points. With the kernel trick, we just need kernel Input Space Feature Space x B C, D = E(C) F E(D) Image by MIT OpenCourseWare. k ( x , x ′ ) = φ ( x ) T φ ( x ′ ) . (6.1) see that the kernel is a symmetric function of its arguments

Basis expansion KA lies Z tf Y 12 if 2 0 My v.IO Ggso9 EfH t.z xi2 x.E t xi O w I 41,010 1 O WTI t 72 O 2

Gaussian Process (Bishop 6.4, Murphy15) t n = y n + ϵ n f ( x ) ∼ GP ( m ( x ) , κ ( x , x ′ ))

Dual representation, Sec 6.2 Primal problem: min S(R) n R > R F ( = − T = 2 + V S = # + R 2 = WR − X + V + + M + ∑ = + R 2 roam L Solution R = W - X = (W F W + YZ [ ) 4\ W F X = W F (WW ] + YZ ^ ) 4# X = W F (_ + YZ ^ ) 4# X = W F C N M Kit ERM ERNA The kernel is ` = WW ] K 17 INT't G xxterNXN C RN Dual representation is : min S(C) C > R F ( = − T = 2 + V S = # + R 2 = _C − X + + + V + C F _C + ∑ = a is found inverting NxN matrix w is found inverting MxM matrix Only kernels , no feature vectors

Dual representation, Sec 6.2 Dual representation is : min S(C) C > R F ( = − T = 2 + V S = # + R 2 = _C − X + + + V + C F _C + ∑ = Prediction > " = ( = > " = B(( = , () k = R F ( = C F W( = ∑ = F ( = ∑ = FM • Often a is sparse (… Support vector machines ) • We don’t need to know x or a ( . cdeX Xfg _ghigj O + + Y 2 C F _C S C = _C − X + expf 8 Hjk ooo

Gaussian Kernels EE a EI E

Commonly used kernels p = + K ( x , y ) ( x . y 1 ) Polynomial: Parameters Gaussian 2 2 2 - - s that the user || x y || / = K ( x , y ) e radial basis must choose function = - d K ( x , y ) tanh ( k x.y ) Neural net: For the neural network kernel, there is one “hidden unit” per support vector, so the process of fitting the maximum margin hyperplane decides how many hidden units to use. Also, it may violate Mercer’s condition.

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of - PowerPoint PPT Presentation

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22 April tonight, Binary graded. For HW1, please get word count <100 167, 165,164 has done the homework. ( If you have not done it talk to me/TA! )

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability & CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Learning Based Vision II Computer Vision Fall 2018 Columbia University Project Project

Deep Learning Gets Way Deeper 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1

Calibration to Implied Volatility Data Jean-Pierre Fouque University of California Santa Barbara

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline GPUs

Statistical Machine Translation Clause Restructuring for

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of - PowerPoint PPT Presentation

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22 April tonight, Binary graded. For HW1, please get word count <100 167, 165,164 has done the homework. ( If you have not done it talk to me/TA! )

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Learning Based Vision II Computer Vision Fall 2018 Columbia University Project Project

Deep Learning Gets Way Deeper 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1

Calibration to Implied Volatility Data Jean-Pierre Fouque University of California Santa Barbara

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline GPUs

Statistical Machine Translation Clause Restructuring for

Linearizability & CAP Announcements No hours this week. Announcements No hours this