Announcements Class is 170. Matlab Grader homework, 1 and 2 (of - PowerPoint PPT Presentation

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22 April tonight, Binary graded. For HW1, please get word count <100 167, 165,164 has done the homework. ( If you have not done it talk to me/TA! ) Homework 3 (released ~tomorrow) due ~ 5 May Jupiter “GPU” home work released Wednesday. Due 10 May Projects: 27 Groups formed. Look at Piazza for help. Guidelines is on Piazza May 5 proposal due. TAs and Peter can approve. Today: • Stanford CNN 9, Kernel methods (Bishop 6), • Linear models for classification, Backpropagation Monday • Stanford CNN 10, Kernel methods (Bishop 6), SVM, • Play with Tensorflow playground before class http://playground.tensorflow.org

Projects • 3-4 person groups preferred • Deliverables: Poster & Report & main code (plus proposal, midterm slide) • Topics your own or chose form suggested topics. Some physics inspired . • April 26 groups due to TA (if you don’t have a group, ask in piaza we can help). TAs will construct group after that. • May 5 proposal due. TAs and Peter can approve. • Proposal: One page: Title, A large paragraph, data, weblinks, references. • Something physical

DataSet • 80 % preparation, 20 % ML • Kaggle: https://inclass.kaggle.com/datasets https://www.kaggle.com • UCI datasets: http://archive.ics.uci.edu/ml/index.php • Past projects… • Ocean acoustics data

In 2017 Many choose the source localization • two CNN projects,

2018: Best reports 6,10,12 15; interesting 19, 47 poor 17; alone is hard 20.

Bayes and Softmax (Bishop p. 198) • Bayes: Parametric Approach: Linear Classifier 3072x1 p ( x | y ) = p ( y | x ) p ( x ) p ( y | x ) p ( x ) f(x,W) = Wx + b 10x1 = Image p ( y ) P y ∈ Y p ( x, y ) 10x1 10x3072 10 numbers giving f( x , W ) class scores C Array of 32x32x3 numbers W (3072 numbers total) • Classification of N classes: parameters p ( x |C n ) p ( C n ) or weights p ( C n | x ) = Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 54 April 6, 2017 P N k =1 p ( x |C k ) p ( C k ) exp( a n ) = P N k =1 exp( a k ) with a n = ln ( p ( x |C n ) p ( C n ))

Softmax to Logistic Regression (Bishop p. 198) p ( x |C 1 ) p ( C 1 ) p ( C 1 | x ) = P 2 k =1 p ( x |C k ) p ( C k ) exp( a 1 ) 1 = = P 2 1 + exp( − a ) k =1 exp( a k ) with a = ln p ( x |C 1 ) p ( C 1 ) p ( x |C 2 ) p ( C 2 ) s for binary classification we should use logis • 𝑏 # = ln 𝑞 𝒚 𝐷 # 𝑞 𝐷 # • 𝑏 = 𝑏 # − 𝑏 + # • 𝑞 𝐷 # 𝑦 = #-./0(2 3 42 5 )

The Kullback-Leibler Divergence P true distribution, q is approximating distribution

Cross entropy • KL divergence (p true q approximating) > 𝑞 = ln(𝑞 = ) - ∑ = > 𝑞 = ln(𝑟 = ) 𝐸 89 (𝑞||𝑟) = ∑ = = −𝐼 𝑞 + 𝐼(𝑞, 𝑟) • Cross entropy > 𝑞 = ln(𝑟 = ) 𝐼 𝑞, 𝑟 = 𝐼 𝑟 + 𝐸 89 (𝑞||𝑟) = - ∑ = • Implementations tf.keras.losses.CategoricalCrossentropy() tf.losses.sparse_softmax_cross_entropy torch.nn.CrossEntropyLoss()

Cross-entropy or “ softmax ” function for multi-class classification z e i = y The output units use a non-local non-linearity: i å z j e j y y y output units 1 2 3 ¶ y i = - y ( 1 y ) i i ¶ z i z z z 2 3 1 target value å = - E t ln y The natural cost function is the negative log prob j j of the right answer j ¶ y ¶ ¶ E E å j = = - y t i i ¶ ¶ ¶ z y z i j i j

Reminder: 1x1 convolutions 1x1 CONV 56 Reminder: 1x1 convolutions with 32 filters 56 preserves spatial dimensions, reduces depth! 56 Projects depth to lower 56 dimension (combination of 64 32 feature maps) 1x1 CONV Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - 52 May 2, 2017 May 2, 2017 56 with 32 filters 56 (each filter has size 1x1x64, and performs a 64-dimensional dot product) 56 56 64 32 Summary: CNN Architectures Lecture 9 - Lecture 9 - May 2, 2017 May 2, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung 51 Case Studies - AlexNet - VGG - GoogLeNet - ResNet Also.... - NiN (Network in Network) - DenseNet - Wide ResNet - FractalNet - ResNeXT - SqueezeNet - Stochastic Depth 10 Lecture 9 - May 2, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung 0

Softmax Case Study: ResNet FC 1000 Pool 3x3 conv, 64 [He et al., 2015] 3x3 conv, 64 3x3 conv, 64 relu 3x3 conv, 64 Very deep networks using residual F(x) + x 3x3 conv, 64 connections 3x3 conv, 64 .. . conv - 152-layer model for ImageNet 3x3 conv, 128 X 3x3 conv, 128 F(x) relu - ILSVRC’15 classification winner identity 3x3 conv, 128 3x3 conv, 128 (3.57% top 5 error) conv 3x3 conv, 128 - Swept all classification and 3x3 conv, 128 / 2 3x3 conv, 64 detection competitions in 3x3 conv, 64 X ILSVRC’15 and COCO’15! 3x3 conv, 64 Residual block 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool 7x7 conv, 64 / 2 Input Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - 65 May 2, 2017 May 2, 2017

Case Study: ResNet [He et al., 2015] What happens when we continue stacking deeper layers on a “plain” convolutional neural network? 56-layer Training error 56-layer Test error 20-layer 20-layer Iterations Iterations Case Study: ResNet 56-layer model performs worse on both training and test error [He et al., 2015] -> The deeper model performs worse, but it’s not caused by overfitting! Hypothesis: the problem is an optimization problem, deeper models are harder to Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - May 2, 2017 May 2, 2017 68 optimize Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - Lecture 9 - May 2, 2017 May 2, 2017 69

Case Study: ResNet [He et al., 2015] Solution: Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping H(x) = F(x) + x relu H(x) F(x) + x Use layers to fit residual conv conv F(x) = H(x) - x X F(x) relu relu identity instead of conv conv H(x) directly X X “Plain” layers Residual block Lecture 9 - Lecture 9 - May 2, 2017 May 2, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Fei-Fei Li & Justin Johnson & Serena Yeung 72 72

Kernels • Kernel function k ( x , x ′ ) = φ ( x ) T φ ( x ′ ) . (6.1) see that the kernel is a symmetric function of its arguments • Kernel trick: substitute the inner product of freatures with

Image by MIT OpenCourseWare. 4 5 |{z} |{z} |{z} |{z} Kernels We might want to consider something more complicated than a linear model: � � ⇥ x (1)2 , x (2)2 , x (1) x (2) ⇤ Example 1 : [ x (1) , x (2) ] → Φ [ x (1) , x (2) ] = Information unchanged, but now we have a linear classifier on the transformed points. With the kernel trick, we just need kernel Input Space Feature Space 𝑙 𝒃, 𝒄 = 𝜲(𝒃) F 𝜲(𝒄) Image by MIT OpenCourseWare. k ( x , x ′ ) = φ ( x ) T φ ( x ′ ) . (6.1) see that the kernel is a symmetric function of its arguments

Basis expansion

Gaussian Process (Bishop 6.4, Murphy15) t n = y n + ϵ n f ( x ) ∼ GP ( m ( x ) , κ ( x , x ′ ))

Dual representation, Sec 6.2 Primal problem: min 𝐹(𝒙) 𝒙 > 𝒙 F 𝒚 = − 𝑢 = 2 + V 𝐹 = # V + 𝒙 2 = 𝒀𝒙 − 𝒖 + + + + ∑ = + 𝒙 2 Solution 𝒙 = 𝒀 - 𝒖 = (𝒀 F 𝒀 + 𝜇𝑱 𝑵 ) 4𝟐 𝒀 F 𝒖 = 𝒀 F (𝒀𝒀 𝑼 + 𝜇𝑱 𝑶 ) 4# 𝒖 = 𝒀 F (𝑳 + 𝜇𝑱 𝑶 ) 4# 𝒖 = 𝒀 F 𝒃 The kernel is 𝐋 = 𝒀𝒀 𝑼 Dual representation is : min 𝐹(𝒃) 𝒃 > 𝒙 F 𝒚 = − 𝑢 = 2 + V 𝐹 = # + + V + 𝒙 2 = 𝑳𝒃 − 𝒖 + + 𝒃 F 𝑳𝒃 + ∑ = a is found inverting NxN matrix w is found inverting MxM matrix Only kernels , no feature vectors

Dual representation, Sec 6.2 Dual representation is : min 𝐹(𝒃) 𝒃 > 𝒙 F 𝒚 = − 𝑢 = 2 + V 𝐹 = # + 𝒙 2 = 𝑳𝒃 − 𝒖 + + + V + 𝒃 F 𝑳𝒃 + ∑ = Prediction > 𝑏 = 𝒚 = > 𝑏 = 𝑙(𝒚 = , 𝒚) 𝑧 = 𝒙 F 𝒚 = 𝒃 F 𝒀𝒚 = ∑ = F 𝒚 = ∑ = • Often a is sparse (… Support vector machines ) • We don’t need to know x or 𝝌 𝒚 . 𝑲𝒗𝒕𝒖 𝒖𝒊𝒇 𝑳𝒇𝒔𝒐𝒇𝒎 + + 𝜇 2 𝒃 F 𝑳𝒃 𝐹 𝒃 = 𝑳𝒃 − 𝒖 +

Gaussian Kernels

Commonly used kernels p = + K ( x , y ) ( x . y 1 ) Polynomial: Parameters Gaussian 2 2 2 - - s that the user || x y || / = K ( x , y ) e radial basis must choose function = - d K ( x , y ) tanh ( k x.y ) Neural net: For the neural network kernel, there is one “hidden unit” per support vector, so the process of fitting the maximum margin hyperplane decides how many hidden units to use. Also, it may violate Mercer’s condition.

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of - PowerPoint PPT Presentation

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22 April tonight, Binary graded. For HW1, please get word count <100 167, 165,164 has done the homework. ( If you have not done it talk to me/TA! )

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability & CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Unit 8 Fundamental Digital Building Blocks: Decoders & Multiplexers 9.2 Checkers / Decoders

Medical Improvisation New Models for Patient-Centered Communication In our line of work, we are

The Metric Coalescent Process joint with David Aldous Daniel Lanoue June 17, 2014 Daniel Lanoue

Game Playing Chapter 5 - supplement Various deterministic board games 1 Othello (reversi,

MATLAB Mathcad function file %solved in Mathcad function dydt = oscillator6(t,y,m,c,k)

Point Estimates and Sampling Variability August 19, 2019 August 19, 2019 1 / 46 Final Exam

- uncertainty and spatio-temporal trends Juha Aalto, Pentti Pirinen, Kirsti Jylh 10th EUMETNET

Bayesian Interpretations of Regularization Charlie Frogner 9.520 Class 17 April 6, 2011 C.

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of - PowerPoint PPT Presentation

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22 April tonight, Binary graded. For HW1, please get word count <100 167, 165,164 has done the homework. ( If you have not done it talk to me/TA! )

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Unit 8 Fundamental Digital Building Blocks: Decoders &amp; Multiplexers 9.2 Checkers / Decoders

Medical Improvisation New Models for Patient-Centered Communication In our line of work, we are

The Metric Coalescent Process joint with David Aldous Daniel Lanoue June 17, 2014 Daniel Lanoue

Game Playing Chapter 5 - supplement Various deterministic board games 1 Othello (reversi,

MATLAB Mathcad function file %solved in Mathcad function dydt = oscillator6(t,y,m,c,k)

Point Estimates and Sampling Variability August 19, 2019 August 19, 2019 1 / 46 Final Exam

- uncertainty and spatio-temporal trends Juha Aalto, Pentti Pirinen, Kirsti Jylh 10th EUMETNET

Bayesian Interpretations of Regularization Charlie Frogner 9.520 Class 17 April 6, 2011 C.

Linearizability & CAP Announcements No hours this week. Announcements No hours this

Unit 8 Fundamental Digital Building Blocks: Decoders & Multiplexers 9.2 Checkers / Decoders