Announcements Matlab Grader homework, emailed Thursday, 1 (of 9) - PowerPoint PPT Presentation

Announcements Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due 21 April, Binary graded. 2 this week Jupyter homework?: translate matlab to Jupiter, TA Harshul h6gupta@eng.ucsd.edu or me I would like this to happen. “GPU” homework. NOAA climate data in Jupyter on the datahub.ucsd.edu, 15 April. Projects: Any computer language Podcast might work eventually. Today: Stanford CNN • Gaussian, Bishop 2.3 • Gaussian Process 6.4 • Linear regression 3.0-3.2 • Wednesday 10 April Stanford CNN, Linear models for regression 3, Applications of Gaussian processes.

Bayes and Softmax (Bishop p. 198) Bayes: • Parametric Approach: Linear Classifier 3072x1 p ( x | y ) = p ( y | x ) p ( x ) p ( y | x ) p ( x ) f(x,W) = Wx + b 10x1 = Image p ( y ) P y ∈ Y p ( x, y ) 10x1 10x3072 10 numbers giving f( x , W ) class scores Classification of N classes: • C Array of 32x32x3 numbers W (3072 numbers total) parameters p ( x |C n ) p ( C n ) or weights p ( C n | x ) = Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 2 - 54 April 6, 2017 P N k =1 p ( x |C k ) p ( C k ) exp( a n ) = P N k =1 exp( a k ) with a n = ln ( p ( x |C n ) p ( C n ))

Softmax to Logistic Regression (Bishop p. 198) p ( x |C 1 ) p ( C 1 ) p ( C 1 | x ) = P 2 k =1 p ( x |C k ) p ( C k ) exp( a 1 ) 1 = = P 2 1 + exp( − a ) k =1 exp( a k ) with a = ln p ( x |C 1 ) p ( C 1 ) p ( x |C 2 ) p ( C 2 ) s for binary classification we should use logis

Softmax with Gaussian(Bishop p. 198) C p ( x |C n ) p ( C n ) p ( C n | x ) = P N k =1 p ( x |C k ) p ( C k ) exp( a n ) = P N k =1 exp( a k ) with a n = ln ( p ( x |C n ) p ( C n )) |C C Assuming x is Gaussian N ( µ n , Σ ) orm, it can be shown that (7) can be ex a n = w T n x + w 0 w n = Σ − 1 µ n w 0 = − 1 2 µ T n Σ − 1 µ n + ln( p ( C n ))

Entropy 1.6 Important quantity in • coding theory • statistical physics • machine learning

The Kullback-Leibler Divergence P true distribution, q is approximating distribution

KL homework Support of P and Q = > “only >0” don’t use isnan isinf • After you pass. Take your time to clean up. Get close to 50 •

Lecture 3 Homework • Pod-cast lecture on-line • Next lectures: • – I posted a rough plan. – It is flexible though so please come with suggestions

Bayes for linear model ! = #$ + & &~N(*, , - ) y ~N(#$, , - ) prior: $~N(*, , $ ) $ 1 = , 1 # 2 , - 34 ! / $ ! ~/ ! $ / $ ~0 $ 1 , , / mean 34 = # 2 , - 34 # + , 5 34 Covariance , 1

Bayes’ Theorem for Gaussian Variables Given • we have • where •

Sequential Estimation of mean (Bishop 2.3.5) Contribution of the N th data point, x N correction given x N correction weight old estimate

Bayesian Inference for the Gaussian (Bishop2.3.6) Assume s 2 is known. Given i.i.d. data the likelihood function for µ is given by This has a Gaussian shape as a function of µ (but it is not a distribution over µ ). •

Bayesian Inference for the Gaussian (Bishop2.3.6) Combined with a Gaussian prior over µ , • this gives the posterior •

Bayesian Inference for the Gaussian (3) Example: for N = 0, 1, 2 and 10. • Prior

Bayesian Inference for the Gaussian (4) Sequential Estimation The posterior obtained after observing N-1 data points becomes the prior when we observe the N th data point. Conjugate prior: posterior and prior are in the same family. The prior is called a conjugate prior for the likelihood function.

Gaussian Process (Bishop 6.4, Murphy15) t n = y n + ϵ n

Gaussian Process (Murphy ch15) Training

Gaussian Process (Murphy ch15) The conditional is Gaussian: Common kernel is the squared exponential, RBF, Gaussian kernel

Gaussian Process (Bishop 6.4) Simple linear model • y ( x ) = w T φ ( x ) With prior • p ( w ) = N ( w | 0 , α − 1 I ) For multiple measurements • y = Φ w = elements Φ E [ y ] = Φ E [ w ] = 0 Φ T = 1 α ΦΦ T = K E � yy T � = Φ E � ww T � cov[ y ] = where K is the Gram matrix with elements K nm = k ( x n , x m ) = 1 α φ ( x n ) T φ ( x m ) and k ( x , x ′ ) is the kernel function. This model provides us with a particular example of a Gaussian process.

Gaussian Process (Bishop 6.4) Measurement model t n = y n + ϵ n p ( t n | y n ) = N ( t n | y n , β − 1 ) Multiple Measurement model p ( t | y ) = N ( t | y , β − 1 I N ) unit matrix. From the definition Integrating out � � − θ 1 � 2 ∥ x n − x m ∥ 2 k ( x n , x m ) = θ 0 exp + p ( t ) = p ( t | y ) p ( y ) d y = N ( t | 0 , C ) C ( x n , x m ) = k ( x n , x m ) + β − 1 δ nm . Note that the term involving θ corresponds to a parametric Predicting observation t N+1 joint distribution over t 1 , . . . , t N +1 will be � � C N k p ( t N +1 ) = N ( t N +1 | 0 , C N +1 ) C N +1 = k T c The conditional p(t N+1 | t N+1 ) is Gaussian k T C − 1 m ( x N +1 ) = N t σ 2 ( x N +1 ) c − k T C − 1 = N k .

Nonparametric Methods (1) Bishop 2.5 • Parametric distribution models (… Gaussian) are restricted to specific forms, which may not always be suitable; for example, consider modelling a multimodal distribution with a single, unimodal model. • Nonparametric approaches make few assumptions about the overall shape of the distribution being modelled. • 1000 parameters versus 10 parameters • Nonparametric models (not histograms) requires storing and computing with the entire data set. • Parametric models, once fitted, are much more efficient in terms of storage and computation.

Linear regression: Linear Basis Function Models (1) Generally where f j (x) are known as basis functions . • Typically, f 0 (x) = 1, so that w 0 acts as a bias. • Simplest case is linear basis functions: f d (x) = x d . • http://playground.tensorflow.org/

Some types of basis function in 1-D Sigmoids Gaussians Polynomials Sigmoid and Gaussian basis functions can also be used in multilayer neural networks, but neural networks learn the parameters of the basis functions. This is more powerful but also harder and messier.

Two types of linear model that are equivalent with respect to learning bias T = + + + = y ( x, w ) w w x w x ... w x 0 1 1 2 2 T = + f + f + = F y ( x, w ) w w ( x ) w ( x ) ... w ( x ) 0 1 1 2 2 • The first and second model has the same number of adaptive coefficients as the number of basis functions +1. • Once we have replaced the data by basis functions outputs, fitting the second model is exactly the same the first model. – No need to clutter math with basis functions

Maximum Likelihood and Least Squares (1) • Assume observations from a deterministic function with added Gaussian noise: where • or, • Given observed inputs, , and targets , we obtain the likelihood function

Maximum Likelihood and Least Squares (2) Taking the logarithm, we get Where the sum-of-squares error is

Maximum Likelihood and Least Squares (3) Computing the gradient and setting it to zero yields Solving for w, The Moore-Penrose where pseudo-inverse, .

Maximum Likelihood and Least Squares (4) Maximizing with respect to the bias, w 0 , alone, We can also maximize with respect to b , giving

Geometry of Least Squares Consider N-dimensional M-dimensional S is spanned by w ML minimizes the distance between t and its orthogonal projection on S, i.e. y.

Least mean squares: An alternative approach for big datasets t + t 1 = - h Ñ w w E t n ( ) weights after learning squared error derivatives seeing training rate w.r.t. the weights on the case tau+1 training case at time tau. This is “on-line“ learning . It is efficient if the dataset is redundant and simple to implement. It is called stochastic gradient descent if the training cases are picked • randomly. Care must be taken with the learning rate to prevent divergent • oscillations. Rate must decrease with tau to get a good fit.

Regularized least squares ~ N l 1 2 = - + 2 å E ( w ) { y ( x , w ) t } || w || n n 2 2 = n 1 The squared weights penalty is mathematically compatible with the squared error function, giving a closed form for the optimal weights: - * T 1 T = l + w ( I X X ) X t identity matrix

A picture of the effect of the regularizer • The overall cost function is the sum of two parabolic bowls. • The sum is also a parabolic bowl. • The combined minimum lies on the line between the minimum of the squared error and the origin. • The L2 regularizer just shrinks the weights.

Announcements Matlab Grader homework, emailed Thursday, 1 (of 9) - PowerPoint PPT Presentation

Announcements Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due 21 April, Binary graded. 2 this week Jupyter homework?: translate matlab to Jupiter, TA Harshul h6gupta@eng.ucsd.edu or me I would like this to happen. GPU

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability & CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

CORBA Object Transaction Service Telcordia Contact: Paolo Missier paolo@research.telcordia.com

AutoAdapt @ TREC 2010 Dyaa Albakour October 7, 2010 Dyaa Albakour AutoAdapt @ TREC 2010 The

EDIT FLUFD BAUER ULRICH DISTANCE UNIVERSAL WORKSHOP EINSTEIN ON & TOPOLOGY DISCRETE

Center Finding Algorithm for Point Source Observation of Slit Spectrometer (IGRINS) Hye-In Lee 1

Photonics in Telecom Satellite Payloads Nikos Karafolas with the kind contribution of colleagues

Ontologies, semantic annotation and GATE Kalina Bontcheva Johann Petrak University of Sheffield

Automatic differentiation, chaos indicators and dynamics Roberto Barrio IUMA and GME, Depto.

SSHIX Carrier Presentation April 2019 Agenda 1. GetInsured 834 Extensions and Changes 2.

Announcements Matlab Grader homework, emailed Thursday, 1 (of 9) - PowerPoint PPT Presentation

Announcements Matlab Grader homework, emailed Thursday, 1 (of 9) homeworks Due 21 April, Binary graded. 2 this week Jupyter homework?: translate matlab to Jupiter, TA Harshul h6gupta@eng.ucsd.edu or me I would like this to happen. GPU

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

CORBA Object Transaction Service Telcordia Contact: Paolo Missier paolo@research.telcordia.com

AutoAdapt @ TREC 2010 Dyaa Albakour October 7, 2010 Dyaa Albakour AutoAdapt @ TREC 2010 The

EDIT FLUFD BAUER ULRICH DISTANCE UNIVERSAL WORKSHOP EINSTEIN ON &amp; TOPOLOGY DISCRETE

Center Finding Algorithm for Point Source Observation of Slit Spectrometer (IGRINS) Hye-In Lee 1

Photonics in Telecom Satellite Payloads Nikos Karafolas with the kind contribution of colleagues

Ontologies, semantic annotation and GATE Kalina Bontcheva Johann Petrak University of Sheffield

Automatic differentiation, chaos indicators and dynamics Roberto Barrio IUMA and GME, Depto.

SSHIX Carrier Presentation April 2019 Agenda 1. GetInsured 834 Extensions and Changes 2.

Linearizability & CAP Announcements No hours this week. Announcements No hours this

EDIT FLUFD BAUER ULRICH DISTANCE UNIVERSAL WORKSHOP EINSTEIN ON & TOPOLOGY DISCRETE