Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof - PDF document

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points] Mixture of Bernoullis A mixture of Bernoullis model is like the Gaussian mixture model which we’ve discussed in this course. Each of the mixture components consists of a collection of independent Bernoulli random variables. In general, a mixture model assumes the data are generated by the following process: first we sample z , and then we sample the observables x from a distribution which depends on z , i.e. p ( x , z ) = p ( x | z ) p ( z ) (1) In mixture models, p ( z ) is always a multinomial distribution with parameter π = { π 1 , ..., π K } which are mixture weights satisfying K � π k = 1 , π k ≥ 0 (2) k =1 Consider a set of N binary random variable in a D -dimensional space x j , where j = 1 , ..., N , each of which is governed by a Bernoulli distribution with parameter θ jk D � θ x ij jk (1 − θ jk ) (1 − x ij ) p ( x i | z i = k, θ ) = (3) j =1 We can write the generative model of a mixture model as K � π z k p ( z | π ) ∼ Multinoimal( π ) = k k =1 (4) K � k (1 − θ k ) (1 − x ) ] z k [ θ x p ( x | z, θ ) ∼ Bernoulli( θ ) = k =1 The second distribution is the mixture proportion and π k is the weight of k -th proportion. So the Bernoulli mixture model is given as K N D � � � θ x ij jk (1 − θ jk ) (1 − x ij ) p ( x ) = π k (5) k =1 i =1 j =1 1

• Show the associated directed graphical model and write down the incomplete- data log likelihood. The complete data log likelihood for this model can be written as N � K D � z ik � � � � � ln p ( x , z | π , θ ) = ln π k p ( x ij | θ jk ) (6) i =1 k =1 j =1 In order to drive the EM algorithm, we take the expectation of the complete data log-likelihood with respect to the posterior distribution of latent variable z . Write down Q ( ξ ; ξ (old) ), where ξ = { θ, π } and the posterior distribution of the latent variable z . • Derive the update for π and θ in the M-step for ML estimation in terms of E [ z ik ] and write down E [ z ik ]. • Consider a mixture distribution p ( x ) and show that K � E [ x ] = π k θ k k =1 (7) K � π k { Σ k + θ k θ T k } − E [ x ] E [ x ] T cov[ x ] = k =1 where Σ k = diag[ θ ki (1 − θ ki )]. Hint - Solve the second equation in a general case by adding and subtract- ing a term which is a function of E [ x | k ] = θ k . • We now consider a Bayesian model in which we impose priors on the parameters. We impose the natural conjugate priors, i.e., a Beta prior for each θ jk and a Dirichlet prior for π . p ( π | α ) ∼ Dir( α ) (8) p ( θ jk | a, b ) ∼ Beta( a, b ) show that the M-step for MAP estimation of a mixture Bernoullis is given by θ kj = ( � i E [ z ik ] x ij + a − 1) ( � i E [ z ik ]) + a + b − 2 (9) π k = ( � i E [ z ik ]) + α k − 1 N + � k α k − K Hint - For the maximization w.r.t. π k in the M-step, you need to use Lagrange multiplier to enforce the constraint about π . 2

2 [2 points] Variational Lower Bound for the mixture of Bernoullis In the mixture of Bernoullis, the multinomial distribution chooses the mixtures. One can assume the conditional probability of each observed component follows a Bernoulli distribution given as Equation 3. We have priors over π and θ as K p ( π | α ) = Γ( � k α k ) � π α k − 1 k � k Γ( α k ) (10) k =1 p ( θ jk | a, b ) = Γ( a + b ) Γ( a )Γ( b ) θ a − 1 jk (1 − θ jk ) b − 1 • If you consider a variational distribution which factorizes between the latent variables and the parameters then you must show how the lower bound has the following form L = E [ln P ( x | z , θ )] + E [ln P ( z | π )] + E [ln P ( π | α )] + E [ln P ( θ | a, b )] (11) − E [ln q ( z )] − E [ln q ( π )] − E [ln q ( θ )] • Let’s assume that the approximate distribution of parameters of the model has the following form q ( θ | η, ν ) ∼ Beta( η, ν ) q ( π | ρ ) ∼ Dir( ρ ) (12) q ( z k | τ k ) ∼ Cat( τ k ) Here ρ , τ , η and ν are variational parameters. Derive the variational update equations for the three variational distributions using the mean field approximation which should yield to N � ρ k = α + τ ik i =1 N � η jk = a + τ ik x ij i =1 N � ν jk = b + τ ik (1 − x ij ) (13) i =1 K D � � � τ ik ∝ exp ψ ( ρ k ) − ψ ( ρ k ′ ) + x ij [ ψ ( η jk ) − ψ ( η jk + ν jk )] k ′ =1 j =1 D � � + (1 − x ij )[ ψ ( ν jk ) − ψ ( η jk + ν jk )] j =1 3

Hint-Use following properties E q ( θ ) [ln θ jk ] = ψ ( η jk ) − ψ ( η jk + ν jk ) E q ( θ ) [ln(1 − θ jk )] = ψ ( ν jk ) − ψ ( η jk + ν jk ) K (14) � E q ( π ) [ln π k ] = ψ ( ρ k ) − ψ ( ρ k ′ ) k ′ =1 E q ( z k ) [ z k ] = τ k where ψ ( . ) is the digamma function. 3 [2 points] Kernel methods 1. The k-nearest neighbors classifier assigns a point x to the majority class of its k nearest neighbors in the training set. Assume that we use squared Euclidean distance to measure the distance to some point x n in the training set, � x − x n � 2 . Reformulate this classifier for a nonlinear kernel k using the kernel trick. 2. The file circles.csv contains a toy dataset. Each example has two fea- tures that represent its coordinates ( x 1 , x 2 ) in 2D space. Points belong to one of 5 classes which correspond to different circles centered at the origin. We would like to perform classification with an additional feature for the squared Euclidean distance to the origin. Write out the appropriate feature map φ (( x 1 , x 2 )) and kernel function k ( x , x ′ ). 3. Perform k-nearest neighbors classification with k = 15 using the kernel from (2) and the standard linear kernel. Compare accuracies over 10-fold cross validation. Which version gives better results? 4

4 [2 points] Support Vector Machine 1. Recall the formulation of soft-margin (linear) SVM: n 1 2 � w � 2 + C � argmin w,b,ξ ξ i i =1 (15) w T x ( i ) + b s.t. y ( i ) � � ≥ 1 − ξ i , i = 1 , . . . , n ξ i ≥ 0 , i = 1 , . . . , n During lecture, support vector machine is introduced geometrically as find- ing the Max-Margin Classifier . While this geometric interpretation pro- vides useful intuition about how SVM works, it is hard to relate to other machine learning algorithms such as Logistic Regression. In this exercise, we show that soft-margin SVM is equivalent to minimizing a loss function (to be specific, the hinge loss ) with L2-regularization. And thus connect it to logistic regression and the goal of binary classification. The hinge loss is defined as V ( y, f ( x )) = (1 − yf ( x )) + where ( s ) + = max( s, 0). Show that � n � 1 � V ( y i , f ( x i )) + λ � w � 2 argmin (16) 2 n w,b i =1 is equivalent to formulation (15) for some C , where f ( x ) = w T x + b ; what is the corresponding C (in terms of n and λ )? 2. In the previous question, we chose V ( y, f ( x )) = (1 − yf ( x )) + (the hinge loss ) as our loss function; however, there are other reasonable loss functions that we can choose. For example, we can choose V ( y, f ( x )) = 1 � 1 + e − yf ( x ) � log(2) log which is usually called the logistic loss ; and � 0 , yf ( x ) ≥ 0 V ( y, f ( x )) = which is called the 0-1 loss . 1 , yf ( x ) < 0 Please plot the above three loss functions in one figure, with yf ( x ) as the horizontal axis and V ( y, f ( x )) as the vertical axis. Explain your observa- tion. 3. [Bonus] Answer the following questions as precisely as you can. What is 1 � 1 + e − yf ( x ) � (16) if we choose V ( y, f ( x )) = log(2) log ? What is (16) if we � 0 , yf ( x ) ≥ 0 choose V ( y, f ( x )) = ? (Long answers receive no score) 1 , yf ( x ) < 0 5

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof - PDF document

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points] Mixture of Bernoullis A mixture of Bernoullis model is like the Gaussian mixture model which weve discussed in this course. Each of the mixture

DPS915 Presentation Ray Tracing Parallelization Soutrik Barua Faiq Malik Assignment

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

Assignment Design Assignment Design Across the Curriculum: Across the Curriculum: Cueing for

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

Volunteer Name: State of Origin: Occupation: Assignment Title: SOW NO: Host Organization:

Dedicated Storage Assignment (DSAP) The assignment of items to slots is termed slotting

Announcements Assignment 4 due today. Assignment 5 uploaded to website and Piazza. Will be due

Assignment # 2 So You Want to Write a Physically Based Motion Which is something you may wish

CSE 158/258 Web Mining and Recommender Systems Assignment 2 Assignment 2 Open-ended Due

MCC assignment info Slides will be available in Noppa Assignment assistants: Rasmus Eskola

Assignment 1 Florian Vesting 2012-09-07 Florian Vesting Assignment 1 2012-09-07 1 / 11

Assignment #3 Which is something you may wish to do since it is Assignment #3 So You Want to

JAVASCRIPT PROGRAMMING Functions Examples Homework assignment

Assignment 01 Assignment 01 First Steps Prepare the Android development environment and create

Writing Assignment 2 Polisci 209 Writing Assignment 2 First Draft due on November 16th, Final

CS 2112 Lab 10: Assignment 6 CS 2112 Lab 10: Assignment 6 November 5 / 7, 2018 CS 2112 Lab 10:

Simultaneous Identification of Source, Initial Conditions and Asynchronous Sources in the

Thermodynamics, Fluid Dynamics, and Heat Transfer ( Chapter 2) 2 Learning Outcomes (Chapter 2)

A RF-Gun Cooling System Presented by: Danielle Hannah Supervised by: Maurice Ball Jamie

Chapter 2: First-Order Differential Equations Part 2 Department of Electrical Engineering

Glyphs Glyphs Ward, Information Visualization Journal, Ward, Information Visualization Journal,

Motivational Interviewing Across the HIV Care Continuum Skills Building for Care Teams Committed

P1 Orientation Programme 28 December 2015 Staff of Oasis Primary School Oasis Primary School

Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof - PDF document

Assignment 3 Zahra Sheikhbahaee Zeou Hu & Colin Vandenhof February 2020 1 [2 points] Mixture of Bernoullis A mixture of Bernoullis model is like the Gaussian mixture model which weve discussed in this course. Each of the mixture

DPS915 Presentation Ray Tracing Parallelization Soutrik Barua Faiq Malik Assignment

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

Assignment Design Assignment Design Across the Curriculum: Across the Curriculum: Cueing for

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

Volunteer Name: State of Origin: Occupation: Assignment Title: SOW NO: Host Organization:

Dedicated Storage Assignment (DSAP) The assignment of items to slots is termed slotting

Announcements Assignment 4 due today. Assignment 5 uploaded to website and Piazza. Will be due

Assignment # 2 So You Want to Write a Physically Based Motion Which is something you may wish

CSE 158/258 Web Mining and Recommender Systems Assignment 2 Assignment 2 Open-ended Due

MCC assignment info Slides will be available in Noppa Assignment assistants: Rasmus Eskola

Assignment 1 Florian Vesting 2012-09-07 Florian Vesting Assignment 1 2012-09-07 1 / 11

Assignment #3 Which is something you may wish to do since it is Assignment #3 So You Want to

JAVASCRIPT PROGRAMMING Functions Examples Homework assignment

Assignment 01 Assignment 01 First Steps Prepare the Android development environment and create

Writing Assignment 2 Polisci 209 Writing Assignment 2 First Draft due on November 16th, Final

CS 2112 Lab 10: Assignment 6 CS 2112 Lab 10: Assignment 6 November 5 / 7, 2018 CS 2112 Lab 10:

Simultaneous Identification of Source, Initial Conditions and Asynchronous Sources in the

Thermodynamics, Fluid Dynamics, and Heat Transfer ( Chapter 2) 2 Learning Outcomes (Chapter 2)

A RF-Gun Cooling System Presented by: Danielle Hannah Supervised by: Maurice Ball Jamie

Chapter 2: First-Order Differential Equations Part 2 Department of Electrical Engineering

Glyphs Glyphs Ward, Information Visualization Journal, Ward, Information Visualization Journal,

Motivational Interviewing Across the HIV Care Continuum Skills Building for Care Teams Committed

P1 Orientation Programme 28 December 2015 Staff of Oasis Primary School Oasis Primary School

Game-Playing &amp; Adversarial Search This lecture topic: Game-Playing &amp; Adversarial Search

Game-Playing & Adversarial Search This lecture topic: Game-Playing & Adversarial Search