cs480 680 lecture 7 may 29 2019
play

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of - PowerPoint PPT Presentation

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2, [M] Section 4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Linear Models Probabilistic Generative Models Regression


  1. CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2, [M] Section 4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

  2. Linear Models β€’ Probabilistic Generative Models Regression Classification University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

  3. Probabilistic Generative Model β€’ Pr(𝐷) : prior probability of class 𝐷 β€’ Pr π’š 𝐷 : class conditional distribution of π’š β€’ Classification: compute posterior Pr(𝐷|π’š) according to Bayes’ theorem Pr π’š 𝐷 Pr 𝐷 Pr 𝐷 π’š = βˆ‘ ! Pr π’š 𝐷 Pr 𝐷 = 𝑙𝑄𝑠 π’š 𝐷 Pr(𝐷) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

  4. Assumptions β€’ In classification, the number of classes is finite, so a natural prior Pr(𝐷) is the multinomial Pr 𝐷 = 𝑑 ! = 𝜌 ! β€’ When π’š ∈ β„œ " , then it is often OK to assume that Pr(π’š|𝐷) is Gaussian. β€’ Furthermore, assume that the same covariance matrix 𝚻 is used for each class. Pr π’š 𝑑 ! ∝ 𝑓 #$ % π’š#𝝂 𝒍 " 𝚻 #$ π’š#𝝂 𝒍 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

  5. Posterior Distribution )𝚻#$ π’š#𝝂𝒍 " " # #$ % π’š#𝝂𝒍 Pr 𝑑 ! π’š = )𝚻#$ π’š#𝝂𝒍 βˆ‘ " " " # #$ % π’š#𝝂𝒍 " " # #$ )𝚻#$+,𝝂" )𝚻#$𝝂" % π’š)𝚻#$π’š#%𝝂" = βˆ‘ " " " # #$ )𝚻#$π’š,𝝂" )𝚻#$𝒗" % π’š)𝚻#$π’š#%𝝂" Consider two classes 𝑑 ! and 𝑑 % & = )𝚻#$π’š#$ )𝚻#$𝝂/ ./0𝝂/ %𝝂/ &' )𝚻#$π’š#$ )𝚻#$𝝂" ."0𝝂" %𝝂" University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

  6. Posterior Distribution $ = "𝚻#$𝝂'#,-.& " 𝚻#$π’š*$ "𝚻#$𝝂&#$ "#𝝂' # 𝝂& +𝝂& +𝝂' .' $)* $ = $)* # 𝒙"π’š*01 where 𝒙 = 𝚻 #$ (𝝂 ! βˆ’ 𝝂 + ) $ $ . & - 𝚻 #$ 𝝂 + + ln - 𝚻 #$ 𝝂 ! + and π‘₯ , = βˆ’ % 𝝂 ! % 𝝂 + . ' University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

  7. Logistic Sigmoid $ β€’ Let 𝜏 𝑏 = $)* #2 Logistic sigmoid β€’ Then Pr 𝑑 ! π’š = 𝜏(𝒙 - π’š + π‘₯ , ) β€’ Picture: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

  8. Logistic Sigmoid posterior class conditionals University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

  9. Prediction 𝑐𝑓𝑑𝑒 π‘‘π‘šπ‘π‘‘π‘‘ = 𝑏𝑠𝑕𝑛𝑏𝑦 ! Pr 𝑑 ! π’š 𝜏 𝒙 - π’š + π‘₯ , β‰₯ 0.5 = A𝑑 $ 𝑑 % otherwise - M Class boundary: 𝜏 𝒙 ! π’š = 0.5 $ ⟹ π’š = 0.5 "3 $)* # 𝒙& - M ⟹ 𝒙 ! π’š = 0 ∴ linear separator University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

  10. Multi-class Problems β€’ Consider Gaussian conditional distributions with identical Ξ£ 23 4 ! 23 π’š 𝑑 1 Pr 𝑑 1 π’š = βˆ‘ " 23 4 " 23 π’š 𝑑 6 (𝚻#$ π’š#𝝂𝒍 7 ! 8 #$ % π’š#𝝂! = ( βˆ‘ " 7 " 8 #$ 𝚻#$ π’š#𝝂" % π’š#𝝂" 7 ! 8 #$ (𝚻#$π’š+𝝂! (𝚻#$𝝂! % #%𝝂! = βˆ‘ " 7 " 8 #$ (𝚻#$π’š+𝝂" (𝚻#$𝝂" % #%𝝂" (𝚻#$π’š#$ (𝚻#$𝒗!+-. /! (1 8 𝝂! %𝝂! 8 𝒙! π’š = (𝚻#$𝝂"+-. /" = π’š ⟹ softmax (𝚻#$π’š#$ (1 𝒙" 𝝂" %𝝂" βˆ‘ " 8 βˆ‘ " 8 3 = (βˆ’ 4 3 𝚻 64 𝝂 2 + ln 𝜌 2 , 𝝂 2 3 𝚻 64 ) where 𝒙 2 5 𝝂 2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

  11. Softmax β€’ When there are several classes, the posterior is a softmax (generalization of the sigmoid) 8 !" π’š β€’ Softmax distribution: Pr 𝑑 1 π’š = βˆ‘ $ 8 !$ π’š β€’ Argmax distribution: Pr 𝑑 1 π’š = -1 if 𝑙 = 𝑏𝑠𝑕𝑛𝑏𝑦 6 𝑔 6 (𝑦) 0 otherwise 9:;8 !" % = lim βˆ‘ $ 9:;8 !$ % 9:;8β†’= 8 !" % β‰ˆ (softmax approximation) βˆ‘ $ 8 !$(%) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

  12. Softmax posterior class conditionals University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

  13. Parameter Estimation β€’ Where do Pr(𝑑 ! ) and Pr(π’š|𝑑 ! ) come from? β€’ Parameters: 𝜌 , 𝝂 𝟐 , 𝝂 πŸ‘ , 𝚻 Pr π’š 𝑑 ! = 𝑙 𝚻 𝑓 # ! " π’š#𝝂 𝟐 $ 𝚻 %! π’š#𝝂 ! Pr 𝑑 ! = 𝜌 , Pr π’š 𝑑 & = 𝑙 𝚻 𝑓 # ! " π’š#𝝂 πŸ‘ $ 𝚻 %! π’š#𝝂 " Pr 𝑑 & = 1 βˆ’ 𝜌, where 𝑙 𝚻 is the normalization constant that depends on 𝚻 β€’ Estimate parameters by – Maximum likelihood – Maximum a posteriori – Bayesian learning University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

  14. Maximum Likelihood Solution β€’ Likelihood: 𝑧 > ∈ {0,1} L 𝐘, 𝐳 = Pr 𝒀, 𝒛 𝜌, 𝝂 ' , 𝝂 ( , 𝚻 = * ! '+* ! 6 πœŒπ‘‚ π’š ) 𝝂 ' , 𝚻 1 βˆ’ 𝜌 𝑂 π’š ) 𝝂 ( , 𝚻 ) β€’ ML hypothesis: βˆ— , 𝚻 βˆ— > = βˆ— , 𝝂 ( < 𝜌 βˆ— , 𝝂 ' ( π’š ) βˆ’ 𝝂 ' 1 𝚻 +' π’š ) βˆ’ 𝝂 ' 𝑏𝑠𝑕𝑛𝑏𝑦 -,𝝂 " ,𝝂 " ,𝚻 βˆ‘ ) 𝑧 ) ln 𝜌 + ln 𝑙 𝚻 βˆ’ ' ( π’š ) βˆ’ 𝝂 ( 1 𝚻 +' π’š ) βˆ’ 𝝂 ( ln 1 βˆ’ 𝜌 + ln 𝑙 𝚻 βˆ’ ' + 1 βˆ’ 𝑧 ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

  15. Maximum Likelihood Solution β€’ Set derivative to 0 0 = G HI J 𝒀,𝒛 G" & & ⟹ 0 = βˆ‘ K 𝑧 K " + 1 βˆ’ 𝑧 K βˆ’ &L" ⟹ 0 = βˆ‘ K 𝑧 K 1 βˆ’ 𝜌 + (1 βˆ’ 𝑧 K )(βˆ’πœŒ) ⟹ βˆ‘ K 𝑧 K = 𝜌 βˆ‘ K 𝑧 K + βˆ‘ K 1 βˆ’ 𝑧 K ⟹ βˆ‘ K 𝑧 K = πœŒπ‘‚ (where 𝑂 is the # of training points) βˆ‘ ? M ? ∴ = 𝜌 N University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

  16. Maximum Likelihood Solution 0 = πœ– ln 𝑀 𝒀, 𝒛 /πœ–π‚ ! ⟹ 0 = βˆ‘ " 𝑧 " [βˆ’πš» #! π’š " βˆ’ 𝝂 ! ] ⟹ βˆ‘ " 𝑧 " π’š " = βˆ‘ " 𝑧 " 𝝂 ! ⟹ βˆ‘ " 𝑧 " π’š " = 𝑂 ! 𝝂 ! βˆ‘ ( % ( π’š ( βˆ‘ ( (!#% ( )π’š ( ∴ = 𝝂 ! Similarly: = 𝝂 * ' ) ' * where 𝑂 $ is the # of data points in class 1 𝑂 % is the # of data points in class 2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

  17. Maximum Likelihood + ,- . 𝒀,𝒛 = 0 +𝚻 ⟹ β‹― ' ) ' * ⟹ Ξ£ = ' 𝑻 ! + ' 𝑻 * ! π’š " βˆ’ 𝝂 ! 5 ' ) βˆ‘ "∈4 ) π’š " βˆ’ 𝝂 ! where 𝑻 ! = ! π’š " βˆ’ 𝝂 * 5 ' * βˆ‘ "∈4 * π’š " βˆ’ 𝝂 * 𝑻 * = ( 𝑻 6 is the empirical covariance matrix of class 𝑙 ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Recommend


More recommend