CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2, [M] Section 4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1
Linear Models β’ Probabilistic Generative Models Regression Classification University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2
Probabilistic Generative Model β’ Pr(π·) : prior probability of class π· β’ Pr π π· : class conditional distribution of π β’ Classification: compute posterior Pr(π·|π) according to Bayesβ theorem Pr π π· Pr π· Pr π· π = β ! Pr π π· Pr π· = πππ π π· Pr(π·) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3
Assumptions β’ In classification, the number of classes is finite, so a natural prior Pr(π·) is the multinomial Pr π· = π ! = π ! β’ When π β β " , then it is often OK to assume that Pr(π|π·) is Gaussian. β’ Furthermore, assume that the same covariance matrix π» is used for each class. Pr π π ! β π #$ % π#π π " π» #$ π#π π University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4
Posterior Distribution )π»#$ π#ππ " " # #$ % π#ππ Pr π ! π = )π»#$ π#ππ β " " " # #$ % π#ππ " " # #$ )π»#$+,π" )π»#$π" % π)π»#$π#%π" = β " " " # #$ )π»#$π,π" )π»#$π" % π)π»#$π#%π" Consider two classes π ! and π % & = )π»#$π#$ )π»#$π/ ./0π/ %π/ &' )π»#$π#$ )π»#$π" ."0π" %π" University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5
Posterior Distribution $ = "π»#$π'#,-.& " π»#$π*$ "π»#$π&#$ "#π' # π& +π& +π' .' $)* $ = $)* # π"π*01 where π = π» #$ (π ! β π + ) $ $ . & - π» #$ π + + ln - π» #$ π ! + and π₯ , = β % π ! % π + . ' University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6
Logistic Sigmoid $ β’ Let π π = $)* #2 Logistic sigmoid β’ Then Pr π ! π = π(π - π + π₯ , ) β’ Picture: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7
Logistic Sigmoid posterior class conditionals University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8
Prediction πππ‘π’ ππππ‘π‘ = ππ ππππ¦ ! Pr π ! π π π - π + π₯ , β₯ 0.5 = Aπ $ π % otherwise - M Class boundary: π π ! π = 0.5 $ βΉ π = 0.5 "3 $)* # π& - M βΉ π ! π = 0 β΄ linear separator University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9
Multi-class Problems β’ Consider Gaussian conditional distributions with identical Ξ£ 23 4 ! 23 π π 1 Pr π 1 π = β " 23 4 " 23 π π 6 (π»#$ π#ππ 7 ! 8 #$ % π#π! = ( β " 7 " 8 #$ π»#$ π#π" % π#π" 7 ! 8 #$ (π»#$π+π! (π»#$π! % #%π! = β " 7 " 8 #$ (π»#$π+π" (π»#$π" % #%π" (π»#$π#$ (π»#$π!+-. /! (1 8 π! %π! 8 π! π = (π»#$π"+-. /" = π βΉ softmax (π»#$π#$ (1 π" π" %π" β " 8 β " 8 3 = (β 4 3 π» 64 π 2 + ln π 2 , π 2 3 π» 64 ) where π 2 5 π 2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10
Softmax β’ When there are several classes, the posterior is a softmax (generalization of the sigmoid) 8 !" π β’ Softmax distribution: Pr π 1 π = β $ 8 !$ π β’ Argmax distribution: Pr π 1 π = -1 if π = ππ ππππ¦ 6 π 6 (π¦) 0 otherwise 9:;8 !" % = lim β $ 9:;8 !$ % 9:;8β= 8 !" % β (softmax approximation) β $ 8 !$(%) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11
Softmax posterior class conditionals University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12
Parameter Estimation β’ Where do Pr(π ! ) and Pr(π|π ! ) come from? β’ Parameters: π , π π , π π , π» Pr π π ! = π π» π # ! " π#π π $ π» %! π#π ! Pr π ! = π , Pr π π & = π π» π # ! " π#π π $ π» %! π#π " Pr π & = 1 β π, where π π» is the normalization constant that depends on π» β’ Estimate parameters by β Maximum likelihood β Maximum a posteriori β Bayesian learning University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13
Maximum Likelihood Solution β’ Likelihood: π§ > β {0,1} L π, π³ = Pr π, π π, π ' , π ( , π» = * ! '+* ! 6 ππ π ) π ' , π» 1 β π π π ) π ( , π» ) β’ ML hypothesis: β , π» β > = β , π ( < π β , π ' ( π ) β π ' 1 π» +' π ) β π ' ππ ππππ¦ -,π " ,π " ,π» β ) π§ ) ln π + ln π π» β ' ( π ) β π ( 1 π» +' π ) β π ( ln 1 β π + ln π π» β ' + 1 β π§ ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14
Maximum Likelihood Solution β’ Set derivative to 0 0 = G HI J π,π G" & & βΉ 0 = β K π§ K " + 1 β π§ K β &L" βΉ 0 = β K π§ K 1 β π + (1 β π§ K )(βπ) βΉ β K π§ K = π β K π§ K + β K 1 β π§ K βΉ β K π§ K = ππ (where π is the # of training points) β ? M ? β΄ = π N University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15
Maximum Likelihood Solution 0 = π ln π π, π /ππ ! βΉ 0 = β " π§ " [βπ» #! π " β π ! ] βΉ β " π§ " π " = β " π§ " π ! βΉ β " π§ " π " = π ! π ! β ( % ( π ( β ( (!#% ( )π ( β΄ = π ! Similarly: = π * ' ) ' * where π $ is the # of data points in class 1 π % is the # of data points in class 2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16
Maximum Likelihood + ,- . π,π = 0 +π» βΉ β― ' ) ' * βΉ Ξ£ = ' π» ! + ' π» * ! π " β π ! 5 ' ) β "β4 ) π " β π ! where π» ! = ! π " β π * 5 ' * β "β4 * π " β π * π» * = ( π» 6 is the empirical covariance matrix of class π ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17
Recommend
More recommend