CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of - PowerPoint PPT Presentation

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2, [M] Section 4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Linear Models • Probabilistic Generative Models Regression Classification University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Probabilistic Generative Model • Pr(𝐷) : prior probability of class 𝐷 • Pr 𝒚 𝐷 : class conditional distribution of 𝒚 • Classification: compute posterior Pr(𝐷|𝒚) according to Bayes’ theorem Pr 𝒚 𝐷 Pr 𝐷 Pr 𝐷 𝒚 = ∑ ! Pr 𝒚 𝐷 Pr 𝐷 = 𝑙𝑄𝑠 𝒚 𝐷 Pr(𝐷) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Assumptions • In classification, the number of classes is finite, so a natural prior Pr(𝐷) is the multinomial Pr 𝐷 = 𝑑 ! = 𝜌 ! • When 𝒚 ∈ ℜ " , then it is often OK to assume that Pr(𝒚|𝐷) is Gaussian. • Furthermore, assume that the same covariance matrix 𝚻 is used for each class. Pr 𝒚 𝑑 ! ∝ 𝑓 #$ % 𝒚#𝝂 𝒍 " 𝚻 #$ 𝒚#𝝂 𝒍 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Posterior Distribution )𝚻#$ 𝒚#𝝂𝒍 " " # #$ % 𝒚#𝝂𝒍 Pr 𝑑 ! 𝒚 = )𝚻#$ 𝒚#𝝂𝒍 ∑ " " " # #$ % 𝒚#𝝂𝒍 " " # #$ )𝚻#$+,𝝂" )𝚻#$𝝂" % 𝒚)𝚻#$𝒚#%𝝂" = ∑ " " " # #$ )𝚻#$𝒚,𝝂" )𝚻#$𝒗" % 𝒚)𝚻#$𝒚#%𝝂" Consider two classes 𝑑 ! and 𝑑 % & = )𝚻#$𝒚#$ )𝚻#$𝝂/ ./0𝝂/ %𝝂/ &' )𝚻#$𝒚#$ )𝚻#$𝝂" ."0𝝂" %𝝂" University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

Posterior Distribution $ = "𝚻#$𝝂'#,-.& " 𝚻#$𝒚*$ "𝚻#$𝝂&#$ "#𝝂' # 𝝂& +𝝂& +𝝂' .' $)* $ = $)* # 𝒙"𝒚*01 where 𝒙 = 𝚻 #$ (𝝂 ! − 𝝂 + ) $ $ . & - 𝚻 #$ 𝝂 + + ln - 𝚻 #$ 𝝂 ! + and 𝑥 , = − % 𝝂 ! % 𝝂 + . ' University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

Logistic Sigmoid $ • Let 𝜏 𝑏 = $)* #2 Logistic sigmoid • Then Pr 𝑑 ! 𝒚 = 𝜏(𝒙 - 𝒚 + 𝑥 , ) • Picture: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Logistic Sigmoid posterior class conditionals University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Prediction 𝑐𝑓𝑡𝑢 𝑑𝑚𝑏𝑡𝑡 = 𝑏𝑠𝑕𝑛𝑏𝑦 ! Pr 𝑑 ! 𝒚 𝜏 𝒙 - 𝒚 + 𝑥 , ≥ 0.5 = A𝑑 $ 𝑑 % otherwise - M Class boundary: 𝜏 𝒙 ! 𝒚 = 0.5 $ ⟹ 𝒚 = 0.5 "3 $)* # 𝒙& - M ⟹ 𝒙 ! 𝒚 = 0 ∴ linear separator University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Multi-class Problems • Consider Gaussian conditional distributions with identical Σ 23 4 ! 23 𝒚 𝑑 1 Pr 𝑑 1 𝒚 = ∑ " 23 4 " 23 𝒚 𝑑 6 (𝚻#$ 𝒚#𝝂𝒍 7 ! 8 #$ % 𝒚#𝝂! = ( ∑ " 7 " 8 #$ 𝚻#$ 𝒚#𝝂" % 𝒚#𝝂" 7 ! 8 #$ (𝚻#$𝒚+𝝂! (𝚻#$𝝂! % #%𝝂! = ∑ " 7 " 8 #$ (𝚻#$𝒚+𝝂" (𝚻#$𝝂" % #%𝝂" (𝚻#$𝒚#$ (𝚻#$𝒗!+-. /! (1 8 𝝂! %𝝂! 8 𝒙! 𝒚 = (𝚻#$𝝂"+-. /" = 𝒚 ⟹ softmax (𝚻#$𝒚#$ (1 𝒙" 𝝂" %𝝂" ∑ " 8 ∑ " 8 3 = (− 4 3 𝚻 64 𝝂 2 + ln 𝜌 2 , 𝝂 2 3 𝚻 64 ) where 𝒙 2 5 𝝂 2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Softmax • When there are several classes, the posterior is a softmax (generalization of the sigmoid) 8 !" 𝒚 • Softmax distribution: Pr 𝑑 1 𝒚 = ∑ $ 8 !$ 𝒚 • Argmax distribution: Pr 𝑑 1 𝒚 = -1 if 𝑙 = 𝑏𝑠𝑕𝑛𝑏𝑦 6 𝑔 6 (𝑦) 0 otherwise 9:;8 !" % = lim ∑ $ 9:;8 !$ % 9:;8→= 8 !" % ≈ (softmax approximation) ∑ $ 8 !$(%) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

Softmax posterior class conditionals University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

Parameter Estimation • Where do Pr(𝑑 ! ) and Pr(𝒚|𝑑 ! ) come from? • Parameters: 𝜌 , 𝝂 𝟐 , 𝝂 𝟑 , 𝚻 Pr 𝒚 𝑑 ! = 𝑙 𝚻 𝑓 # ! " 𝒚#𝝂 𝟐 $ 𝚻 %! 𝒚#𝝂 ! Pr 𝑑 ! = 𝜌 , Pr 𝒚 𝑑 & = 𝑙 𝚻 𝑓 # ! " 𝒚#𝝂 𝟑 $ 𝚻 %! 𝒚#𝝂 " Pr 𝑑 & = 1 − 𝜌, where 𝑙 𝚻 is the normalization constant that depends on 𝚻 • Estimate parameters by – Maximum likelihood – Maximum a posteriori – Bayesian learning University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Maximum Likelihood Solution • Likelihood: 𝑧 > ∈ {0,1} L 𝐘, 𝐳 = Pr 𝒀, 𝒛 𝜌, 𝝂 ' , 𝝂 ( , 𝚻 = * ! '+* ! 6 𝜌𝑂 𝒚 ) 𝝂 ' , 𝚻 1 − 𝜌 𝑂 𝒚 ) 𝝂 ( , 𝚻 ) • ML hypothesis: ∗ , 𝚻 ∗ > = ∗ , 𝝂 ( < 𝜌 ∗ , 𝝂 ' ( 𝒚 ) − 𝝂 ' 1 𝚻 +' 𝒚 ) − 𝝂 ' 𝑏𝑠𝑕𝑛𝑏𝑦 -,𝝂 " ,𝝂 " ,𝚻 ∑ ) 𝑧 ) ln 𝜌 + ln 𝑙 𝚻 − ' ( 𝒚 ) − 𝝂 ( 1 𝚻 +' 𝒚 ) − 𝝂 ( ln 1 − 𝜌 + ln 𝑙 𝚻 − ' + 1 − 𝑧 ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Maximum Likelihood Solution • Set derivative to 0 0 = G HI J 𝒀,𝒛 G" & & ⟹ 0 = ∑ K 𝑧 K " + 1 − 𝑧 K − &L" ⟹ 0 = ∑ K 𝑧 K 1 − 𝜌 + (1 − 𝑧 K )(−𝜌) ⟹ ∑ K 𝑧 K = 𝜌 ∑ K 𝑧 K + ∑ K 1 − 𝑧 K ⟹ ∑ K 𝑧 K = 𝜌𝑂 (where 𝑂 is the # of training points) ∑ ? M ? ∴ = 𝜌 N University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

Maximum Likelihood Solution 0 = 𝜖 ln 𝑀 𝒀, 𝒛 /𝜖𝝂 ! ⟹ 0 = ∑ " 𝑧 " [−𝚻 #! 𝒚 " − 𝝂 ! ] ⟹ ∑ " 𝑧 " 𝒚 " = ∑ " 𝑧 " 𝝂 ! ⟹ ∑ " 𝑧 " 𝒚 " = 𝑂 ! 𝝂 ! ∑ ( % ( 𝒚 ( ∑ ( (!#% ( )𝒚 ( ∴ = 𝝂 ! Similarly: = 𝝂 * ' ) ' * where 𝑂 $ is the # of data points in class 1 𝑂 % is the # of data points in class 2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Maximum Likelihood + ,- . 𝒀,𝒛 = 0 +𝚻 ⟹ ⋯ ' ) ' * ⟹ Σ = ' 𝑻 ! + ' 𝑻 * ! 𝒚 " − 𝝂 ! 5 ' ) ∑ "∈4 ) 𝒚 " − 𝝂 ! where 𝑻 ! = ! 𝒚 " − 𝝂 * 5 ' * ∑ "∈4 * 𝒚 " − 𝝂 * 𝑻 * = ( 𝑻 6 is the empirical covariance matrix of class 𝑙 ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of - PowerPoint PPT Presentation

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2, [M] Section 4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Linear Models Probabilistic Generative Models Regression

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview

Test Case Software

ADVANCED MACHINE LEARNING Caveats and Techniques to Deal with Imbalanced Datasets (Adapted from

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

E9 205 Machine Learning for Signal Procesing Support Vector Machines 9-10-2019 Linear

Lecture 9: Logistic Regression Discriminative vs. Generative Classification Aykut Erdem

Kindergarten students to another location). Overflow does become an added cost to the district

Class 15: Calculation of natural frequency Class 15: Calculation of natural frequency Old Slide