Introduction to Machine Learning Classification: Discriminant Analysis compstat-lmu.github.io/lecture_i2ml
LINEAR DISCRIMINANT ANALYSIS (LDA) LDA follows a generative approach π k ( x ) = P ( y = k | x ) = P ( x | y = k ) P ( y = k ) p ( x | y = k ) π k = g P ( x ) � p ( x | y = j ) π j j = 1 where we now have to pick a distributional form for p ( x | y = k ) . � c Introduction to Machine Learning – 1 / 10
LINEAR DISCRIMINANT ANALYSIS (LDA) LDA assumes that each class density is modeled as a multivariate Gaussian : � � 1 − 1 2 ( x − µ k ) T Σ − 1 ( x − µ k ) p ( x | y = k ) = exp p 1 2 | Σ | ( 2 π ) 2 with equal covariance, i. e. Σ k = Σ ∀ k . 15 10 X2 5 0 0 5 10 15 X1 � c Introduction to Machine Learning – 2 / 10
LINEAR DISCRIMINANT ANALYSIS (LDA) Parameters θ are estimated in a straight-forward manner by estimating π k ˆ = n k / n , where n k is the number of class k observations 1 � x ( i ) µ k ˆ = n k i : y ( i ) = k g 1 ( x ( i ) − ˆ µ k )( x ( i ) − ˆ ˆ � � µ k ) T Σ = n − g k = 1 i : y ( i ) = k 15 10 X2 5 0 0 5 10 15 X1 � c Introduction to Machine Learning – 3 / 10
LDA AS LINEAR CLASSIFIER Because of the equal covariance structure of all class-specific Gaussian, the decision boundaries of LDA are linear. 2.5 2.0 Species Petal.Width 1.5 ● setosa versicolor virginica 1.0 ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 2 4 6 Petal.Length � c Introduction to Machine Learning – 4 / 10
LDA AS LINEAR CLASSIFIER We can formally show that LDA is a linear classifier, by showing that the posterior probabilities can be written as linear scoring functions - up to any isotonic / rank-preserving transformation. π k ( x ) = π k · p ( x | y = k ) π k · p ( x | y = k ) = g p ( x ) � π j · p ( x | y = j ) j = 1 As the denominator is the same for all classes we only need to consider π k · p ( x | y = k ) and show that this can be written as a linear function of x . � c Introduction to Machine Learning – 5 / 10
LDA AS LINEAR CLASSIFIER π k · p ( x | y = k ) � − 1 2 x T Σ − 1 x − 1 2 µ T k Σ − 1 µ k + x T Σ − 1 µ k � ∝ π k exp log π k − 1 − 1 � 2 µ T k Σ − 1 µ k + x T Σ − 1 µ k � � 2 x T Σ − 1 x � = exp exp � θ 0 k + x T θ k � � − 1 2 x T Σ − 1 x � = exp exp � θ 0 k + x T θ k � ∝ exp by defining θ 0 k := log π k − 1 2 µ T k Σ − 1 µ k and θ k := Σ − 1 µ k . We have again left-out all constants which are the same for all classes − 1 2 x T Σ − 1 x � � k , so the normalizing constant of our Gaussians and exp ) By finally taking the log, we can write our transformed scores as linear: f k ( x ) = θ 0 k + x T θ k � c Introduction to Machine Learning – 6 / 10
QUADRATIC DISCRIMINANT ANALYSIS (QDA) QDA is a direct generalization of LDA, where the class densities are now Gaussians with unequal covariances Σ k . � � 1 − 1 2 ( x − µ k ) T Σ − 1 p ( x | y = k ) = exp k ( x − µ k ) p 1 2 | Σ k | ( 2 π ) 2 Parameters are estimated in a straight-forward manner by: n k π k ˆ = n , where n k is the number of class k observations 1 � x ( i ) µ k ˆ = n k i : y ( i ) = k 1 ( x ( i ) − ˆ µ k )( x ( i ) − ˆ ˆ � µ k ) T Σ k = n k − 1 i : y ( i ) = k � c Introduction to Machine Learning – 7 / 10
QUADRATIC DISCRIMINANT ANALYSIS (QDA) Covariance matrices can differ over classes. Yields better data fit but also requires estimation of more parameters. 15 10 X2 5 0 0 5 10 15 X1 � c Introduction to Machine Learning – 8 / 10
QUADRATIC DISCRIMINANT ANALYSIS (QDA) π k ( x ) ∝ π k · p ( x | y = k ) 2 exp( − 1 k x − 1 π k | Σ k | − 1 2 x T Σ − 1 2 µ T k Σ − 1 k µ k + x T Σ − 1 ∝ k µ k ) Taking the log of the above, we can define a discriminant function that is quadratic in x . log π k − 1 2 log | Σ k | − 1 k µ k − 1 2 µ T k Σ − 1 k µ k + x T Σ − 1 2 x T Σ − 1 k x � c Introduction to Machine Learning – 9 / 10
QUADRATIC DISCRIMINANT ANALYSIS (QDA) 2.5 2.0 Species Petal.Width 1.5 ● setosa versicolor virginica 1.0 ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 2 4 6 Petal.Length � c Introduction to Machine Learning – 10 / 10
Recommend
More recommend