the owning house data
play

The owning house data Can we separate the points with a line? - PowerPoint PPT Presentation

Linear Discriminant Analysis Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 28, 2014 The owning house data Can we separate the points with a line? 200 Income (thousand


  1. Linear ¡Discriminant ¡Analysis ¡ Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 28, 2014

  2. The ¡owning ¡house ¡data ¡ Can we separate the points with a line? 200 Income (thousand rupees) 150 Equivalently, project the points onto 100 another line so that the projection of the 50 points in the two classes are separated 0 30 40 50 60 70 80 Age (Years) 2 ¡

  3. Not ¡same ¡as ¡Latent ¡Dirichlet ¡Alloca?on ¡(also ¡LDA) ¡ Linear ¡Discriminant ¡Analysis ¡(LDA) ¡ § Reduce dimensionality, preserve as much class discriminatory information as possible A projection with non- A projection with ideal ideal separation separation 3 ¡ The ¡figures ¡are ¡from ¡Ricardo ¡Gu?errez-­‑Osuna’s ¡slides ¡ ¡ ¡

  4. Projec?on ¡onto ¡a ¡line ¡– ¡basics ¡ ! $ ! # ! # 0.5 1.1 = 0.5 1.1 1 0 " $ " $ # & 0.7 0.8 " % 1 × 2 vector Projection onto 2 × 2 matrix the x axis norm=1 two data points Distances from represents (0.5,0.7) and the origin (1.1,0.8) the x axis ! # ! $ ! # 0.7 0.8 0 1 0.5 1.1 = " $ " $ # & 0.7 0.8 " % Projection onto the y axis Distances from the origin 4 ¡

  5. Projec?on ¡onto ¡a ¡line ¡– ¡basics ¡ ! $ ! # ! $ 1 1 0.5 1.1 = 0.85 1.34 # & " $ # & 2 2 # & " % 0.7 0.8 " % 1 × 2 vector, norm=1 Projection onto the x=y line the x=y line Distances from the origin distance of projection of x onto the line along w T x : w from origin = w T x a scalar x : any point w : some unit vector 5 ¡

  6. Projec?on ¡vector ¡for ¡LDA ¡ § Define a measure of separation (discrimination) § Mean vectors µ 1 and µ 2 for the two classes c 1 and c 2 , with N 1 and N 2 points: µ i = 1 ∑ x N i x ∈ c i § The mean vector projected onto the a unit vector w : µ i = 1 ! ∑ w T x = w T µ i N i x ∈ c i 6 ¡

  7. Towards ¡maximizing ¡separa?on ¡ § One approach: find a line such that the distance between projected means is maximized § Objective function J ( w ) µ 2 = w T ( µ 1 − µ 2 ) J ( w ) = ! µ 1 − ! Example: if w µ 1 is the unit vector along x Better separation µ 2 or y axis Better separation of means 7 ¡

  8. How ¡much ¡are ¡the ¡points ¡scaQered? ¡ § Scatter: within each class, variance of the projected points 2 ! ( w T x − ! ) s 2 ∑ i = µ i x ∈ c i § Within-class scatter of the projected samples: ! 1 + ! s 2 s 2 2 µ 1 µ 2 8 ¡

  9. Fisher’s ¡discriminant ¡ § Maximize difference between the projected means , normalized by within-class scatter 2 J ( w ) = ! µ 1 − ! µ 2 ! 1 + ! s 2 s 2 2 µ 1 µ 2 Separation of means and the points as well 9 ¡

  10. Formula?on ¡of ¡the ¡objec?ve ¡func?on ¡ § Measure of scatter in the feature space ( x ) T ∑ ( ) ( ) S i = x − µ i x − µ i x ∈ c i § The within-class scatter matrix is: S W = S 1 + S 2 § The scatter of projections, in terms of S W 2 = 2 ! ( w T x − ! ) ( ) ∑ ∑ s 2 w T x − w T µ i i = µ i x ∈ c i x ∈ c i w T x − µ i T w = w T S i w ∑ ( ) x − µ i ( ) = x ∈ c i ! 1 + ! s 2 s 2 2 = w T S W w Hence: 10 ¡

  11. Formula?on ¡of ¡the ¡objec?ve ¡func?on ¡ 2 µ 1 − ! ! § Similarly, the difference in terms of µ i ’ s in µ 2 the feature space 2 = w T µ 1 − w T µ 2 2 µ 1 − ! ! ( ) µ 2 = w T µ 1 − µ 2 T ( ) µ 1 − µ 2 ( ) ### w ! ## # " $ S B = w T S B w Between class scatter matrix § Fisher’s objective function in terms of S B and S W J ( w ) = w T S B w w T S W w 11 ¡

  12. Maximizing ¡the ¡objec?ve ¡func?on ¡ § Take derivative and solve for it being zero ! $ w T S B w d ] = d [ dw J ( w ) & = 0 # w T S W w dw " % " $ " $ d w T S B w d w T S W w # % # % " $ " $ ⇒ w T S W w − w T S B w = 0 # % # % dw dw " $ " $ ⇒ w T S W w % 2 S B w − w T S B w % 2 S W w = 0 # # Dividing by " % " % ⇒ w T S W w ' S B w − w T S B w same ' S W w = 0 $ $ w T S W w w T S W w denominator # & # & ⇒ S B w − J ( w ) S W w = 0 ⇒ J ( w ) w = S − 1 W S B w The generalized eigenvalue problem 12 ¡

  13. Limita?ons ¡of ¡LDA ¡ § LDA is a parametric method – Assumes Gaussian (normal) distribution of data – What if the data is very much non-Gaussian? µ 2 µ 1 = µ 2 µ 1 13 ¡

  14. Limita?ons ¡of ¡LDA ¡ § LDA depends on mean for the discriminatory information – What if it is mainly in the variance? µ 1 = µ 2 14 ¡

Recommend


More recommend