multidimensional scaling
play

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 - PowerPoint PPT Presentation

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 Outline Fundamental Idea Classical Multidimensional Scaling Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2013 How to represent in


  1. Multidimensional Scaling Applied Multivariate Statistics – Spring 2013

  2. Outline  Fundamental Idea  Classical Multidimensional Scaling  Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2013

  3. How to represent in two dimensions? Basic Idea Appl. Multivariate Statistics - Spring 2013

  4. Idea 1: Projection Appl. Multivariate Statistics - Spring 2013

  5. Idea 2: Squeeze on table Close points stay close Appl. Multivariate Statistics - Spring 2013

  6. Which idea is better? Appl. Multivariate Statistics - Spring 2013

  7. Idea of MDS  Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar  Classical/Metric MDS: Use a clever projection R: cmdscale  Non-metric MDS: Squeeze data on table, only conserve ranks R: isoMDS Appl. Multivariate Statistics - Spring 2013

  8. Classical MDS  Problem: Given euclidean distances among points, recover the position of the points!  Example: Road distance between 21 European cities (almost euclidean, but not quite) … Appl. Multivariate Statistics - Spring 2013

  9. Classical MDS  First try: Appl. Multivariate Statistics - Spring 2013

  10. Can identify points up to - shift Classical MDS - rotation - reflection  Flip axes: Appl. Multivariate Statistics - Spring 2013

  11. Classical MDS  Another example: Airpollution in US cities  Range of manu and popul is much bigger than range of wind  Need to standardize to give every variable equal weight Appl. Multivariate Statistics - Spring 2013

  12. Classical MDS Appl. Multivariate Statistics - Spring 2013

  13. Classical MDS: Theory  Input: Euclidean distances between n objects in p dimensions  Output: Position of points up to rotation, reflection, shift  Two steps: - Compute inner products matrix B from distance - Compute positions from B Appl. Multivariate Statistics - Spring 2013

  14. Classical MDS: Theory – Step 1 n * q data matrix  Inner products matrix B = XX T b ij = P q k =1 x ik x jk ij = P q k =1 ( x ik ¡ x jk ) 2 = ::: = b ii + b jj ¡ 2 b ij  Connect to distance: d 2  Center points to avoid shift invariance ³ ´ x = 0 ! P n i =1 x ik = 0 ! P i or j b ij = 0  Invert relationship: b ij = ¡ 1 2 ( d 2 ij ¡ d 2 i: ¡ d 2 :j + d 2 :: ) “doubly centered” (Hint for middle of page 108: Plug in (4.3) and equations on top of page 108 to show that the expression involving d’s is equal to b ij )  Thus, we obtained B from the distance matrix Appl. Multivariate Statistics - Spring 2013

  15. Classical MDS: Theory – Step 2  Since B = XX T , we need the “square root” of B  B is a symmetric and positive definite n*n matrix  Thus, B can be diagonalized: B = V ¤ V T D is a diagonal matrix with on diagonal ¸ 1 ¸ ¸ 2 ¸ ::: ¸ ¸ n (“eigenvalues”) V contains as columns normalized eigenvectors  Some eigenvalues will be zero; drop them: B = V 1 ¤ 1 V T 1 1  Take “square root”: X = V 1 ¤ 2 1  Thus we obtained the position of points from the distances between all points Appl. Multivariate Statistics - Spring 2013

  16. Classical MDS: Low-dim representation  Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors  The resulting X will be the low-dimensional representation we were looking for  Goodness of fit (GOF) if we reduce to m dimensions: P m (should be at least 0.8) i =1 ¸ i P n GOF = i =1 ¸ i  Finds “optimal” low -dim representation: Minimizes ³ ij ) 2 ´ S = P n P n ij ¡ ( d ( m ) d 2 i =1 j =1 Appl. Multivariate Statistics - Spring 2013

  17. Classical MDS: Pros and Cons + Optimal for euclidean input data + Still optimal, if B has non-negative eigenvalues (pos. semidefinite) + Very fast - No guarantees if B has negative eigenvalues However, in practice, it is still used then. New measures for Goodness of fit: P m P m P m i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) P n P n P n i GOF = GOF = GOF = i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) i Used in R function “ cmdscale ” Appl. Multivariate Statistics - Spring 2013

  18. Non-metric MDS: Idea  Sometimes, there is no strict metric on original points  Example: How beautiful are these persons? (1: Not at all, 10: Very much) 9 6 2 10 ?? 1 5 OR Appl. Multivariate Statistics - Spring 2013

  19. Non-metric MDS: Idea  Absolute values are not > that meaningful  Ranking is important  Non-metric MDS finds a low-dimensional representation, which respects the ranking of distances > Appl. Multivariate Statistics - Spring 2013

  20. Non-metric MDS: Theory  is the true dissimilarity, d ij is the distance of representation ± ij  Minimize STRESS ( is an increasing function): µ P i<j ( µ ( ± ij ) ¡ d ij ) 2 P S = i<j d 2 ij  Optimize over both position of points and µ  is called “disparity” ^ d ij = µ ( ± ij )  Solved numerically (isotonic regression); Classical MDS as starting value; very time consuming Appl. Multivariate Statistics - Spring 2013

  21. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5 3 STRESS = 19.7 A 2 B ± AB < ± BC < ± AC Appl. Multivariate Statistics - Spring 2013

  22. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 4.8 2.7 STRESS = 20.1 A 2 B ± AB < ± BC < ± AC Appl. Multivariate Statistics - Spring 2013

  23. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Stop if minimal STRESS is found. Compute best representation 5.2 STRESS = 18.9 2.9 We will finally represent the A 2 “transformed true distances” B (called disparities): d AB = 2 ; ^ ^ d BC = 2 : 9 ; ^ ± AB < ± BC < ± AC d AC = 5 : 2 instead of the true distances: ± AB = 2 ; ± BC = 3 ; ± AC = 5 Appl. Multivariate Statistics - Spring 2013

  24. Non-metric MDS: Pros and Cons + Fulfills a clear objective without many assumptions (minimize STRESS) + Results don’t change with rescaling or monotonic variable transformation + Works even if you only have rank information - Slow in large problems - Usually only local (not global) optimum found - Only gets ranks of distances right Appl. Multivariate Statistics - Spring 2013

  25. Non-metric MDS: Example  Do people in the same party vote alike?  Number of votes where 15 congressmen disagreed in 19 votes … Appl. Multivariate Statistics - Spring 2013

  26. Non-metric MDS: Example Appl. Multivariate Statistics - Spring 2013

  27. Concepts to know  Classical MDS: - Finds low-dim projection that respects distances - Optimal for euclidean distances - No clear guarantees for other distances - fast  Non-metric MDS: - Squeezes data points on table - respects only rankings of distances - (locally) solves clear objective - slow Appl. Multivariate Statistics - Spring 2013

  28. R commands to know  cmdscale included in standard R distribution  isoMDS from package “MASS” Appl. Multivariate Statistics - Spring 2013

Recommend


More recommend