multidimensional scaling
play

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 - PowerPoint PPT Presentation

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental Idea Classical Multidimensional Scaling Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2012 2 How to represent


  1. Multidimensional Scaling Applied Multivariate Statistics – Spring 2012

  2. Outline  Fundamental Idea  Classical Multidimensional Scaling  Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2012 2

  3. How to represent in two dimensions? Basic Idea Appl. Multivariate Statistics - Spring 2012 3

  4. Idea 1: Projection Appl. Multivariate Statistics - Spring 2012 4

  5. Idea 2: Squeeze on table Close points stay close Appl. Multivariate Statistics - Spring 2012 5

  6. Which idea is better? Appl. Multivariate Statistics - Spring 2012 6

  7. Idea of MDS  Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar  Classical/Metric MDS: Use a clever projection R: cmdscale  Non-metric MDS: Squeeze data on table R: isoMDS Appl. Multivariate Statistics - Spring 2012 7

  8. Classical MDS  Problem: Given euclidean distances among points, recover the position of the points!  Example: Road distance between 21 European cities (almost euclidean, but not quite) … Appl. Multivariate Statistics - Spring 2012 8

  9. Classical MDS  First try: Appl. Multivariate Statistics - Spring 2012 9

  10. Can identify points up to - shift Classical MDS - rotation - reflection  Flip axes: Appl. Multivariate Statistics - Spring 2012 10

  11. Classical MDS  Another example: Airpollution in US cities  Range of manu and popul is much bigger than range of wind  Need to standardize to give every variable equal weight Appl. Multivariate Statistics - Spring 2012 11

  12. Classical MDS Appl. Multivariate Statistics - Spring 2012 12

  13. Classical MDS: Theory  Input: Euclidean distances between n objects in p dimensions  Output: Position of points up to rotation, reflection, shift  Two steps: - Compute inner products matrix B from distance - Compute positions from B Appl. Multivariate Statistics - Spring 2012 13

  14. Classical MDS: Theory – Step 1  Inner products matrix B = XX T  Connect to distance: d 2 ij = b ii + b jj ¡ 2 b ij  Center points to avoid shift invariance  Invert realtionship: b ij = ¡ 1 2 ( d 2 ij ¡ d 2 i: ¡ d 2 :j + d 2 :: ) “doubly centered” Appl. Multivariate Statistics - Spring 2012 14

  15. Classical MDS: Theory – Step 2  Since B = XX T , we need the “square root” of B  B is a symmetric and positive definite n*n matrix  Thus, B can be diagonalized: B = V ¤ V T D is a diagonal matrix with on diagonal ¸ 1 ¸ ¸ 2 ¸ ::: ¸ ¸ n (“eigenvalues”) V contains as columns normalized eigenvectors  Some eigenvalues will be zero; drop them: B = V 1 ¤ 1 V T 1 ¡ 1  Take “square root”: X = V 1 ¤ 2 1 Appl. Multivariate Statistics - Spring 2012 15

  16. Classical MDS: Low-dim representation  Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors  The resulting X will be the low-dimensional representation we were looking for  Goodness of fit (GOF) if we reduce to m dimensions: P m (should be at least 0.8) i =1 ¸ i P n GOF = i =1 ¸ i  Finds “optimal” low -dim representation: Minimizes ³ ij ) 2 ´ S = P n P n ij ¡ ( d ( m ) d 2 i =1 j =1 Appl. Multivariate Statistics - Spring 2012 16

  17. Classical MDS: Pros and Cons + Optimal for euclidean input data + Still optimal, if B has non-negative eigenvalues (pos. semidefinite) + Very fast - No guarantees if B has negative eigenvalues However, in practice, it is still used then. New measures for Goodness of fit: P m P m P m i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) P n P n P n i GOF = GOF = GOF = i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) i Used in R function “ cmdscale ” Appl. Multivariate Statistics - Spring 2012 17

  18. Non-metric MDS: Idea  Sometimes, there is no strict metric on original points  Example: How much do you like the portraits? (1: Not at all, 10: Very much) 9 2 6 10 ?? 5 1 OR Appl. Multivariate Statistics - Spring 2012 18

  19. Non-metric MDS: Idea  Absolute values are not > that meaningful  Ranking is important  Non-metric MDS finds a low-dimensional representation, which respects the ranking of distances > Appl. Multivariate Statistics - Spring 2012 19

  20. Non-metric MDS: Theory  is the true dissimilarity, d ij is the distance of representation ± ij  Minimize STRESS ( is an increasing function): µ P i<j ( µ ( ± ij ) ¡ d ij ) 2 P S = i<j d 2 ij  Optimize over both position of points and µ  is called “disparity” ^ d ij = µ ( ± ij )  Solved numerically (isotonic regression); Classical MDS as starting value; very time consuming Appl. Multivariate Statistics - Spring 2012 20

  21. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5 3 STRESS = 19.7 A 2 B d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 21

  22. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 4.8 2.7 STRESS = 20.1 A 2 B d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 22

  23. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5.2 STRESS = 18.9 2.9 A 2 B Stop if minimal STRESS is found. We will finally represent the distances d AB = 2, d BC = 2.9, d AC = 5.2 d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 23

  24. Non-metric MDS: Pros and Cons + Fulfills a clear objective without many assumptions (minimize STRESS) + Results don’t change with rescaling or monotonic variable transformation + Works even if you only have rank information - Slow in large problems - Usually only local (not global) optimum found - Only gets ranks of distances right Appl. Multivariate Statistics - Spring 2012 24

  25. Non-metric MDS: Example  Do people in the same party vote alike?  Agreement of 15 congressman in 19 votes … Appl. Multivariate Statistics - Spring 2012 25

  26. Non-metric MDS: Example Appl. Multivariate Statistics - Spring 2012 26

  27. Concepts to know  Classical MDS: - Finds low-dim projection that respects distances - Optimal for euclidean distances - No clear guarantees for other distances - fast  Non-metric MDS: - Squeezes data points on table - respects only rankings of distances - (locally) solves clear objective - slow Appl. Multivariate Statistics - Spring 2012 27

  28. R commands to know  cmdscale included in standard R distribution  isoMDS from package “MASS” Appl. Multivariate Statistics - Spring 2012 28

Recommend


More recommend