Multidimensional Scaling Applied Multivariate Statistics – Spring 2012
Outline Fundamental Idea Classical Multidimensional Scaling Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2012 2
How to represent in two dimensions? Basic Idea Appl. Multivariate Statistics - Spring 2012 3
Idea 1: Projection Appl. Multivariate Statistics - Spring 2012 4
Idea 2: Squeeze on table Close points stay close Appl. Multivariate Statistics - Spring 2012 5
Which idea is better? Appl. Multivariate Statistics - Spring 2012 6
Idea of MDS Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar Classical/Metric MDS: Use a clever projection R: cmdscale Non-metric MDS: Squeeze data on table R: isoMDS Appl. Multivariate Statistics - Spring 2012 7
Classical MDS Problem: Given euclidean distances among points, recover the position of the points! Example: Road distance between 21 European cities (almost euclidean, but not quite) … Appl. Multivariate Statistics - Spring 2012 8
Classical MDS First try: Appl. Multivariate Statistics - Spring 2012 9
Can identify points up to - shift Classical MDS - rotation - reflection Flip axes: Appl. Multivariate Statistics - Spring 2012 10
Classical MDS Another example: Airpollution in US cities Range of manu and popul is much bigger than range of wind Need to standardize to give every variable equal weight Appl. Multivariate Statistics - Spring 2012 11
Classical MDS Appl. Multivariate Statistics - Spring 2012 12
Classical MDS: Theory Input: Euclidean distances between n objects in p dimensions Output: Position of points up to rotation, reflection, shift Two steps: - Compute inner products matrix B from distance - Compute positions from B Appl. Multivariate Statistics - Spring 2012 13
Classical MDS: Theory – Step 1 Inner products matrix B = XX T Connect to distance: d 2 ij = b ii + b jj ¡ 2 b ij Center points to avoid shift invariance Invert realtionship: b ij = ¡ 1 2 ( d 2 ij ¡ d 2 i: ¡ d 2 :j + d 2 :: ) “doubly centered” Appl. Multivariate Statistics - Spring 2012 14
Classical MDS: Theory – Step 2 Since B = XX T , we need the “square root” of B B is a symmetric and positive definite n*n matrix Thus, B can be diagonalized: B = V ¤ V T D is a diagonal matrix with on diagonal ¸ 1 ¸ ¸ 2 ¸ ::: ¸ ¸ n (“eigenvalues”) V contains as columns normalized eigenvectors Some eigenvalues will be zero; drop them: B = V 1 ¤ 1 V T 1 ¡ 1 Take “square root”: X = V 1 ¤ 2 1 Appl. Multivariate Statistics - Spring 2012 15
Classical MDS: Low-dim representation Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors The resulting X will be the low-dimensional representation we were looking for Goodness of fit (GOF) if we reduce to m dimensions: P m (should be at least 0.8) i =1 ¸ i P n GOF = i =1 ¸ i Finds “optimal” low -dim representation: Minimizes ³ ij ) 2 ´ S = P n P n ij ¡ ( d ( m ) d 2 i =1 j =1 Appl. Multivariate Statistics - Spring 2012 16
Classical MDS: Pros and Cons + Optimal for euclidean input data + Still optimal, if B has non-negative eigenvalues (pos. semidefinite) + Very fast - No guarantees if B has negative eigenvalues However, in practice, it is still used then. New measures for Goodness of fit: P m P m P m i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) P n P n P n i GOF = GOF = GOF = i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) i Used in R function “ cmdscale ” Appl. Multivariate Statistics - Spring 2012 17
Non-metric MDS: Idea Sometimes, there is no strict metric on original points Example: How much do you like the portraits? (1: Not at all, 10: Very much) 9 2 6 10 ?? 5 1 OR Appl. Multivariate Statistics - Spring 2012 18
Non-metric MDS: Idea Absolute values are not > that meaningful Ranking is important Non-metric MDS finds a low-dimensional representation, which respects the ranking of distances > Appl. Multivariate Statistics - Spring 2012 19
Non-metric MDS: Theory is the true dissimilarity, d ij is the distance of representation ± ij Minimize STRESS ( is an increasing function): µ P i<j ( µ ( ± ij ) ¡ d ij ) 2 P S = i<j d 2 ij Optimize over both position of points and µ is called “disparity” ^ d ij = µ ( ± ij ) Solved numerically (isotonic regression); Classical MDS as starting value; very time consuming Appl. Multivariate Statistics - Spring 2012 20
Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5 3 STRESS = 19.7 A 2 B d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 21
Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 4.8 2.7 STRESS = 20.1 A 2 B d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 22
Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5.2 STRESS = 18.9 2.9 A 2 B Stop if minimal STRESS is found. We will finally represent the distances d AB = 2, d BC = 2.9, d AC = 5.2 d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 23
Non-metric MDS: Pros and Cons + Fulfills a clear objective without many assumptions (minimize STRESS) + Results don’t change with rescaling or monotonic variable transformation + Works even if you only have rank information - Slow in large problems - Usually only local (not global) optimum found - Only gets ranks of distances right Appl. Multivariate Statistics - Spring 2012 24
Non-metric MDS: Example Do people in the same party vote alike? Agreement of 15 congressman in 19 votes … Appl. Multivariate Statistics - Spring 2012 25
Non-metric MDS: Example Appl. Multivariate Statistics - Spring 2012 26
Concepts to know Classical MDS: - Finds low-dim projection that respects distances - Optimal for euclidean distances - No clear guarantees for other distances - fast Non-metric MDS: - Squeezes data points on table - respects only rankings of distances - (locally) solves clear objective - slow Appl. Multivariate Statistics - Spring 2012 27
R commands to know cmdscale included in standard R distribution isoMDS from package “MASS” Appl. Multivariate Statistics - Spring 2012 28
Recommend
More recommend