Visualization 1 Applied Multivariate Statistics – Spring 2012
Goals Covariance, Correlation (true / sample version) Test for zero correlation: Fisher’s z -Transformation Scatterplot / Scatterplotmatrix Covariance matrix / Correlation matrix Multivariate Normal Distribution Mahalanobis distance Appl. Multivariate Statistics - Spring 2012 2
Visualization in 1d Appl. Multivariate Statistics - Spring 2012 3
Normaldistribution in 1d: Most common model choice 2 ¢ ( x ¡ ¹ ) 2 2 ¼¾ 2 exp( ¡ 1 1 ' ¹;¾ 2 ( x ) = ) p ¾ 2 Appl. Multivariate Statistics - Spring 2012 4
Squared Mahalanobis Distance = Normaldistribution in 1d: Sq. Distance from mean in Most common model choice standard deviations 2 ¢ ( x ¡ ¹ ) 2 2 ¼¾ 2 exp( ¡ 1 1 ' ¹;¾ 2 ( x ) = ) p ¾ 2 Appl. Multivariate Statistics - Spring 2012 5
Two variables: Covariance and Correlation Covariance: Cov ( X;Y ) = E [( X ¡ E [ X ])( Y ¡ E [ Y ])] 2 [ ¡1 ; 1 ] Correlation: Corr ( X; Y ) = Cov ( X;Y ) 2 [ ¡ 1; 1] ¾ X ¾ Y P n Sample covariance: d 1 Cov ( x; y ) = i =1 ( x i ¡ x )( y i ¡ y ) n ¡ 1 Cor ( x; y ) = c r xy = d Cov ( x;y ) Sample correlation: ¾ x ^ ^ ¾ y Correlation is invariant to changes in units, covariance is not (e.g. kilo/gram, meter/kilometer, etc.) Appl. Multivariate Statistics - Spring 2012 6
Scatterplot: Correlation is scale invariant Appl. Multivariate Statistics - Spring 2012 7
Intuition and pitfalls for correlation Correlation = LINEAR relation Appl. Multivariate Statistics - Spring 2012 8
Test for zero correlation: Fisher’s z -Test X, Y (bivariate) normal distributed with true correlation ½ Take n samples Compute sample correlation r ¡ 1+ r ¢ z = 1 Compute 2 log 1 ¡ r ¡ 1+ ½ ¢ » = 1 2 log Compute 1 ¡ ½ p n ¡ 1( z ¡ » ) » N (0 ; 1) For large n: Use cor.test() in R to test and get confidence intervals Appl. Multivariate Statistics - Spring 2012 9
Many dimensions: Scatterplot matrix Appl. Multivariate Statistics - Spring 2012 10
Covariance matrix / correlation matrix: Table of pairwise values True covariance matrix: § ij = Cov ( X i ;X j ) True correlation matrix: C ij = Cor ( X i ;X j ) S ij = d Sample covariance matrix: Cov ( x i ; x j ) Diagonal: Variances R ij = d Sample correlation matrix: Cor ( x i ;x j ) Diagonal: 1 Appl. Multivariate Statistics - Spring 2012 11
Multivariate Normal Distribution: Most common model choice ¡ ¢ 1 ¡ 1 2 ¢ ( x ¡ ¹ ) T § ¡ 1 ( x ¡ ¹ ) p f ( x ; ¹; §) = 2 ¼ j § j exp Appl. Multivariate Statistics - Spring 2012 12
Multivariate Normal Distribution: Funny facts If X 1 , …, X p multivariate normal, then every linear combination Y = a 1 X 1 + … + a p X p is normally distributed every projection on a subspace is multivariate normally distributed If margins are normally distributed, then it is NOT GUARANTEED that the underlying distribution is multivariate normal (i.e., “multivariate” is stronger than just normal margins) Appl. Multivariate Statistics - Spring 2012 13
Multivariate Normal Distribution: Two examples 1000 random samples µ ¶ µ ¶ µ ¶ µ ¶ 0 1 0 5 10 3 ¹ = ; § = ¹ = ; § = 0 0 1 10 3 2 Appl. Multivariate Statistics - Spring 2012 14
Sq. Mahalanobis Distance MD 2 (x ) = Sq. distance from mean in Multivariate Normal Distribution: standard deviations Most common model choice IN DIRECTION OF X ¡ ¢ 1 ¡ 1 2 ¢ ( x ¡ ¹ ) T § ¡ 1 ( x ¡ ¹ ) p f ( x ; ¹; §) = 2 ¼ j § j exp Appl. Multivariate Statistics - Spring 2012 15
µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 Appl. Multivariate Statistics - Spring 2012 16
µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 (20,0) MD = 4 Appl. Multivariate Statistics - Spring 2012 17
µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 (0,10) MD = 10 Appl. Multivariate Statistics - Spring 2012 18
µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 (10, 7) MD = 7.3 Appl. Multivariate Statistics - Spring 2012 19
Concepts to know Covariance, Correlation (true / sample version) Test for zero correlation: Fisher’s z -Transformation Scatterplot / Scatterplotmatrix Covariance matrix / Correlation matrix Multivariate Normal Distribution Mahalanobis distance Appl. Multivariate Statistics - Spring 2012 20
R commands to know read.csv, head, str, dim colMeans, cov, cor mvrnorm, t, solve, %*% Appl. Multivariate Statistics - Spring 2012 21
Recommend
More recommend