visualization 1
play

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals - PowerPoint PPT Presentation

Visualization 1 Applied Multivariate Statistics Spring 2012 Goals Covariance, Correlation (true / sample version) Test for zero correlation: Fishers z -Transformation Scatterplot / Scatterplotmatrix Covariance matrix /


  1. Visualization 1 Applied Multivariate Statistics – Spring 2012

  2. Goals  Covariance, Correlation (true / sample version)  Test for zero correlation: Fisher’s z -Transformation  Scatterplot / Scatterplotmatrix  Covariance matrix / Correlation matrix  Multivariate Normal Distribution  Mahalanobis distance Appl. Multivariate Statistics - Spring 2012 2

  3. Visualization in 1d Appl. Multivariate Statistics - Spring 2012 3

  4. Normaldistribution in 1d: Most common model choice 2 ¢ ( x ¡ ¹ ) 2 2 ¼¾ 2 exp( ¡ 1 1 ' ¹;¾ 2 ( x ) = ) p ¾ 2 Appl. Multivariate Statistics - Spring 2012 4

  5. Squared Mahalanobis Distance = Normaldistribution in 1d: Sq. Distance from mean in Most common model choice standard deviations 2 ¢ ( x ¡ ¹ ) 2 2 ¼¾ 2 exp( ¡ 1 1 ' ¹;¾ 2 ( x ) = ) p ¾ 2 Appl. Multivariate Statistics - Spring 2012 5

  6. Two variables: Covariance and Correlation  Covariance: Cov ( X;Y ) = E [( X ¡ E [ X ])( Y ¡ E [ Y ])] 2 [ ¡1 ; 1 ]  Correlation: Corr ( X; Y ) = Cov ( X;Y ) 2 [ ¡ 1; 1] ¾ X ¾ Y P n  Sample covariance: d 1 Cov ( x; y ) = i =1 ( x i ¡ x )( y i ¡ y ) n ¡ 1 Cor ( x; y ) = c r xy = d Cov ( x;y )  Sample correlation: ¾ x ^ ^ ¾ y  Correlation is invariant to changes in units, covariance is not (e.g. kilo/gram, meter/kilometer, etc.) Appl. Multivariate Statistics - Spring 2012 6

  7. Scatterplot: Correlation is scale invariant Appl. Multivariate Statistics - Spring 2012 7

  8. Intuition and pitfalls for correlation Correlation = LINEAR relation Appl. Multivariate Statistics - Spring 2012 8

  9. Test for zero correlation: Fisher’s z -Test  X, Y (bivariate) normal distributed with true correlation ½  Take n samples  Compute sample correlation r ¡ 1+ r ¢ z = 1 Compute 2 log 1 ¡ r ¡ 1+ ½ ¢ » = 1 2 log Compute 1 ¡ ½ p n ¡ 1( z ¡ » ) » N (0 ; 1)  For large n:  Use cor.test() in R to test and get confidence intervals Appl. Multivariate Statistics - Spring 2012 9

  10. Many dimensions: Scatterplot matrix Appl. Multivariate Statistics - Spring 2012 10

  11. Covariance matrix / correlation matrix: Table of pairwise values  True covariance matrix: § ij = Cov ( X i ;X j )  True correlation matrix: C ij = Cor ( X i ;X j ) S ij = d  Sample covariance matrix: Cov ( x i ; x j ) Diagonal: Variances R ij = d  Sample correlation matrix: Cor ( x i ;x j ) Diagonal: 1 Appl. Multivariate Statistics - Spring 2012 11

  12. Multivariate Normal Distribution: Most common model choice ¡ ¢ 1 ¡ 1 2 ¢ ( x ¡ ¹ ) T § ¡ 1 ( x ¡ ¹ ) p f ( x ; ¹; §) = 2 ¼ j § j exp Appl. Multivariate Statistics - Spring 2012 12

  13. Multivariate Normal Distribution: Funny facts If X 1 , …, X p multivariate normal, then  every linear combination Y = a 1 X 1 + … + a p X p is normally distributed  every projection on a subspace is multivariate normally distributed If margins are normally distributed, then it is NOT GUARANTEED that the underlying distribution is multivariate normal (i.e., “multivariate” is stronger than just normal margins) Appl. Multivariate Statistics - Spring 2012 13

  14. Multivariate Normal Distribution: Two examples 1000 random samples µ ¶ µ ¶ µ ¶ µ ¶ 0 1 0 5 10 3 ¹ = ; § = ¹ = ; § = 0 0 1 10 3 2 Appl. Multivariate Statistics - Spring 2012 14

  15. Sq. Mahalanobis Distance MD 2 (x ) = Sq. distance from mean in Multivariate Normal Distribution: standard deviations Most common model choice IN DIRECTION OF X ¡ ¢ 1 ¡ 1 2 ¢ ( x ¡ ¹ ) T § ¡ 1 ( x ¡ ¹ ) p f ( x ; ¹; §) = 2 ¼ j § j exp Appl. Multivariate Statistics - Spring 2012 15

  16. µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 Appl. Multivariate Statistics - Spring 2012 16

  17. µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 (20,0) MD = 4 Appl. Multivariate Statistics - Spring 2012 17

  18. µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 (0,10) MD = 10 Appl. Multivariate Statistics - Spring 2012 18

  19. µ ¶ 0 ¹ = ; 0 µ ¶ Mahalanobis distance: Example 25 0 § = 0 1 (10, 7) MD = 7.3 Appl. Multivariate Statistics - Spring 2012 19

  20. Concepts to know  Covariance, Correlation (true / sample version)  Test for zero correlation: Fisher’s z -Transformation  Scatterplot / Scatterplotmatrix  Covariance matrix / Correlation matrix  Multivariate Normal Distribution  Mahalanobis distance Appl. Multivariate Statistics - Spring 2012 20

  21. R commands to know  read.csv, head, str, dim  colMeans, cov, cor  mvrnorm, t, solve, %*% Appl. Multivariate Statistics - Spring 2012 21

Recommend


More recommend