a plot for visualizing multivariate data
play

A Plot for Visualizing Multivariate Data Rida E. A. Moustafa - PowerPoint PPT Presentation

A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com Talk Outline The Theory of MV-Plot. Detecting Linear Structures with MV-plot.


  1. A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM Group,AAL rmoustaf@galaxy.gmu.edu rmustafa@aalcpas.com

  2. Talk Outline � The Theory of MV-Plot. � Detecting Linear Structures with MV-plot. � Detecting Non-Linear Structures with MV-plot. � Comparisons with other methods and application on real data.

  3. MV-Plot Theory Given an observation x=(x 1 ,x 2 ,…,x d ) We define m and v as follows: d ∑ = = ( ) 1 | | m f x x j d = 1 j d ∑ = = − 2 1 ( , ( )) | ( ) | v g x f x x f x d j = j 1 Computing m and v for every observation produces vector of m and v. What is the relationship between m and v?

  4. MV-Relationship in 2-d 2 ∑ = = + 1 1 | | (| | | |) m x x x 2 2 1 2 i ij i i = 1 j 2 ∑ = − = − 2 1 | | 1 v x m x x 1 2 i 2 ij i 2 i i = 1 j • Normalizing the data in range (0,1) avoid the abs-value in computing m. • Close to the PC in 2-d

  5. MV- detects linear structure(s) If the data is linear in the original space = + ⇒ = + + 1 ( 1 ) ; x w x w m w x w i 2 1 i 1 0 i 2 1 i 1 0 = − + 1 ( 1 ) v w x w i 2 1 i 1 0 + + ⇒ if w 1 + ≈ − = = ( 1 ) ( 1 ) ; w w a w a 1 1 1 0 0 ⇒ = + = + ; m a x a v a x a i 1 i 1 0 i 1 i 1 0 � It will be linear in the MV-space!!

  6. MV- detects linear structure(s)   − 1 d ∑ = + +   1 ( 1 ) m w x w 0 j j ij d   = 1 j   − 1 d ∑ = − − + − − 1   d ( 1 ) 1 ) ( 1 ) v d w x d w 0 j 2 j ij d   = j 1   − 1 d ∑ = +   m a x a 0 j j ij   = 1 j   − 1 d ∑ ≈ +   v a x a 0 j j ij   = 1 j

  7. Detecting Linear structure(s) Example I

  8. Detecting Linear structure(s) Example II

  9. Detecting Linear structure(s) Example III

  10. Detecting nonlinear data with MV-plot � MV- plot can detect nonlinear structure in the data set without any changes in the equations.

  11. Detecting nonlinear structure → = + = − , cos( ) cos( ), | cos( ) | x x m x x v x x → = + = − , sin( ) sin( ), | sin( ) | x x m x x v x x

  12. Detecting Sphere(s) Case I: • The sphere radius R • The sphere center is the origin 2 ( ) ( ) d d ∑ ∑ = − = − 2 2 2 1 1 v x m x dm i ij i ij i d d = = 1 1 j j ⇒ + = 2 2 2 . R v m i i d

  13. Detecting Sphere(s) Case II: • The sphere radius R • The sphere center is not the origin 2 ( ) d ∑ = − + − = 2 c c 1 v x x x m i ij j j i d = 1 j ( ) d ∑ = − − − 2 2 c c 1 ( ) ( ) x x d x m ij j j i d = 1 j ⇒ + = 2 2 2 . R v m i i d

  14. Detecting Sphere(s)

  15. Application on Real data � Fisher’s IRIS data (150x4) � 3-classes of( 50 point each) � Process control data (600x60) � 6-classes of (100 points each) � Pollen data (3,848x5) (Wegman’s data) � 2-classes (linear and nonlinear)

  16. Related Dimensional Reduction Methods � Multidimensional Scaling � Fisher Discriminate Analysis � Principal Component

  17. IRIS (R. A. Fisher) Dataset 1 50-cases in 4-dim

  18. Time Series Dataset 600-cases in 60-dim

  19. Pollen dataset 3,848-points in 5-dim Other methods: Require more storage and speed. Even if it work, we expect bad results on this particular data. (Wegman2002)

  20. Pollen dataset Linear and Nonlinear mixed structures.

  21. The linear structure in the Pollen data set 17+16+18+17+14+16=98 Linear, 3750 nonlinear

  22. Summary � MV-algorithm can discover the linear and nonlinear pattern at the same time. � MV-algorithm can discover symmetric data. � MV-algorithm deals with large multivariate data.

Recommend


More recommend