linear manifold embeddings of pattern clusters
play

Linear Manifold Embeddings of Pattern Clusters Robert Haralick - PowerPoint PPT Presentation

Linear Manifolds The Algorithm Empirical Evaluation Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition Laboratory The Graduate Center, City University of New York DIMACS 2005 Linear Manifold


  1. Linear Manifolds The Algorithm Empirical Evaluation Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition Laboratory The Graduate Center, City University of New York DIMACS 2005 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  2. Linear Manifolds The Algorithm Empirical Evaluation Linear Manifolds Informally, a linear manifold is a subspace that may have been shifted away from the origin. A subspace is an instance of a linear manifold that contains the origin. C3 C4 C2 C1 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  3. Linear Manifolds The Algorithm Empirical Evaluation Linear Manifolds Each point x i in a set of a d-dim points that all lie on an m-dim linear manifold can be modeled as: . .   . . . .   x i = µ + b 1 · · · b m  λ i    . . . . . . Each point x i in a set of points that all manifest a shift pattern in the full space can be modeled as: x i = p + 1 L i e.g.       2 1 4  +  2 = x 1 = 6 1 8     4 1 6 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  4. Linear Manifolds The Algorithm Empirical Evaluation Linear Manifolds Each point x i in a set of a d-dim points that all lie on an m-dim linear manifold can be modeled as:  . .  . . . .   x i = µ + b 1 · · · b m  λ i    . . . . . . Each point x i in a set of points that all manifest a scale pattern in the full space can be modeled as: x i = pL i e.g.     2 4  2 = x 1 = 6 12    4 8 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  5. Linear Manifolds The Algorithm Empirical Evaluation Shift and Scale Patterns as Linear Manifolds PC 1 shift = ( 0 . 5774 , 0 . 5774 , 0 . 5774 ) ′ 800 PC 1 scale = ( 0 . 3810 , 0 . 2540 , 0 . 8890 ) ′ 600 scale scale 400 shift ✵ ✶ 200 1 1 1 R = 1 1 1 400 0 ❅ ❆ 300 400 200 1 1 1 300 200 100 100 0 0 400 700 PearsonR = 1 350 600 300 500 Scale Cluster MSR shift = 0 250 400 Shift Cluster 200 300 MSR scale = 3236 . 3 150 200 100 100 50 x y z 0 x y z Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  6. Linear Manifolds The Algorithm Empirical Evaluation Linear Manifolds - Patterns in Subspaces Shift pattern that exists only in a subspace: x i = B r ( µ r + 1 r φ i ) + B c ( µ c + λ i ) = B r µ r + B r 1 r φ i + B c µ c + B c λ i ( B r | B c ) = I 8 X1 X2 X3 X4 X5 X6 X7 X8 The linear manifold embedding: � � √ r φ i � � � 1 r � µ r x i = ( B r | B c ) + B r √ r | B c µ c λ i Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  7. Linear Manifolds The Algorithm Empirical Evaluation Linear Manifolds - Patterns in Subspaces x3 x2 X1 X2 X3 x1 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  8. Linear Manifolds The Algorithm Empirical Evaluation Linear Manifolds - Adding an Error Term Definition (The Linear Manifold Cluster Model) Let D be a set of d -dimensional points, C ⊆ D a subset of points that constitute a cluster, x i some point in C , b 1 , . . . , b d an orthonormal set of vectors that span R d , ( b i , . . . , b j ) a matrix whose columns are the vectors b i , . . . , b j , and µ some point in R d . Then each x i ∈ C is modeled by, . . . . ✵ ✶ ✵ ✶ . . . . . . . . ❇ ❈ ❇ ❈ x i = µ + b 1 · · · b m ❆ λ i + b m + 1 · · · b d ❆ ψ i ❇ ❈ ❇ ❈ ❅ ❅ . . . . . . . . . . . . Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  9. Linear Manifolds The Algorithm Empirical Evaluation Shift Pattern - Bicluster (Cheng 00), Floc (Yang 02), pCluster (Wang 02) Definition (Shift Pattern Cluster Model) Let D be a set of d -dimensional points, C ⊆ D the subset of points manifesting a shift pattern in some r-dimensional subspace of the data, and x i some point in C . Then each x i ∈ C can be modeled by, x i = B r µ r + B r 1 r φ i + B r ψ i + B c µ c + B c λ Proposition Every point x i in a d-dimensional space that fits the shift pattern cluster model, also fits the linear manifold cluster model, where the dimension of the linear manifold is d − r + 1 , and the model is given by: ✓ ✥ √ r φ i + ✒ µ r 1 ′ ✦ I r − 1 r 1 ′ ✓ ✒ ✒ ✓ B r 1 r √ r ψ i r r x i = ( B r | B c ) + √ r | B c + B r ψ i µ c r λ Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  10. Linear Manifolds The Algorithm Empirical Evaluation Scale Pattern Definition (Scale Pattern Cluster Model) Let D be a set of d -dimensional points, C ⊆ D the subset of points manifesting a scale pattern in some r-dimensional subspace of the data, and x i some point in C . Then each x i ∈ C can be modeled by, x i = φ i B r µ r + B r ψ i + B c µ c + B c λ i Proposition Every point x i in a d-dimensional space that fits the scale pattern cluster model, also fits the linear manifold cluster model, where the dimension of the linear manifold is d − r + 1 , and the model is given by: ✓ ✥ ✦ µ ′ µ r µ ′ ✒ ✒ ✓ µ r � µ r � φ i + � µ r � ψ i r r x i = B c µ c + B r � µ r �| B c + B r I r − ψ i � µ r � 2 λ i Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  11. Linear Manifolds The Algorithm Empirical Evaluation The Bicluster Model (Cheng et al. 00) i ∈ I , j ∈ J ( Y ij − ¯ Y i − ¯ Y j − ¯ 1 Y IJ ) 2 MSRS = H ( I , J ) = � | I || J | The Underlying Model - Two Way ANOVA Y ij = µ + φ i + ψ j + ǫ ij Each point in a bicluster can be modeled by: x i = 1 µ + 1 φ i + ψ + ǫ i where φ i is a scalar denoting the residual effect of the i -th gene, ψ = ( ψ 1 , . . . ψ d ) ′ a vector containing the residual effects of the conditions, and ǫ i ∼ N ( 0 , σ 2 I ) Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  12. Linear Manifolds The Algorithm Empirical Evaluation The Bicluster Model (Cheng et al. 00) Proposition Every point x i in a d-dim space that fits a bicluster model embedded in an r-dim subspace, also fits the linear manifold cluster model, where the dimension of the linear manifold is d − r + 1 , and the model is given by: ✓ ✥ √ r φ i + ✒ 1 r µ r + ψ ✦ 1 ′ I r − 1 r 1 ′ ✓ ✒ ✒ ✓ B r 1 r r √ r ǫ i r x i = ( B r | B c ) + √ r | B c + B r ǫ i µ c r λ i Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  13. Linear Manifolds The Algorithm Empirical Evaluation Subspace Clusters Consist of a subset of points and a corresponding subset of attributes, such that these points form a dense region in a subspace defined by the set of corresponding attributes. z y x y x CLIQUE (Agrawal 98), MAFIA (Nagesh 99),PROCLUS (Aggarwal 99), ORCLUS (Aggarwal 00) Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  14. Linear Manifolds The Algorithm Empirical Evaluation Other Instances of Linear Manifolds - Negative Correlations 800 700 ✵ ✶ 600 1 -1 1 R = -1 1 -1 500 ❅ ❆ 1 -1 1 400 300 200 PearsonR = 0 . 3181 100 x y z MSR = 18280 800 Yip et al. (2004)- HARP , to detect 700 co-regulated genes, create a reflective 600 z 500 copy of the data set, cluster and remove 400 the copy. 800 300 800 600 600 400 400 200 200 0 x y Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  15. Linear Manifolds The Algorithm Empirical Evaluation Other Instances of Linear Manifolds - Linear Combinations of Variables PearsonR = 0 . 4509 300 MSR = 8975 250 200 150 z 100 Coefficient of multiple determination: 50 0 600 P (ˆ z − ¯ z ) 2 R 2 = 400 600 z ) 2 = 1 400 200 P ( z − ¯ 200 0 0 y x 4C, Böhm et al. (2004) z = b 0 + b 1 x + b 2 y Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  16. Linear Manifolds The Algorithm Empirical Evaluation Other Instances of Linear Manifolds - Latent Variables x i = R ( µ + 1 d φ i ) 1 y i = x i − R µ = 1 d φ i 2 � x ′ i x i 1 2 3 4 5 6 7 8 φ i = 3 d R − 1 ⇓ � n C = 1 i = 1 y i ( 1 d φ i ) ′ 4 n [ u , s , v ] = svd ( C ) 5 R = uv ′ 6 1 2 3 4 5 6 7 8 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  17. Linear Manifolds The Algorithm Empirical Evaluation Data Transformations mean/var transformation on a shift pattern row mean subtraction on a shift pattern log transformation of a shift pattern shift pattern x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 mean/var transformation on a scale pattern log transformation on a scale pattern row mean subtraction on a scale pattern scale pattern x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 mean/var transformation on a 0−dim manifold 0−Dim manifold log transformation to a 0−Dim manifold row mean subtraction on a 0−Dim manifold x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 1 2 3 4 5 x1 x2 x3 x4 x5 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

  18. Linear Manifolds The Algorithm Empirical Evaluation Data Transformations Shift pattern before normalization Shift pattern after normalization x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Scale pattern before normalization Scale pattern after normalization x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Linear Manifold Embeddings of Pattern Clusters Haralick, Harpaz

Recommend


More recommend