Linear Manifold Clustering Robert Haralick and Rave Harpaz
Outline Background The linear manifold cluster model The Linear manifold clustering algorithm Linear manifold modeling Linear manifold subspace correlation clustering Conclusion
Background Clustering is the process of classifying a collection patterns, into classes called clusters so that the patterns within a cluster are “similar” to one another, yet “dissimilar” to patterns in other clusters. Each clustering technique makes implicit assumptions The shape of the clusters The similarity criteria The grouping technique
Cluster Models database 2 hyper-spherical hyper-ellipsoidal arbitrary shaped linear nonlinear
K-Means Hyper-Spherical Clusters Choose K points at random to be cluster centers Assign each point to its closest cluster center Make the new cluster centers be the cluster means Iterate
K-Means Clusters
Subspace Clustering Definition Subspace clustering produces clusters which are compact on a subset of dimensions aligned with the coordinate axes and not compact on the orthogonal complement of those dimensions. z z x x y full space subspace (x-z projection) Subspace clustering handles High dimensional data Irrelevant features
Pattern and Correlation Clustering 1 2 3 4 5 6 7 8 parallel coordinate view Object similarity is no longer measured by physical distance, but by the behavior patterns objects manifest or the magnitude of correlations they induce. Problem Statement: Identify groups of points that exhibit coherent behavior patterns across a subset of the measurement features.
Pattern and Correlation Clustering - Applications 1 2 3 4 5 6 7 8 Gene expression micro-array analysis - identify groups of genes that exhibit similar expression patterns under some subset of conditions, from which gene function or regulatory mechanisms may be inferred. Collaborative filtering/recommendation systems - sets of customers/users with similar interest patterns need to be identified so that future interests can be predicted and proper recommendations be made. Dimensionality reduction by correlation Finance - identify groups of stocks that show similar price fluctuations under a certain time period.
Linear Manifold Clusters Definition L is a linear manifold of vector space V if and only if for some subspace S of V and translation t ∈ V , L = { x ∈ V | for some s ∈ S , x = t + s } . The dimension of L is the dimension of S , and if the dimension of L is one less than the dimension of V then L is called a hyperplane.
Linear Manifold Clusters Definition L is a linear manifold of vector space V if and only if for some subspace S of V and translation t ∈ V , L = { x ∈ V | for some s ∈ S , x = t + s } . The dimension of L is the dimension of S , and if the dimension of L is one less than the dimension of V then L is called a hyperplane. A linear manifold is, in other words, a subspace that may have been shifted away from the origin. A subspace is a linear manifold that contains the origin.
Dense Linear Manifold Clusters C3 200 150 C2 C1 100 100 50 50 0 0 0 50 100 150
The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold.
The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold.
The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented.
The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented. The points in the cluster induce a correlation among two or more attributes (or linear combinations of attributes) of the data.
The Linear Manifold Cluster Model The linear manifold cluster model has the following properties: The points in each cluster are embedded in a lower dimensional linear manifold. The intrinsic dimensionality of the cluster is the dimensionality of the linear manifold. The manifold is arbitrarily oriented. The points in the cluster induce a correlation among two or more attributes (or linear combinations of attributes) of the data. In the orthogonal complement space to the manifold the points form a compact densely populated region, which can be used to cluster the data.
The Linear Manifold Cluster Model Comment Classical clustering algorithms such as K-means assume that each cluster is associated with a zero dimensional manifold (the center) and therefore omit the possibility that a cluster may have non-zero dimensional linear manifold associated with it.
The Range Space of a Matrix Suppose B is a matrix. . . . . . . . . . B = b 1 b 2 · · · b N . . . . . . . . . and x is a vector x 1 x 2 x = . . . x N Let y = Bx .
The Range Space of a Matrix y = Bx x 1 . . . . . . . . . x 2 = b 1 b 2 · · · b N . . . . . . . . . . . . x N N � y = x n b n n = 1 y is a linear combination of the columns of B .
The Linear Manifold Cluster Model Each point x in a k - D linear manifold cluster is modeled by: x = µ + B φ + B ǫ x : d × 1 random vector µ : d × 1 translation vector in R d b 3 b 2 B : d × k matrix b 1 ′ B = 0 B : d × d − k matrix, B µ φ : k × 1 random vector ∼ U ( − R , R ) ǫ : d − k × 1 random vector ∼ N ( 0 , Σ) | Σ | is small
Linear Manifold Cluster Model x = µ + B φ + B ǫ E [ x ] = E [ µ + B φ + B ǫ ] = E [ µ ] + E [ B φ ] + E [ B ǫ ] = µ + BE [ φ ] + BE [ ǫ ] = µ
Orthogonal Projection Definition Let V be a vector space and W be any subspace of V . Represent vector v ∈ V as v = w + w ⊥ where w ∈ W and w ⊥ ∈ W ⊥ . Then w is called the orthogonal projection of v onto W and w ⊥ is the orthogonal projection of v onto W ⊥ . Theorem Let V be a vector space and W be any subspace of V. Let B be a matrix whose columns constitute an orthonormal basis of W. Let v ∈ V satisfy v = w + w ⊥ where w ∈ W and w ⊥ ∈ W ⊥ . Then ′ v w = BB
Singular Value Decomposition Definition The Singular Value Decomposition of a real matrix X N × K is the factoring of X as ′ K × K X N × K U N × N Λ N × K V = where ′ UU = I ′ = VV I Λ = rectangular diagonal
Thin Singular Value Decomposition Definition The Thin Singular Value Decomposition of a real matrix X N × K , K < N is the factoring of X as ′ K × K X N × K U N × K Λ K × K = V K K where ′ I K × K = U K U K ′ I K × K VV = Λ K = diagonal
Orthonormal Basis of Subspace Theorem Let X N × K have columns which span a K-dimensional subspace W. Let the thin singular value decomposition of X be ′ K × K X N × K U N × K Λ K × K = V K K Then ′ U K U K X = X Proof. ′ K × N ′ K × N ′ K × K ) U N × K X N × K U N × K ( U N × K Λ K × K U = U V K K K K K K ′ K × N ′ K × K U N × K U N × K )Λ K × K = ( U V K K K K ′ K × K U N × K Λ K × K = V K K = X
Distance To Linear Manifold Theorem Let a linear manifold L be represented by L = { z | z = µ + B φ } where µ is a vector that translates the origin to the manifold and the columns of B are orthonormal. Then the Euclidean distance of x to L is given by ′ )( x − µ ) � ρ ( x , L ) = � ( I − BB Proof. BB ′ is the orthogonal projection operator to the subspace spanned by the columns of B ′ is the orthogonal projection operator to the orthogonal complement of I − BB the subspace spanned by the columns of B ′ )( x − µ ) is the projection of x to the orthogonal complement of the linear ( I − BB manifold L ′ )( x − µ ) � is the distance of x to the manifold L � ( I − BB
Distance To Linear Manifold Proposition Let B be a matrix whose columns are orthonormal. Then � ′ ) y � � y � 2 − � B ′ y � 2 � ( I − BB = Proof. ′ ) y � 2 ′ y � 2 � ( I − BB = � y − BB ′ y ) ′ ( y − BB ′ y ) = ( y − BB ′ y − 2 y ′ BB ′ y + y ′ ( BB ′ )( BB ′ ) y = y ′ y − 2 y ′ BB ′ y + y ′ ( B ( B ′ B ) B ′ ) y = y ′ y − 2 y ′ BB ′ y + y ′ ( BB ′ ) y = y ′ y − y ′ BB ′ y = y � y � 2 − � B ′ y � 2 =
The Linear Manifold Clustering Algorithm C3 200 150 C2 C1 100 100 50 50 0 0 0 50 100 150 Outline - stochastic model fitting technique
The Linear Manifold Clustering Algorithm C3 200 150 C2 C1 100 100 50 50 0 0 0 50 100 150 Outline - stochastic model fitting technique Sample trial linear manifolds of various dimensions. 1
Recommend
More recommend