Cluster graphs Rick Jardine School of Mathematical and Statistical Sciences Western University October 26, 2017 Rick Jardine Cluster graphs
Foundational Mathematics for Machine Learning Tutte Institute Canadian Security Establishment (CSE) Ottawa May 23 – June 9, 2016 Rick Jardine Cluster graphs
Topological Data Analysis A data cloud is a finite set of points X ⊂ R N . (a metric space) Basic idea : analyze regions of the data cloud X , by density. Rips complex : s > 0: V s ( X ) has simplices { x 0 , . . . , x n } st d ( x i , x j ) < s for all i , j . If s < t , then V s ( X ) < V t ( X ) V s ( X ) is discrete for s small, contractible for s big. There are only finitely many isomorphism types V i ( X ) = V s i ( X ). Have an sequence of complexes (“filtration”, “dynamical system”) V 1 ( X ) ⊂ V 2 ( X ) ⊂ · · · ⊂ V k ( X ) ⊂ . . . What we care about is points and 1-simplices of V s ( X ): pairs of points ( x , y ) such that d ( x , y ) < s . Rick Jardine Cluster graphs
� � � � � � � � � � Path components Say that points x , y are in the same path component of V s ( X ) (write x ∼ s y ) if there is a string of segments (1-simplices) . . . x 1 � x 3 � x 0 x 2 x 4 . . . x n in X with x = x 0 , y = x n , and d ( x i , x i +1 ) < s for all i . Each pair ( x i , x i +1 ) defines a 1-simplex of V s ( X ). The picture defines a polygonal path of 1-simplices of V s ( X ) between x and y . x is related to y in V s ( X ) if there is a series of short hops (of length < s ) through points of X . π 0 V s ( X ) = the set of equivalence classes under ∼ s , is the set of path components of V s ( X ). Rick Jardine Cluster graphs
Varying the parameter s If s < t and x ∼ s y , then x ∼ t y . Hops of length < s are of length < t . There is a function of equivalence classes (path components) π 0 V s ( X ) → π 0 V t ( X ) , which is induced by the inclusion V s ( X ) ⊂ V t ( X ). Picture : • • • • · · · • • • • Rick Jardine Cluster graphs
Cluster We get a family of maps between path component sets π 0 V 1 ( X ) → π 0 V 2 ( X ) → · · · → π 0 V k ( X ) → . . . A “cluster” is a path component in some V i ( X ) that does not vary with i , “for a while”. How to express that? Suppose given functions F (1) α → F (2) α → . . . α → F ( k ) α − − − − → . . . For p < q , x ∈ F ( p ), y ∈ F ( q ), say that x ∼ y if α q − p ( x ) = y and ( α q − k ) − 1 ( y ) = { α k − p ( x ) } for all p ≤ k ≤ q : F ( p ) → F ( k ) → F ( q ) Clusters are equivalence classes in ∪ p F ( p ). Rick Jardine Cluster graphs
� � The graph Γ( F ) Sets and functions: F : F (1) α → F (2) α → . . . α → F ( k ) α − − − − → . . . Graph Γ( F ): vertices ( x , i ), x ∈ F ( i ), edges ( x , i ) → ( α ( x ) , i + 1). ( w , 1) α � ( v , 2) α � ( v , 3) α � · · · α ( x , 1) α � ( y , 2) ( y , 1) α � ( u , 3) α � · · · α � ( z , 2) ( z , 1) α A branch point is a vertex ( x , i ) with more than one incoming edge ( y , i − 1) → ( x , i ). Rick Jardine Cluster graphs
The cluster graph Remove all edges of Γ( F ) terminating in branch points to construct subgraph Γ 0 ( F ) ⊂ Γ( F ) Γ 0 ( F ) is the cluster graph for F . Graphs have path components, and the clusters are the path components of Γ 0 ( F ), ie. elements of π 0 Γ 0 ( F ). Alternatively : A cluster of F is a path ( x 0 , i ) → ( x 1 , i + 1) → · · · → ( x p , i + p ) of max length in Γ( F ) st no ( x j , i + j ) is a branch point for j > 0. NB: ( x 0 , i ) is a branch point, or x 0 has no preimage in F ( i − 1). Example : A cluster of { π 0 V i ( X ) } starts with a path component [ x ] ∈ π 0 ( V i ( X )) which was strictly smaller in V i − 1 ( X ) (branch point) and has a fixed size through the maps V i ( X ) → V i +1 ( X ) → · · · → V i + p ( X ) for some maximal p . Rick Jardine Cluster graphs
Noise The isolated groups of bright spots define “small” clusters. They join other clusters at some parameter value, which could be large. • • • • • • · · · • • • The small clusters are “noise”, up to some interpretation. Two ways to address this: 1) Every element of x s ∈ π 0 V s ( X ) has a cardinality | x s | . Score each cluster P : ( x s , s ) → ( x s +1 , s + 1) → · · · → ( x s + p , s + p ) by setting σ ( P ) = | x s | · p . Compare scores of clusters. 2) Throw away the path components of small size during the computation process. Rick Jardine Cluster graphs
Comments 1) The score � σ ( P ) = | x s | · p = | x i | ( x i , j ) ∈ P is the sum of the cardinalities | x i | of all path components appearing in the cluster P . 2) Clusters with big voids around them have higher scores than clusters of same size surrounded by smaller voids. 3) Scoring is relatively expensive. It can only be done after all other calculations. 4) Throwing away small path components (eg isolated stars, small groups) is brutal but computationally effective — can be done before constructing the cluster graph. Rick Jardine Cluster graphs
� � Higher dimensional persistence The Rips complex has subcomplexes (“Lesnick complexes”) · · · ⊂ L s , k +1 ( X ) ⊂ L s , k ( X ) ⊂ . . . L s , 0 ( X ) = V s ( X ) defined by valence of vertices, and natural in s . x ∈ L s , k ( X ) if it is a member of at least k edges ... another type of density measure. Have a rectangular array of inclusions of complexes � L s +1 , k ( X ) L s , k ( X ) � L s +1 , k +1 ( X ) L s , k +1 ( X ) all with potentially different vertices. Rick Jardine Cluster graphs
� Abstraction Computing path components gives rectangular array of functions α � F ( s + 1 , k ) F i , k = π 0 L s , k ( X ) : F ( s , k ) β � β � F ( s + 1 , k + 1) F ( s , k + 1) α There is a (directed) graph Γ( F ) with vertices ( x , ( i , j )) and edges ( x , ( i , j )) → ( α ( x ) , ( i + 1 , j )) and ( x , ( i , j )) → ( β ( x ) , ( i , j + 1)) . ( x , ( i , j )) is a horizontal branch point if there are distinct ( u , ( i − 1 , j )), ( v , ( i − 1 , j )) with α ( u ) = α ( v ) = x . Vertical branch points are defined similarly. Removing edges ending at branch points gives the cluster graph Γ 0 ( F ) ⊂ Γ( F ). The clusters are the path components π 0 Γ 0 ( F ). Rick Jardine Cluster graphs
� � � � � Example F is the diagram of functions � ∗ { x , y } x � ∗ ∗ ∗ is the one point set, and x : ∗ → { x , y } picks out the element x . Here’s Γ( F ): ( y , (0 , 1)) � ( ∗ , (1 , 1)) ( x , (0 , 1)) � ( ∗ , (1 , 0)) ( ∗ , (0 , 0)) ( ∗ , (1 , 1)) is the one (horizontal) branch point. Rick Jardine Cluster graphs
� � Example, cont. Γ 0 ( F ) is constructed by removing the edges ( x , (0 , 1)) → ( ∗ , (1 , 1)) and ( y , (0 , 1)) → ( ∗ , (1 , 1)) Γ 0 ( F ) is the graph ( y , (0 , 1)) ( x , (0 , 1)) ( ∗ , (1 , 1)) � ( ∗ , (1 , 0)) ( ∗ , (0 , 0)) { ( y , (0 , 1)) } is a noise object. Rick Jardine Cluster graphs
Scoring L s , k ( X ) of a data cloud X — just an example. A cluster P is a connected graph consisting of a set of vertices ( x , ( s , k )) with suitable edges. For each vertex ( x , ( s , k )), the element x is a path component (a set of vertices) in L s , k ( X ). The path component x has finite cardinality | x | . The score σ ( P ) of the cluster P is defined by � σ ( P ) = | x | . ( x , ( s , k )) ∈ P As before, we deal with noise by throwing away clusters with low scores, or by throwing away points ( x , ( s , k )) with | x | small, or both. Rick Jardine Cluster graphs
Explanation Suppose given an ascending sequence of finite sets P : P 0 ⊂ P 1 ⊂ P 2 ⊂ · · · ⊂ P n . The score σ ( P ) of the sequence P is given by n � σ ( P ) = | P i | . i =0 P 1 = P 0 ⊔ ( P 1 − P 0 ) P 2 = P 0 ⊔ ( P 1 − P 0 ) ⊔ ( P 2 − P 1 ) Multiplicities: The points of P 0 are counted n + 1 times, the points of P 1 − P 0 are counted n times, ... , the points of P n − P n − 1 are counted only once. Rick Jardine Cluster graphs
Comments At least for clusters, we may have the first viable approach to multidimensional persistence. These ideas apply to arrays of sets of all dimensions. eg. we could vary the data cloud X in persistence applications. Rick Jardine Cluster graphs
Persistent homology The Rips complexes V s ( X ) have homology groups H n ( V s ( X )), n ≥ 0, (coefficients in a fixed field k ), all finite dimensional vector spaces because all complexes are finite. The inclusions V i ( X ) ⊂ V i +1 ( X ) induce vector space morphisms H n ( V 1 ( X )) t → H n ( V 2 ( X )) t − − → . . . interpreted as a k [ t ]-module, a persistence module . Standard theorem about finitely generated modules over a principal ideal domain says that a persistence module is a direct sum of finite torsion modules k [ t ] / ( t p ) (shifted). Rick Jardine Cluster graphs
Recommend
More recommend