Cluster graphs Rick Jardine School of Mathematical and Statistical - PowerPoint PPT Presentation

Cluster graphs Rick Jardine School of Mathematical and Statistical Sciences Western University October 26, 2017 Rick Jardine Cluster graphs

Foundational Mathematics for Machine Learning Tutte Institute Canadian Security Establishment (CSE) Ottawa May 23 – June 9, 2016 Rick Jardine Cluster graphs

Topological Data Analysis A data cloud is a finite set of points X ⊂ R N . (a metric space) Basic idea : analyze regions of the data cloud X , by density. Rips complex : s > 0: V s ( X ) has simplices { x 0 , . . . , x n } st d ( x i , x j ) < s for all i , j . If s < t , then V s ( X ) < V t ( X ) V s ( X ) is discrete for s small, contractible for s big. There are only finitely many isomorphism types V i ( X ) = V s i ( X ). Have an sequence of complexes (“filtration”, “dynamical system”) V 1 ( X ) ⊂ V 2 ( X ) ⊂ · · · ⊂ V k ( X ) ⊂ . . . What we care about is points and 1-simplices of V s ( X ): pairs of points ( x , y ) such that d ( x , y ) < s . Rick Jardine Cluster graphs

� � � � � � � � � � Path components Say that points x , y are in the same path component of V s ( X ) (write x ∼ s y ) if there is a string of segments (1-simplices) . . . x 1 � x 3 � x 0 x 2 x 4 . . . x n in X with x = x 0 , y = x n , and d ( x i , x i +1 ) < s for all i . Each pair ( x i , x i +1 ) defines a 1-simplex of V s ( X ). The picture defines a polygonal path of 1-simplices of V s ( X ) between x and y . x is related to y in V s ( X ) if there is a series of short hops (of length < s ) through points of X . π 0 V s ( X ) = the set of equivalence classes under ∼ s , is the set of path components of V s ( X ). Rick Jardine Cluster graphs

Varying the parameter s If s < t and x ∼ s y , then x ∼ t y . Hops of length < s are of length < t . There is a function of equivalence classes (path components) π 0 V s ( X ) → π 0 V t ( X ) , which is induced by the inclusion V s ( X ) ⊂ V t ( X ). Picture : • • • • · · · • • • • Rick Jardine Cluster graphs

Cluster We get a family of maps between path component sets π 0 V 1 ( X ) → π 0 V 2 ( X ) → · · · → π 0 V k ( X ) → . . . A “cluster” is a path component in some V i ( X ) that does not vary with i , “for a while”. How to express that? Suppose given functions F (1) α → F (2) α → . . . α → F ( k ) α − − − − → . . . For p < q , x ∈ F ( p ), y ∈ F ( q ), say that x ∼ y if α q − p ( x ) = y and ( α q − k ) − 1 ( y ) = { α k − p ( x ) } for all p ≤ k ≤ q : F ( p ) → F ( k ) → F ( q ) Clusters are equivalence classes in ∪ p F ( p ). Rick Jardine Cluster graphs

� � The graph Γ( F ) Sets and functions: F : F (1) α → F (2) α → . . . α → F ( k ) α − − − − → . . . Graph Γ( F ): vertices ( x , i ), x ∈ F ( i ), edges ( x , i ) → ( α ( x ) , i + 1). ( w , 1) α � ( v , 2) α � ( v , 3) α � · · · α ( x , 1) α � ( y , 2) ( y , 1) α � ( u , 3) α � · · · α � ( z , 2) ( z , 1) α A branch point is a vertex ( x , i ) with more than one incoming edge ( y , i − 1) → ( x , i ). Rick Jardine Cluster graphs

The cluster graph Remove all edges of Γ( F ) terminating in branch points to construct subgraph Γ 0 ( F ) ⊂ Γ( F ) Γ 0 ( F ) is the cluster graph for F . Graphs have path components, and the clusters are the path components of Γ 0 ( F ), ie. elements of π 0 Γ 0 ( F ). Alternatively : A cluster of F is a path ( x 0 , i ) → ( x 1 , i + 1) → · · · → ( x p , i + p ) of max length in Γ( F ) st no ( x j , i + j ) is a branch point for j > 0. NB: ( x 0 , i ) is a branch point, or x 0 has no preimage in F ( i − 1). Example : A cluster of { π 0 V i ( X ) } starts with a path component [ x ] ∈ π 0 ( V i ( X )) which was strictly smaller in V i − 1 ( X ) (branch point) and has a fixed size through the maps V i ( X ) → V i +1 ( X ) → · · · → V i + p ( X ) for some maximal p . Rick Jardine Cluster graphs

Noise The isolated groups of bright spots define “small” clusters. They join other clusters at some parameter value, which could be large. • • • • • • · · · • • • The small clusters are “noise”, up to some interpretation. Two ways to address this: 1) Every element of x s ∈ π 0 V s ( X ) has a cardinality | x s | . Score each cluster P : ( x s , s ) → ( x s +1 , s + 1) → · · · → ( x s + p , s + p ) by setting σ ( P ) = | x s | · p . Compare scores of clusters. 2) Throw away the path components of small size during the computation process. Rick Jardine Cluster graphs

Comments 1) The score � σ ( P ) = | x s | · p = | x i | ( x i , j ) ∈ P is the sum of the cardinalities | x i | of all path components appearing in the cluster P . 2) Clusters with big voids around them have higher scores than clusters of same size surrounded by smaller voids. 3) Scoring is relatively expensive. It can only be done after all other calculations. 4) Throwing away small path components (eg isolated stars, small groups) is brutal but computationally effective — can be done before constructing the cluster graph. Rick Jardine Cluster graphs

� � Higher dimensional persistence The Rips complex has subcomplexes (“Lesnick complexes”) · · · ⊂ L s , k +1 ( X ) ⊂ L s , k ( X ) ⊂ . . . L s , 0 ( X ) = V s ( X ) defined by valence of vertices, and natural in s . x ∈ L s , k ( X ) if it is a member of at least k edges ... another type of density measure. Have a rectangular array of inclusions of complexes � L s +1 , k ( X ) L s , k ( X ) � L s +1 , k +1 ( X ) L s , k +1 ( X ) all with potentially different vertices. Rick Jardine Cluster graphs

� Abstraction Computing path components gives rectangular array of functions α � F ( s + 1 , k ) F i , k = π 0 L s , k ( X ) : F ( s , k ) β � β � F ( s + 1 , k + 1) F ( s , k + 1) α There is a (directed) graph Γ( F ) with vertices ( x , ( i , j )) and edges ( x , ( i , j )) → ( α ( x ) , ( i + 1 , j )) and ( x , ( i , j )) → ( β ( x ) , ( i , j + 1)) . ( x , ( i , j )) is a horizontal branch point if there are distinct ( u , ( i − 1 , j )), ( v , ( i − 1 , j )) with α ( u ) = α ( v ) = x . Vertical branch points are defined similarly. Removing edges ending at branch points gives the cluster graph Γ 0 ( F ) ⊂ Γ( F ). The clusters are the path components π 0 Γ 0 ( F ). Rick Jardine Cluster graphs

� � � � � Example F is the diagram of functions � ∗ { x , y } x � ∗ ∗ ∗ is the one point set, and x : ∗ → { x , y } picks out the element x . Here’s Γ( F ): ( y , (0 , 1)) � ( ∗ , (1 , 1)) ( x , (0 , 1)) � ( ∗ , (1 , 0)) ( ∗ , (0 , 0)) ( ∗ , (1 , 1)) is the one (horizontal) branch point. Rick Jardine Cluster graphs

� � Example, cont. Γ 0 ( F ) is constructed by removing the edges ( x , (0 , 1)) → ( ∗ , (1 , 1)) and ( y , (0 , 1)) → ( ∗ , (1 , 1)) Γ 0 ( F ) is the graph ( y , (0 , 1)) ( x , (0 , 1)) ( ∗ , (1 , 1)) � ( ∗ , (1 , 0)) ( ∗ , (0 , 0)) { ( y , (0 , 1)) } is a noise object. Rick Jardine Cluster graphs

Scoring L s , k ( X ) of a data cloud X — just an example. A cluster P is a connected graph consisting of a set of vertices ( x , ( s , k )) with suitable edges. For each vertex ( x , ( s , k )), the element x is a path component (a set of vertices) in L s , k ( X ). The path component x has finite cardinality | x | . The score σ ( P ) of the cluster P is defined by � σ ( P ) = | x | . ( x , ( s , k )) ∈ P As before, we deal with noise by throwing away clusters with low scores, or by throwing away points ( x , ( s , k )) with | x | small, or both. Rick Jardine Cluster graphs

Explanation Suppose given an ascending sequence of finite sets P : P 0 ⊂ P 1 ⊂ P 2 ⊂ · · · ⊂ P n . The score σ ( P ) of the sequence P is given by n � σ ( P ) = | P i | . i =0 P 1 = P 0 ⊔ ( P 1 − P 0 ) P 2 = P 0 ⊔ ( P 1 − P 0 ) ⊔ ( P 2 − P 1 ) Multiplicities: The points of P 0 are counted n + 1 times, the points of P 1 − P 0 are counted n times, ... , the points of P n − P n − 1 are counted only once. Rick Jardine Cluster graphs

Comments At least for clusters, we may have the first viable approach to multidimensional persistence. These ideas apply to arrays of sets of all dimensions. eg. we could vary the data cloud X in persistence applications. Rick Jardine Cluster graphs

Persistent homology The Rips complexes V s ( X ) have homology groups H n ( V s ( X )), n ≥ 0, (coefficients in a fixed field k ), all finite dimensional vector spaces because all complexes are finite. The inclusions V i ( X ) ⊂ V i +1 ( X ) induce vector space morphisms H n ( V 1 ( X )) t → H n ( V 2 ( X )) t − − → . . . interpreted as a k [ t ]-module, a persistence module . Standard theorem about finitely generated modules over a principal ideal domain says that a persistence module is a direct sum of finite torsion modules k [ t ] / ( t p ) (shifted). Rick Jardine Cluster graphs

Cluster graphs Rick Jardine School of Mathematical and Statistical - PowerPoint PPT Presentation

Cluster graphs Rick Jardine School of Mathematical and Statistical Sciences Western University October 26, 2017 Rick Jardine Cluster graphs Foundational Mathematics for Machine Learning Tutte Institute Canadian Security Establishment (CSE)

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Cluster Architectures Overview Cluster Computing The Problem The Solution The Anatomy

Cluster algebras, snake graphs and continued fractions Ralf Schiffler Intro Cluster algebras

On surface cluster algebras: Snake graph Abstract Snake Graphs Relation to calculus and dreaded

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

history and drivers The Aerospace Cluster The Cluster-Association The Aerospace Cluster The

Getting started on the cluster Learning Objectives Describe the structure of a compute cluster

Cluster Minimization in Geometric Graphs Jakob Geiger Motivation Motivation Cluster

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

What is Cluster Analysis? Cluster: a collection of data objects Similar to one another

Homotopy theory and concurrency Rick Jardine Dagstuhl Seminar 16282 July 13, 2016 Rick Jardine

Path categories and algorithms Rick Jardine GETCO 2015 April 8, 2015 Rick Jardine Path

Homotopy theories of dynamical systems Rick Jardine University of Western Ontario July 15, 2013

Path categories and algorithms Rick Jardine Univ. of Western Ontario June 8, 2015 Rick Jardine

Cluster management at Google with Borg - coping with scale 2016-11 john wilkes /

Attempts to Axiomatize Clustering Shai Ben-David University of Waterloo, Canada NIPS Workshop

Ordered Cubes Ed Morehouse HoTT/UF, Oxford July 8, 2018 Various criteria for choosing a cubical

Stellar activity effects on high energy transits Joe Llama | joe.llama@lowell.edu | @joe_llama