Statistical Analysis of Persistent Homology Genki Kusano (Tohoku - PowerPoint PPT Presentation

Statistical Analysis of Persistent Homology Genki Kusano (Tohoku University, D1) Topology and Computer 2016, Oct 28 @ Akita. Collaborators ： Kenji Fukumizu (The Institute of Statistical Mathematics ） Yasuaki Hiraoka (Tohoku University, AIMR) Persistence weighted Gaussian kernel for topological data analysis . Proceedings of the 33rd ICML, pp. 2004–2013, 2016

Self introduction • Interests : Applied topology, topological data analysis   B3 : Homology group, homological algebra   B4 : Persistent homology, computational homology   M1 : Applied topology to sensor network   “ Relative interleavings and applications to sensor networks ”,   JJIAM, 33(1),99-120, 2016.   M2 : Statistics, machine learning, kernel methods   “ Persistence weighted Gaussian kernel for topological data analysis ”,   ICML, pp. 2004–2013, 2016.   D1(now) : Time series analysis, dynamics, information geometry, … • Announcement : Joint Mathematics Meetings, January 4, 2017, Atlanta   ★ Statistical Methods in Computational Topology and Applications   Sheaves in Topological Data Analysis

？ Motivation of this work T opological D ata A nalysis (TDA, 位相的データ解析 ) Mathematical methods for characterizing “shapes of data” 30 30 10 25 25 5 20 20 0 15 15 − 5 10 10 − 10 5 5 − 15 0 0 25 10 25 15 20 20 25 25 5 10 15 20 15 20 15 15 5 10 10 0 10 10 0 5 5 5 5 − 5 − 5 0 0 0 0 − 10 Liquid Glass Solid Atomic configurations of liquid, glass, and solid state of silica ( , silica — composed of silicon and oxygen) SiO 2 At the configuration level, it is difficult to distinguish liquid and glass state.

？！ Motivation of this work Persistent homology / Persistence diagram Topological descriptor of data 30 30 10 25 25 5 20 20 0 15 15 − 5 10 10 − 10 5 5 − 15 0 0 25 10 25 15 20 20 25 25 5 10 15 20 15 20 15 15 5 10 10 0 10 10 0 5 5 5 5 − 5 − 5 0 0 0 0 − 10 Liquid Glass Solid 3 3 3 2 2 2 1.8 1.8 1.8 2.5 2.5 2.5 1.6 1.6 1.6 1.4 1.4 1.4 2 2 2 Multiplicity Multiplicity Multiplicity 1.2 1.2 1.2 A 2 ] [A ] A 2 ] 1.5 1.5 1.5 [ ˚ 1 1 [ ˚ 1 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 − 0.5 0 0.5 1 1.5 2 2.5 3 − 0.5 0 0.5 1 1.5 2 2.5 3 − 0.5 0 0.5 1 1.5 2 2.5 3 [ ˚ A 2 ] [ ˚ A 2 ] [ ˚ A 2 ] Y. Hiraoka et al., Hierarchical structures of amorphous solids characterized by persistent homology, PNAS, 113(26):7035–7040, 2016.

Motivation of this work Classification problem Liquid Glass 3 2 3 2 1.8 1.8 2.5 2.5 1.6 1.6 1.4 1.4 2 2 Multiplicity Multiplicity 1.2 1.2 A 2 ] A 2 ] 1.5 1 1.5 1 [ ˚ [ ˚ 0.8 0.8 1 1 0.6 0.6 0.4 0.4 0.5 0.5 0.2 0.2 0 0 − 0.5 0 0.5 1 1.5 2 2.5 3 − 0.5 0 0.5 1 1.5 2 2.5 3 [ ˚ A 2 ] [ ˚ A 2 ] Q. Can we distinguish them mathematically? A. Make a statistical framework for persistence diagrams

Section 1 What is a persistence diagram

Persistence diagram X β R 2 b β d β x β β β d β X r = S x i ∈ X B ( x i ; r ) b β B ( x ; r ) = { z | d ( x, z ) ≤ r }

Persistence diagram X D 1 ( X ) R 2 b β x α x β x β β α d β b α D 1 ( X ) = { x α , x β , x γ , · · · | x a = ( b α , d α ) }

Definition of persistence diagram Definition For a filtration , X : X 1 ⊂ X 2 ⊂ · · · ⊂ X n we compute homology groups with a field coefficient   K and obtain a sequence H q ( X ) : H q ( X 1 ) → H q ( X 2 ) → · · · → H q ( X n ) . This sequence is called a persistent homology. In this talk, we set a filtration by the union of balls X r = S x i ∈ X B ( x i ; r ) A persistent homology can be seen as a representation of   H q ( X ) -quiver. A n

Definition of persistence diagram From Gabriel and Krull-Remak-Schmidt theorem, there is the following decomposition: M H q ( X ) ∼ I [ b i , d i ] ( I is a finite set) = i ∈ I b d I [ b, d ] : 0 ← → 0 ← → 0 ← → 0 ← → 0 F ← → F ← F ← → · · · ← → → · · · ← → → · · · ← M From the decomposition , H q ( X ) ∼ I [ b i , d i ] = i ∈ I D q ( X ) = { ( b i , d i ) | i ∈ I } the persistence diagram is defined by . Remark: can be seen as a module over and   L r H q ( X r ) K [ t ] it can be decomposed from the structure theorem for PID.

Persistence Definition The lifetime of a cycle x α D 1 ( X ) pers( x α ) = d α − b α R 2 is called persistence. x α β x β x β γ x γ pers( x ) = k x � ∆ k ∞ α ∆ = { ( a, a ) | a ∈ R } A cycle with small persistence can be seen as a small cycle, and sometimes noisy cycle.

Metric structure of persistence diagram The set of persistence diagrams is defined by D = { D | D is a multiset in R 2 , ul and | D | < ∞ } R 2 ul = { ( b, d ) | b ≤ d ∈ R } where . The bottleneck ( -Wasserstein) metric ∞ , d B ( D, E ) = inf sup k x � γ ( x ) k ∞ ( γ : D [ ∆ ! E [ ∆ is bijective) γ x ∈ D ∪ ∆ where is the diagonal set, ∆ = { ( a, a ) | a ∈ R } becomes a distance on the set of persistence diagrams. ( D , d B ) Remark: is a metric space.

Stability theorem Theorem[Cohen-Steiner et al., 2007] d B ( D q ( X ) , D q ( Y )) ≤ d H ( X, Y ) X, Y ⊂ R d For finite subsets , , ⇢ � where is the Hausdorff distance. d H ( X, Y ) = max max p ∈ X min q ∈ Y d ( p, q ) , max q ∈ Y min p ∈ X d ( p, q ) Significant property This map is Lipchitz continuous. X → D q ( X ) ( Betti number is not continuous.) X → β q ( X ) = dim H q ( X ) X Y Death Birth

Stability theorem Theorem[Cohen-Steiner et al., 2007] d B ( D q ( X ) , D q ( Y )) ≤ d H ( X, Y ) X, Y ⊂ R d For finite subsets , , ⇢ � where is the Hausdorff distance. d H ( X, Y ) = max max p ∈ X min q ∈ Y d ( p, q ) , max q ∈ Y min p ∈ X d ( p, q ) Significant property This map is Lipchitz continuous. X → D q ( X ) ( Betti number is not continuous.) X → β q ( X ) = dim H q ( X ) α 1 X Y β 1 Death β 1 α 1 Birth

Statistical Topological Data Analysis Prediction Persistence Data diagram Classification D q ( X ) X Testing Facts Estimation (1)A persistence diagram is not a vector. (2)Standard statistical method is for vectors   (multivariate analysis)

Statistical Topological Data Analysis Prediction Persistence Data Vector diagram Classification D q ( X ) X This work Testing Facts Estimation (1)A persistence diagram is not a vector. (2)Standard statistical method is for vectors   (multivariate analysis) Make a vector representation of persistence diagram by kernel method

Section 2 Kernel method ~Statistical method for non-vector data~

Statistics for non-vector data Let be a data set and be obtained data. Ω x 1 , · · · , x n ∈ Ω To consider statistical properties of the data, it is sometimes needed to calculate summaries, like mean/average: n x 1 , · · · , x n → 1 X x i n i =1 Ω To calculate statistical summaries, the data set is desired to have structures of addition, multiplication by scalers, and inner product, that is, should be an inner product space. Ω The space of persistence diagrams is not an inner space.

Statistics for non-vector data While does not always have an inner product, by defining a Ω map , where is an inner product space, we can φ : Ω → H H consider statistical summaries in . H n x 1 , · · · , x n → φ ( x 1 ) , · · · , φ ( x n ) → 1 X φ ( x i ) ∈ H n i =1 (well-defined) Fact Many statistical summaries and machine learning techniques are calculated from the value of inner product: h φ ( x i ) , φ ( x j ) i H

Kernel method In kernel method, a positive definite kernel is used k : Ω × Ω → R as “non-linear’’ inner product on the data set. k ( x, y ) = h φ ( x ) , φ ( y ) i H x ∈ Ω k ( · , x ) : Ω → R For an element , is a function and a vector in the functional space . C ( Ω ) In many cases, what we need is just the Gram matrix ( k ( x i , x j )) i,j =1 , ··· ,n H φ ( x i ) = k ( · , x i ) Ω Machine learning ( k ( x i , x j )) x i Statistics

Kernel method In kernel method, a positive definite kernel is used k : Ω × Ω → R as “non-linear’’ inner product on the data set. k ( x, y ) = h φ ( x ) , φ ( y ) i H x ∈ Ω k ( · , x ) : Ω → R For an element , is a function and a vector in the functional space . C ( Ω ) In many cases, what we need is just the Gram matrix ( k ( x i , x j )) i,j =1 , ··· ,n H φ ( x i ) = k ( · , x i ) Ω Machine learning ( k ( x i , x j )) x i Kernel trick Statistics

Statistical Analysis of Persistent Homology Genki Kusano (Tohoku - PowerPoint PPT Presentation

Statistical Analysis of Persistent Homology Genki Kusano (Tohoku University, D1) Topology and Computer 2016, Oct 28 @ Akita. Collaborators Kenji Fukumizu (The Institute of Statistical Mathematics Yasuaki Hiraoka (Tohoku University, AIMR)

Persistent Homology: Persistence Modules Andrey Blinov 6 October 2017 Andrey Blinov Persistent

Partial Groups and Homology Groups, Partial Groups, Homology, Topology The homology of a

A Practical Guide to Persistent Homology Dmitriy Morozov Lawrence Berkeley National Lab A

A primer in persistent homology Bastian Rieck Motivation What is the shape of data?

Persistent Homology in Data Science Salzburg University of Applied Sciences, Austria May 13, 2020

Clay Lecture June 16, 2020 Fields Institute Persistent Homology From Chebyshev and Weierstrass

1 Homology: similarity among two or more individuals or lineages in a feature/character, or

Homology of generalized generalized graph homology generalizing to configuration spaces

Hardware Support for ACID Transactions in Persistent Memory Arpit Joshi , Vijay Nagarajan, Marcelo

Persistent Handles: approaches Ralph Bhme, Samba Team, SerNet 2018-06-08 Outline Persistent

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Artworks and Articles Meet Artworks and Articles Meet MAPPER and Persistent MAPPER and

Persistent Homology in Text Mining ACAT Meeting, Bremen Hubert Wagner (Jagiellonian University)

Introduction to Topological Data Analysis Persistent Homology Norm Matloff University of

HOMOLOGY AND ANALOGY DR.PIYUSH KUMAR RAI DEPARTMENT OF BOTANY SEMESTER- IV PAPER - BOT CC410

Direct computation of knot Floer homology and the Upsilon invariant Taketo Sano, joint work with

Fast Algorithms Estimating Statistics . . . Applications to Radar . . . for Computing Statistics

First 50 years of Survo: from a statistical program to an interactive environment for data

Part 3 Markov Chain Modeling Markov Chain Model Stochastic model Amounts to sequence of

Introduction to Quantitative XRF analysis Andreas - Germanos Karydas NSIL- Nuclear Science and

COMP 633 - Parallel Computing Lecture 13 September 24, 2020 Computational Accelerators COMP

Computational Methods for Neutrino Transport in Core-Collapse Supernovae Eirik Endeve

Analysis of Survival Times Using Bayesian Networks Helge Langseth Presented at ESREL 98

Finite element methods in scientifjc computing Wolfgang Bangerth, Colorado State University