Topological Autoencoders 13. November, 2019 Machine Learning and - PowerPoint PPT Presentation

Topological Autoencoders 13. November, 2019 Machine Learning and Computational Biology Group, ETH Zurich Michael Moor † , Max Horn † , Bastian Rieck ‡ and Karsten Borgwardt ‡

Motivation Representation of our data , but in 100 dimensional space 1

Motivation - Dimensionality reduction PCA t-SNE UMAP Autoencoder Issues Most methods preserve connectivity at local scales Goal We want to preserve connectivity at multiple scales 2

Topology - The study of connectivity Betti numbers characterize topological spaces Issues • Great for manifolds (which are usually unknown) • But instead approximated via samples • Topology on samples is noisy 3 • β 0 connected components • β 1 cycles • β 2 voids

Persistent Homology - Topology at multiple scales 0 8 7 6 5 4 3 2 Vietoris-Rips Complex: Calculate neighbourhood graph (simplicial complex for higher 1 Filtration: disappearance of topological features. dimensions) for all weight thresholds and keep track of the appearance and 4 ∅ = K 0 ⊆ K 1 ⊆ · · · ⊆ K n − 1 ⊆ K n = K ( ǫ 1 , ǫ 2 ) ǫ 2 � � E := ( u , v ) | dist ( p u , p v ) ≤ ǫ 0 1 2 3 4 5 6 7 8 ǫ 1

Method - Overview Recon- ? Topological loss Reconstruction loss Z struction X Input data X Latent code 5 ˜ ǫ ǫ ǫ ǫ

Distance matrix and relation to persistence diagrams 3 2 While the persistence computation is an inherently discrete process, we can nevertheless 0 0 1 2 4 3 5 6 7 8 Insight We can map all Persistent Homology computations of Flag complexes to individual edges of the distance matrix! 10 3 0 1 compute gradients due to one key observation. Distance matrix A 0 1 2 8 10 6 2 2 0 8   ǫ 2         Index: π 0 1 2 3 4 5 6 7 8 ǫ 1

Topological loss term 2 7 2 L t = L X→Z + L Z→X � π X �� π Z �� X � π X � Z � Z � π Z � X � L X→Z := 1 − A L Z→X := 1 − A � A � 2 � A � 2

Experiments

Datasets Spheres MNIST Fashion-MNIST 8

Spheres PCA t-SNE Autoencoder UMAP Topo-AE 9

MNIST PCA t-SNE Autoencoder UMAP Topo-AE 10

FashionMNIST PCA t-SNE Autoencoder UMAP Topo-AE 11

Insights and Summary • Novel method for preserving topological information of the input space in dimensionality reduction training of MLPs via backpropagation high-dimensional spheres 12 • Under weak theoretical assumptions our loss term is difgerentiable and allowing the • Our method was uniquely able to capture spatial relationships of nested

For further information and more theory Check out our paper on ArXiv! https://arxiv.org/abs/1906.00722 13

Appendix

Bound of bottleneck distance between persistence diagrams on subsampled data as Theorem Let X be a point cloud of cardinality n and X ( m ) be one subsample of X of cardinality m, i.e. X ( m ) ⊆ X, sampled without replacement. We can bound the probability of the persistence diagrams of X ( m ) exceeding a threshold in terms of the bottleneck distance � � D X , D X ( m ) � � � � X , X ( m ) � � d b ≤ P d H P > ǫ > 2 ǫ , where d H refers to the Hausdorfg distance between the point cloud and its subsample.

Expected value of Hausdorfg distance sorted such that the fjrst m rows correspond to the columns of the m subsampled 0 0 Theorem Let A ∈ n × m be the distance matrix between samples of X and X ( m ) , where the rows are points with diagonal elements a ii = 0 . Assume that the entries a ij with i > m are random samples following a distance distribution F D with supp( f D ) ∈ ≥ 0 . The minimal distances δ i for rows with i > m follow a distribution F ∆ . Letting Z := max 1 ≤ i ≤ n δ i with a corresponding distribution F Z , the expected Hausdorfg distance between X and X ( m ) for m < n is bounded by: + ∞ + ∞ � � � � � 1 − F D ( z ) ( n − 1 ) � � 1 − F D ( z ) m ( n − m ) � d H ( X , X ( m ) ) = E Z ∼ F Z [ Z ] ≤ d z ≤ d z E

Explicit Gradient Derivation 2 where i A A entry of the vector of paired distances. Letting θ refer to the parameters of the encoder , we have � 1 � 2 � ∂ θ L X→Z = ∂ ∂ � X � π X � Z � π X �� − A � A ∂ θ � � Z � π X � ∂ A π X �� ⊤ � X � π X � Z � = − − A ∂ θ � � π X �   � � � Z � π X � ∂ A π X �� ⊤ � � X � π X � Z � = − − A  ,   ∂ θ  i = 1 � � π X � Z � π X � � denotes the cardinality of a persistence pairing and A i refers to the i th

Density distribution error Defjnition (Density distribution error) (1) i.e. the Kullback–Leibler divergence between the two density estimates. Let σ ∈ > 0 . For a fjnite metric space S with an associated distance dist( · , · ) , we evaluate the density at each point x ∈ S as � − σ − 1 dist( x , y ) 2 � S ( x ) := � f σ exp , y ∈S where we assume without loss of generality that max dist( x , y ) = 1. We then calculate f σ X ( · ) and f σ Z ( · ) , normalise them such that they sum to 1, and evaluate X � f σ � Z � KL σ := KL f σ ,

Quantifjcation of performance TopoAE 0.901 0.166 0.389 0.163 0.00160 PCA MNIST 0.1207 20.5 0.956 0.980 0.032 0.392 0.054 0.00100 0.1020 13.2 20.7 0.974 0.968 0.026 0.478 0.068 0.00125 AE – 13.7 0.959 0.981 0.029 0.424 0.065 0.00163 0.745 0.2227 – AE 19.6 0.928 0.932 0.056 0.341 0.110 0.00114 TopoAE 0.1373 18.2 0.937 0.913 0.058 0.620 0.155 0.00156 – TSNE 14.6 0.938 0.940 0.051 0.321 0.146 0.00234 UMAP – 22.9 0.946 0.921 0.040 0.277 0.133 0.00214 UMAP 41.3 0.1388 11.8 0.250 0.157 0.613 0.01658 UMAP – 8.1 0.679 0.773 0.217 0.152 0.527 0.01271 TSNE 0.9610 0.626 0.635 0.747 0.294 0.332 0.651 0.01530 PCA – 10.4 0.676 0.790 0.246 0.181 0.420 0.00881 Isomap Spheres 0.752 9.3 0.974 – 0.967 0.020 0.405 0.071 0.00198 TSNE 0.1844 9.1 0.917 0.968 0.057 PCA F-MNIST 0.8681 13.5 0.658 0.822 0.272 0.085 0.326 0.00694 TopoAE 0.8155 13.3 0.588 0.607 0.349 0.566 0.746 0.01664 AE Data set Method KL 0 . 01 KL 0 . 1 KL 1 ℓ -MRRE ℓ -Cont ℓ -Trust ℓ -RMSE Data MSE 0.356 0.052 0.00069

Quantifjcation of performance - 2 0.851 0.817 33.6 – AE 0.668 0.035 0.00062 0.132 0.864 0.127 36.3 0.1403 TopoAE 0.108 0.927 0.845 37.9 0.920 0.617 0.026 0.00050 0.1398 UMAP CIFAR PCA 0.591 0.020 0.00023 0.119 0.931 0.821 17.7 0.1482 TSNE 0.627 0.030 0.00073 0.103 0.903 0.863 25.6 – Data set Method KL 0 . 01 KL 0 . 1 KL 1 ℓ -MRRE ℓ -Cont ℓ -Trust ℓ -RMSE Data MSE 0.556 0.019 0.00031

Topological Autoencoders 13. November, 2019 Machine Learning and - PowerPoint PPT Presentation

Topological Autoencoders 13. November, 2019 Machine Learning and Computational Biology Group, ETH Zurich Michael Moor , Max Horn , Bastian Rieck and Karsten Borgwardt Motivation Representation of our data , but in 100 dimensional

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Topological Sort Shivam Patel Viktor Zenkov Questions 1. Who first described topological sort?

Topological invariants in disordered topological insulators Subtitle: Spectral localizer of

Exotic topological states of ultra-cold atomic matter Lecture 1: Topolgical and non- topological

Lecture 19: Topological Mapping CS 344R/393R: Robotics Benjamin Kuipers Exploration Defines

G -bases in free objects of Topological Algebra (Local) -bases in topological and uniform

Topological states of matter: topological order vs SPT phases Victor Gurarie January 2018

EE 355 Unit 18 DFS and Topological Sort Mark Redekopp 2 Topological Sort Given a graph of

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M.

Understanding Geometric Attributes with Autoencoders Alasdair Newson Tlcom ParisTech

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

V-topologies, t-Henselian Fields and Definable Valuations Katharina Dupont Department of

Virtual Topologies Virtual Topologies Convenient process naming. Naming scheme to fit the

Degree correlations and topology generators Dmitri Krioukov dima@caida.org Priya Mahadevan and

Undecidability in group theory, topology, and F.p. groups Word problem Markov properties

Space-time structure may be topological and not geometrical Gabriele Carcassi and Christine

Boolean topological graphs of semigroups Micha l Stronkowski Belinda Trotta Warsaw

Tree sums and maximal connected I-spaces Adam Barto s drekin@gmail.com Faculty of Mathematics

Synthetic topology in Homotopy Type Theory for probabilistic programming Martin Bidlingmaier