connectivity optimized representation learning via
play

Connectivity-Optimized Representation Learning via Persistent - PowerPoint PPT Presentation

ICML | 2019 Long Beach Connectivity-Optimized Representation Learning via Persistent Homology Christoph D. Hofer, Roland Kwitt Mandar Dixit Marc Niethammer University of Salzburg UNC Chapel Hill Microsoft Unsupervised representation


  1. ICML | 2019 Long Beach Connectivity-Optimized Representation Learning via Persistent Homology Christoph D. Hofer, Roland Kwitt Mandar Dixit Marc Niethammer University of Salzburg UNC Chapel Hill Microsoft

  2. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc.

  3. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder

  4. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) Contractive AE’s [Rifai et al., ICML ’11] ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X + Reg Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder

  5. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input Denoising AE’s [Vincent et al., JMLR ’10] ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X Rec [ x, ˆ x ] f θ : X → Z � Encoder Decoder Large Perturb, or zero-out

  6. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input Sparse AE’s [Makhzani & Frey, ICLR ’14] ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X + Reg Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder

  7. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) Adversarial AE’s [Makhzani et al., ICLR ’16] ◮ etc. (by far not exhaustive) Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder � Enforce distributional properties through adversarial training

  8. Motivating (toy) example We aim to control properties of the latent space, but from a topological point of view !

  9. Motivating (toy) example We aim to control properties of the latent space, but from a topological point of view ! Assume, we want to do Kernel Density Estimation (KDE) in the latent space Z . Data ( z i ) Gaussian KDE Bandwidth selection: Scott’s rule [Scott, 1992]

  10. Motivating (toy) example We aim to control properties of the latent space, but from a topological point of view ! Assume, we want to do Kernel Density Estimation (KDE) in the latent space Z . Data ( z i ) Data ( z i ) Gaussian KDE Gaussian KDE Bandwidth selection: Scott’s rule [Scott, 1992] Bandwidth selection can be challenging, as the scaling greatly differs!

  11. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Latent space Z

  12. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Radius r = r 1 r Latent space Z

  13. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Radius r = r 2 Latent space Z

  14. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Radius r = r 3 Latent space Z ◮ PH tracks topological changes as the ball radius r increases ◮ Connectivity information is caputred by 0 -dim. persistent homology

  15. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Homogeneous arrangement! Radius r = r 3 What if z �→ f θ ( z ) η/ 2 Latent space Z ◮ PH tracks topological changes as the ball radius r increases beneficial for KDE ◮ Connectivity information is caputred by 0 -dim. persistent homology

  16. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? x x ˆ g φ : R n → X f θ : X → R n Rec [ · , · ]

  17. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? x ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH

  18. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? x ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH L η η , penalize deviation from homogeneous arrangement (with scale η )

  19. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? Gradient signal x ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH L η η , penalize deviation from homogeneous arrangement (with scale η )

  20. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? � Until now , we could not backpropagate through PH Gradient signal x x ˆ ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH L η η , penalize deviation from homogeneous arrangement (with scale η )

  21. Connectivity loss From a theoretical perspective , we show . . . · · · Enc Dec + Connectivity loss PH (1) . . . that under mild conditions, the connectivity loss is differentiable

  22. Connectivity loss From a theoretical perspective , we show . . . x 1 , . . . , x B · · · Enc Dec + Connectivity loss PH (1) . . . that under mild conditions, the connectivity loss is differentiable (2) . . . metric-entropy based guidelines for choosing the training batch size B

  23. Connectivity loss From a theoretical perspective , we show . . . x 1 , . . . , x N · · · Enc Dec + Connectivity loss N ≫ B PH (1) . . . that under mild conditions, the connectivity loss is differentiable (2) . . . metric-entropy based guidelines for choosing the training batch size B (3) . . . “densification ” e ff ects occur for samples, N , larger than the training batch size B

  24. Connectivity loss From a theoretical perspective , we show . . . x 1 , . . . , x N · · · Enc Dec + Connectivity loss N ≫ B PH (1) . . . that under mild conditions, the connectivity loss is differentiable (2) . . . metric-entropy based guidelines for choosing the training batch size B (3) . . . “ densi fi cation ” e ff ects occur for samples, N , larger than the training batch size B Intuitively , during training ... ... the reconstruction loss controls what is worth capturing ... the connectivity loss controls how to topologically organize the latent space

  25. Experiments – Task : One-class learning unlabled data Auxiliary f θ g φ Rec [ · , · ] + Connectivity loss (with fi xed scale η ) PH Trained only once (e.g., on C I FAR-10 without labels)

  26. Experiments – Task : One-class learning unlabled data Auxiliary f θ g φ Rec [ · , · ] + Connectivity loss (with fi xed scale η ) PH Trained only once (e.g., on C I FAR-10 without labels) KDE-inspired one-class "learning" One-class samples r = η/ 2 f θ

  27. Experiments – Task : One-class learning unlabled data Auxiliary f θ g φ Rec [ · , · ] + Connectivity loss (with fi xed scale η ) PH Trained only once (e.g., on C I FAR-10 without labels) KDE-inspired one-class "learning" Computation of a one-class score One-class samples r = η/ 2 I n-class f θ f θ f θ Out-of-class Count #samples falling into balls of radius η , anchored at the one-class instances

  28. Results – Task : One-class learning CIFAR-10 (AE trained on C I FAR-100) 0.8 0.8 ∅ AUROC 0.7 0.7 0.6 0.6 0.5 0.5 DAGMM DSEBM OC-SVM (CAE) Deep-SVDD ADT Ours -120 ADT [Goland & El-Yaniv, N I PS ’ 18] DAGMM [Zong et al., I CLR ’ 18] DSEBM [Zhai et al., I CML ’ 16] Training batch size: B = 100 Deep-SVDD [Ru ff et al., I CML ’ 18]

  29. Results – Task : One-class learning CIFAR-10 (AE trained on C I FAR-100) +7 points 0.8 0.8 ∅ AUROC 0.7 0.7 0.6 0.6 0.5 0.5 DAGMM DSEBM OC-SVM (CAE) Deep-SVDD ADT Ours -120 ADT-1,000 ADT-500 ADT-120 Ours -120 Low-sample size ADT [Goland & El-Yaniv, N I PS ’ 18] DAGMM [Zong et al., I CLR ’ 18] DSEBM [Zhai et al., I CML ’ 16] Training batch size: B = 100 Deep-SVDD [Ru ff et al., I CML ’ 18]

Recommend


More recommend