Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon - PowerPoint PPT Presentation

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University of She ffi eld, U.K. Probabilistic Scientific Computing Workshop ICERM at Brown 6th June 2017

Outline Dimensionality Reduction Conclusions

Motivation for Non-Linear Dimensionality Reduction USPS Data Set Handwritten Digit ◮ 3648 Dimensions ◮ 64 rows by 57 columns

Motivation for Non-Linear Dimensionality Reduction USPS Data Set Handwritten Digit ◮ 3648 Dimensions ◮ 64 rows by 57 columns ◮ Space contains more than just this digit.

Motivation for Non-Linear Dimensionality Reduction USPS Data Set Handwritten Digit ◮ 3648 Dimensions ◮ 64 rows by 57 columns ◮ Space contains more than just this digit. ◮ Even if we sample every nanosecond from now until the end of the universe, you won’t see the original six!

Simple Model of Digit Rotate a ’Prototype’

MATLAB Demo demDigitsManifold([1 2], ’all’)

MATLAB Demo demDigitsManifold([1 2], ’all’) 0.1 0.05 PC no 2 0 -0.05 -0.1 -0.1 -0.05 0 0.05 0.1 PC no 1

MATLAB Demo demDigitsManifold([1 2], ’sixnine’ ) 0.1 0.05 PC no 2 0 -0.05 -0.1 -0.1 -0.05 0 0.05 0.1 PC no 1

Low Dimensional Manifolds Pure Rotation is too Simple ◮ In practice the data may undergo several distortions. ◮ e.g. digits undergo ‘thinning’, translation and rotation. ◮ For data with ‘structure’: ◮ we expect fewer distortions than dimensions; ◮ we therefore expect the data to live on a lower dimensional manifold. ◮ Conclusion: deal with high dimensional data by looking for lower dimensional non-linear embedding.

Existing Methods Spectral Approaches ◮ Classical Multidimensional Scaling (MDS) (Mardia et al., 1979) . ◮ Uses eigenvectors of similarity matrix. ◮ Isomap (Tenenbaum et al., 2000) is MDS with a particular proximity measure. ◮ Kernel PCA (Sch¨ olkopf et al., 1998) ◮ Provides a representation and a mapping — dimensional expansion. ◮ Mapping is implied throught he use of a kernel function as a similarity matrix. ◮ Locally Linear Embedding (Roweis and Saul, 2000). ◮ Looks to preserve locally linear relationships in a low dimensional space.

Existing Methods II Iterative Methods ◮ Multidimensional Scaling (MDS) ◮ Iterative optimisation of a stress function (Kruskal, 1964). ◮ Sammon Mappings (Sammon, 1969) . ◮ Strictly speaking not a mapping — similar to iterative MDS. ◮ NeuroScale (Lowe and Tipping, 1997) ◮ Augmentation of iterative MDS methods with a mapping.

Existing Methods III Probabilistic Approaches ◮ Probabilistic PCA (Tipping and Bishop, 1999; Roweis, 1998) ◮ A linear method.

Existing Methods III Probabilistic Approaches ◮ Probabilistic PCA (Tipping and Bishop, 1999; Roweis, 1998) ◮ A linear method. ◮ Density Networks (MacKay, 1995) ◮ Use importance sampling and a multi-layer perceptron.

Existing Methods III Probabilistic Approaches ◮ Probabilistic PCA (Tipping and Bishop, 1999; Roweis, 1998) ◮ A linear method. ◮ Density Networks (MacKay, 1995) ◮ Use importance sampling and a multi-layer perceptron. ◮ Generative Topographic Mapping (GTM) (Bishop et al., 1998) ◮ Uses a grid based sample and an RBF network.

Existing Methods III Probabilistic Approaches ◮ Probabilistic PCA (Tipping and Bishop, 1999; Roweis, 1998) ◮ A linear method. ◮ Density Networks (MacKay, 1995) ◮ Use importance sampling and a multi-layer perceptron. ◮ Generative Topographic Mapping (GTM) (Bishop et al., 1998) ◮ Uses a grid based sample and an RBF network. Di ffi culty for Probabilistic Approaches ◮ Propagate a probability distribution through a non-linear mapping.

The New Model A Probabilistic Non-linear PCA ◮ PCA has a probabilistic interpretation (Tipping and Bishop, 1999; Roweis, 1998) . ◮ It is di ffi cult to ‘non-linearise’. Dual Probabilistic PCA ◮ We present a new probabilistic interpretation of PCA (Lawrence, 2005) . ◮ This interpretation can be made non-linear. ◮ The result is non-linear probabilistic PCA.

Notation q — dimension of latent / embedded space p — dimension of data space n — number of data points � ⊤ = � � ∈ ℜ n × p centred data, Y = � y 1 , : , . . . , y n , : y : , 1 , . . . , y : , p � ⊤ = � � ∈ ℜ n × q latent variables, X = � x 1 , : , . . . , x n , : x : , 1 , . . . , x : , q mapping matrix, W ∈ ℜ p × q a i , : is a vector from the i th row of a given matrix A a : , j is a vector from the j th row of a given matrix A

Reading Notation X and Y are design matrices ◮ Covariance given by n − 1 Y ⊤ Y . ◮ Inner product matrix given by YY ⊤ .

Linear Dimensionality Reduction Linear Latent Variable Model ◮ Represent data, Y , with a lower dimensional set of latent variables X . ◮ Assume a linear relationship of the form y i , : = Wx i , : + ǫ i , : , where � � 0 , σ 2 I ǫ i , : ∼ N .

Linear Latent Variable Model Probabilistic PCA ◮ Define linear-Gaussian X relationship between W latent variables and data. σ 2 Y n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N i = 1

Linear Latent Variable Model Probabilistic PCA ◮ Define linear-Gaussian X relationship between W latent variables and data. σ 2 ◮ Standard Latent Y variable approach: n � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N i = 1

Linear Latent Variable Model Probabilistic PCA X W ◮ Define linear-Gaussian relationship between latent variables and σ 2 Y data. ◮ Standard Latent variable approach: n ◮ Define Gaussian prior � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N over latent space , X . i = 1 n � � � N x i , : | 0 , I p ( X ) = i = 1

Linear Latent Variable Model X W Probabilistic PCA ◮ Define linear-Gaussian relationship between σ 2 Y latent variables and data. ◮ Standard Latent n variable approach: � � � y i , : | Wx i , : , σ 2 I p ( Y | X , W ) = N ◮ Define Gaussian prior i = 1 over latent space , X . n � ◮ Integrate out latent � � p ( X ) = N x i , : | 0 , I variables . i = 1 n � y i , : | 0 , WW ⊤ + σ 2 I � � p ( Y | W ) = N i = 1

Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : ,

Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : , Wx i , : ∼ N � 0 , WW ⊤ � ,

Computation of the Marginal Likelihood � � 0 , σ 2 I x i , : ∼ N ( 0 , I ) , ǫ i , : ∼ N y i , : = Wx i , : + ǫ i , : , Wx i , : ∼ N � 0 , WW ⊤ � , 0 , WW ⊤ + σ 2 I � � Wx i , : + ǫ i , : ∼ N

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) W σ 2 Y n � y i , : | 0 , WW ⊤ + σ 2 I � � p ( Y | W ) = N i = 1

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n � N � y i , : | 0 , C � , C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n � N � y i , : | 0 , C � , C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q ,

Linear Latent Variable Model II Probabilistic PCA Max. Likelihood Soln (Tipping and Bishop, 1999) n � N � y i , : | 0 , C � , C = WW ⊤ + σ 2 I p ( Y | W ) = i = 1 log p ( Y | W ) = − n 2 log | C | − 1 � � C − 1 Y ⊤ Y 2tr + const. If U q are first q principal eigenvectors of n − 1 Y ⊤ Y and the corresponding eigenvalues are Λ q , � 1 � W = U q LR ⊤ , Λ q − σ 2 I 2 L = where R is an arbitrary rotation matrix.

Linear Latent Variable Model III Dual Probabilistic PCA ◮ Define linear-Gaussian W relationship between X latent variables and data. σ 2 Y n � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N i = 1

Linear Latent Variable Model III Dual Probabilistic PCA ◮ Define linear-Gaussian W relationship between X latent variables and data. σ 2 ◮ Novel Latent variable Y approach: n � � y i , : | Wx i , : , σ 2 I � p ( Y | X , W ) = N i = 1

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon - PowerPoint PPT Presentation

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University of She ffi eld, U.K. Probabilistic Scientific Computing Workshop ICERM at Brown 6th June 2017 Outline Dimensionality Reduction Conclusions

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to:

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration

PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Pediatric Ruptured PCA Aneurysm Ricardo A Hanel, MD PhD Director, Stroke and Cerebrovascular

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon - PowerPoint PPT Presentation

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University of She ffi eld, U.K. Probabilistic Scientific Computing Workshop ICERM at Brown 6th June 2017 Outline Dimensionality Reduction Conclusions

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS,

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

Dimensionality Reduction &amp; Embedding Prof. Mike Hughes Many ideas/slides attributable to:

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration

PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Pediatric Ruptured PCA Aneurysm Ricardo A Hanel, MD PhD Director, Stroke and Cerebrovascular

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: