Resampling PCA & GP Inference Manfred Opper (ISIS, University - PowerPoint PPT Presentation

Resampling PCA & GP Inference Manfred Opper (ISIS, University of Southampton)

Motivation • Construct “simple” intractable GP model • Study approximate (EC/EP) inference • “MC” conceptually simple • Get a quantitative idea why EC inference works.

Resampling (Bootstrap) Estimate average case properties (test errors) of statistical estimators based on a single dataset D 0 = { y 1 , y 2 , y 3 } Bootstrap: Resample with replacement → Generate pseudo data. D 1 = { y 1 , y 2 , y 2 } , D 2 = { y 1 , y 1 , y 1 } , D 3 = { y 2 , y 3 , y 3 } , . . . etc Problem: Each sample requires retraining of some learning algorithm. Mapping to probabilistic model & Approximate inference: Only single training (inference) for single (effective) model required (Malzahn & Opper 2003).

PCA • Goal: Project ( d dimensional) data vectors y → P q [ y ] on q < d dimensional subspace with minimal reconstruction error E || y − P q [ y ] || 2 . • Method: Approximate expectation by N training data D 0 given by the ( d × N ) matrix Y = ( y 1 , y 2 , . . . , y N ). y i ∈ R d . d = ∞ allowed (feature vectors). Optimal subspace spanned by eigenvectors u l of data covariance matrix C = 1 N YY T corresponding to the q largest eigenvalues λ l ≥ λ .

Reconstruction Error Expected reconstruction error (on novel data) E ( y · u l ) 2 � ε ( λ ) = l : λ l <λ Resample averaged reconstruction error   E r = 1 � y i y T i u l u T � � Tr E D   l N 0   y i / ∈ D ; λ l <λ

Bootstrap of density of Eigenvalues 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 eigenvalue λ Bootstrap ( N = 50 random data, Dim = 25) 1 × and 3 × oversampled 2 1.4 1.8 1.2 1.6 1 1.4 1.2 0.8 1 0.6 0.8 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4

The model • Let s i = # times y i ∈ D • Diagonal random matrix D ii = D i = 1 C ( ǫ ) = Γ N YDY T . µ Γ( s i + ǫδ s i , 0 ) C (0) ∝ covariance matrix of the resampled data. • kernel matrix K = 1 N Y T Y • Partition function − 1 � � � K − 1 + D d N x exp 2 x T � � = Z x − 1 1 � � � 2 z T ( C ( ǫ ) + Γ I ) z 2 Γ d/ 2 (2 π ) ( N − d ) / 2 d d z exp = | K | .

Z as generating function N − 2 ∂ ln Z 1 δ s j , 0 Tr y j y T � = j G (Γ) ∂ǫ µN ǫ =0 j =1 − 2 ∂ ln Z d = Γ + Tr G (Γ) ∂ Γ with u k u T G (Γ) = ( C (0) + Γ I ) − 1 = k � λ k + Γ k Compare with (resample averaged) reconstruction error   E r = 1 � y i y T i u l u T � � E D Tr   l N 0   ∈ D ; λ l <λ y i /

Analytical Continuation Reconstruction error   E r = 1 y i y T i u l u T � � � Tr E D   l N 0   ∈ D ; λ l <λ y i / 1 Use representation of the Dirac δ δ ( x ) = lim η → 0 + ℑ π ( x − iη ) and get � λ 0 + dλ ′ ε r ( λ ′ ) E r = E 0 r + where   ε r ( λ ) = 1 η → 0 + ℑ 1 y j y T � � � π lim δ s j , 0 Tr j G ( − λ − iη ) E D  N 0 j defines error density from all eigenvalues > 0 and and E 0 r is the contribution from eigenspace with λ k = 0.

Replica Trick Data averaged free energy 1 n ln E D [ Z n ] , − E D [ln Z ] = − lim n → 0 for integer n : Z ( n ) . � = E D [ Z n ] = dx ψ 1 ( x ) ψ 2 ( x ) where we set x . = ( x 1 , . . . , x n ) and       n n  − 1  − 1  x T  x T a K − 1 x a � � ψ 1 ( x ) = E D  exp ψ 2 ( x ) = exp a Dx a   2 2 a =1  a =1 intractable!

Approximate Inference (EC: Opper & Winther) p 1 ( x ) = 1 p 0 ( x ) = 1 2 Λ 0 x T x , ψ 1 ( x ) e − Λ 1 x T x e − 1 Z 1 Z 0 with Λ 1 and Λ 0 “variational” parameters dx p 1 ( x ) ψ 2 ( x ) e Λ 1 x T x � Z ( n ) = Z 1 dx p 0 ( x ) ψ 2 ( x ) e Λ 1 x T x ≡ Z ( n ) � ≈ Z 1 EC (Λ 1 , Λ 0 ) Match moments � x T x � 1 = � x T x � 0 & Stationarity w.r.t. Λ 1 Final result 2 x T ( D +(Λ 0 − Λ) I ) x d x e − 1 � � � − ln Z EC = − E D ln − 2 x T ( K − 1 +Λ I ) x + ln d x e − 1 d x e − 1 � � 2 Λ 0 x T x − ln where we have set Λ = Λ 0 − Λ 1 . Tractable!

Result: Artificial Data N = 50 data, Dim = 25, 3 × oversampled. EC vs resampling 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 eigenvalue λ

The PCA Reconstruction Error ( N = 32 artificial random data, Dim = 25) Approximate bootstrap 3 × oversampled 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 eigenvalue λ test error versus sum of eigenvalues (training error)

Approximate Bootstrap: handwritten Digits ( N = 100 data, Dim = 784) Density of eigenvalues and reconstruction error 0.6 20 0.4 15 10 0.2 5 eigenvalue λ 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

The result without replicas 2 x T ( D +(Λ 0 − Λ) I ) x − ln 2 x T ( K − 1 +Λ I ) x + d x e − 1 d x e − 1 � � − ln Z = − ln 2 Λ 0 x T x + 1 d x e − 1 � + ln 2 ln det( I + r ) with � � � Λ 0 � − 1 − I � K − 1 + Λ I � r ij = 1 − Λ 0 . Λ 0 − Λ + D i ij Expand ∞ ( − 1) k +1 � r k � � ln det ( I + r ) = Tr ln ( I + r ) = Tr k k =1 We have E D [ r ij ] = 0 → 1.order term vanishes after average, 2.order yields on average � 2 � 2 � ∆ F = − 1 Λ 0 � − 1 � K − 1 + Λ I � � � Λ 0 − 1 × − 1 E D ii 4 Λ 0 − Λ + D i i i

Correction Correction to resampling error 0.6 22 Resampled reconstruction error ( λ = 0) 20 18 16 0.4 14 12 10 0.2 8 6 4 0 2 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Resampling rate µ Resampling rate µ

Correction to EC Z ( n ) �� dk dxp 1 ( x ) ψ 2 ( x ) e Λ 1 x T x = 1 2 Λ x T x (2 π ) Nn e − ik T x χ ( k ) � � = dxψ 2 ( x ) e Z 1 � dx p 1 ( x ) e − ik T x is the characteristic function of the density where χ ( k ) . = p 1 . Cumulant expansion starts with a quadratic term (EC) ln χ ( k ) = − M 2 2 k T k + R ( k ) , (1) where M 2 = � x T a x a � 1 . Expand 4-th order term in R ( k ) as e R ( k ) = 1 + R ( k ) + . . . leads to ∆ F . Possibility of perturbative improvement?

Conclusion • Non–Bayesian inference problems can be related to “hidden” probabilistic models via analytic continuation. • EC approximate inference appears to be robust and survives analytic continuation and limits.

Resampling PCA & GP Inference Manfred Opper (ISIS, University - PowerPoint PPT Presentation

Resampling PCA & GP Inference Manfred Opper (ISIS, University of Southampton) Motivation Construct simple intractable GP model Study approximate (EC/EP) inference MC conceptually simple Get a quantitative idea why

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Estimating the Performance of Predictive Models with Resampling Methods Florian Pargent

Introduction to Machine Learning Tuning: Nested Resampling compstat-lmu.github.io/lecture_i2ml

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Introduction to Machine Learning Evaluation: Resampling compstat-lmu.github.io/lecture_i2ml

Subsampling versus bootstrap in resampling-based model selection for multivariable regression

The EXQUIRES (EXtensible QUantitative Image RESampling) Test Suite: Impact of the Downsampler,

Controlling Adaptive Resampling Fons Adriaensen Casa della Musica, Parma Linux Audio Conference

Introduction to resampling methods Tushar Shanker Data Scientist DataCamp Statistical

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

Parametric bootstrap August 30, 2017 Resampling from the data or from distribution Simple

Stochastic simulation and resampling methods Statistical modelling: Theory and practice Gilles

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

c cientific omputing Automatic Differentiation of Computational Fluid Dynamics Package DROPS

Helicity off-shell matrix elements within high energy factorization Piotr Kotko Institute of

Controlled Risk Processes and Large Claims Hanspeter Schmidli University of Cologne Optimization

Jets and Fields on Lie Algebroids Geometry of Jets and Fields

(IHP-December 2006) 1 / 34 Geometric configurations and E 10 subalgebras of cosmological

Op en Charm Ph ysics at SPS and RHIC Ziw ei Lin T exas A&M Univ ersit y Pro

Extracted from Slide 1: Extracted from Slide 2: Extracted from Slide 3: Extracted from Slide 4:

SMACNA Technical Service Presented By: Patrick J Brooks, P.E. - Senior Project Manager Duct

Resampling PCA & GP Inference Manfred Opper (ISIS, University - PowerPoint PPT Presentation

Resampling PCA & GP Inference Manfred Opper (ISIS, University of Southampton) Motivation Construct simple intractable GP model Study approximate (EC/EP) inference MC conceptually simple Get a quantitative idea why

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Estimating the Performance of Predictive Models with Resampling Methods Florian Pargent

Introduction to Machine Learning Tuning: Nested Resampling compstat-lmu.github.io/lecture_i2ml

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Introduction to Machine Learning Evaluation: Resampling compstat-lmu.github.io/lecture_i2ml

Subsampling versus bootstrap in resampling-based model selection for multivariable regression

The EXQUIRES (EXtensible QUantitative Image RESampling) Test Suite: Impact of the Downsampler,

Controlling Adaptive Resampling Fons Adriaensen Casa della Musica, Parma Linux Audio Conference

Introduction to resampling methods Tushar Shanker Data Scientist DataCamp Statistical

Cross-validation and the Bootstrap In the section we discuss two resampling methods:

Parametric bootstrap August 30, 2017 Resampling from the data or from distribution Simple

Stochastic simulation and resampling methods Statistical modelling: Theory and practice Gilles

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

c cientific omputing Automatic Differentiation of Computational Fluid Dynamics Package DROPS

Helicity off-shell matrix elements within high energy factorization Piotr Kotko Institute of

Controlled Risk Processes and Large Claims Hanspeter Schmidli University of Cologne Optimization

Jets and Fields on Lie Algebroids Geometry of Jets and Fields

(IHP-December 2006) 1 / 34 Geometric configurations and E 10 subalgebras of cosmological

Op en Charm Ph ysics at SPS and RHIC Ziw ei Lin T exas A&amp;M Univ ersit y Pro

Extracted from Slide 1: Extracted from Slide 2: Extracted from Slide 3: Extracted from Slide 4:

SMACNA Technical Service Presented By: Patrick J Brooks, P.E. - Senior Project Manager Duct

Op en Charm Ph ysics at SPS and RHIC Ziw ei Lin T exas A&M Univ ersit y Pro