Stein Point Markov Chain Monte Carlo Wilson Chen Institute of - PowerPoint PPT Presentation

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan June 15, 2019 @ ICML Stein’s Method Workshop, Long Beach 1/14

Collaborators Alessandro Barp Fran¸ cois-Xavier Briol Jackson Gorham Mark Girolami Lester Mackey Chris Oates 2/14

Empirical Approximation Problem A major problem in machine learning and modern statistics is to approximate some difficult-to-compute density p defined on some domain X ⊆ R d where normalisation constant is unknown . I.e., p ( x ) = ˜ p ( x ) /Z and Z > 0 is unknown. 3/14

Empirical Approximation Problem A major problem in machine learning and modern statistics is to approximate some difficult-to-compute density p defined on some domain X ⊆ R d where normalisation constant is unknown . I.e., p ( x ) = ˜ p ( x ) /Z and Z > 0 is unknown. We consider an empirical approximation of p with points { x i } n i =1 : n p n ( x ) = 1 � ˆ δ ( x − x i ) , n i =1 so that for test function f : X → R : n � f ( x ) p ( x )d x ≈ 1 � f ( x i ) . n X i =1 3/14

Empirical Approximation Problem A major problem in machine learning and modern statistics is to approximate some difficult-to-compute density p defined on some domain X ⊆ R d where normalisation constant is unknown . I.e., p ( x ) = ˜ p ( x ) /Z and Z > 0 is unknown. We consider an empirical approximation of p with points { x i } n i =1 : n p n ( x ) = 1 � ˆ δ ( x − x i ) , n i =1 so that for test function f : X → R : n � f ( x ) p ( x )d x ≈ 1 � f ( x i ) . n X i =1 A popular approach is Markov chain Monte Carlo. 3/14

Discrepancy Idea – construct a measure of discrepancy D (ˆ p n , p ) with desirable features: ∗ • Detect (non)convergence. I.e., D (ˆ p n , p ) → 0 only if ˆ p n − → p . • Efficiently computable with limited access to p . 4/14

Discrepancy Idea – construct a measure of discrepancy D (ˆ p n , p ) with desirable features: ∗ • Detect (non)convergence. I.e., D (ˆ p n , p ) → 0 only if ˆ p n − → p . • Efficiently computable with limited access to p . Unfortunately not the case for many popular discrepancy measures: • Kullback-Leibler divergence, • Wasserstein distance, • Maximum mean discrepancy (MMD). 4/14

Kernel Embedding and MMD Kernel embedding of a distribution p � µ p ( · ) = k ( x, · ) p ( x )d x (a function in the RKHS K ) 5/14

Kernel Embedding and MMD Kernel embedding of a distribution p � µ p ( · ) = k ( x, · ) p ( x )d x (a function in the RKHS K ) Consider the maximum mean discrepancy (MMD) as an option for D : p n − µ p � K =: D k,p ( { x i } n D (ˆ p n , p ) := � µ ˆ i =1 ) 5/14

Kernel Embedding and MMD Kernel embedding of a distribution p � µ p ( · ) = k ( x, · ) p ( x )d x (a function in the RKHS K ) Consider the maximum mean discrepancy (MMD) as an option for D : p n − µ p � K =: D k,p ( { x i } n D (ˆ p n , p ) := � µ ˆ i =1 ) i =1 ) 2 = � µ ˆ ∴ D k,p ( { x i } n p n − µ p � 2 K = � µ ˆ p n − µ p , µ ˆ p n − µ p � = � µ ˆ p n , µ ˆ p n � − 2 � µ ˆ p n , µ p � + � µ p , µ p � We are faced with intractable integrals w.r.t. p ! 5/14

Kernel Embedding and MMD Kernel embedding of a distribution p � µ p ( · ) = k ( x, · ) p ( x )d x (a function in the RKHS K ) Consider the maximum mean discrepancy (MMD) as an option for D : p n − µ p � K =: D k,p ( { x i } n D (ˆ p n , p ) := � µ ˆ i =1 ) i =1 ) 2 = � µ ˆ ∴ D k,p ( { x i } n p n − µ p � 2 K = � µ ˆ p n − µ p , µ ˆ p n − µ p � = � µ ˆ p n , µ ˆ p n � − 2 � µ ˆ p n , µ p � + � µ p , µ p � We are faced with intractable integrals w.r.t. p ! For a Stein kernel k 0 : � µ p ( · ) = k 0 ( x, · ) p ( x )d x = 0 . i =1 ) 2 =: KSD 2 ! p n − µ p � 2 p n � 2 K 0 =: D k 0 ,p ( { x i } n ∴ � µ ˆ K 0 = � µ ˆ 5/14

Kernel Stein Discrepancy (KSD) The kernel Stein discrepancy (KSD) is given by � n n � i =1 ) = 1 � � D k 0 ,p ( { x i } n � k 0 ( x i , x j ) , � n i =1 j =1 6/14

Kernel Stein Discrepancy (KSD) The kernel Stein discrepancy (KSD) is given by � n n � i =1 ) = 1 � � D k 0 ,p ( { x i } n � k 0 ( x i , x j ) , � n i =1 j =1 where k 0 is the Stein kernel k 0 ( x, x ′ ) := T p T ′ p k ( x, x ′ ) = ∇ x · ∇ x ′ k ( x, x ′ ) + �∇ x log p ( x ) , ∇ x ′ k ( x, x ′ ) � + �∇ x ′ log p ( x ′ ) , ∇ x k ( x, x ′ ) � + �∇ x log p ( x ) , ∇ x ′ log p ( x ′ ) � k ( x, x ′ ) , with T p f = ∇ ( pf ) /p . ( T p is a Stein operator.) 6/14

Kernel Stein Discrepancy (KSD) The kernel Stein discrepancy (KSD) is given by � n n � i =1 ) = 1 � � D k 0 ,p ( { x i } n � k 0 ( x i , x j ) , � n i =1 j =1 where k 0 is the Stein kernel k 0 ( x, x ′ ) := T p T ′ p k ( x, x ′ ) = ∇ x · ∇ x ′ k ( x, x ′ ) + �∇ x log p ( x ) , ∇ x ′ k ( x, x ′ ) � + �∇ x ′ log p ( x ′ ) , ∇ x k ( x, x ′ ) � + �∇ x log p ( x ) , ∇ x ′ log p ( x ′ ) � k ( x, x ′ ) , with T p f = ∇ ( pf ) /p . ( T p is a Stein operator.) • This is computable without the normalisation constant. • Requires gradient information ∇ log p ( x i ) . • Detects (non)convergence for an appropriately chosen k (e.g., the IMQ kernel). 6/14

Stein Points (SP) The main idea of Stein Points is the greedy minimisation of KSD: D k 0 ,p ( { x i } j − 1 x j | x 1 , . . . , x j − 1 ← arg min i =1 ∪ { x } ) x ∈X j − 1 � = arg min k 0 ( x, x ) + 2 k 0 ( x, x i ) . x ∈X i =1 7/14

Stein Points (SP) The main idea of Stein Points is the greedy minimisation of KSD: D k 0 ,p ( { x i } j − 1 x j | x 1 , . . . , x j − 1 ← arg min i =1 ∪ { x } ) x ∈X j − 1 � = arg min k 0 ( x, x ) + 2 k 0 ( x, x i ) . x ∈X i =1 A global optimisation step is needed for each iteration. 7/14

Stein Point Markov Chain Monte Carlo (SP-MCMC) We propose to replace the global minimisation at each iteration j of the SP method with a local search based on a p -invariant Markov chain of length m j . The proposed SP-MCMC method proceeds as follows: 1. Fix an initial point x 1 ∈ X . 2. For j = 2 , . . . , n : a. Select i ∗ ∈ { 1 , . . . , j − 1 } according to criterion crit ( { x i } j − 1 i =1 ) . b. Generate ( y j,i ) m j i =1 from a p -invariant Markov chain with y j, 1 = x i ∗ . i =1 D k 0 ,p ( { x i } j − 1 c. Set x j ← arg min x ∈{ y j,i } i =1 ∪ { x } ) . mj 8/14

Stein Point Markov Chain Monte Carlo (SP-MCMC) We propose to replace the global minimisation at each iteration j of the SP method with a local search based on a p -invariant Markov chain of length m j . The proposed SP-MCMC method proceeds as follows: 1. Fix an initial point x 1 ∈ X . 2. For j = 2 , . . . , n : a. Select i ∗ ∈ { 1 , . . . , j − 1 } according to criterion crit ( { x i } j − 1 i =1 ) . b. Generate ( y j,i ) m j i =1 from a p -invariant Markov chain with y j, 1 = x i ∗ . i =1 D k 0 ,p ( { x i } j − 1 c. Set x j ← arg min x ∈{ y j,i } i =1 ∪ { x } ) . mj For crit , three different approaches are considered: • LAST selects the point last added: i ∗ := j − 1 . • RAND selects i ∗ uniformly at random in { 1 , . . . , j − 1 } . • INFL selects i ∗ to be the index of the most influential point in { x i } j − 1 i =1 . We call x ∗ i the most influential point if removing it from the point set creates the greatest increase in KSD. 8/14

Gaussian Mixture Model Experiment MCMC LAST RAND INFL 0 0 0 0 log KSD -2 -2 -2 -2 -4 -4 -4 -4 500 1000 500 1000 500 1000 500 1000 j 2 2 2 2 0 0 0 0 -2 -2 -2 -2 500 1000 500 1000 500 1000 500 1000 j 1.5 1.5 1.5 1.5 SP-MCMC Density 1 1 1 1 MCMC 0.5 0.5 0.5 0.5 0 0 0 0 0 2 4 6 0 2 4 6 0 2 4 6 0 2 4 6 Jump 2 9/14

IGARCH Experiment ( d = 2 ) -4 -5 -6 -7 log E P MALA -8 RWM SVGD -9 MED SP -10 SP-MALA LAST SP-MALA INFL -11 SP-RWM LAST SP-RWM INFL 2 4 6 8 10 12 log n eval SP-MCMC methods are compared against the original SP (Chen et al., 2018), MED (Roshan Joseph et al., 2015) and SVGD (Liu & Wang, 2016), as well as the Metropolis-adjusted Langevin algorithm ( MALA ) and random-walk Metropolis ( RWM ). 10/14

ODE Experiment ( d = 4 ) 5 4 3 log KSD 2 MALA RWM 1 SVGD MED SP 0 SP-MALA LAST SP-MALA INFL SP-RWM LAST -1 SP-RWM INFL 2 4 6 8 10 12 log n eval SP-MCMC methods are compared against the original SP (Chen et al., 2018), MED (Roshan Joseph et al., 2015) and SVGD (Liu & Wang, 2016), as well as the Metropolis-adjusted Langevin algorithm ( MALA ) and random-walk Metropolis ( RWM ). 11/14

ODE Experiment ( d = 10 ) 8 7 6 5 log KSD MALA 4 RWM SVGD MED 3 SP SP-MALA LAST 2 SP-MALA INFL SP-RWM LAST 1 SP-RWM INFL 4 6 8 10 12 log n eval SP-MCMC methods are compared against the original SP (Chen et al., 2018), MED (Roshan Joseph et al., 2015) and SVGD (Liu & Wang, 2016), as well as the Metropolis-adjusted Langevin algorithm ( MALA ) and random-walk Metropolis ( RWM ). 12/14

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of - PowerPoint PPT Presentation

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan June 15, 2019 @ ICML Steins Method Workshop, Long Beach 1/14 Collaborators Alessandro Barp Fran cois-Xavier Briol Jackson Gorham Mark

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Active Learning Using Discrepancy Zhenghang Cui, Issei Sato The University of Tokyo / RIKEN AIP

Flow Cytometry Data Assessment Flow Cytometry Data Assessment with L2 Discrepancy Learning with

Measuring Sample Quality with Steins Method Lester Mackey Joint work with Jackson Gorham

Lower Bounds for L 1 Discrepancy Armen Vagharshakyan Brown University January 10, 2013 Armen

From affine to two-source extractors via approximate duality Eli Ben-Sasson Noga Zewi Computer

Unsupervised learning of multimodal image registration using domain adaptation with projected

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Sambuz

Useful Links

Newsletter

Mail Us

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of - PowerPoint PPT Presentation

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan June 15, 2019 @ ICML Steins Method Workshop, Long Beach 1/14 Collaborators Alessandro Barp Fran cois-Xavier Briol Jackson Gorham Mark

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Active Learning Using Discrepancy Zhenghang Cui, Issei Sato The University of Tokyo / RIKEN AIP

Flow Cytometry Data Assessment Flow Cytometry Data Assessment with L2 Discrepancy Learning with

Measuring Sample Quality with Steins Method Lester Mackey Joint work with Jackson Gorham

Lower Bounds for L 1 Discrepancy Armen Vagharshakyan Brown University January 10, 2013 Armen

From affine to two-source extractors via approximate duality Eli Ben-Sasson Noga Zewi Computer

Unsupervised learning of multimodal image registration using domain adaptation with projected

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly