Bayesian parameter estimation using Multilevel and multi-index Monte - PowerPoint PPT Presentation

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian parameter estimation using Multilevel and multi-index Monte Carlo Kody Law joint with A. Jasra (NUS), K. Kamatani (Osaka), Y. Xu (NUS*), & Y. Zhou (Cubist) Monash Workshop on Numerical Differential Equations and Applications Monash University, AU February 12, 2020

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Outline Multilevel Monte Carlo sampling 1 Bayesian inference problem 2 Our Bayesian inference problem 3 Approximate coupling 4 Particle Markov chain Monte Carlo 5 Particle Markov chain Multilevel Monte Carlo 6 Sequential Monte Carlo 2 7 Sequential Multilevel Monte Carlo 2 8 Numerical simulations 9 10 Multi-index Monte Carlo sampling 11 Summary

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Orientation Aim: Approximate posterior expectations of the state path and static parameters associated to an S(P)DE which must be finitely approximated. Solution: Apply an approximate coupling strategy so that multi-index Monte Carlo (MIMC) methods can be used within a particle MCMC [B02, AR08, ADH10] and SMC 2 [CJP13]. MLMC ( d = 1) [H00, G08] and MIMC ( d > 1) [HNT15] methods reduce cost to mean-squared error = O ( ε 2 ) ; Recently this methodology has been applied to inference , mostly in cases where target can be evaluated up to a normalizing constant [HSS13, DKST15, HTL16, BJLTZ17]. Here we can only simulate a non-negative unbiased estimator (utanc); using PMCMC we are able to sample consistently from an approximate coupling of successive targets [JKLZ18.i, JKLZ18.ii], and this is extended to the sequential context via SMC 2 [JLX19].

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Example: expectation for SDE [G08] Estimation of expectation of solution of intractable stochastic differential equation (SDE). dX = f ( X ) dt + σ ( X ) dW , X 0 = x 0 . Aim: estimate E ( g ( X T )) . We need to (1) Approximate, e.g. by Euler-Maruyama method with resolution h : √ ξ n ∼ N ( 0 , 1 ) . X n + 1 = X n + hf ( X n ) + h σ ( X n ) ξ n , (2) Sample { X ( i ) N T } N i = 1 , N T = T / h .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Multilevel Monte Carlo (MLMC) Aim: Approximate η ∞ ( g ) := E η ∞ ( g ) for g : E → R . � N i = 1 g ( U ( i ) L ) , U ( i ) Single level estimator: 1 ∼ η L i.i.d. N L Cost to achieve MSE = O ( ε 2 ) is C = Cost ( U ( i ) L ) × ε − 2 . Multilevel estimator*: � L � N l i = 1 { g ( U ( i ) l ) − g ( U ( i ) 1 l − 1 ) } , l = 0 N l � ( U l , U l − 1 ) ( i ) ∼ ¯ η l i.i.d. such that η l du l − 1 , l = η l , l − 1 for ¯ l = 0 , . . . , L . (* g ( U ( i ) − 1 ) := 0) Cost is C ML = � L l = 0 C l N l , where C l is the cost to obtain a η l . sample from ¯ Fix bias by choosing L . Minimize cost C ML ( { N l } L l = 0 ) for � fixed variance = � L l = 0 V l / N l , ⇒ N l ∝ V l / C l . Example: Milstein solution of SDE for MSE = O ( ε 2 ) C = O ( ε − 3 ) C ML = O ( ε − 2 ) . vs .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Illustration of pairwise coupling Pairwise coupling of trajectories of an SDE: √ X 1 n + 1 = X 1 n + hf ( X 1 h σ ( X 1 n )+ n ) ξ n , ξ n ∼ N ( 0 , 1 ) , n = 0 , . . . , N 1 √ X 0 n + 1 = X 0 n +( 2 h ) f ( X 0 h σ ( X 0 n )+ n )( ξ 2 n + ξ 2 n + 1 ) , n = 0 , . . . , N 1 / 2 . 1.6 � � � � 0.6 0 0 W X t 1.5 t � � � � 1 W 1 X 0.4 t t 1.4 0.2 1.3 1.2 0.0 1.1 0.2 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t t (a) Wiener process (b) Stochastic process driven by √ h � n W 1 i = 0 ξ n , W 0 n = W 1 n = 2 n . Wiener process.

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian inference is about approximating integrals Suppose we know how to evaluate γ ( x ) for x ∈ X. Let γ ( x ) dx � η ( dx ) = X γ ( x ) dx , and ϕ : X → R , and suppose we want to estimate � η ( ϕ ) := ϕ ( x ) η ( dx ) . X X may be quite high dimension, e.g. R d with d = 100 easily, or even 1000, 10000, etc...

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Monte Carlo If we could obtain i.i.d. samples x i ∼ η , then we could use N � η ( ϕ ) ≈ 1 ϕ ( x i ) . N i = 1 Convergence rate (of MSE) is O ( 1 / N ) , independently of d . Unfortunately we cannot get i.i.d. samples.

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Importance sampling and ratio estimators Suppose we can get i.i.d. samples x i ∼ ν where 0 < G ( x ) := γ ( x ) ν ( x ) < C . Then we can use the self-normalized importance sampling estimator � N i = 1 G ( x i ) ϕ ( x i ) η ( ϕ ) ≈ . � N i = 1 G ( x i ) The rate will still be O ( 1 / N ) , but typically with a constant O ( e d ) , depending on E ( G ( x ) − E G ( x )) 2 . We may as well use quadrature.

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Markov chain Monte Carlo Suppose we can construct a Markov chain K , that is an operator with the property K : B ( X ) → B ( X ) and K ∗ : P ( X ) → P ( X ) , where B ( X ) are bounded measurable functions and P ( X ) are probability measures, and such that � η ( dx ′ ) K ( x ′ , dx ) = η ( dx ) , ( η K )( dx ) = X and for all A ⊂ X, x , x ′ ∈ X, � � K ( x ′ , dz ) . K ( x , dz ) ≤ A A

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Markov chain Monte Carlo Then we can run the Markov chain to collect samples, x 0 ∈ X and x i ∼ K ( x i − 1 , · ) = K i ( x 0 , · ) and use these for Monte Carlo N b + N � η ( ϕ ) ≈ 1 ϕ ( x i ) . N i = N b + 1 Again Monte Carlo provides rate O ( 1 / N ) , but now under quite general conditions one may achieve polynomial constant O ( d ) .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Example: Metropolis-Hastings Let Q denote a Markov kernel on X. Let x 0 ∈ X. 1 Sample x ∗ ∼ Q ( x i , · ) . 2 Set x i + 1 = x ∗ with probability: 3 � � 1 , γ ( x ∗ ) Q ( x ∗ , x i ) min , γ ( x i ) Q ( x i , x ∗ ) otherwise x i + 1 = x i . Set i = i + 1 and return to the start of (2). 4

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Parameter inference Estimate the posterior expectation of a function ϕ of the joint path X 1 : T and parameters θ , of an intractable S(P)DE dX = f θ ( X ) dt + σ θ ( X ) dW , X 0 ∼ µ θ , given noisy partial observations Y n ∼ g θ ( X n , · ) , n = 1 , . . . , T . Aim: estimate E [ ϕ ( θ, X 0 : T ) | y 1 : T ] , where y 1 : T := { y 1 , . . . , y T } . The hidden process { X n } is a Markov chain. Discretize with resolution h and denote the transition kernel � � F θ, h x p − 1 , d x p – this can be simulated from , but its density cannot be evaluated .

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Return to ML (SDE, for simplicity) The joint measure (suppressing fixed y p in notation) is n � Π h ( d θ, d x 0 : n ) ∝ Π( d θ ) µ θ ( d x 0 ) g θ ( x p , y p ) F θ, h ( x p − 1 , d x p ) , p = 1 For + ∞ > h 0 > · · · > h L > 0, we would like to compute L � � � E Π hl [ ϕ ( θ, X 0 : n )] − E Π hl − 1 [ ϕ ( θ, X 0 : n )] E Π hL [ ϕ ( θ, X 0 : n )] = l = 0 where E Π h − 1 [ · ] := 0.

Bayesian parameter estimation using Multilevel and multi-index Monte - PowerPoint PPT Presentation

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian parameter estimation using Multilevel and multi-index Monte Carlo Kody Law joint with A. Jasra (NUS), K. Kamatani (Osaka), Y. Xu (NUS*), & Y.

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian Methods for Parameter Estimation Bayesian vs Frequentist Inference Frequentist Chris

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Overview Bayesian Methods for Parameter Estimation Introduction to Bayesian Statistics: Learning

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate

Bayesian parameter estimation for heavy-ion collisions: inferring properties of the quark-gluon

Bayesian parameter estimation in predictive engineering Damon McDougall Institute for

Bayesian Inference for Parameter Estimation + Topic Modeling Matt Gormley Lecture 20 Nov. 4,

1 Estimation from outbreaks when R 0 < 1 Estimating R 0 : from epidemic data If case data are

Some Bayesian Approaches for ERGM Ranran Wang, UW MURI-UCI August 25, 2009 Some Bayesian

Multi-parameter models Applied Bayesian Statistics Dr. Earvin Balderama Department of

HYPOTHESIS TESTING PART I RECAP & OUTLOOK BAYESIAN PARAMETER ESTIMATION FREQUENTIST

Experimental design for multi-level data: Improving our approach to power analysis using Monte

STAT 339 Hidden Markov Models III 21 April 2017 Bayesian Estimation / Model Averaging Outline

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

brms: Bayesian Multilevel Models using Stan Paul Brkner 2018-04-09 1 Why using Multilevel

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Bayesian networks: basic parameter learning Machine Intelligence Thomas D. Nielsen September

Gravitational Wave Data Analysis: II. Model Selection and Parameter Estimation Chris Van Den

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama

Metti 5 Optimization for nonlinear parameter estimation and function estimation Lecture 7

Bayesian parameter estimation using Multilevel and multi-index Monte - PowerPoint PPT Presentation

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian parameter estimation using Multilevel and multi-index Monte Carlo Kody Law joint with A. Jasra (NUS), K. Kamatani (Osaka), Y. Xu (NUS*), & Y.

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian Methods for Parameter Estimation Bayesian vs Frequentist Inference Frequentist Chris

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Overview Bayesian Methods for Parameter Estimation Introduction to Bayesian Statistics: Learning

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate

Bayesian parameter estimation for heavy-ion collisions: inferring properties of the quark-gluon

Bayesian parameter estimation in predictive engineering Damon McDougall Institute for

Bayesian Inference for Parameter Estimation + Topic Modeling Matt Gormley Lecture 20 Nov. 4,

1 Estimation from outbreaks when R 0 &lt; 1 Estimating R 0 : from epidemic data If case data are

Some Bayesian Approaches for ERGM Ranran Wang, UW MURI-UCI August 25, 2009 Some Bayesian

Multi-parameter models Applied Bayesian Statistics Dr. Earvin Balderama Department of

HYPOTHESIS TESTING PART I RECAP &amp; OUTLOOK BAYESIAN PARAMETER ESTIMATION FREQUENTIST

Experimental design for multi-level data: Improving our approach to power analysis using Monte

STAT 339 Hidden Markov Models III 21 April 2017 Bayesian Estimation / Model Averaging Outline

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

brms: Bayesian Multilevel Models using Stan Paul Brkner 2018-04-09 1 Why using Multilevel

Multi-parameter models - Gibbs Sampling Applied Bayesian Statistics Dr. Earvin Balderama

Bayesian networks: basic parameter learning Machine Intelligence Thomas D. Nielsen September

Gravitational Wave Data Analysis: II. Model Selection and Parameter Estimation Chris Van Den

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama

Metti 5 Optimization for nonlinear parameter estimation and function estimation Lecture 7

1 Estimation from outbreaks when R 0 < 1 Estimating R 0 : from epidemic data If case data are

HYPOTHESIS TESTING PART I RECAP & OUTLOOK BAYESIAN PARAMETER ESTIMATION FREQUENTIST