Fast Bayesian optimal experimental design and its applications Quan - PowerPoint PPT Presentation

Introduction Determined models Under-determined models Numerical examples Conclusions Fast Bayesian optimal experimental design and its applications Quan Long Joint work with Chaouki Issaid, Mohammad Motamed (UNM), Marco Scavino, Raul Tempone and Suojin Wang (TAMU) SRI Center for Uncertainty Quantification in Computational Science and Engineering, King Abdullah University of Science and Technology, KSA January 9, 2015 SRI UQ Annual Meeting 1 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Introduction Experimental design is important when resources are limited. For example, the total cost of an onshore oil well would be 1-1.5 million USD. 2 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Introduction We first consider a linear regression model: Y = X θ + ǫ The simple least square estimation: ˆ θ = ( X T X ) − 1 X T Y Cov (ˆ θ ) = Σ = ( X T X ) − 1 We want ( X T X ) − 1 to be as “small” as possible. 3 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Introduction Alphabetic optimality: A –optimality: minimize the trace of the covariance matrix tr ( Σ ) C –optimality: minimize the variance of a predefined linear com- bination of parameters ( β T Σ − 1 β ) − 1 D –optimality: minimize the determinant of the covariance matrix | Σ | E –optimality: minimize the maximum eigenvalue of the covariance matrix max ( σ ii ) Entropy based expected information gain in a Bayesian setting. 4 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Major Notations p ( · ) : probability density function θ : unknown parameter vector θ 0 : the d dimensional vector of the “true” parameters used to generate the synthetic data ξ : the vector of control parameters, also known as the experimental setup g : the deterministic model y i : the i th observation vector y = { y i } M ¯ i = 1 : a set of observation vectors ǫ i : the additive independent and identically distributed (i.i.d.) measurement noise 5 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Bayesian framework for experimental design and expected information gain Prior of parameters: p ( θ ) . Posterior (post experimental) of parameters by Bayes’ theorem: y , ξ ) = p (¯ y | θ , ξ ) p ( θ ) p ( θ | ¯ . p (¯ y ) K-L divergence (information gain) between prior and posterior to measure the usefulness of an experiment � � p ( θ | ¯ � y , ξ ) D KL := log p ( θ | ¯ y , ξ ) d θ . p ( θ ) Θ (if p ( θ | ¯ y ) = p ( θ ) , then D KL = 0. ) Expected information gain : � I ( ξ ) = D KL p (¯ y | ξ ) d ¯ y . 6 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Double–loop Monte Carlo The expected information gain can be rearranged as follows � � � p (¯ � y | θ ) I = log p (¯ y | θ ) d ¯ y p ( θ ) d θ . p (¯ y ) Θ Y This integral can be evaluated using Monte Carlo sampling [Ryan, 2003], [Huan and Marzouk, 2011]. N o � p (¯ � y I | θ I ) I DLMC = 1 � log , p (¯ N o y I ) I = 1 where θ I is drawn from p ( θ ) , ¯ y I is drawn from p (¯ y | θ I ) . The so-called “double–loop” comes from the nested Monte Carlo to evaluate the marginal density N i � y I | θ ) p ( θ ) d θ ≈ 1 � p (¯ y I ) = p (¯ p (¯ y I | θ J ) . N i Θ J = 1 7 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Double–loop Monte Carlo � � 1 Bias ( I DLMC ) = E ( I DLMC − I ) = O N i � � 1 Var ( I DLMC ) = O N o Var ( I DLMC ) + Bias ( I DLMC ) 2 = tol 2 � tol − 3 � N o × N i = O 8 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Laplace approximation of I ( ξ ) for determined models Laplace Approximation: � � 1 � � 2 π exp [ Mf ( x )] dx = M | f ′′ ( x 0 ) | exp [ Mf ( x 0 )] + O . M Hint: � | x − x 0 | 3 � 2 f ′′ ( x 0 )( x − x 0 ) 2 + O f ( x ) = f ( x 0 ) + 1 . 9 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Laplace approximation of I ( ξ ) for determined models Synthetic data model : y i = g ( θ 0 , ξ ) + ǫ i , i = 1 , . . . , M , 20 M=1 M=5 M=10 15 Posterior PDF 10 5 0 0.5 1 1.5 θ Figure 1: Posterior pdfs as M increases. 10 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Laplace approximation of I ( ξ ) for determined models Truncated Taylor expansion of log ( p ( θ |{ y i } )) leads to a normal dis- tribution N (ˆ θ , Σ ) . Major result 1 � � � � θ ) − tr ( Σ H p (ˆ − 1 2 log (( 2 π ) d | Σ | ) − d θ )) 2 − h (ˆ I = 2 Θ Y � �� D KL � 1 � p (¯ y | θ 0 ) d ¯ y p ( θ 0 ) d θ 0 + O (1) M 2 Q. Long, M. Scavino, R. Tempone, S. Wang: Fast estimation of expected information gains for Bayesian experimental designs based on Laplace approximations, Computer Methods in Applied Mechanics and Engineering 259 (2013) 24-39. 11 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Under-determined models So far, the results are useful when the Laplace approximation can be applied: a single dominant mode exists. Question: How about the cases, where an non-informative manifold exists? 2 ) 3 ξ 2 + ( θ 2 Example 1: g = ( θ 2 1 + θ 2 1 + θ 2 2 ) exp [ −| 0 . 2 − ξ | ] Example 2: Figure 2: A cantilever beam. 12 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions The non-informative manifold 1.5 T( θ 0 ) 1 θ 2 S 0.5 Ω M ( θ 0 ) 0 0 0.5 1 1.5 θ 1 13 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions The definition of non-informative manifold The definition of the manifold and a small region containing this manifold 2 : T ( θ 0 ) := { θ ∈ Θ ⊂ R d : p (¯ y | θ ) − g (¯ y | θ 0 ) = 0 } , Ω M ( θ 0 ) := { θ ∈ R d : dist ( θ , T ( θ 0 )) ≤ ℓ 0 M − α } 2 The volume of Ω M ( θ 0 ) contracts to zero in a slower rate than the square root of the number of replicate experiments M , i.e., α ∈ ( 0 , 0 . 5 ) . 14 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Local reparameterization The diffeomorphism mapping: f : Ω Ms , t → Ω M 2 ( g ( θ ) − g ( θ 0 )) T Σ − 1 Cost function: F ( θ ) := 1 ǫ ( g ( θ ) − g ( θ 0 )) H ( f ( 0 , t )) = [ U V ] Λ [ U V ] T Hessian of F : s = U T ( θ − f ( 0 , t )) Local coordinate s : Prior weight function: p ( s , t ) := p Θ ( f ( s , t )) | J | p ( s , t | ¯ y ) := p Θ ( f ( s , t ) | ¯ y ) | J | Posterior weight function: Due to Bayes’ theorem, we have y ) = p (¯ y | s , t ) p ( s , t ) p ( s , t | ¯ ( s , t ) ∈ Ω Ms , t for p (¯ y ) 15 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Change of coordinates for the K–L divergence ( D KL ) Approximated K–L divergence using the local coordinates t and s : � p ( s , t | ¯ � � � y ) D KL (¯ y ) = log p ( s | t , ¯ y ) p ( t | ¯ y ) d s d t p ( s , t ) [ − ℓ 0 M − α , ℓ 0 M − α ] T t � e − M ℓ 0 δ � + O P 16 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Laplace approximation for the conditional information gain Gaussian approximations: � � s ) T Σ − 1 ( s − ˆ s | t ( s − ˆ s ) 1 p ( s | t , ¯ ˜ y ) = 2 π ) r | Σ s | t | 1 / 2 exp − √ 2 ( � � s ) T Σ − 1 ( s − ˆ s | t ( s − ˆ s ) p ( s , t | ¯ ˜ y ) = p (ˆ s , t | ¯ y ) exp − 2 � � s ) T H p (ˆ s ) + ( s − ˆ s , t )( s − ˆ s ) p ( s , t ) = p (ˆ ˜ s , t ) exp ∇ log p (ˆ s , t )( s − ˆ 2 The information gain D KL can be approximated by � ˜ � � � p ( s , t | ¯ y ) D KL = log p ( s | t , ¯ ˜ y ) d s p ( t | ¯ y ) d t p ( s , t ) ˜ [ − ℓ 0 M − α ,ℓ 0 M − α ] T t � �� D s | t � 1 � + O P , M with �� − r 2 log ( 2 π ) − r 2 + O P ( 1 s , t ) | Σ s | t | 1 / 2 d t D s | t = − log p (ˆ M ) . T t 17 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Laplace approximation for the expected information gain for under determined models Major result 2 The expected information gain can be expressed as � � � �� − r 2 log ( 2 π ) − r s , t ) | Σ s | t | 1 / 2 d t I = 1 Ω M − log p (ˆ 2 Θ Y T t � 1 � p (¯ y | θ 0 ) p ( θ 0 ) d ¯ y d θ 0 + O , M � 1 � where the error O is dominated by the standard Laplace M approximation in s direction. Q. Long, M. Scavino, R. Tempone, S. Wang: A Laplace Method for Under-Determined Bayesian Optimal Experimental Designs. Computer Methods in Applied Mechanics and Engineering 285 (2015) 849-876. 18 / 32

Introduction Determined models Under-determined models Numerical examples Conclusions Simplification of the integration over the manifold T t Approximation of the conditional covariance matrix (by Woodbury’s formular) 1 Σ s | t =˜ Σ s | t + O P ( √ ) M M Σ s | t = 1 � U T � � � − 1 ˜ s , t )) T Σ − 1 J g ( f (ˆ ǫ J g ( f (ˆ s , t )) U . M Note that | ˜ Σ s | t | is independent to t for a given value of s . 19 / 32

Fast Bayesian optimal experimental design and its applications Quan - PowerPoint PPT Presentation

Introduction Determined models Under-determined models Numerical examples Conclusions Fast Bayesian optimal experimental design and its applications Quan Long Joint work with Chaouki Issaid, Mohammad Motamed (UNM), Marco Scavino, Raul

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Multilevel methods for fast Bayesian optimal experimental design Ra ul Tempone Alexander von

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

Fast Bayesian automatic Fast Bayesian automatic adaptive quadrature adaptive quadrature Gh.

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Recent Advances and Techniques in Algorithmic Mechanism Design Part 2: Bayesian Mechanism Design

Compressed Sensing and Bayesian Experimental Design or Optimal Sensing and Reconstruction of N -

On Computational Complexity of Finding c -optimal Experimental Designs over a Finite Experimental

Experimental Design and Probability Introduction to course Robin Elahi Experimental Design and

Optimal Experimental Design for Large-Scale Bayesian Inverse Problems via Multi-PDE-Constrained

Variational Bayesian Optimal Experimental Design Adam Foster Martin Jankowiak Eli Bingham

A model for optimal electricity storage Tiziano De Angelis School of Mathematics University

Moments, Sums of Squares and Semidefinite Programming Jean B. LASSERRE LAAS-CNRS, and Institute

About Polynomial Instability of Linear Switched Systems Paolo Mason (CNRS / Laboratoire des

A Monge-Kantorovich approach to multivariate quantile regression Guillaume Carlier a . Joint work

Investment into low-carbon projects Doug Prentice Dep CEO Abatement potential in Scotland

Ma Mark rket et f failures ures Session 12 PMAP 8141: Microeconomics for Public Policy

Multi-Objective Parameter Fitting in Parametric Probabilistic Hybrid Automata Learning to

Victorian Energy Upgrades Public Forum Friday 13 October 2017 Victorian Energy Upgrades Public