conditional expectation as the basis for bayesian updating
play

Conditional Expectation as the Basis for Bayesian Updating Hermann - PowerPoint PPT Presentation

Conditional Expectation as the Basis for Bayesian Updating Hermann G. Matthies Bojana V. Rosi c, Elmar Zander, Alexander Litvinenko, Oliver Pajonk Institute of Scientific Computing, TU Braunschweig Brunswick, Germany wire@tu-bs.de


  1. Conditional Expectation as the Basis for Bayesian Updating Hermann G. Matthies Bojana V. Rosi´ c, Elmar Zander, Alexander Litvinenko, Oliver Pajonk Institute of Scientific Computing, TU Braunschweig Brunswick, Germany wire@tu-bs.de http://www.wire.tu-bs.de 15 Cond-Exp.tex,v 2.9 2017/07/18 00:35:08 hgm Exp

  2. 2 Overview 1. BIG DATA 2. Parameter identification 3. Stochastic identification — Bayes’s theorem 4. Conditional probability and conditional expectation 5. Updating — filtering. TU Braunschweig Institute of Scientific Computing

  3. 3 Representation of knowledge Data from measurements, sensors, observations ⇒ one form of knowledge about a system. ‘Big Data’ considers only data — looking for patterns, interpolating, etc. Mathematical / computational models of a system represent another form of knowledge — ‘structural’ knowledge — about a system. These models are often generated based on general physical laws (e.g. conservation laws), a very compressed form of knowledge. These two views on systems are not in competition, they are complementary. The challenge is to combine these forms of knowledge — in form of a synthesis. Knowledge may be uncertain. TU Braunschweig Institute of Scientific Computing

  4. 4 Big Data 16th century Johannes Kepler Isaac Newton Tycho Brahe Pierre-Simon Laplace (1571 – 1630) (1643 – 1727) (1546 – 1601) (1749 – 1827) Description Understanding Data Perfection I. Newton: The latest authors, like the most ancient, strove to subordinate the phenomena of nature to the laws of mathematics. Kepler’s 2nd law: (adapted from M. Ortiz) TU Braunschweig Institute of Scientific Computing

  5. 5 BIG DATA Mathematically speaking, big data algorithms (feature / pattern recognition) are regression (generalised interpolation) methods. Often based on deep artificial neural networks (deep ANNs), combining many inputs (= high-dimensional data). Deep networks are connected to sparse tensor decompositions (buzzword: deep-learning). Although often spectacularly successful, as knowledge representation, it is difficult to extract insight. But there is a connection of such regression to Bayesian updating. TU Braunschweig Institute of Scientific Computing

  6. 6 Inference Our uncertain knowledge about some situation is described by probabilities. Now we obtain new information. How does it change our knowledge — the probabilistic description? Answered by T. Bayes and P.-S. Laplace more than 250 years ago. Thomas Bayes Pierre-Simon Laplace (1701 – 1761) (1749 – 1827) TU Braunschweig Institute of Scientific Computing

  7. 7 Synopsis of Bayesian inference We have a some knowledge about an event A , but it can not be observed directly. After some new information B (an observation, a measurement), our knowledge has to be made consistent with the new information, i.e. we are looking for conditional probabilities P ( A|B ) . The idea is to change our present model by just so much — as little as possible — so that it becomes consistent. For this we have to predict — with our present knowledge / model — the probability of all possible observations and compare with the actual observation. TU Braunschweig Institute of Scientific Computing

  8. 8 Model inverse problem Geometry flow = 0 Sources 2 flow out 1.5 Dirichlet b.c. 0 1 0.5 0.5 1 1.5 0 2 Aquifier 2D Model Governing model equations: ̺ ∂u ∈ G ⊂ R d . ∂t − ∇ · ( κ · ∇ u ) = f Parameter q = log κ . Conductivity field κ , initial condition u 0 , and state u ( t ) may be unknown. They have to be determined from observations Y ( q ; u ) . TU Braunschweig Institute of Scientific Computing

  9. 9 A possible realisation of κ ( x, ω ) TU Braunschweig Institute of Scientific Computing

  10. 10 Measurement patches 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1 0 1 −1 0 1 447 measurement patches 239 measurement patches 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1 0 1 −1 0 1 120 measurement patches 10 measurement patches TU Braunschweig Institute of Scientific Computing

  11. 11 Convergence plot of updates 0 10 447 pt 239 pt 120 pt Relative error ε a 60 pt 10 pt −1 10 −2 10 0 1 2 3 4 Number of sequential updates TU Braunschweig Institute of Scientific Computing

  12. 12 Forecast and assimilated pdfs 6 κ f κ a 4 PDF 2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 κ Forecast and assimilated probability density functions (pdfs) for κ at a point where κ t = 2 . TU Braunschweig Institute of Scientific Computing

  13. 13 Setting for identification General idea: We observe / measure a system, whose structure we know in principle. The system behaviour depends on some quantities (parameters), which we do not know ⇒ uncertainty. We model (uncertainty in) our knowledge in a Bayesian setting: as a probability distribution on the parameters. We start with what we know a priori, then perform a measurement. This gives new information, to update our knowledge (identification). Update in probabilistic setting works with conditional probabilities ⇒ Bayes’s theorem. Repeated measurements lead to better identification. TU Braunschweig Institute of Scientific Computing

  14. 14 Mathematical formulation of model Consider operator equation, physical system modelled by A : u ∈ U , d u + A ( u ; q ) d t = g d t + B ( u ; q ) d W U — space of states, g a forcing, W noise, q ∈ Q unknown parameters. Well-posed problem: for q, g and initial cond. u ( t 0 ) = u 0 a unique solution u ( t ) , given by the flow or solution operator, S : ( u 0 , t 0 , q, g, W, t ) �→ u ( t ; q ) = S ( u 0 , t 0 , q, g, W, t ) . Set extended state ξ = ( u, q ) ∈ X = U × Q , advance from ξ n − 1 = ( u n − 1 , q n − 1 ) at time t n − 1 to ξ n = ( u n , q n ) at t n , ξ n =( u n , q n ) = ( S ( u n − 1 , t n − 1 , q n , g, W, t n ) , q n ) =: f ( ξ n − 1 , w n − 1 ) . This is the model for the system observed at times t n . Applies also to stationary case A ( u ; q ) = g . TU Braunschweig Institute of Scientific Computing

  15. 15 Mathematical formulation of observation Measurement operator Y with values in Y : η n = Y ( u n ; q ) = Y ( S ( u n − 1 , t n − 1 , q, g, W, t n ); q ) . But observed at time t n , it is noisy y n with noise ǫ n y n = H ( η n , ǫ n ) = H ( Y ( u n ; q ) , ǫ n ) =: h ( ξ n , ǫ n ) = h ( f ( ξ n − 1 , w n − 1 ) , ǫ n ) . For given g, w , measurement η = Y ( u ( q ); q ) is just a function of q . This function is usually not invertible ⇒ ill-posed problem, measurement η does not contain enough information. Parameters q and initial state u 0 uncertain, modelled as RVs q ∈ Q = Q ⊗ S ⇒ u ∈ U = U ⊗ S , with e.g. S = L 2 ( Ω, P ) a RV-space. Bayesian setting allows updating of information about ξ = ( u, q ) . The problem of updating becomes well-posed. TU Braunschweig Institute of Scientific Computing

  16. 16 Mathematical formulation of filtering We want to track the extended state ξ n by a tracking equation for a RV x n through observations ˆ y n . • Prediction / forecast state is a RV x n,f = f ( x n − 1 , w n − 1 ) ; • Forecast observation is a RV y n = h ( x n,f , ǫ n ) , actual observation ˆ y n , • Updated / assimilated x n = x n,f + Ξ ( x n,f , y n , ˆ y n ) , • Hopefully x n ≈ ξ n , and the update map Ξ has to be determined. x n,i := Ξ ( x n,f , y n , ˆ y n ) is called the innovation. We concentrate on one step from forecast to assimilated variables. • Forecast state x f := x n,f , forecast observation y f := y n , • Actual observation ˆ y and assimilated variable x a := x f + Ξ ( x f , y f , ˆ y ) = x n = x n,f + Ξ ( x n,f , y n,f , ˆ y n ) . This is the filtering or update equation. TU Braunschweig Institute of Scientific Computing

  17. 17 Setting for updating Knowledge prior to new observation is also called forecast: the state u f ∈ U = U ⊗ S and parameters q f ∈ Q = Q ⊗ S modelled as random variables (RVs), also the extended state x f = ( u f , q f ) ∈ X = X ⊗ S and the measurement y ( x f , ε ) ∈ Y = Y ⊗ S . Then an observation ˆ y is performed, and is compared to predicted measurement y ( x f , ε ) . Bayes’s theorem gives only probability distribution of posterior or assimilated extended state x a . Here we want more: a filter x a := x f + Ξ ( x f , y f , ˆ y ) . TU Braunschweig Institute of Scientific Computing

  18. 18 Using Bayes’s theorem Classically, Bayes’s theorem gives conditional probability P ( I x |M y ) = P ( M y |I x ) P ( I x ) for P ( M y ) > 0 . P ( M y ) Well-known special form with densities of RVs x, y (w.r.t. some background measure µ ): = π ( y | x ) ( y | x ) π ( x | y ) ( x | y ) = π xy ( x, y ) π x ( x ); π y ( y ) Z y � with marginal density Z y := π y ( y ) = X π xy ( x, y ) µ (d x ) (from German Zustandssumme) — only valid when π xy ( x, y ) exists. Problems / paradoxa appear when P ( M y ) = 0 (and P ( M y |I x ) = 0 ) e.g. Borel-Kolmogorov paradox. Problem is limit P ( M y ) → 0 , or when no joint density π xy ( x, y ) exists. TU Braunschweig Institute of Scientific Computing

Recommend


More recommend