transport multilevel approaches for large scale pde
play

Transport & Multilevel Approaches for Large-Scale - PowerPoint PPT Presentation

Transport & Multilevel Approaches for Large-Scale PDE-Constrained Bayesian Inference Robert Scheichl Institute of Applied Mathematics & Interdisciplinary Center for Scientific Computing Heidelberg University Collaborators: K


  1. Variational Bayes (as opposed to Metropolis-Hastings MCMC) Aim to characterise the posterior distribution (density π pos ) analytically (at least approximately) for more efficient inference. This is a challenging task since: x ∈ R d is typically high-dimensional (e.g., discretised function) π pos is in general non-Gaussian (even if π pr and observational noise are Gaussian) evaluations of likelihood may be expensive (e.g., solution of a PDE) Key Tools – a playground for a numerical analyst! Transport Maps, Optimisation , Principle Component Analysis, Model Order Reduction, Hierarchies, Sparsity, Low Rank Approximation R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 6 / 38

  2. Deterministic Couplings of Probability Measures Core idea [Moselhy, Marzouk, 2012] Choose a reference distribution η (e.g., standard Gaussian) Seek transport map T : R d → R d such that T ♯ η = π (push-forward) (invertible) T η π R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 7 / 38

  3. Deterministic Couplings of Probability Measures Core idea [Moselhy, Marzouk, 2012] Choose a reference distribution η (e.g., standard Gaussian) Seek transport map T : R d → R d such that T ♯ η = π (push-forward) (invertible) T η π In principle, enables exact (independent, unweighted) sampling! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 7 / 38

  4. Deterministic Couplings of Probability Measures Core idea [Moselhy, Marzouk, 2012] Choose a reference distribution η (e.g., standard Gaussian) Seek transport map T : R d → R d such that T ♯ η = π (push-forward) (invertible) T η π In principle, enables exact (independent, unweighted) sampling! Approximately satisfying conditions still useful: Preconditioning! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 7 / 38

  5. Variational Inference Goal: Sampling from target density π ( x ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  6. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  7. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p Advantage of using D KL : normalising constant for π is not needed R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  8. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p Advantage of using D KL : normalising constant for π is not needed Minimise over some suitable class T of maps T � � ∇ x T − 1 ( x ) (where ideally Jacobian determinant det is easy to evaluate) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  9. Variational Inference Goal: Sampling from target density π ( x ) Given a reference density η , find an invertible map ˆ T such that ˆ D KL ( η � T − 1 T := argmin D KL ( T ♯ η � π ) = argmin π ) ♯ T T where � � p ( x ) � D KL ( p � q ):= log p ( x ) d x . . . Kullback-Leibler divergence q ( x ) � � � � � � T − 1 ( x ) � det ∇ x T − 1 ( x ) � T ♯ p ( x ):= p . . . push-forward of p Advantage of using D KL : normalising constant for π is not needed Minimise over some suitable class T of maps T � � ∇ x T − 1 ( x ) (where ideally Jacobian determinant det is easy to evaluate) or use samples of T − 1 To improve: enrich class T π as ♯ proposals for MCMC or in importance sampling (see below) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 8 / 38

  10. Many Choices (“Architectures”) for T possible Examples: (list not comprehensive!!) Optimal Transport or Knothe-Rosenblatt Rearrangement 1 [Moselhy, Marzouk, 2012] , [Marzouk, Moselhy, Parno, Spantini, 2016] Normalizing or Autoregressive Flows [Rezende, Mohamed, 2015] 2 (and related methods in the ML literature) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 9 / 38

  11. Many Choices (“Architectures”) for T possible Examples: (list not comprehensive!!) Optimal Transport or Knothe-Rosenblatt Rearrangement 1 [Moselhy, Marzouk, 2012] , [Marzouk, Moselhy, Parno, Spantini, 2016] Normalizing or Autoregressive Flows [Rezende, Mohamed, 2015] 2 (and related methods in the ML literature) Kernel-based variational inference: Stein Variational Methods 3 [Liu, Wang, 2016], [Detommaso, Cui, Spantini, Marzouk, RS , 2018], [Chen, Wu, Chen, O’Leary-Roseberry, Ghattas, 2019] not today! Layers of low-rank maps [Bigoni, Zahm, Spantini, Marzouk, arXiv 2019] 4 Layers of hierarchical invertible neural networks (HINT) not today! 5 [Detommaso, Kruse, Ardizzone, Rother, K¨ othe, RS , arXiv:1905.10687] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 9 / 38

  12. Many Choices (“Architectures”) for T possible Examples: (list not comprehensive!!) Optimal Transport or Knothe-Rosenblatt Rearrangement 1 [Moselhy, Marzouk, 2012] , [Marzouk, Moselhy, Parno, Spantini, 2016] Normalizing or Autoregressive Flows [Rezende, Mohamed, 2015] 2 (and related methods in the ML literature) Kernel-based variational inference: Stein Variational Methods 3 [Liu, Wang, 2016], [Detommaso, Cui, Spantini, Marzouk, RS , 2018], [Chen, Wu, Chen, O’Leary-Roseberry, Ghattas, 2019] not today! Layers of low-rank maps [Bigoni, Zahm, Spantini, Marzouk, arXiv 2019] 4 Layers of hierarchical invertible neural networks (HINT) not today! 5 [Detommaso, Kruse, Ardizzone, Rother, K¨ othe, RS , arXiv:1905.10687] Low-rank tensor approximation of Knothe-Rosenblatt rearrangement 6 [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 9 / 38

  13. Approximation and Sampling of Multivariate Probability Distributions in the Tensor Train Decomposition [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 10 / 38

  14. Variational Inference with Triangular Maps In general, in Variational Inference aim to find argmin D KL ( T ♯ η || π ) T Note: � � D KL ( T ♯ η || π ) = − E u ∼ η log π ( T ( u )) + log | det ∇ T ( u ) | + const R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 11 / 38

  15. Variational Inference with Triangular Maps In general, in Variational Inference aim to find argmin D KL ( T ♯ η || π ) T Note: � � D KL ( T ♯ η || π ) = − E u ∼ η log π ( T ( u )) + log | det ∇ T ( u ) | + const Particularly useful family T are Knothe-Rosenblatt triangular rearrangements (see [Marzouk, Moshely, Parno, Spantini, 2016]) :   T 1 ( x 1 )   T 2 ( x 1 , x 2 )   T ( x ) =   . (= autoregressive flow in ML) .   . T d ( x 1 , x 2 , . . . , x d ) Then: log | det ∇ T ( u ) | = � k log ∂ x k T k R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 11 / 38

  16. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  17. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) Can sample (up to normalisation with known scaling factor) � x k ∼ π k ( x k | x 1 , . . . , x k − 1 ) ∼ π ( x 1 , . . . , x d ) dx k +1 · · · dx d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  18. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) 1st step: Produce sample x i 1 via 1D CDF-inversion from � π 1 ( x 1 ) ∼ π ( x 1 , x 2 , . . . , x d ) dx 2 · · · dx d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  19. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) 1st step: Produce sample x i 1 via 1D CDF-inversion from � π 1 ( x 1 ) ∼ π ( x 1 , x 2 , . . . , x d ) dx 2 · · · dx d k -th step: Given x i 1 , . . . , x i k − 1 sample x i k via 1D CDF-inversion from � π k ( x k | x i 1 , . . . , x i π ( x i 1 , . . . , x i k − 1 ) ∼ k − 1 , x k , x k +1 , . . . , x d ) dx k +1 · · · dx d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  20. Knothe-Rosenblatt via Conditional Distribution Sampling In fact, ∃ ! triangular map satisfying T ♯ η = π (for abs. cont. η, π on R d ) Conditional Distribution Sampling [Rosenblatt ’52] (explicitly available!) Any density factorises into product of conditional densities: π ( x 1 , . . . , x d ) = π 1 ( x 1 ) π 2 ( x 2 | x 1 ) · · · π d ( x d | x 1 , . . . , x d − 1 ) 1st step: Produce sample x i 1 via 1D CDF-inversion from � π 1 ( x 1 ) ∼ π ( x 1 , x 2 , . . . , x d ) dx 2 · · · dx d k -th step: Given x i 1 , . . . , x i k − 1 sample x i k via 1D CDF-inversion from � π k ( x k | x i 1 , . . . , x i π ( x i 1 , . . . , x i k − 1 ) ∼ k − 1 , x k , x k +1 , . . . , x d ) dx k +1 · · · dx d Problem: ( d − k ) -dimensional integration at k -th step! Remedy: Find approximation ˜ π ≈ π where integration is cheap! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 12 / 38

  21. Low-rank Tensor Approximation of Distributions Low-rank tensor decomposition ⇔ separation of variables: n � O ( n d ) O ( dn ) Tensor grid with n points per direction (or n polynomial basis fcts.) � | α |≤ r π 1 α ( x 1 ) π 2 α ( x 2 ) · · · π d Approximate: π ( x 1 , . . . , x d ) ≈ α ( x d ) � �� � � �� � tensor tensor product decomposition Many low-rank tensor formats exist [Kolda, Bader ’09], [Hackbusch ’12] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 13 / 38

  22. Conditional Distribution Sampler (with factorised distribution) For the low-rank tensor approximation � π 1 α ( x 1 ) · π 2 α ( x 2 ) · · · π d π ( x ) ≈ ˜ π ( x ) = α ( x d ) | α |≤ r the k -th step of the CD sampler, given x i 1 , . . . , x i k − 1 , simplifies to � π 1 1 ) · · · π k − 1 π k ( x k | x i 1 , . . . , x i α ( x i ( x i ˜ k − 1 ) ∼ k − 1 ) . . . α | α |≤ r . . . π k α ( x k ) . . . � � π k +1 π d . . . ( x k +1 ) dx k +1 · · · α ( x d ) dx d α � �� � Repeated 1D integrals! linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 14 / 38

  23. Conditional Distribution Sampler (with factorised distribution) For the low-rank tensor approximation � π 1 α ( x 1 ) · π 2 α ( x 2 ) · · · π d π ( x ) ≈ ˜ π ( x ) = α ( x d ) | α |≤ r the k -th step of the CD sampler, given x i 1 , . . . , x i k − 1 , simplifies to � π 1 1 ) · · · π k − 1 π k ( x k | x i 1 , . . . , x i α ( x i ( x i ˜ k − 1 ) ∼ k − 1 ) . . . α | α |≤ r . . . π k α ( x k ) . . . � � π k +1 π d . . . ( x k +1 ) dx k +1 · · · α ( x d ) dx d α � �� � Repeated 1D integrals! linear in d To sample (in each step) : Simple 1D CDF-inversions linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 14 / 38

  24. Low-rank Decomposition (Two Variables) Collect discretised values of π ( θ 1 , θ 2 ) on n × n grid into a matrix: r � P ( i , j ) = V α ( i ) W α ( j ) + O ( ε ) α =1 ≈ Rank r ≪ n (exploiting structure, smoothness, . . . ) mem ( V ) + mem ( W ) = 2 nr ≪ n 2 = mem ( P ) SVD provides optimal ε for fixed r (s.t. min V , W � P − VW ∗ � 2 F ) But requires all n 2 entries of P ! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 15 / 38

  25. Low-rank Decomposition (Two Variables) Collect discretised values of π ( θ 1 , θ 2 ) on n × n grid into a matrix: r � P ( i , j ) = V α ( i ) W α ( j ) + O ( ε ) α =1 ≈ Rank r ≪ n (exploiting structure, smoothness, . . . ) mem ( V ) + mem ( W ) = 2 nr ≪ n 2 = mem ( P ) SVD provides optimal ε for fixed r (s.t. min V , W � P − VW ∗ � 2 F ) But requires all n 2 entries of P ! n d in d dimensions! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 15 / 38

  26. Cross Algorithm (construct low-rank factorisation from few entries) Interpolation arguments show: for some suitable index sets I , J ⊂ { 1 , . . . , n } with | I | = | J | = r , the cross decomposition − 1 ≈ also P (: , J ) P − 1 ( I , J ) P ( I , :) ≈ P R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 16 / 38

  27. Cross Algorithm (construct low-rank factorisation from few entries) Interpolation arguments show: for some suitable index sets I , J ⊂ { 1 , . . . , n } with | I | = | J | = r , the cross decomposition − 1 ≈ also P (: , J ) P − 1 ( I , J ) P ( I , :) ≈ P Maxvol principle gives ‘best’ indices I , J [Goreinov, Tyrtyshnikov ’01] � � � � � det P ( ˆ ˆ � ⇒ � P − ˜ � P − ˆ | det P ( I , J ) | = max I , J ) P � C ≤ ( r +1) min P � 2 ˆ ˆ rank ˆ I , J P = r (NP-hard) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 16 / 38

  28. Cross Algorithm (construct low-rank factorisation from few entries) Interpolation arguments show: for some suitable index sets I , J ⊂ { 1 , . . . , n } with | I | = | J | = r , the cross decomposition − 1 ≈ also P (: , J ) P − 1 ( I , J ) P ( I , :) ≈ P Maxvol principle gives ‘best’ indices I , J [Goreinov, Tyrtyshnikov ’01] � � � � � det P ( ˆ ˆ � ⇒ � P − ˜ � P − ˆ | det P ( I , J ) | = max I , J ) P � C ≤ ( r +1) min P � 2 ˆ ˆ rank ˆ I , J P = r (NP-hard) But how can we find good sets I , J in practice ? And how can we generalise this to d > 2? R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 16 / 38

  29. Alternating Iteration (for cross approximation) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  30. Alternating Iteration (for cross approximation) j 1 j 2 j 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  31. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  32. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  33. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  34. Alternating Iteration (for cross approximation) j 1 j 2 j 3 i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  35. Alternating Iteration (for cross approximation) Practically realizable j 1 j 2 j 3 strategy (with O (2 nr ) samples & O ( nr 2 ) flops) . For numerical stability use rank-revealing QR i 1 in practice. To adapt rank expand i 2 � � V → V Z ) (with residual Z ) Several similar i 3 algorithms exist: e.g. ACA [Bebendorf ’00] or EIM [Barrault et al ’04] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 17 / 38

  36. Tensor Train (TT) Decomposition (Many Variables) Many variables: Matrix Product States/Tensor Train r k � π 1 α 1 ( i 1 ) · π 2 α 1 ,α 2 ( i 2 ) · π 3 α 2 ,α 3 ( i 3 ) · · · π d π ( i 1 . . . i d ) = α d − 1 ( i d ) α k =1 0 < k < d n n . . . r k − 1 r 1 r k [Wilson ’75] (comput. physics), [White ’93], [Verstraete ’04]; [Oseledets ’09] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 18 / 38

  37. Tensor Train (TT) Decomposition (Many Variables) Many variables: Matrix Product States/Tensor Train r k � π 1 α 1 ( i 1 ) · π 2 α 1 ,α 2 ( i 2 ) · π 3 α 2 ,α 3 ( i 3 ) · · · π d π ( i 1 . . . i d ) = α d − 1 ( i d ) α k =1 0 < k < d n n . . . r k − 1 r 1 r k [Wilson ’75] (comput. physics), [White ’93], [Verstraete ’04]; [Oseledets ’09] TT blocks π k are three-dimensional r k − 1 × n × r k tensors with TT ranks r 1 , . . . , r d − 1 ≤ r R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 18 / 38

  38. Tensor Train (TT) Decomposition (Many Variables) Many variables: Matrix Product States/Tensor Train r k � π 1 α 1 ( i 1 ) · π 2 α 1 ,α 2 ( i 2 ) · π 3 α 2 ,α 3 ( i 3 ) · · · π d π ( i 1 . . . i d ) = α d − 1 ( i d ) α k =1 0 < k < d n n . . . r k − 1 r 1 r k [Wilson ’75] (comput. physics), [White ’93], [Verstraete ’04]; [Oseledets ’09] TT blocks π k are three-dimensional r k − 1 × n × r k tensors with TT ranks r 1 , . . . , r d − 1 ≤ r Storage: O ( dnr 2 ) linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 18 / 38

  39. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k ( i k ) = π ( I k − 1 , i k , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  40. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k ( i k ) = π ( I k − 1 , i k , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  41. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  42. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  43. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 Set k → k + 1 and move to the next block. 3 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  44. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 Set k → k + 1 and move to the next block. 3 When k = d , switch direction and update index set J k − 1 . 4 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  45. TT Cross – An Efficient Computation of a TT Decomposition Given random initial sets J 0 , . . . , J d iterate: [Oseledets, Tyrtyshnikov ’10] Update k th TT block: π k () = π ( I k − 1 , , J k ) 1 Update k th index set: I k = pivots row ( π k ) 2 (using maxvol principle on different matrizations of tensor in each step) ( j 1 , k 1 ) ( j 2 , k 2 ) ( j 3 , k 3 ) i 1 i 2 i 3 Set k → k + 1 and move to the next block. 3 When k = d , switch direction and update index set J k − 1 . 4 Cost: O ( dnr 2 ) samples & O ( dnr 3 ) flops per iteration linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 19 / 38

  46. Tensor Train (TT) Transport Maps (Summary & Comments) [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] Generic – not problem specific (‘black box’) Cross approximation: ‘sequential’ design along 1D lines π ( x 1 , . . . , x d ) = � | α |≤ r π 1 α ( x 1 ) . . . π d Separable product form: ˜ α ( x d ) Cheap construction/storage & low # model evals linear in d Cheap integration w.r.t. x linear in d Cheap samples via conditional distribution method linear in d R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 20 / 38

  47. Tensor Train (TT) Transport Maps (Summary & Comments) [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] Generic – not problem specific (‘black box’) Cross approximation: ‘sequential’ design along 1D lines π ( x 1 , . . . , x d ) = � | α |≤ r π 1 α ( x 1 ) . . . π d Separable product form: ˜ α ( x d ) Cheap construction/storage & low # model evals linear in d Cheap integration w.r.t. x linear in d Cheap samples via conditional distribution method linear in d Tuneable approximation error ε (by adapting ranks r ): = ⇒ cost & storage (poly)logarithmic in ε next slide R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 20 / 38

  48. Tensor Train (TT) Transport Maps (Summary & Comments) [Dolgov, Anaya-Izquierdo, Fox, RS, 2019] Generic – not problem specific (‘black box’) Cross approximation: ‘sequential’ design along 1D lines π ( x 1 , . . . , x d ) = � | α |≤ r π 1 α ( x 1 ) . . . π d Separable product form: ˜ α ( x d ) Cheap construction/storage & low # model evals linear in d Cheap integration w.r.t. x linear in d Cheap samples via conditional distribution method linear in d Tuneable approximation error ε (by adapting ranks r ): = ⇒ cost & storage (poly)logarithmic in ε next slide Many known ways to use these samples for fast inference! (as proposals for MCMC, as control variates, importance weighting, . . . ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 20 / 38

  49. Theoretical Result [Rohrbach, Dolgov, Grasedyck, RS, 2020] For Gaussian distributions π ( x ) we have the following result: Let � � π : R d → R , − 1 2 x T Σ x x �→ exp � � and define Σ ( k ) Γ T Γ k ∈ R ( d − k ) × k . 11 Σ := k where Σ ( k ) Γ k 22 Theorem. Let Σ be SPD with λ min > 0. Suppose ρ := max k rank(Γ k ) and σ := max k , i σ ( k ) , where σ ( k ) are the singular values of Γ k . i i Then, for all ε > 0, there exists a TT-approximation ˜ π ε s.t. � π − ˜ π ε � L 2 ( R d ) ≤ ε � π � L 2 ( R d ) and the TT-ranks of ˜ π ε are bounded by �� � �� ρ � 1 + 7 σ 7 ρ d r ≤ log . (polylogarithmic growth) λ min ε R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 21 / 38

  50. How to use the TT-CD sampler to estimate E π Q ? Problem: We are sampling from approximate ˜ π = π + O ( ε ). R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 22 / 38

  51. How to use the TT-CD sampler to estimate E π Q ? Problem: We are sampling from approximate ˜ π = π + O ( ε ). Option 0: Classical variational inference Explicit integration (linear in d ) : get biased estimator E ˜ π Q ≈ E π Q R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 22 / 38

  52. How to use the TT-CD sampler to estimate E π Q ? Problem: We are sampling from approximate ˜ π = π + O ( ε ). Option 0: Classical variational inference Explicit integration (linear in d ) : get biased estimator E ˜ π Q ≈ E π Q Non-smooth Q : use Monte Carlo sampling, or better , QMC ‘seeds’ 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 → -0.5 -0.5 -0.5 -1 -1 -1 -1.5 -1.5 -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 2D projection of 11D map (problem specification below!) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 22 / 38

  53. Sampling from exact π : Unbiased estimates of E π Q Option 1: Use { x i π } as (i.i.d.) proposals in Metropolis-Hastings ˜ � � 1 , π ( x i π ( x i − 1 π )˜ ) π ˜ Accept proposal x i π with probability α = min ˜ π ( x i − 1 π ( x i )˜ π ) π ˜ Can prove that rejection rate ∼ ε and IACT τ ∼ 1 + ε R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 23 / 38

  54. Sampling from exact π : Unbiased estimates of E π Q Option 1: Use { x i π } as (i.i.d.) proposals in Metropolis-Hastings ˜ � � 1 , π ( x i π ( x i − 1 π )˜ ) π ˜ Accept proposal x i π with probability α = min ˜ π ( x i − 1 π ( x i )˜ π ) π ˜ Can prove that rejection rate ∼ ε and IACT τ ∼ 1 + ε Option 2: Use ˜ π importance weighting with QMC quadrature N N � � π ) π ( x i π ( x i E π Q ≈ 1 1 π ) Z = 1 π ) ˜ ˜ Q ( x i with ˜ π ( x i π ( x i Z N ˜ π ) N ˜ π ) ˜ ˜ i =1 i =1 We can use an unbiased (randomised) QMC rule for both integrals. R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 23 / 38

  55. Sampling from exact π : Unbiased estimates of E π Q using TT approximation as preconditioner , importance weight or control variate Option 1: Use { x i π } as (i.i.d.) proposals in Metropolis-Hastings ˜ � � 1 , π ( x i π ( x i − 1 π )˜ ) π ˜ Accept proposal x i π with probability α = min ˜ π ( x i − 1 π ( x i )˜ π ) π ˜ Can prove that rejection rate ∼ ε and IACT τ ∼ 1 + ε Option 2: Use ˜ π importance weighting with QMC quadrature N N � � π ) π ( x i π ( x i E π Q ≈ 1 1 π ) Z = 1 π ) ˜ ˜ Q ( x i with ˜ π ( x i π ( x i Z N ˜ π ) N ˜ π ) ˜ ˜ i =1 i =1 We can use an unbiased (randomised) QMC rule for both integrals. Option 3: Use estimate w.r.t. ˜ π as control variate ( multilevel MCMC ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 23 / 38

  56. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 0.9 subsurface flow or structural mechanics) 0.8 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.5 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 � � 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 ∂ n ∂ n 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  57. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 0.9 subsurface flow or structural mechanics) 0.8 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.5 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 � � 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 ∂ n ∂ n 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] Discretisation with bilinear FEs on uniform mesh with h = 1 / 64. R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  58. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 1 0.9 0.9 subsurface flow or structural mechanics) 0.8 0.8 0.7 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.6 0.5 0.5 0.4 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 0.3 � � 0.2 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 0.1 ∂ n ∂ n 0 0 0 0 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] Discretisation with bilinear FEs on uniform mesh with h = 1 / 64. Data: average pressure in 9 locations ( synthetic , i.e. for some ξ ∗ ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  59. Numerical Example (Inverse Stationary Diffusion Problem) Model Problem (representative for 1 1 1 0.9 0.9 0.9 subsurface flow or structural mechanics) 0.8 0.8 0.8 0.7 0.7 0.7 ξ ∈ (0 , 1) 2 −∇ κ ( ξ , x ) ∇ u ( ξ , x ) = 0 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 u | ξ 1 =0 = 1 , u | ξ 1 =1 = 0 , 0.3 0.3 0.3 � � 0.2 0.2 0.2 � � ∂ u ∂ u ξ 2 =0 = 0 , ξ 2 =1 = 0 . � � 0.1 0.1 0.1 ∂ n ∂ n 0 0 0 0 0 0 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.4 0.5 0.5 0.5 0.6 0.6 0.6 0.7 0.7 0.7 0.8 0.8 0.8 0.9 0.9 0.9 1 1 1 � d Karhunen-Lo` eve expansion of log κ ( ξ , x ) = φ k ( ξ ) x k with prior k =1 d = 11, x k ∼ U [ − 1 , 1], � φ k � ∞ = O ( k − 3 2 ) [Eigel, Pfeffer, Schneider ’16] Discretisation with bilinear FEs on uniform mesh with h = 1 / 64. Data: average pressure in 9 locations ( synthetic , i.e. for some ξ ∗ ) QoI Q = h ( u ( · , x )): probability that flux exceeds 1.5 ( not smooth! ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 24 / 38

  60. Comparison against DRAM (for inverse diffusion problem) noise level σ 2 e = 0 . 01 TT-MH TT-qIW DRAM Relative error in P ( F > 1 . 5) 10 − 1 10 − 2 discr. error 10 − 3 10 1 10 2 10 3 10 4 CPU time TT-MH TT conditional distribution samples (iid) as proposals for MCMC TT-qIW TT surrogate for importance sampling with QMC DRAM Delayed Rejection Adaptive Metropolis [Haario et al, 2006] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 25 / 38

  61. Comparison against DRAM (for inverse diffusion problem) noise level σ 2 noise level σ 2 e = 0 . 01 e = 0 . 001 10 0 TT-MH TT-MH TT-qIW TT-qIW DRAM DRAM Relative error in P ( F > 1 . 5) relative error for P F > 1 . 5 10 − 1 10 − 1 10 − 2 10 − 2 discr. error 10 − 3 discr. error 10 − 3 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 CPU time CPU time TT-MH TT conditional distribution samples (iid) as proposals for MCMC TT-qIW TT surrogate for importance sampling with QMC DRAM Delayed Rejection Adaptive Metropolis [Haario et al, 2006] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 25 / 38

  62. Samples – Comparison TT-CD vs. DRAM TT-MH (i.i.d. seeds) DRAM 1.5 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1 -1 -0.5 -0.5 0 0 0.5 0.5 1 1 1.5 1.5 -1.5 -1.5 -1 -1 -0.5 -0.5 0 0 0.5 0.5 1 1 1.5 1.5 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 26 / 38

  63. Multilevel Markov Chain Monte Carlo [Dodwell, Ketelsen, RS, Teckentrup, 2015 & 2019], [Cui, Detommaso, RS, 2019] R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 27 / 38

  64. Exploiting Model Hierarchy (same inverse diffusion problem) V ℓ X ℓ L 0 Here h ℓ = h 0 × 2 − ℓ and M ℓ ≈ M 0 × 2 2 ℓ R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 28 / 38

  65. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  66. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  67. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) Assuming | E [ Q ℓ − Q ] | = O (2 − αℓ ) and E [Cost ℓ ] = O (2 γℓ ), to get MSE = O ( ε 2 ), we need L ∼ log 2 ( ε − 1 ) α − 1 & N ∼ ε − 2 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  68. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) Assuming | E [ Q ℓ − Q ] | = O (2 − αℓ ) and E [Cost ℓ ] = O (2 γℓ ), to get MSE = O ( ε 2 ), we need L ∼ log 2 ( ε − 1 ) α − 1 & N ∼ ε − 2 Monte Carlo Complexity Theorem � ε − 2 − γ/α � Cost( ˆ Q MC to obtain MSE = O ( ε 2 ) . ) = O ( NM L ) = O L R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  69. Monte Carlo (assuming first π can be sampled – forward problem) Standard Monte Carlo estimator for E [ Q ]: (where Q = h ( u ( · , x )) ∈ R ) N � := 1 Q ( i ) Q ( i ) Q MC ˆ L , i.i.d. samples with Model( L ) L L N i =1 Convergence of plain vanilla MC (mean square error) : � � 2 �� ˆ � 2 � V [ Q L ] Q MC − E [ Q ] = + E [ Q L − Q ] E L N � �� � � �� � � �� � =: MSE sampling error model error (“bias”) Assuming | E [ Q ℓ − Q ] | = O (2 − αℓ ) and E [Cost ℓ ] = O (2 γℓ ), to get MSE = O ( ε 2 ), we need L ∼ log 2 ( ε − 1 ) α − 1 & N ∼ ε − 2 Monte Carlo Complexity Thm. (2D model problem w. AMG: α = 1, γ = 2) � ε − 2 − γ/α � Cost( ˆ Q MC to obtain MSE = O ( ε 2 ) . ) = O ( NM L ) = O L R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 29 / 38

  70. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially L � Q L = Q 0 + Q ℓ − Q ℓ − 1 ℓ =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  71. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  72. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  73. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 Key Observation: (Variance Reduction! Corrections cheaper!) Level L : V [ Q L − Q L − 1 ] → 0 as L → ∞ ⇒ N L = O (1) (best case) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  74. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 Key Observation: (Variance Reduction! Corrections cheaper!) Level L : V [ Q L − Q L − 1 ] → 0 as L → ∞ ⇒ N L = O (1) (best case) Level 0: N 0 ∼ N but Cost 0 = O ( M 0 ) = O (1) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  75. Multilevel Monte Carlo [Heinrich, ’98], [Giles, ’07] Basic Idea: Note that trivially (due to linearity of E ) L � E [ Q L ] = E [ Q 0 ] + E [ Q ℓ − Q ℓ − 1 ] Control Variates!! ℓ =1 Define the following multilevel MC estimator for E [ Q ]: � L � := ˆ ˆ Q MLMC Q MC Y MC + where Y ℓ := Q ℓ − Q ℓ − 1 L 0 ℓ ℓ =1 Key Observation: (Variance Reduction! Corrections cheaper!) Level L : V [ Q L − Q L − 1 ] → 0 as L → ∞ ⇒ N L = O (1) (best case) . . . Level ℓ : N ℓ optimised to “balance” with cost on levels 0 and L . . . Level 0: N 0 ∼ N but Cost 0 = O ( M 0 ) = O (1) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 30 / 38

  76. Complexity Theorem [Giles, ’07], [Cliffe, Giles, RS, Teckentrup, ’11] Assume approximation error O (2 − αℓ ), Cost/sample O (2 γℓ ) and V [ Q ℓ − Q ℓ − 1 ] = O (2 − βℓ ) (strong error/variance reduction) ℓ =0 to obtain MSE = O ( ε 2 ) with Then there exist L , { N ℓ } L � �� � 0 , γ − β Cost( � ε − 2 − max Q MLMC ) = O + possible log-factor α L � ˆ � L using dependent or independent estimators ˆ Q MC Y MC , and ℓ =1 . 0 ℓ R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 31 / 38

  77. Complexity Theorem [Giles, ’07], [Cliffe, Giles, RS, Teckentrup, ’11] Assume approximation error O (2 − αℓ ), Cost/sample O (2 γℓ ) and V [ Q ℓ − Q ℓ − 1 ] = O (2 − βℓ ) (strong error/variance reduction) ℓ =0 to obtain MSE = O ( ε 2 ) with Then there exist L , { N ℓ } L � �� � 0 , γ − β Cost( � ε − 2 − max Q MLMC ) = O + possible log-factor α L � ˆ � L using dependent or independent estimators ˆ Q MC Y MC , and ℓ =1 . 0 ℓ Running example (for smooth fctls. & AMG) : α ≈ 1, β ≈ 2, γ ≈ 2 � α ) � ε − max ( 2 , γ Cost( � = O (max( N 0 , M L )) ≈ O ( ε − 2 ) Q MLMC ) = O L R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 31 / 38

  78. Complexity Theorem [Giles, ’07], [Cliffe, Giles, RS, Teckentrup, ’11] Assume approximation error O (2 − αℓ ), Cost/sample O (2 γℓ ) and V [ Q ℓ − Q ℓ − 1 ] = O (2 − βℓ ) (strong error/variance reduction) ℓ =0 to obtain MSE = O ( ε 2 ) with Then there exist L , { N ℓ } L � �� � 0 , γ − β Cost( � ε − 2 − max Q MLMC ) = O + possible log-factor α L � ˆ � L using dependent or independent estimators ˆ Q MC Y MC , and ℓ =1 . 0 ℓ Running example (for smooth fctls. & AMG) : α ≈ 1, β ≈ 2, γ ≈ 2 � α ) � ε − max ( 2 , γ Cost( � = O (max( N 0 , M L )) ≈ O ( ε − 2 ) Q MLMC ) = O L Optimality: Asymptotic cost of one deterministic solve (to tol= ε ) ! R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 31 / 38

  79. Numerical Example (Multilevel MC) Running example with Q = � u � L 2 ( D ) 8 ; lognormal diffusion coeff. w. exponential covariance ( σ 2 = 1, λ = 0 . 3) h 0 = 1 R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 32 / 38

  80. Inverse Problem: Multilevel Markov Chain Monte Carlo Posterior distribution for PDE model problem (Bayes): π ℓ ( x ℓ | y obs ) � exp( −� y obs − F ℓ ( x ℓ ) � 2 Σ obs ) π prior ( x ℓ ) R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 33 / 38

  81. Inverse Problem: Multilevel Markov Chain Monte Carlo Posterior distribution for PDE model problem (Bayes): π ℓ ( x ℓ | y obs ) � exp( −� y obs − F ℓ ( x ℓ ) � 2 Σ obs ) π prior ( x ℓ ) What were the key ingredients of “standard” multilevel Monte Carlo? R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 33 / 38

  82. Inverse Problem: Multilevel Markov Chain Monte Carlo Posterior distribution for PDE model problem (Bayes): π ℓ ( x ℓ | y obs ) � exp( −� y obs − F ℓ ( x ℓ ) � 2 Σ obs ) π prior ( x ℓ ) What were the key ingredients of “standard” multilevel Monte Carlo? Telescoping sum: E [ Q L ] = E [ Q 0 ] + � L ℓ =1 E [ Q ℓ − Q ℓ − 1 ] Models on coarser levels much cheaper to solve ( M 0 ≪ M L ). V [ Q ℓ − Q ℓ − 1 ] ℓ !∞ − → 0 as = ⇒ much fewer samples on finer levels. R. Scheichl (Heidelberg) PDE-Constrained Bayesian Inference ICERM 23/03/20 33 / 38

Recommend


More recommend