Bayesian Multi-Fidelity Optimization under Uncertainty Phaedon S. Koutsourelakis Maximilian Koschade p.s.koutsourelakis@tum.de maximilian.koschade@tum.de Continuum Mechanics Group Department of Mechanical Engineering Technical University of Munich, Germany SIAM Computational Science and Engineering, Atlanta, 2017 1
Optimization under Uncertainty Example: Material property as random field λ ( x ) • z : design variables (topology or shape) • θ ∼ p θ ( θ ) : stochastic influences, e.g. • material : discretized random field λ ( x ) • temperature / load : stochastic process • manufacturing tolerances : distributed around nominal value Figure 1: Cross section view of stiffening rib Introducing uncertainty to optimization problems In many engineering applications deterministic optimization is a simplification neglecting aleatory and epistemic uncertainty. 2
Optimization under Uncertainty - Objective Function Maximize the expected utility � z ∗ = arg max V ( z ) = arg max U ( z , θ ) p θ ( θ ) d θ z z • z : design variables • θ ∼ p θ ( θ ) : stochastic influences on the system Example: minimize probability of failure U ( z , θ ) = 1 A ( z , θ ) (where A the event of non-failure) Example: design goal u target � 2 τ Q ( u ( z , θ ) − u target ) 2 � − 1 U ( z , θ ) = exp τ Q : penalty parameter enforcing the design goal 3
Probabilistic Formulation of Optimization under Uncertainty Reformulation as Probabilistic Inference 1 Solution is given by an auxiliary posterior distribution 2 π ( z , θ ) � V ( z ) ∝ π ( z , θ ) d θ � �� � posterior � ∝ U ( z , θ ) p θ ( θ ) d θ � �� � � �� � likelihood prior since the marginal π ( z ) ∝ V ( z ), given a flat prior p z ( z ). Conducive to consistent incorporation of epistemic uncertainty due to approximate, lower-fidelity solvers! 1 Mueller (2005) 2 This approach should NOT be confused with Bayesian optimization 4
Probabilistic Formulation of Optimization under Uncertainty Reformulation as Probabilistic Inference 1 Solution is given by an auxiliary posterior distribution 2 π ( z , θ ) � V ( z ) ∝ π ( z , θ ) d θ � �� � posterior �� � ∝ U ( z , θ ) p θ ( θ ) d θ p z ( z ) � �� � � �� � � �� � likelihood prior flat prior since the marginal π ( z ) ∝ V ( z ), given a flat prior p z ( z ). Conducive to consistent incorporation of epistemic uncertainty due to approximate, lower-fidelity solvers! 1 Mueller (2005) 2 This approach should NOT be confused with Bayesian optimization 4
Example: Stochastic Poisson Equation ∇ · ( − λ ( x ) ∇ u ( x )) = 0 dim ( z ) = 21 z ( x 2 ) dim ( θ ) = 800 x 2 Solution x 1 Target u ( x 2 ) θ (1) x 2 u ( x 2 ) θ (2) x 2 z : Control heat influx θ : Log-Normal conductivity field 5
1 0 50 100 150 200 0 Sensitivity x 2 Solution via rank-1-perturbed Gaussian q ∗ z ∗ 0 . 8 0 . 6 0 . 4 0 . 2 − 200 − 150 − 100 − 50 heat fmux g ( x 2 ) Figure 2: Black-box stochastic variational inference in dimension 821 (dim ( θ ) = 800,dim ( z ) = 21) (Hoffman et al., 2013; Ranganath et al., 2013) 6
1 0 50 100 150 200 0 Sensitivity x 2 Solution via rank-1-perturbed Gaussian q ∗ z ∗ 0 . 8 0 . 6 0 . 4 0 . 2 − 200 − 150 − 100 − 50 heat fmux g ( x 2 ) � 10 3 � Cost : O forward evaluations 6
• high dimension • expensive numerical model ⇒ probabilistic inference can quickly become prohibitive. How can we address this issue? 6
Introduction of approximate solvers If we denote a = log U and y = [ z , θ ] T we can rewrite π ( y ) π a ( y ) ∝ U ( y ) p y ( y ) = exp ( a ( y )) p y ( y ) � exp ( a ) δ ( a − log U ( y )) p y ( y ) d a = � = exp ( a ) p ( a | y ) p y ( y ) d a Approximate solvers = Epistemic uncertainty • As long as p ( a | y ) is a Dirac, we recover posterior perfectly • Introduction of cheap, approximate solvers leads to dispersion of p ( a | y ) and irrevocable loss of information regarding y • We can consistently incorporate this epistemic uncertainty in the Bayesian framework 7
Introduction of approximate solvers If we denote a = log U and y = [ z , θ ] T we can rewrite π ( y ) π a ( y ) ∝ U ( y ) p y ( y ) = exp ( a ( y )) p y ( y ) � exp ( a ) δ ( a − log U ( y )) p y ( y ) d a = � = exp ( a ) p ( a | y ) p y ( y ) d a Regression Model We may learn p ( a | y ) from e.g. a Bayesian regression model or a Gaussian process GP a = φ ( y ) T w + ǫ This approach is impractical for a high-dimensional probability space y = [ z , θ ] T ! 7
Introduction of approximate solvers If we denote a = log U and y = [ z , θ ] T we can rewrite π ( y ) π a ( y ) ∝ U ( y ) p y ( y ) = exp ( a ( y )) p y ( y ) � exp ( a ) δ ( a − log U ( y )) p y ( y ) d a = � = exp ( a ) p ( a | y ) p y ( y ) d a Suppose instead we introduce a low-fidelity log-likelihood A � � p ( a | y ) = p ( a , A | y ) d A = p ( a | A , y ) p ( A | y ) d A � ≈ p ( a | A ) δ ( A − log U LowFi ) d A := p A ( a | y ) � ⇒ π A ( y ) ∝ exp ( a ) p A ( a | y ) p y ( y ) d a 7
-10 0 -10 4 High-Fidelity a Low-Fidelity A -10 2 -10 6 -10 4 -10 2 -10 0 -10 6 Learning p ( a | y ) : Probabilistic multi-fidelity approach 3 Introduce low-fidelity log-likelihood A � p A ( a | y ) ≈ p ( a | A ) δ ( A − log U LowFi. ( y )) d A Pred. density p A ( a | A ) • belief of high-fidelity a given low-fidelity A • learn from a limited set of forward solver evaluations D • D = { a ( y n ) , A ( y n ) } N n =1 8
-10 0 -10 4 High-Fidelity a Low-Fidelity A -10 2 -10 6 -10 4 -10 2 -10 0 -10 6 Learning p ( a | y ) : Probabilistic multi-fidelity approach 3 Introduce low-fidelity log-likelihood A � p A ( a | y ) ≈ p ( a | A ) δ ( A − log U LowFi. ( y )) d A Pred. density p A ( a | A , D ) • belief of high-fidelity a given low-fidelity A • learn from a limited set of forward solver evaluations D • D = { a ( y n ) , A ( y n ) } N n =1 8
-10 0 -10 4 High-Fidelity a Low-Fidelity A -10 2 -10 6 -10 4 -10 2 -10 0 -10 6 Learning p ( a | y ) : Probabilistic multi-fidelity approach 3 Introduce low-fidelity log-likelihood A � p A ( a | y ) ≈ p ( a | A ) δ ( A − log U LowFi. ( y )) d A Learn p A ( a | A , D ) • Learn predictive density • Using e.g. variational relevance vector machine (VRVM) or Variational Heteroscedastic Gaussian Process (VHGP) • D = { a ( y n ) , A ( y n ) } N n =1 8
-10 0 -10 4 High-Fidelity a Low-Fidelity A -10 2 -10 6 -10 4 -10 2 -10 0 -10 6 Learning p ( a | y ) : Probabilistic multi-fidelity approach 3 Introduce low-fidelity log-likelihood A � p A ( a | y , D ) ≈ p ( a | A , D ) δ ( A − log U LowFi. ( y )) d A Pred. density p A ( a | A , D ) • belief of high-fidelity a given low-fidelity A • learn from a limited set of forward solver evaluations D • D = { a ( y n ) , A ( y n ) } N n =1 8
1 y a Extended Probability Space - Illustration π A ( a , y ) π a ( y ) δ ( a − log U ( y )) 9
1 2 y a Extended Probability Space - Illustration Increase of epistemic uncertainty π A ( a , y ) π a ( y ) p A ( a | y ) π A ( y ) 9
1 2 3 y a Extended Probability Space - Illustration Increase of epistemic uncertainty π A ( a , y ) π a ( y ) p A ( a | y ) π A ( y ) 9
Multi-Fidelity posterior π A ( y ) Approximate π A ( y ) If predictive density p ( a | y ) is given by a Gaussian � � � µ ( A ( y )) , σ 2 ( A ( y )) � N , then we obtain a log π A ( y ) = µ ( A ( y )) + 1 2 σ 2 ( A ( y )) + log p y ( y ) Place probability mass on y associated with 10
Multi-Fidelity posterior π A ( y ) Approximate π A ( y ) If predictive density p ( a | y ) is given by a Gaussian � � � µ ( A ( y )) , σ 2 ( A ( y )) � N , then we obtain a + 1 2 σ 2 ( A ( y )) + log p y ( y ) log π A ( y ) = µ ( A ( y )) A Place probability mass on y associated with (A) : high predictive mean µ ( y ) 10
Multi-Fidelity posterior π A ( y ) Approximate π A ( y ) If predictive density p ( a | y ) is given by a Gaussian � � � µ ( A ( y )) , σ 2 ( A ( y )) � N , then we obtain a + 1 2 σ 2 ( A ( y )) log π A ( y ) = µ ( A ( y )) + log p y ( y ) B A Place probability mass on y associated with (A) : high predictive mean µ ( y ) (B) : large epistemic uncertainty σ 2 ( y ) 10
Example: Stochastic Poisson Equation ∇ · ( − λ ( x ) ∇ u ( x )) = 0 dim ( z ) = 1 z ( x 2 ) dim ( θ ) = 256 x 2 Solution x 1 Target u ( x 2 ) θ (1) x 2 u ( x 2 ) θ (2) x 2 z : Control heat influx θ : Log-Normal conductivity field 11
5 design variable z 30 40 50 60 70 80 90 100 0 20 10 15 Effect of lower-fidelity solvers 4 density estimate π A ( z |D ) z ∗ |Reference (32x32) π A ( z |D ) (8x8) Figure 2: dim ( θ ) = 256, speedup S 4 × 4 ≈ 2 , 000 , N = 200 training data samples, density estimate obtained using MALA 4 here the low-fidelity solvers are simply coarser discretizations of the stochastic 12 Poisson equation
10 design variable z 30 40 50 60 70 80 90 100 0 5 20 15 Effect of lower-fidelity solvers 4 density estimate π A ( z |D ) z ∗ |Reference (32x32) π A ( z |D ) (8x8) π A ( z |D ) (4x4) Figure 2: dim ( θ ) = 256, speedup S 4 × 4 ≈ 2 , 000 , N = 200 training data samples, density estimate obtained using MALA 4 here the low-fidelity solvers are simply coarser discretizations of the stochastic 12 Poisson equation
Recommend
More recommend