Maziar Raissi September 14, 2017 Division of Applied Mathematics Brown University, Providence, RI, USA maziar_raissi@brown.edu Hidden Physics Models
Problem Setup
Let us consider parametrized and nonlinear partial differential equations of the general form As an example, the one dimensional Burgers’ equation corresponds Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 1 Problem Setup h t + N λ x h = 0 , x ∈ Ω , t ∈ [ 0 , T ] , where h ( t , x ) denotes the latent (hidden) solution, N λ x is a nonlinear operator parametrized by λ , and Ω is a subset of R D . Example to the case where N λ x h = λ 1 hh x − λ 2 h xx and λ = ( λ 1 , λ 2 ) .
Identification Problem Inference Problem or data driven solutions of partial differential equations which Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial unknown hidden state h t x of the system? what can be said about the states: given fixed model parameters 2 The second problem is that of inference, filtering and smoothing, Given noisy measurements of the system, one is typically interested that best describe the observed data? the parameters driven discovery of partial differential equations stating: what are The first problem is that of learning, system identification, or data in the solution of two distinct problems. Two Distinct Problems
Inference Problem Given noisy measurements of the system, one is typically interested in the solution of two distinct problems. The first problem is that of learning, system identification, or data driven discovery of partial differential equations stating: what are The second problem is that of inference, filtering and smoothing, or data driven solutions of partial differential equations which states: given fixed model parameters what can be said about the unknown hidden state h t x of the system? Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 2 Two Distinct Problems Identification Problem the parameters λ that best describe the observed data?
Given noisy measurements of the system, one is typically interested in the solution of two distinct problems. The first problem is that of learning, system identification, or data driven discovery of partial differential equations stating: what are The second problem is that of inference, filtering and smoothing, or data driven solutions of partial differential equations which Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). 2 Two Distinct Problems Identification Problem the parameters λ that best describe the observed data? Inference Problem states: given fixed model parameters λ what can be said about the unknown hidden state h ( t , x ) of the system?
Differential Equations.” arXiv preprint arXiv:1708.00588 (2017). Raissi, Maziar, and George Em Karniadakis. “Hidden Physics Models: Machine Learning of Nonlinear Partial 3 Identification Problem – Hidden Physics Models
and Non-linear Partial Differential Equations.” arXiv preprint arXiv:1703.10230 (2017). Raissi, Maziar, Paris Perdikaris, and George Em Karniadakis. “Numerical Gaussian Processes for Time-dependent 4 Inference Problem – Numerical Gaussian Processes
Gaussian Processes
5 0 MIT Press, Cambridge, MA, USA 38 (2006): 715-719. is just a shorthand notation for Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The A Gaussian process 0 Gaussian processes f ( x ) ∼ GP ( 0 , k ( x , x ′ ; θ )) , [ ] ([ ] [ ]) f ( x ) k ( x , x ; θ ) k ( x , x ′ ; θ ) ∼ N , . f ( x ′ ) k ( x ′ , x ; θ ) k ( x ′ , x ′ ; θ )
6 0 MIT Press, Cambridge, MA, USA 38 (2006): 715-719. is just a shorthand notation for Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The A Gaussian process 0 Gaussian processes f ( x ) ∼ GP ( 0 , k ( x , x ′ ; θ )) , [ ] ([ ] [ ]) f ( x ) k ( x , x ; θ ) k ( x , x ′ ; θ ) ∼ N , . f ( x ′ ) k ( x ′ , x ; θ ) k ( x ′ , x ′ ; θ )
Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The covariance function MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 7 Covariance Function A typical example for the kernel k ( x , x ′ ; θ ) is the squared exponential ( ) k ( x , x ′ ; θ ) = γ 2 exp 2 w 2 ( x − x ′ ) 2 − 1 , where θ = ( γ, w ) are the hyper-parameters of the kernel.
resulting from negative log marginal likelihood Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 8 Training Given a dataset { x , y } of size N , the hyper-parameters θ and the noise variance parameter σ can be trained by minimizing the 2 y T K − 1 y + 1 NLML ( θ, σ ) = 1 2 log | K | + n 2 log ( 2 π ) , y ∼ N ( 0 , K ) , where K = k ( x , x ; θ ) + σ 2 I .
Further Explanation 0 9 MIT Press, Cambridge, MA, USA 38 (2006): 715-719. Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The K k x x k x x k x x 0 Having trained the model, one can use the posterior distribution y f x This is obtained by writing the joint distribution to make predictions at a new test point x . Prediction f ( x ) | y ∼ N ( k ( x , x ) K − 1 y , k ( x , x ) − k ( x , x ) K − 1 k ( x , x )) ,
9 0 MIT Press, Cambridge, MA, USA 38 (2006): 715-719. to make predictions at a new test point x . Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The This is obtained by writing the joint distribution K Having trained the model, one can use the posterior distribution y Prediction f ( x ) | y ∼ N ( k ( x , x ) K − 1 y , k ( x , x ) − k ( x , x ) K − 1 k ( x , x )) , Further Explanation [ ] ([ ] [ ]) f ( x ) k ( x , x ) k ( x , x ) ∼ N , . 0 k ( x , x )
Rasmussen, Carl Edward, and Christopher KI Williams. “Gaussian processes for machine learning. 2006.” The Code MIT Press, Cambridge, MA, USA 38 (2006): 715-719. 10 Example
Multi-fidelity Gaussian Processes
Let us consider the following autoregressive model and Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 11 Multi-fidelity Gaussian Processes f H ( x ) = ρ f L ( x ) + δ ( x ) , where f L ( x ) and δ ( x ) are two independent Gaussian Processes with f L ( x ) ∼ GP ( 0 , k 1 ( x , x ′ , θ 1 )) , δ ( x ) ∼ GP ( 0 , k 2 ( x , x ′ , θ 2 )) .
12 0 approximations are available.” Biometrika 87.1 (2000): 1-13. Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast and with Therefore, 0 Multi-fidelity Gaussian Processes [ ] ([ ] [ ]) f L ( x ) k LL ( x , x ′ ; θ 1 ) k LH ( x , x ′ ; θ 1 , ρ ) ∼ GP , , f H ( x ) k HL ( x , x ′ ; θ 1 , ρ ) k HH ( x , x ′ ; θ 1 , θ 2 , ρ ) k LL ( x , x ′ ; θ 1 ) = k 1 ( x , x ′ ; θ 1 ) , k LH ( x , x ′ ; θ 1 , ρ ) = k HL ( x ′ , x ; θ 1 , ρ ) = ρ k 1 ( x , x ′ ; θ 1 ) , k HH ( x , x ′ ; θ 1 , θ 2 , ρ ) = ρ 2 k 1 ( x , x ′ ; θ 1 ) + k 2 ( x , x ′ ; θ 2 ) .
In the following, we assume that we have access to data with two levels of fidelity denote the sample size of x H . The main assumption is that Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast approximations are available.” Biometrika 87.1 (2000): 1-13. 13 Training Data { x L , y L } , { x H , y H } , where y H has a higher level of fidelity. Main Assumption We use N L to denote the number of observations in x L and N H to N H ≪ N L .
14 y L approximations are available.” Biometrika 87.1 (2000): 1-13. negative log marginal likelihood Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast H I L I 2 where resulting from y H Training The hyper-parameters { θ 1 , θ 2 } , the parameter ρ , and the noise variance parameters { σ L , σ H } can be trained by minimizing the 2 log | K | + N L + N H 2 y T K − 1 y + 1 NLML ( θ 1 , θ 2 , ρ, σ L , σ H ) = 1 log ( 2 π ) , [ ] ∼ N ( 0 , K ) , y := [ ] k LL ( x L , x L ; θ 1 ) + σ 2 k LH ( x L , x H ; θ 1 , ρ ) K = . k HL ( x H , x L ; θ 1 , ρ ) k HH ( x H , x H ; θ 1 , θ 2 , ρ ) + σ 2
15 to make predictions at a new test point x . Here, approximations are available.” Biometrika 87.1 (2000): 1-13. Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast Having trained the model, one can use the posterior distribution Prediction ( ) f H ( x ) | y ∼ N q T K − 1 y , k HH ( x , x ) − q T K − 1 q , [ ] q T = k HL ( x , x L ) k HH ( x , x H ) .
Kennedy, Marc C., and Anthony O’Hagan. “Predicting the output from a complex computer code when fast Code approximations are available.” Biometrika 87.1 (2000): 1-13. 16 Example
Recommend
More recommend