Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied & Computational Mathematics, Princeton University Joint work with Weinan E and Arnulf Jentzen Inverse Problems and Machine Learning, Caltech, February 9, 2018 1 / 32
Outline 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 2 / 32
Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 3 / 32
Well-known Examples of PDEs • The Schr¨ odinger equation in quantum many-body problem, i � ∂ ∂t Ψ( t, x ) = ( − 1 2∆ + V )Ψ( t, x ) . • The Black-Scholes equation for pricing financial derivatives, v t + 1 � � σσ T ( Hess x v ) 2 Tr + r ∇ v · x − rv = 0 . • The Hamilton-Jacobi-Bellman equation in stochastic control (dynamic programming), � 1 � � � σσ T ( Hess x v ) v t + max 2 Tr + ∇ v · b + f = 0 . u 4 / 32
Curse of Dimensionality • The dimension of PDEs can be easily large in practice. Equation Dimension (roughly) Schr¨ odinger equation # of electrons × 3 Black-Scholes equation # of underlying financial assets HJB equation the same as the state space • A key computational challenge is the curse of dimensionality: the complexity is exponential in dimension d for finite difference/element method – usually unavailable for d ≥ 4 . • There is a huge gap between PDE modelings and computational algorithms. 5 / 32
Remarkable Success of Deep Learning • Machine learning/data analysis also face the same curse of dimensionality • In recent years, deep learning has achieved remarkable success • An old but essential idea: represent functions in a compositional form rather than additive 6 / 32
Related Work in High-dimensional Case • Linear parabolic PDEs: Monte Carlo methods based on the Feynman-Kac formula • Semilinear parabolic PDEs: 1. branching diffusion approach (Henry-Labord` ere 2012, Henry-Labord` ere et al. 2014) 2. multilevel Picard approximation (E et al. 2016) • Hamilton-Jacobi PDEs: using Hopf formula and fast convex/nonconvex optimization methods (Darbon & Osher 2016, Chow et al. 2017) 7 / 32
Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 8 / 32
Semilinear Parabolic PDE We consider a general semilinear parabolic PDE in [0 , T ] × R d : ∂u ∂t ( t, x ) + 1 � � σσ T ( t, x )( Hess x u )( t, x ) 2Tr + ∇ u ( t, x ) · µ ( t, x ) � = 0 . � t, x, u ( t, x ) , σ T ( t, x ) ∇ u ( t, x ) + f • Terminal condition is given: u ( T, x ) = g ( x ) . • To fix ideas, we are interested in the solution at t = 0 , x = ξ for some vector ξ ∈ R d . 9 / 32
Connection between PDE and BSDE • The link between parabolic PDEs and backward stochastic differential equations (BSDEs) has been extensively investigated (Pardoux & Peng 1992, El Karoui et al. 1997, etc). • In particular, Markovian BSDEs give a nonlinear Feynman-Kac representation of some nonlinear parabolic PDEs. • Consider the following BSDE � t � t X t = ξ + µ ( s, X s ) ds + σ ( s, X s ) dW s , 0 0 � T � T ( Z s ) T dW s , Y t = g ( X T ) + f ( s, X s , Y s , Z s ) ds − t t The solution is an adapted process { ( X t , Y t , Z t ) } t ∈ [0 ,T ] with values in R d × R × R d . 10 / 32
Connection between PDE and BSDE • Under suitable regularity assumptions, the BSDE is well-posed and related to the PDE in the sense that for all t ∈ [0 , T ] it holds a.s. that Z t = σ T ( t, X t ) ∇ u ( t, X t ) . Y t = u ( t, X t ) and • In other words, given the stochastic process satisfying � t � t X t = ξ + µ ( s, X s ) ds + σ ( s, X s ) dW s , 0 0 the solution of PDE satisfies the following SDE u ( t, X t ) − u (0 , X 0 ) � t � ds � s, X s , u ( s, X s ) , σ T ( s, X s ) ∇ u ( s, X s ) = − f 0 � t [ ∇ u ( s, X s )] T σ ( s, X s ) dW s . + 0 11 / 32
BSDE and Control – A LQG Example Consider a classical linear-quadratic-Gaussian (LQG) control problem in R d : √ √ dX t = 2 λ m t dt + 2 dW t , � � T 0 � m t � 2 with cost functional J ( { m t } 0 ≤ t ≤ T ) = ❊ 2 dt + g ( X T ) � . The HJB equation for this problem is ∂u ∂t ( t, x ) + ∆ u ( t, x ) − λ �∇ u ( t, x ) � 2 2 = 0 . The optimal control is given by t = ∇ u ( t, x ) ( recall Z t = σ T ( t, X t ) ∇ u ( t, X t )) . m ∗ √ , 2 λ In the context of BSDE for control, Y t denotes the optimal value and Z t denotes the optimal control (up to a constant scaling). 12 / 32
Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 13 / 32
Neural Network Approximation • Key step: approximate the function x �→ σ T ( t, x ) ∇ u ( t, x ) at each discretized time step t = t n by a feedforward neural network σ T ( t n , X t n ) ∇ u ( t n , X t n ) = ( σ T ∇ u )( t n , X t n ) ≈ ( σ T ∇ u )( t n , X t n | θ n ) , where θ n denotes neural network parameters. • Observation: we can stack all the subnetworks together to form a deep neural network (DNN) as a whole, based on the time discretization (see the next two slides). 14 / 32
Time Discretization We consider the simple Euler scheme of the BSDE, with a partition of the time interval [0 , T ] , 0 = t 0 < t 1 < . . . < t N = T : X t n +1 − X t n ≈ µ ( t n , X t n ) ∆ t n + σ ( t n , X t n ) ∆ W n , and u ( t n +1 , X t n +1 ) − u ( t n , X t n ) � ∆ t n � t n , X t n , u ( t n , X t n ) , σ T ( t n , X t n ) ∇ u ( t n , X t n ) ≈ − f + [ ∇ u ( t n , X t n )] T σ ( t n , X t n ) ∆ W n , where ∆ t n = t n +1 − t n , ∆ W n = W t n +1 − W t n . 15 / 32
Network Architecture Figure: Network architecture for solving parabolic PDEs. Each column corresponds to a subnetwork at time t = t n . The whole network has ( H + 2)( N − 1) layers in total. 16 / 32
Optimization • This network takes the paths { X t n } 0 ≤ n ≤ N and { W t n } 0 ≤ n ≤ N as the input data and gives the final output, denoted by ˆ u ( { X t n } 0 ≤ n ≤ N , { W t n } 0 ≤ n ≤ N ) , as an approximation to u ( t N , X t N ) . • The error in the matching of given terminal condition defines the expected loss function � 2 � �� �� l ( θ ) = ❊ � g ( X t N ) − ˆ u � { X t n } 0 ≤ n ≤ N , { W t n } 0 ≤ n ≤ N . • The paths can be simulated easily. Therefore the commonly used SGD algorithm fits this problem well. • We call the introduced methodology deep BSDE method since we use the BSDE and DNN as essential tools. 17 / 32
Time Discretization as Skip Connection Why such deep networks can be trained? Intuition: there are skip connections between different subnetworks u ( t n +1 , X t n +1 ) − u ( t n , X t n ) � ∆ t n � t n , X t n , u ( t n , X t n ) , ( σ T ∇ u )( t n , X t n | θ n ) ≈ − f + ( σ T ∇ u )( t n , X t n | θ n ) ∆ W n , 18 / 32
Analogy to Deep Reinforcement Learning • Deep Reinforcement Learning (DRL) has achieved great success in game domains and sophisticated control tasks. A common strategy is to represent policy function (control) through neural networks. • Recall that in the example of LQG control problem, Z t denotes the optimal control, which is approximated by neural networks. Table: Informal analogy Deep BSDE method DRL BSDE ← → Markov decision model gradient of the solution ← → optimal policy function 19 / 32
Table of Contents 1. Introduction 2. Mathematical Formulation 3. Neural Network Approximation 4. Numerical Examples 5. Summary 20 / 32
Implementation • Each subnetwork has 4 layers, with 1 input layer ( d -dimensional), 2 hidden layers (both d + 10 -dimensional), and 1 output layer ( d -dimensional). • Choose the rectifier function (ReLU) as the activation function and optimize with Adam method. • Implement in Tensorflow and reported examples are all run on a Macbook Pro. • Github: https://github.com/frankhan91/DeepBSDE 21 / 32
LQG Example Revisited We solve the introduced HJB equation in [0 , 1] × R 100 . It admits an explicit formula, which allows accuracy test: √ u ( t, x ) = − 1 � ��� � � λ ln exp − λg ( x + 2 W T − t ) . ❊ 4.7 Deep BSDE Solver 4.6 Monte Carlo 4.5 u(0,0,...,0) 4.4 4.3 4.2 4.1 4.0 0 10 20 30 40 50 lambda Figure: Left: Relative error of the deep BSDE method for u ( t =0 , x =(0 , . . . , 0)) when λ = 1 , which achieves 0 . 17% in a runtime of 330 seconds. Right: Optimal cost u ( t =0 , x =(0 , . . . , 0)) against different λ . 22 / 32
Black-Scholes Equation with Default Risk • The classical Black-Scholes model can and should be augmented by some important factors in real markets, including defaultable securities, transactions costs, uncertainties in the model parameters, etc. • Ideally the pricing models should take into account the whole basket of financial derivative underlyings, resulting in high-dimensional nonlinear PDEs. • To test the deep BSDE method, we study a special case of the recursive valuation model with default risk (Duffie et al. 1996, Bender et al. 2015). 23 / 32
Recommend
More recommend