Multi-resolution Inference of Stochastic Models from Partially Observed Data Samuel Kou Department of Statistics Harvard University Joint with Ben Olding
Stochastic Differential Equations n Models based on stochastic differential equations (SDE) are widely used in science and engineering. n General form Y t : the process, θ : the underlying parameters
Stochastic Differential Equations n Models based on stochastic differential equations (SDE) are widely used in science and engineering. n General form Y t : the process, θ : the underlying parameters n Example 1. In chemistry and biology. A reversible enzymatic reaction A ↔ B U ( y ) typically modeled as ′ = − + σ dY U ( Y ) dt dB t t t E A E B θ : the energy barrier heights etc. A B y : reaction coordinate
Stochastic Differential Equations n Models based on stochastic differential equations (SDE) are widely used in science and engineering. n General form Y t : the process, θ : the underlying parameters n Example 2. In finance and economics. Feller process (a.k.a. CIR process) has been used to model interest rates = γ µ − + σ dY ( Y ) dt Y dB t t t t parameters
Stochastic Differential Equations n Models based on stochastic differential equations (SDE) are widely used in science and engineering. n General form Y t : the process, θ : the underlying parameters n Example 2. In finance and economics. Feller process (a.k.a. CIR process) has been used to model interest rates = γ µ − + σ dY ( Y ) dt Y dB t t t t
Statistical Inference n Given a stochastic model, infer the parameter values from data n Major complication: the continuous-time model is only observed at discrete time points Example: (i) Biology or chemistry experiments can track movement of molecules only at discrete camera frames (ii) Finance or economics, interest rates, price index, etc. only observed daily, weekly or monthly
Likelihood Inference n Data ( Y 1 , t 1 ), ( Y 2 , t 2 ),…( Y n , t n ) from n Likelihood f ( y | x , t , θ ) : transition density n In most cases, f does not permit analytical form; solving a PDE numerically is not feasible either
The Euler Approximation n Idea: approximate an SDE by a difference equation n Obtain approx likelihood from the difference eqn n Works well only if ∆ t is small Generated from Ornstein-Uhlenbeck process
Euler Approx. Generated from Ornstein-Uhlenbeck process
Euler Approx. Exact Generated from Ornstein-Uhlenbeck process
Bayesian Data Augmentation If ∆ t is not “sufficiently small” n Choose a ∆ t small enough so that the Euler ¤ approximation is appropriate. y obs y mis y mis y mis y obs y mis y mis y mis y obs Treat the unobserved values of Y t as missing data . ¤ ∫ θ ∝ θ π θ P ( | y ) P ( y , y | ) ( ) dy n obs obs mis mis Data augmentation: n θ ∝ θ π θ P ( , y | y ) P ( y , y | ) ( ) mis obs obs mis Use Monte Carlo to perform the augmentation n
Bayesian Data Augmentation (ctd)
Bayesian Data Augmentation (ctd) k= 31 Exact Idea appeared simultaneously in stats & econ literature in late 1990s: Elerian, Chib, Shephard (2001); Eraker (2001); Jones (1998)
Monte Carlo: Not that easy n The smaller the ∆ t , the more accurate the approximation n However, the smaller the ∆ t , the more missing data we need to augment: dimensionality goes way up! n The missing data are dependent as well! very slow convergence of the Gibbs sampler at small ∆ t
Monte Carlo: Not that easy n The smaller the ∆ t , the more accurate the approximation n However, the smaller the ∆ t , the more missing data we need to augment: dimensionality goes way up! n The missing data are dependent as well! n The dilemma: ¤ Low resolution (big ∆ t ) runs quickly, but result inaccurate ¤ High resolution (small ∆ t ) good approximation, but painfully slow
Multi-resolution Idea n Utilize the strength of different resolutions, while avoid their weakness n Simultaneously work on multiple resolutions “rough” approximations quickly locate the important regions “fine” approximations get jump start, and then accurately explore the space . . . . . .
Multi-resolution sampler n Consider multiple resolutions (i.e., approximation levels) together. Associate each level with a Monte Carlo chain n Start from the lowest level with a MC (such as Gibbs sampler); record the results n Move on to the 2nd level ¤ In each MC update, with prob p do Gibbs ¤ With prob 1- p , draw y from previous lower level chain augment y to ( y , y' ) by “upsampling” accept ( y , y' ) with probability + ′ ′ → ( k 1 ) ( k ) L ( y , y ) L ( y ) T ( y y ) = old old old r min{ 1 , } + ′ ′ → ( k 1 ) ( k ) L ( y , y ) L ( y ) T ( y y ) old old n Move on to the 3rd level …… Likelihood at level k
A Pictorial Guide . . . . . . y 1 y 2 y m
A Pictorial Guide . . . . . . y 1 y 2 y m with prob p Gibbs with prob 1 -p + ′ ′ → ( k 1 ) ( k ) L ( y , y ) L ( y ) T ( y y ) = accept with old old old r min{ 1 , } ′ ′ + → ( k 1 ) ( k ) L ( y , y ) L ( y ) T ( y y ) old old
The comparison multi-resolution vanilla Gibbs
Multi-resolution Inference n Observation: A by-product of the Multiresolution sampler is that we obtain multiple approximations to the same distribution n Question: Can we combine them together for inference, instead of using only the finest resolution? n Idea: Look for trend from successive approximations and leap forward
Illustration
Leap forward: the multiresolution extrapolation
Richardson Extrapolation n Richardson (1927) = n If is what we want A lim A ( h ) h → 0 = + + + + k k k With resolution h � A ( h ) A a h a h a h 0 1 2 0 1 2 = + + k k A a h O ( h ) 0 1 0 k 0 h h = + + k A ( ) A a O ( h ) But for resolution h /2 1 0 2 2 h − k 2 A ( ) A ( h ) 0 ~ 2 ≡ = + k A ( h ) A O ( h ) 1 − k 2 1 0 is an order of magnitude better!
Multiresolution Extrapolation n We have multiple posterior distributions from the multi- resolution sampler n Extrapolate the entire distribution by quantiles 3 = k 7 = k
Inference of GCIR process ψ = γ µ − + σ = γ µ σ ψ θ dY ( Y ) dt Y dB , ( , , , ) n t t t t n Model for interest rate, bond rate, exchange rate n No analytical solution for the transition density n The data
Result n Posterior distribution 3+7 extrap. K = inf K = inf multiresolution vanilla Gibbs
Result n Posterior distribution 3+7 extrap. K = inf K = inf n Autocorrelation plots Faster and more accurate!
Result (continued) n In Bayesian analysis, use MC samples to approximate posterior quantities of interest (eg., mean, median, etc.) θ → E θ ( | Y ) obs n Use quantiles from MC sample to construct interval estimate α θ → α θ ˆ ( ) ( ) Q ( ) Q ( | Y ) obs
n Compare ratio of Mean Square Error given same time budget:
10 15 20 25 0 5 n Use GCIR model n 3-month Eurodollar deposit rate Application 1: Eurodollar rate 1/8/1971 1/8/1973 1/8/1975 1/8/1977 1/8/1979 1/8/1981 1/8/1983 1/8/1985 1/8/1987 1/8/1989 1/8/1991 1/8/1993 1/8/1995 1/8/1997 1/8/1999 1/8/2001 1/8/2003 1/8/2005 1/8/2007
Eurodollar rate n Posterior mean, median and interval ψ = γ µ − + σ = γ µ σ ψ θ dY ( Y ) dt Y dB , ( , , , ) t t t t
Application 2: Inference of Optically- Trapped Particle Data n McCann et al. (1999): Data of a particle in a bistable trap
n Again compare ratio of Mean Square Error given same time budget:
Discussion n We introduce the multi-resolution framework n Efficient Monte Carlo with the multi-resolution sampler n Accurate inference with the multi-resolution extrapolation n Extendible to higher dimensions n Extendible to state space (HMM) models
References n SØrensen (2004) – Survey paper n Elerian et al. (2001) – Original Gibbs paper n Roberts & Stramer (2001) – SDE transformations n Kloeden & Platen (1992) – Book on Numerical Solution of SDEs
Acknowledgement n Ben Olding n Jun Liu n Xiaoli Meng n Wing Wong n NSF, NIH Thank you!
Extrapolation Theorem n Assuming: ¤ The diffusion & volatility functions � (•) and σ 2 (•) have linear growth ¤ � (•) and σ 2 (•) are twice continuously differentiable with bounded derivatives ¤ σ 2 (•) is bounded from below n Then for any integrable function g ( θ ) :
Extrapolation Corollary n Taking g ( θ ) to be an indicator function, then if a posterior cdf F has non-zero derivative at all points, its quantiles can be expanded as:
Recommend
More recommend