Multivariate and Partially observed models Erik Lindström
n T Briefly on multivariate models N n 1 n 1 N (2) 1 n X n X T 1 n 1 n Consider the Vector-AR(1) (VAR) process. 1 X T X n 1 n 1 N A Matrix Cookbook] Leads to Writing down the log-likelihood... [checking the matrix algebra tricks. Yes, with a bit of Can we estimate the parameters? (1) (3) X n + 1 = A X n + ε n + 1 , ε ∼ MVN ( 0 , Σ)
n T Briefly on multivariate models N n 1 n 1 N (2) 1 n X n X T 1 n 1 n Consider the Vector-AR(1) (VAR) process. 1 X T X n 1 n 1 N A Leads to Matrix Cookbook] Writing down the log-likelihood... [checking the matrix algebra tricks. Can we estimate the parameters? Yes, with a bit of (1) (3) X n + 1 = A X n + ε n + 1 , ε ∼ MVN ( 0 , Σ)
Briefly on multivariate models n n (2) n X n X T Consider the Vector-AR(1) (VAR) process. (3) matrix algebra tricks. (1) Can we estimate the parameters? Yes, with a bit of Matrix Cookbook] Leads to Writing down the log-likelihood... [checking the X n + 1 = A X n + ε n + 1 , ε ∼ MVN ( 0 , Σ) ) − 1 ( N − 1 ) ( N − 1 ˆ ∑ ∑ A = X n + 1 X T n = 1 n = 1 N − 1 ˆ ∑ Σ = ε n ˆ ˆ ε T n = 1
Motivation, partially observed models Eg. when the regressor dimension is larger than the observable state dimension (think stoch. vol). or interest rate models or credit models (hidden jump intensity process) Missing observations can be treated in this framework. ◮ Used when regressors are unobservable
Motivation, partially observed models the observable state dimension (think stoch. vol). or interest rate models or credit models (hidden jump intensity process) Missing observations can be treated in this framework. ◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than
Motivation, partially observed models the observable state dimension (think stoch. vol). or credit models (hidden jump intensity process) Missing observations can be treated in this framework. ◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than ◮ or interest rate models
Motivation, partially observed models the observable state dimension (think stoch. vol). Missing observations can be treated in this framework. ◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than ◮ or interest rate models ◮ or credit models (hidden jump intensity process)
Motivation, partially observed models the observable state dimension (think stoch. vol). framework. ◮ Used when regressors are unobservable ◮ Eg. when the regressor dimension is larger than ◮ or interest rate models ◮ or credit models (hidden jump intensity process) ◮ Missing observations can be treated in this
r t d t Examples d r t B t T r t A t T e P t T (6) r t d W t Short rate models (5) (4) (7) ◮ Stochastic volatility y t = σ t η t t = a 0 + a 1 log σ 2 t − 1 + e t log σ 2
Examples (4) (5) (6) (7) ◮ Stochastic volatility y t = σ t η t t = a 0 + a 1 log σ 2 t − 1 + e t log σ 2 ◮ Short rate models √ d r t = α ( β − r t ) d t + γ + δ r t d W t P ( t , T ) = A ( t , T ) e − B ( t , T ) r t
t is not directly observable 1 y 1 T y T d 1 T Stoch vol. 2 p y T p y 1 Likelihood but can be estimated. (9) Let us start with the stoch vol. model. t (8) y t Dependence structure? = σ t η t = a 0 + a 1 log σ 2 t − 1 + e t log σ 2
1 y 1 T y T d 1 T Stoch vol. Let us start with the stoch vol. model. p y T p y 1 Likelihood but can be estimated. (9) t (8) y t Dependence structure? = σ t η t = a 0 + a 1 log σ 2 t − 1 + e t log σ 2 ◮ σ 2 t is not directly observable
Stoch vol. t (9) Let us start with the stoch vol. model. Dependence structure? y t (8) = σ t η t = a 0 + a 1 log σ 2 t − 1 + e t log σ 2 ◮ σ 2 t is not directly observable ◮ but can be estimated. ◮ Likelihood ∫ p ( y 1 , . . . , y T ) = p ( σ 1 , y 1 , . . . , σ T , y T ) d σ 1 : T ?
Stoch vol. Let us start with the stoch vol. model. y t (8) (9) t = σ t η t = a 0 + a 1 log σ 2 t − 1 + e t log σ 2 ◮ σ 2 t is not directly observable ◮ but can be estimated. ◮ Likelihood ∫ p ( y 1 , . . . , y T ) = p ( σ 1 , y 1 , . . . , σ T , y T ) d σ 1 : T ? ◮ Dependence structure?
General state space models x t HMM) (11) All models we use can be written in general state complex models! (10) y t space form = h ( x t , η t ) = f ( x t − 1 , e t ) ◮ x is a hidden (unobservable) Markov process (cf. ◮ y is observed. ◮ y t | x t is independent of y s , s = 1 .. t − 1, t + 1 .. T . ◮ These rather simple structures can generate
Structure All models we use can be written in state space form y t (12) x t (13) model setup. = h ( x t , η t ) = f ( x t − 1 , e t ) ◮ These equations imply transition probabilities, i.e. we can derive p ( x t | x t − 1 ) and p ( y t | x t ) from the ◮ We also need p ( x 0 ) , i.e. initial conditions.
p y t y 1 t p y t x t p x t y 1 t 1 d x t p x t y 1 t p x t x t 1 p x t 1 y 1 t 1 d x t Likelihood 1 and 1 The likelihood can be written as We can write T 1 ∏ p ( y 1 , . . . , y T ) = p ( y 1 ) p ( y t | y 1 : t − 1 ) , t = 2 where y 1 : t − 1 is shorthand notation for { y 1 , . . . , y t − 1 } .
Likelihood The likelihood can be written as T We can write and ∏ p ( y 1 , . . . , y T ) = p ( y 1 ) p ( y t | y 1 : t − 1 ) , t = 2 where y 1 : t − 1 is shorthand notation for { y 1 , . . . , y t − 1 } . ∫ p ( y t | y 1 : t − 1 ) = p ( y t | x t ) p ( x t | y 1 : t − 1 ) d x t ∫ p ( x t | y 1 : t − 1 ) = p ( x t | x t − 1 ) p ( x t − 1 | y 1 : t − 1 ) d x t − 1
p y t x t p x t y 1 t p x t y 1 t p y t y 1 t p y t x t p x t y 1 t p x t y 1 t p y t x t p x t y 1 t 1 d x t Filter density We can derive the filter density from 1 1 or equivalently 1 ◮ The density for the hidden state x t , using the information y 1 : t is called the filter density , p ( x t | y 1 : t ) .
or equivalently Filter density ◮ The density for the hidden state x t , using the information y 1 : t is called the filter density , p ( x t | y 1 : t ) . ◮ We can derive the filter density from p ( x t | y 1 : t ) = p ( y t | x t ) p ( x t | y 1 : t − 1 ) . p ( y t | y 1 : t − 1 ) p ( y t | x t ) p ( x t | y 1 : t − 1 ) p ( x t | y 1 : t ) = . ∫ p ( y t | x t ) p ( x t | y 1 : t − 1 ) d x t
p x t y 1 t p x t x t 1 p x t 1 y 1 t 1 dx t Predictive density We can derive the predictive density from 1 1 ◮ The density for the hidden state x t , using the information y 1 : t − 1 is called the predictive density , p ( x t | y 1 : t − 1 ) .
Predictive density ◮ The density for the hidden state x t , using the information y 1 : t − 1 is called the predictive density , p ( x t | y 1 : t − 1 ) . ◮ We can derive the predictive density from ∫ p ( x t | y 1 : t − 1 ) = p ( x t | x t − 1 ) p ( x t − 1 | y 1 : t − 1 ) dx t − 1 .
Recursion 2. At time t , generate the predictive density 2. 1. We have the filter density p ( x 0 ) at time 0. p ( x t + 1 | y 1 : t ) . 3. At time t + 1, calculate p ( y t | y 1 : t − 1 ) and update the filter density p ( x t + 1 | y 1 : t + 1 ) . Repeat from step
Working recursions Essentially there are only 2 recursions known in closed form ◮ HMM (finite state space) ◮ Kalman filter (linear, Gaussian models)
x 0 m 0 P 0 Kalman filter Why does it give closed form recursions? Short answer: The Gaussian density is an exponential, second order polynomial. Model: Y t X t [1] Assume initial distribution p x 0 0 = CX t + η t , η t ∈ N ( 0 , Γ) = AX t − 1 + e t , e t ∈ N ( 0 , Σ)
Kalman filter Why does it give closed form recursions? Short answer: The Gaussian density is an exponential, second order polynomial. Model: Y t X t [1] Assume initial distribution = CX t + η t , η t ∈ N ( 0 , Γ) = AX t − 1 + e t , e t ∈ N ( 0 , Σ) p ( x 0 |F 0 ) = φ ( x 0 ; m 0 , P 0 )
x 1 Am 0 AP 0 A T Kalman filter [2] Calculate the predictive density Some calculations give ∫ p ( x 1 |F 0 ) = p ( x 1 | x 0 ) p ( x 0 |F 0 ) dx 0 Here p ( x 1 | x 0 ) = φ ( x 1 ; Ax 0 , Σ) , thus giving ∫ e − 1 2 ( x 1 − Ax 0 ) T Σ − 1 ( x 1 − Ax 0 ) e − 1 2 ( x 0 − m 0 ) T P − 1 0 ( x 0 − m 0 ) dx 0 . ∝
Kalman filter [2] Calculate the predictive density Some calculations give ∫ p ( x 1 |F 0 ) = p ( x 1 | x 0 ) p ( x 0 |F 0 ) dx 0 Here p ( x 1 | x 0 ) = φ ( x 1 ; Ax 0 , Σ) , thus giving ∫ e − 1 2 ( x 1 − Ax 0 ) T Σ − 1 ( x 1 − Ax 0 ) e − 1 2 ( x 0 − m 0 ) T P − 1 0 ( x 0 − m 0 ) dx 0 . ∝ = φ ( x 1 ; Am 0 , AP 0 A T + Σ) .
Kalman filter [3] The filter density is more complicated. We have Thus Note that the likelihood is a normalization, and independent of x 1 . p ( y t | x t ) p ( x t | y 1 : t − 1 ) p ( x t | y 1 : t ) = . ∫ p ( y t | x t ) p ( x t | y 1 : t − 1 ) d x t p ( x 1 | y 1 ) = p ( y 1 | x 1 ) p ( x 1 |F 0 ) ∝ p ( y 1 | x 1 ) p ( x 1 |F 0 ) . p ( y 1 |F 0 )
Kalman filter [3] The filter density is more complicated. We have Thus Note that the likelihood is a normalization, and independent of x 1 . p ( y t | x t ) p ( x t | y 1 : t − 1 ) p ( x t | y 1 : t ) = . ∫ p ( y t | x t ) p ( x t | y 1 : t − 1 ) d x t p ( x 1 | y 1 ) = p ( y 1 | x 1 ) p ( x 1 |F 0 ) ∝ p ( y 1 | x 1 ) p ( x 1 |F 0 ) . p ( y 1 |F 0 )
Recommend
More recommend