A General Class of Score-Driven Smoothers Giuseppe Buccheri Scuola Normale Superiore Joint work with Giacomo Bormetti a , Fulvio Corsi b and Fabrizio Lillo a University of Bologna a , University of Pisa and City University of London b IAAE 2018 Montr´ eal June 27, 2018
Key facts Following Cox (1981), we divide time-varying parameter models into two classes: 1. Parameter-driven models: parameters evolve in time based on idiosyncratic innovations (e.g. local level, stochastic volatility, stochastic intensity, etc) 2. Observation-driven models: parameters evolve in time based on nonlinear functions of past observations (e.g. GARCH, MEM, DCC, Score-Driven models) We shall see that there is a trade-off between: 1. Estimation complexity and computational speed ◮ Here observation-driven models are superior 2. Flexibility ◮ Here parameter-driven models are superior Why a difference in flexibility? ◮ Observation-driven: Var[ f t +1 |F t ] = 0 but Var[ f t +1 ] > 0 ◮ Parameter-driven: Var[ f t +1 |F t ] > 0 and Var[ f t +1 ] > 0
A different interpretation Consider a standard GARCH(1,1) model: r t = σ t ǫ t , ǫ t ∼ N(0 , 1) σ 2 t +1 = c + ar 2 t + b σ 2 t There are two possible interpretations for the dynamic model σ 2 t +1 : 1. It is the true DGP of volatility 2. Since σ 2 t +1 is F t -measurable, it can be seen as a filter, i.e. σ 2 t +1 = E[ ζ 2 t +1 |F t ], where ζ t +1 is the volatility of the true, parameter-driven, DGP (e.g. the SV model) Assumption 1 is more common in the financial econometrics literature while 2 is closer to the filtering literature.
An example: ARCH filtering and smoothing ◮ ”Filtering and forecasting with misspecified ARCH models I. Getting the right variance with the wrong model” . Nelson (1992), JoE ◮ ”Asymptotic filtering theory for univariate ARCH models” . Nelson & Foster (1994), Ecta ◮ ”Filtering and forecasting with misspecified ARCH models II. Making the right forecast with the wrong model” . Nelson & Foster (1995), JoE ◮ ”Asymptotically Optimal Smoothing with Arch Models” . Nelson (1996), Ecta Quoting Nelson (1992): ”Note that our use of the term ‘estimate’ corresponds to its use in the filtering literature rather than the statistics literature; that is, an ARCH model with (given) fixed parameters produces ‘estimates’ of the true underlying conditional covariance matrix at each point in time in the same sense that a Kalman filter produces ‘estimates’ of unobserved state variables in a linear system”
Motivations and Objectives A key observation ◮ Observation-driven models as DGP’s − → all relevant information is contained in past observations − → no room for smoothing ◮ Observation-driven models as filters − → can benefit from using all observations − → smoothing is useful Related literature ◮ Little attention has been paid to the problem of smoothing with misspecified observation-driven models. Harvey (2013) proposed a smoothing algorithm for a dynamic Student t location model. Objective of this paper ◮ Filling the gap by proposing a methodology to smooth filtered estimates of a general class of observation-driven models, namely Score-Driven models of Creal et al. (2013) and Harvey (2013)
Filtering and smoothing in linear Gaussian models Consider the general linear Gaussian model: y t = Z α t + ǫ t , ǫ t ∼ N(0 , H ) α t +1 = c + T α t + η t , η t ∼ N(0 , Q ) Kalman forward filter → a t +1 = E[ α t +1 |F t ], P t +1 = Var[ α t +1 |F t ] F t = ZP t Z ′ + H v t = y t − Za t , P t +1 = TP t ( T − K t Z ) ′ + Q a t +1 = c + Ta t + K t v t , K t = TP t Z ′ F − 1 and t = 1 , . . . , n t α t = E[ α t |F T ], ˆ Kalman backward smoother → ˆ P t = Var[ α t |F T ], t ≤ n . r t − 1 = Z ′ F − 1 v t + L ′ N t − 1 = Z ′ F − 1 Z + L ′ t r t , t N t L t t t ˆ α t = a t + P t r t − 1 , ˆ P t = P t − P t N t − 1 P t L t = T − K t Z , r n = 0, N n = 0 and t = n , . . . , 1. t F − 1 ◮ The conditional density is written as log p ( y t |F t − 1 ) = − 1 2 log | F t | − 1 2 v ′ v t t ◮ As Z , H , T , Q are constant, the variance recursion has a fixed point solution ¯ P that is referred to as the steady state of the Kalman filter
A more general representation Introduce the score and information matrix of the conditional density: � ′ � ∂ log p ( y t |F t − 1 ) I t | t − 1 = E t − 1 [ ∇ t ∇ ′ ∇ t = , t ] ∂ a ′ t After some simple algebra, we can re-write Kalman filter and smoothing recursions for the mean in the steady state as: a t +1 = c + Ta t + R ∇ t , (1) where R = T ¯ P , and: r t − 1 = ∇ t + L ′ t r t (2) α t = a t + T − 1 Rr t − 1 ˆ (3) where L t = T − R I t | t − 1 . ◮ Kalman recursions for the mean re-parametrized in terms of ∇ t and I t | t − 1 ◮ The new representation is more general, as it only relies on the conditional density p ( y t |F t − 1 ), which is defined for any observation-driven model. ◮ The Kalman filter is a score-driven process
The Score-driven Smoother (SDS) ◮ Based on eq. (1), score-driven models can be viewed as approximate filters for nonlinear non-Gaussian state-space models ◮ By analogy, we can regard eq. (2), (3) as an approximate smoother for nonlinear non-Gaussian models Assume y t |F t − 1 ∼ p ( y t | f t , Θ), where f t is a vector of t.v.p. and Θ collects all static parameters. In score-driven models: f t +1 = ω + As t + Bf t (4) where s t = S t ∇ t , ∇ t = ∂ log p ( y t | f t , Θ) and S t = I − α t | t − 1 , α ∈ [0 , 1]. We generalize eq. (2), ∂ f t (3) as: r t − 1 = s t + ( B − AS t I t | t − 1 ) ′ r t (5) ˆ f t = f t + B − 1 Ar t − 1 (6) t = n , . . . , 1, r n = 0. We name the smoother (5), (6) “Score-Driven Smoother” (SDS). It has same structure as Kalman backward smoothing recursions but uses the score of the non-Gaussian density and it is nonlinear in the observations
SDS methodology y t |F t − 1 ∼ p ( y t | f t , Θ) f t +1 = ω + As t + Bf t 1. Estimation of static parameters: n � ˜ Θ = argmax log p ( y t | f t , Θ) Θ t =1 2. Forward filter: ω + ˜ As t + ˜ f t +1 = ˜ Bf t 3. Backward smoother: r t − 1 = s t + (˜ B − ˜ AS t I t | t − 1 ) ′ r t B − 1 ˜ ˆ f t = f t + ˜ Ar t − 1 ◮ SDS is computationally simple (maximization of closed-form likelihood + forward/smoothing recursion) ◮ SDS is general, in that it can handle any observation density p ( y t | f t , Θ), with a potentially large number of time-varying parameters
Example: GARCH-SDS Consider the model: y t = σ t ǫ t , ǫ t ∼ NID(0 , 1) The predictive density is thus: y 2 t 1 − 2 σ 2 p ( y t | σ 2 t ) = √ 2 πσ t e t t and S t = I − 1 Setting f t = σ 2 t | t − 1 , eq. (4) reduces to: f t +1 = ω + a ( y 2 t − f t ) + bf t i.e. the standard GARCH(1,1) model. The smoothing recursions (5), (6) reduce to: r t − 1 = y 2 t − f t + ( b − a ) r t ˆ f t = f t + b − 1 ar t − 1
Example: GARCH-SDS 0.4 0.5 0.3 0.4 0.2 0.3 0.1 0.2 0 1000 2000 3000 4000 0 1000 2000 3000 4000 1 0.4 0.8 0.6 0.3 0.4 0.2 0.2 0 0.1 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Figure: Filtered (blue dotted), and smoothed (red) estimates of GARCH(1,1) model
Other examples ◮ MEM (Engle, 2002) y t = µ t ǫ t , ǫ t ∼ Gamma( α ) where Gamma( α ) = Γ( α ) − 1 ǫ α − 1 α α e − αǫ t t ◮ AR(1) with a time-varying coefficient ǫ t ∼ N(0 , q 2 ) y t = c + α t y t − 1 + ǫ t , ◮ Wishart-GARCH (Gorgi et al. 2018) r t |F t − 1 ∼ N k (0 , V t ) X t |F t − 1 ∼ W k ( V t /ν, ν ) where N k (0 , V t ) is a multivariate zero-mean normal distribution with covariance matrix V t and W k ( V t /ν, ν ) is a Wishart distribution with mean V t and degrees of freedom ν ≥ p
Other example: MEM-SDS 0.6 0.4 0.5 0.3 0.4 0.3 0.2 0.2 0.1 0.1 0 1000 2000 3000 4000 0 1000 2000 3000 4000 1 0.4 0.8 0.3 0.6 0.2 0.4 0.1 0.2 0 0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Figure: Filtered (blue dotted), and smoothed (red) estimates of MEM(1,1) model
Other example: t.v. AR(1)-SDS 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0.8 1 0.6 0.8 0.4 0.2 0.6 0 0.4 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Figure: Filtered (blue dotted), and smoothed (red) estimates of autoregressive coefficient of AR(1) model
Other example: Wishart-GARCH-SDS 10 -3 V(1,1) 10 -3 V(1,2) 3 3 Simulated X(1,1) Simulated V(1,1) Wishart-GARCH 2 2.5 Wishart-GARCH-SDS 1 2 0 1.5 -1 1 -2 Simulated X(1,2) 0.5 -3 Simulated V(1,2) Wishart-GARCH Wishart-GARCH-SDS 0 -4 0 100 200 300 400 0 100 200 300 400 Figure: Comparison among simulated observations of X t (grey lines), simulated true covariances V t (black lines), filtered (blue dotted lines) and smoothed (red lines) (co)variances of realized Wishart-GARCH model in the case k = 5. We show the variance corresponding to the first asset on the left and the covariance between the first and the second asset on the right.
Recommend
More recommend