Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions A comparison of estimators for regression models with change points Cathy WS Chen 1 , Jennifer SK Chan 2 , Richard Gerlach 2 , and William Hsieh 1 1 Feng Chia University, Taiwan 2 University of Sydney, Australia Forthcoming, Statistics and Computing 1/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Outline One involves jump discontinuities in a regression model and the other involves regression lines connected at unknown points. Four methods : Bayesian, Julious, grid search, and the segmented methods. The proposed methods are evaluated via a simulation study and compared via some standard measures of estimation bias and precision. Detection of structural breaks in a time-varying heteroskedastic regression model 2/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Regression models with change points Applications in many fields: demography, epidemiology, toxicology, ecology, economics, and finance. There are many terminologies: ”segmented” (Lerman 1980), ”broken-line” (Ulm 1991), ”structural change”, ”structural break” or ”smoothing transition”. 3/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Multiple change-point regression models p β (1) + β (1) β (1) 1 x i + � z il − 1 + ε i 1 , if x i ≤ r 1 , 0 l l =2 p β (2) + β (2) β (2) 1 x i + � z il − 1 + ε i 2 , if r 1 < x i ≤ r 2 , 0 l l =2 . . . . . . y i = p β ( k ) + β ( k ) β ( k ) � 1 x i + z il − 1 + ε ik , if r k − 1 < x i ≤ r k , 0 l l =2 . . . . . . p β ( K +1) + β ( K +1) β ( K +1) � x i + z il − 1 + ε i , K +1 , if r K < x i . 0 1 l l =2 r k , k = 1 , . . . , K , are change-point parameters for the regressor x , which satisfy r 1 < r 2 < . . . < r K 4/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Connected regression lines To enforce continuity, or connected regression lines, the regression parameters in (1) must be constrained so that β ( k ) + β ( k ) 1 r k = β ( k +1) + β ( k +1) r k for k = 1 , . . . , K . Then equation (1) 0 0 1 can be simplified and written as: K +1 � y i = β 0 + β ⋆ 1 x i + β ⋆ k ( x i − r k − 1 ) I ik + ε i , (2) k =2 β (1) 1 = β (1) k = β ( k ) − β ( k − 1) where β 0 = 0 , β ⋆ 1 , β ⋆ , k > 1 , 1 1 K +1 � ε i = I ik ε ik , I i 1 = I ( x i 1 ≤ r 1 ) , I ik = I ( r k − 1 < x i 1 ≤ r k ) , k > 1 , k =1 and I ( E ) is an indicator function for the event E . 5/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Related Papers The change point regression problem was initially described by Quandt (1958, 1960) and Chow (1960). Bayesian: Bacon and Watts (1971), Ferreira (1975), Smith and Cook (1980), Carlin, Gelfand, and Smith (1992), Stephens (1994) etc. Julious: Julious (2001) proposed a bootstrap method to conduct inference on the existence of the single change-point and parameter estimates. Segmented: Muggeo (2003), Muggeo (2008). Grid-search: Lerman (1980). 6/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Bayesian method Continuity is not enforced and thus dis-continuous regression lines are allowed. Prior setups: the same spirit as those in Chen and Lee (1995) β k as independent multivariate normals N ( β 0 k , V − 1 k ), 1 k = 1 , . . . , K + 1, and employ the conjugate priors for σ 2 2 k � ν k � 2 , ν k λ k σ 2 k ∼ IG k = 1 , . . . , K + 1 , , 2 In the three line case where K = 2, 3 r 1 ∼ U ( a 1 , b 1 ) ; r 2 | r 1 ∼ U ( a 2 , b 2 ) , 7/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions The conditional posterior distributions: k , V ∗− 1 β k is a multivariate normal N ( β ∗ ) where 1 k � − 1 � � � X T X T k X k k Y k β ∗ = + V k + V k β 0 k , k σ 2 σ 2 k k � � X T k X k and V ∗ = + V k , k = 1 , . . . , K + 1 . k σ 2 k , ν k λ k + n k s 2 � ν k + n k � k for σ 2 an inverse gamma IG k where 2 2 2 k ( Y k − ˆ Y k ) T ( Y k − ˆ Y k ) and ˆ k = n − 1 Y k = X T s 2 k β k and a nonstandard distribution for r , with density function 3 � K +1 � 1 f ( r | y , β , σ 2 ) � ( Y k − X T k β k ) T ( Y k − X T ∝ exp − k β k ) 2 σ 2 k k =1 K +1 � σ − n k × I ( B )( ) . k k =1 8/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Julious’ method (JRSSD) Julious (2001) proposes a search algorithm for a single unknown change point. The restriction - the regression function is continuous at the unknown change-point. Step 1 Set a and b as percentiles of x , ordered from lowest to highest, so that at least 100 h % of the sample data will be in each of the two regimes. Set the first set of two groups to be ( x 1 , y 1 ) , . . . , ( x k , y k ) and ( x k +1 , y k +1 ) , . . . , ( x n , y n ). Step 2 Fit the OLS regression line within each group separately. Save the restricted RSS value obtained and the parameter estimates, where the change-point estimate is x k . 9/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Julious’ method Step 3 Form the next (in order) set of two groups by removing the lowest x -valued ( x , y ) pair from group 2 and putting that pair into group 1. Step 4 Choose the optimal two-line parameter estimates and change-point estimate ˆ r as those which minimise the total restricted RSS across regimes, calculated in step 2. The final parameter estimates, are denoted as (ˆ β (1) 0 , ˆ β (1) , 1 β (2) ˆ 0 , ˆ β (2) σ 2 σ 2 1 ). Use these estimates to estimate ˆ 1 and ˆ 2 by the MSE in each regime, conditional on ˆ r . 10/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Segmented procedure: the regression function is continuous. Model parameters can be estimated iteratively via the following linear function of predictors β 0 + β ⋆ 1 x i 1 + β ⋆ 2 ( x i 1 − r 0 ) I ( x i 1 > r 0 ) − γ I ( x i 1 > r 0 ) , (3) where r 0 is an initial estimate for the change point and γ is a re-parameterization of r 0 (i) choose an initial change-point estimate r 0 ; 1 (ii) given the current (estimated) change-point r 0 , estimate 2 model (3) by Gaussian ML and update the change point via γ/ ˆ β ⋆ ˆ r = r 0 + ˆ 2 ; (iii) If ˆ γ is sufficiently close to zero then stop, otherwise set 3 r 0 = ˆ r and go to step (ii). Iterate steps (ii) and (iii) until termination. 11/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Grid-search Here continuous regression lines are not assumed or forced for this method. A common approach to estimate regression change points is to search over a grid, say from x l 1 to x u 1 which correspond to the p l and p u , ( p l < p u ) percentiles of x i 1 . The grid of M possible values for the change points is set as: where ∆ = x up , 1 − x low , 1 ψ m = x l 1 + ( m − 1)∆ , , (4) M − 1 and m = 1 , . . . , M . 12/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Aim Change-points Bayesian Julious Segmented Grid search Simulation Return prediction model Applications Conclusions Grid-search Conditional on ( r 1 , r 2 ) = ( ψ m 1 , ψ m 2 ), the density function for y i in the change-point regression model is f ( ·| θ m 1 , m 2 ) = f 1 ( ·| θ m 1 , m 2 ) I i 1 f 2 ( ·| θ m 1 , m 2 ) I i 2 f 3 ( ·| θ m 1 , m 2 ) I i 3 , where the f 1 , f 2 and f 3 are all Gaussian and the indicators are I i 1 = I ( x i 1 ≤ r 1 ), I i 2 = I ( r 1 < x i 1 ≤ r 2 ) and I i 3 = I ( x i 1 > r 2 ). Parameter estimates are given by θ m ∗ 2 which maximize 1 , m ∗ the log-likelihood function. The final estimates for the grid search method are the set (ˆ r 1 , ˆ r 2 ) and θ m ∗ 2 | (ˆ r 1 , ˆ r 2 ) = ( ψ m 1 , ψ m 2 ) that jointly 1 , m ∗ maximise the likelihood function across all considered values of ( r 1 , r 2 ). 13/41 Cathy Chen, COMPSTAT10 Computational Econometrics
Recommend
More recommend