lecture 3 estimation and model validation
play

Lecture 3, Estimation and model validation Erik Lindstrmm Maximum - PowerPoint PPT Presentation

Lecture 3, Estimation and model validation Erik Lindstrmm Maximum likelihood, recap (1) parametric class of models with known density some unknown parameter vector. defined as (2) argument, so this is equivalently written as Let x ( N


  1. Lecture 3, Estimation and model validation Erik Lindströmm

  2. Maximum likelihood, recap (1) parametric class of models with known density some unknown parameter vector. defined as (2) argument, so this is equivalently written as ◮ Let x ( N ) = ( x 1 , . . . , x N ) be a sample from some f X ( N ) ( x 1 , . . . , x n ; Θ) = L ( x ( N ) ; θ ) , where θ ∈ Θ is ◮ The Maximum Likelihood estimator (MLE) is ˆ L ( x ( N ) ; θ ) θ MLE = arg max θ ∈ Θ ◮ Taking logarithm does not change the ˆ ℓ ( x ( N ) ; θ ) θ MLE = arg max θ ∈ Θ with ℓ ( θ ) = log L ( x ( N ) ; θ ) .

  3. X N be an unbiased estimator of . It log L x N log L x N then holds that (5) 1 2 E (4) 1 E 1 N I F V T X N and the MLE attains this lower bound asymptotically. by Let T X 1 Theorem (Cramer-Rao) (3) N ◮ The asymptotic distribution for the MLE is given √ ( ) d ( N ( θ ) − 1 ) ˆ θ − θ → N 0 , I F

  4. and the MLE attains this lower bound asymptotically. (3) (5) N E (4) E by then holds that ◮ The asymptotic distribution for the MLE is given √ ( ) d ( N ( θ ) − 1 ) ˆ θ − θ → N 0 , I F ◮ Theorem (Cramer-Rao) Let T ( X 1 , . . . , X N ) be an unbiased estimator of θ . It ]) − 1 N ( θ ) − 1 = − ( [ ∇ θ ∇ θ log ( L ( x ( N ) ; θ )) V ( T ( X N )) ≥ I F , ) 2 ]) − 1 ( [( ∇ θ log ( L ( x ( N ) ; θ )) =

  5. Misspecified models What happens if the model is wrong? We look at two simple cases ◮ The model is too simple ◮ The model is too complex

  6. Too simple (6) (7) ◮ Assume that the data is given by ( θ ) Y = [ X Z ] + ϵ β ◮ While the model is given by Y = X θ + ϵ. ◮ What happens? Bias!

  7. Proof, model is too simple X T (11) bias + noise (10) X T X T Z X T X 1 X T X (9) X Z 1 Estimate is given (in matrix notation) by X T X OLS Plug the expression for Y into that equation (8) X T Y X T X Interpretation of the bias? ) − 1 ( ˆ θ OLS =

  8. Proof, model is too simple X T X (11) (10) X T X (9) Estimate is given (in matrix notation) by X T Interpretation of the bias? Plug the expression for Y into that equation (8) X T Y X T X ) − 1 ( ˆ θ OLS = ( ( θ ) ) ) − 1 ( ˆ θ OLS = [ X Z ] + ϵ β ) − 1 ( ( ) = X T X θ + X T Z β + X T ϵ = θ + bias + noise

  9. Model is too complex (12) (13) efficiency ◮ Assume that the data is given by Y = X θ + ϵ. ◮ While the model is given by ( θ ) Y = [ X Z ] + ϵ β ◮ What happens? No bias, but potentially poor

  10. Proof X T Z U (15) Long and tedious (on the blackboard) (14) Z T Z C Z T X V ◮ Estimates are given by ˆ ] − 1 [ X T X θ + X T ϵ ( θ ) [ X T X ] = β Z T X θ + Z T ϵ ◮ Use the Woodbury identity ] − 1 [ A − 1 + A − 1 U Ω − 1 VA − 1 − A − 1 U Ω − 1 [ A ] = − Ω − 1 VA − 1 Ω − 1 with Ω = ( C − VA − 1 U ) ◮ It then follow that θ is unbiased and E [ˆ β ] = 0.

  11. Examination of the data Before starting to do any estimation we should carefully look at the dataset. trade... explanatory variables? ◮ Is the data correct? Most orders never result in a ◮ Does the data contain outliers? ◮ Missing values? ◮ Do we have measurements of all relevant ◮ Timing errors?

  12. Model validation There are two types of validation. Absolute: Are the model assumptions fulfilled? Relative: Is the estimated model good enough, compared to some other model. Both can still be wrong...

  13. Model validation There are two types of validation. Absolute: Are the model assumptions fulfilled? Relative: Is the estimated model good enough, compared to some other model. Both can still be wrong...

  14. Model validation There are two types of validation. Absolute: Are the model assumptions fulfilled? Relative: Is the estimated model good enough, compared to some other model. Both can still be wrong...

  15. Absolute tests We have some external knowledge of data e.g. underlying physics (Gray box models). make sense. ◮ Looking at whether the estimated parameters ◮ Are effects going in the right directions? ◮ Do the parameters have reasonable values?

  16. such that E f e 2 E g e 2 such that E f e 2 E g u 2 Residuals Cov f e n some external signal used as explanatory where u is f g k 0 k g u n . No cross- dependence : f g k 0 k g e n Cov f e n No auto- dependence This implies: variable. The residuals { e } should be i.i.d. Why?

  17. such that E f e 2 E g u 2 Residuals 0 some external signal used as explanatory where u is f g k k g u n Cov f e n No cross- dependence : variable. The residuals { e } should be i.i.d. Why? This implies: ◮ No auto- dependence Cov ( f ( e n ) , g ( e n + k )) = 0 , ∀ k ∈ Z , ∀ f , g , such that E [ f ( e ) 2 ] < ∞ , E [ g ( e ) 2 ] < ∞ .

  18. some external signal used as explanatory Residuals variable. The residuals { e } should be i.i.d. Why? This implies: ◮ No auto- dependence Cov ( f ( e n ) , g ( e n + k )) = 0 , ∀ k ∈ Z , ∀ f , g , such that E [ f ( e ) 2 ] < ∞ , E [ g ( e ) 2 ] < ∞ . ◮ No cross- dependence : Cov ( f ( e n ) , g ( u n + k )) = 0 , ∀ k ∈ Z , ∀ f , g , such that E [ f ( e ) 2 ] < ∞ , E [ g ( u ) 2 ] < ∞ where u is

  19. Normalized prediction errors Residuals are usually normalized prediction errors This can in many cases also be generalized to SDE-models. e n = y n − E [ Y n |F n − 1 ] √ V ( Y n |F n − 1 ) .

  20. Formal tests p Signtest on residuals # of positive Bin N 1 2 . Number of changes of sign (Wald-Wolfowitz runs test) Resimulate the model from residuals. Can it reproduce data? ◮ Test for dependence in residuals (Box-Ljung). γ ( k ) 2 ∑ T = N ( N + 2 ) N − k . k = 1 Reject if T > χ 2 1 − α, p .

  21. Formal tests p runs test) reproduce data? ◮ Test for dependence in residuals (Box-Ljung). γ ( k ) 2 ∑ T = N ( N + 2 ) N − k . k = 1 Reject if T > χ 2 1 − α, p . ◮ Signtest on residuals # of positive ∈ Bin ( N , 1 / 2 ) . ◮ Number of changes of sign (Wald-Wolfowitz ◮ Resimulate the model from residuals. Can it

  22. Scatterplots of residuals remaining auto dependence. ◮ e n vs e n − 1 (autocorr). ◮ e n vs y n | n − 1 = E [ y n |F n − 1 ] prediction error- ◮ e n vs u n external dependence.

  23. A good example (a well estimated AR(1) process) SACF Normplot e n − 1 vs e n e n vs y n | n − 1 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 1.2 Normal Probability Plot 0.999 0.997 1 0.99 0.98 0.8 0.95 0.90 0.6 0.75 Probability 0.50 0.4 0.25 0.2 0.10 0.05 0.02 0 0.01 0.003 0.001 −0.2 0 2 4 6 8 10 12 14 16 18 20 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 lag Data

  24. An example of wrong order (an AR(2) model estimated with a AR(1) model) Normplot SACF e n − 1 vs e n e n vs y n | n − 1 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −1.5 −1 −0.5 0 0.5 1 1.2 Normal Probability Plot 0.999 0.997 1 0.99 0.98 0.8 0.95 0.90 0.6 0.75 Probability 0.50 0.4 0.25 0.2 0.10 0.05 0.02 0 0.01 0.003 0.001 −0.2 0 2 4 6 8 10 12 14 16 18 20 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 lag Data

  25. SACF An example of wrong model structure (a non-linear model Normplot estimated with a AR(1) model) e n − 1 vs e n e n vs y n | n − 1 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 1.2 Normal Probability Plot 0.999 0.997 1 0.99 0.98 0.8 0.95 0.90 0.6 0.75 Probability 0.50 0.4 0.25 0.2 0.10 0.05 0.02 0 0.01 0.003 0.001 −0.2 0 2 4 6 8 10 12 14 16 18 20 −1 −0.5 0 0.5 1 lag Data

  26. Overfitting Overfitting gives residuals that look good. Therefore it is important to test predictions also out of sample. ◮ Split data into an estimation and a validation set. ◮ Cross validation

  27. Example overfitting (ARMA(1,1) fitted with ARMA(3,3)) SACF in sample SACF out of sample e n − 1 vs e n in sample 4 1.2 3 1 0.8 2 0.6 1 0.4 0 0.2 −1 0 −2 −0.2 0 2 4 6 8 10 12 14 16 18 20 −3 lag −3 −2 −1 0 1 2 3 4 e n − 1 vs e n out of sample 1 3 2 0.5 1 0 0 −1 −2 −0.5 0 2 4 6 8 10 12 14 16 18 20 −3 lag −3 −2 −1 0 1 2 3

  28. Relative model validation Test if a larger model is necessary. 0 Hypothesis test: Wald, LM or LR. Wald: H 0 : θ ′ = θ ′ H 1 : θ ′ free . θ = ˆ θ ± λ α/ 2 d (ˆ θ ) I ˆ

  29. LR for Gaussian models i) ii) exact test for AR models. estimated model with n parameters from N observations. Let Q ( n ) be the sum of squared residuals for an Test n 1 vs n 2 parameters, then for true order n 0 ≤ n 1 < n 2 Q ( n 2 ) ∈ χ 2 ( N − n 2 ) . σ 2 Q ( n 1 ) − Q ( n 2 ) ∈ χ 2 ( n 2 − n 1 ) . σ 2 iii) Q ( n 2 ) and Q ( n 1 ) − Q ( n 2 ) are independent. iv) η = N − n 2 Q ( n 1 ) − Q ( n 2 ) ∈ F ( n 2 − n 1 , N − n 2 ) . n 2 − n 1 Q ( n 2 ) If η is large pick model 2 else pick model 1. This is an

Recommend


More recommend