weight selection for a model weight selection for a model
play

Weight Selection for a Model Weight Selection for a Model Average - PowerPoint PPT Presentation

Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work


  1. Weight Selection for a Model Weight Selection for a Model Average Estimator Average Estimator Alan Wan Alan Wan City University of Hong Kong City University of Hong Kong (joint work with H. Liang and G. Zou Zou, University , University (joint work with H. Liang and G. of Rochester) of Rochester) 1

  2. • Model selection methods assume final model chosen in advance • Under ‐ reporting of variability and confidence intervals. • Papers on under ‐ reporting due to model selection: Danilov and Magnus (2004, J. of Econometrics) Leeb and P ö tscher (2006, Annals of Stat) Leeb and P ö tscher (2008, Econometric Theory) 2

  3. • Current paper – frequentist model averaging • Bayesian model averaging – very common – based on prior probabilities for potential models and priors for parameters. – Hoeting et al. (1999, Stat Science) • Frequentist model averaging – Hjort and Claeskens (2003, JASA) – Yuan and Yang (2005, JASA) – Leung and Barron (2006, IEEE Info Th.) – Hansen (2007, Econometrica) 3

  4. • Current paper motivated by Hansen (2007, Econometrica) • Hansen’s approach: Weights chosen by minimizing the Mallows criterion, equivalent to squared error in large samples. • Model framework: ( ) = θ + ε ε σ 2 ; ~ . . . 0 , y H i i d ( ) ( )( ) ( ) × × × × 1 1 1 n n P P n 4

  5. Hansen’s approach: •Order the regressors at the outset, X 1 , X 2 , X 3 , … X p •Estimate a set of nested models: = θ + ε ; y X 1 1 1 = θ + θ + ε ; y X X 1 1 2 2 2 � = θ + θ + + θ + ε � y X X X 1 1 2 2 p p 5

  6. • Let H p be an n × p ( ≤ P ) matrix comprising the 1 st p columns of H and ω p is the weight. • Hansen’s (MMA) estimator : ⎛ ′ ′ ⎞ − 1 P ( ) H H H y ∑ Θ ˆ = ω ⎜ p p p ⎟ m p ⎝ ⎠ 0 = 1 p 6

  7. •Mallows criterion: ( ) ( ) ′ ( ) ( ) ˆ ˆ ω = − θ − θ + σ ω 2 2 C y H y H k where ( ) P ′ ω = ω ω ω … , 1 2 and ( ) ω is the effective number of parameters k ( ) ˆ = ω ω arg min C • Dk 7

  8. Difficulties with Hansen’s (2007) approach: 1) explicit ordering of regressors 2) Estimation of nested models = θ + ε ; y X � 1 1 1 = θ + θ + ε ; y X X 1 1 2 2 2 � = θ + θ + + θ + ε � y X X X 1 1 2 2 p p ( cannot handle, for example, combining (X 1 , X 4 , X 8 ) and (X 1 , X 5 , X 7 ) 3) criterion based on asymptotic justification 8

  9. 9

  10. Alternative approach: = β + γ + ε ; y X Z ( )( ) ( )( ) ( ) ( ) × × × × × × 1 1 1 1 n n k k n m m n X : focus (required) regressors Z : auxillary regressors Framework follows Magnus and Durbin (1999, Econometrica) 10

  11. Choice of weights • when m = 1, Magnus (2002, Econometrics Journal) and Danilov (2005, Econometrics Journal) considered weight based on Laplace prior. • Our approach: select weights based on the MSE of the weighted average estimator. 11

  12. • With m auxiliary regressors in Z, there are 2 m models • Unrestricted estimators : ′ ˆ = − Q θ γ = 2 ˆ ; D Z My b b u u r • Fully restricted estimators: γ = ′ ′ = − 1 ˆ 0 ; ( ) b X X X y r r ( ) ( ) − − 1 ′ ′ ′ = = 1 where , , Q X X X ZD D Z MZ 2 ( ) ( ) − ′ ′ ′ ˆ = − θ = θ σ 1 2 and ~ , M I X X X X D Z My N I n m − θ = γ 1 with . D 12

  13. ( ) ≤ ≤ th m The 0 2 partially restricted estimators : i i ˆ ˆ = − θ γ = θ ˆ ; b b QW DW ( ) ( ) i r i i i { } − 1 ′ ′ = − = 2 where , , W I P P DS S D S S D i m i i i i i i and is a selection matrix of rank . S r i i 13

  14. • Traditional model selection chooses the “best” among the 2 m models. • Frequentist Model Average estimators: m m 2 2 = ∑ ∑ λ γ = λ γ ˆ ˆ , b b ( ) ; ( ) f i i f i i = = 1 1 i i m 2 ∑ λ = λ ≥ 1. where 0 and i i = 1 i ˆ λ = λ θ σ 2 ˆ ( , ). • Consider weights i i m = ∑ 2 λ • Write , W W i i = 1 i 14

  15. Theorem 3.1 { } ( ) { ( ) } ( ) ′ ( ) ( ) 2 − ˆ ′ ′ ˆ ˆ ˆ = σ − σ + − θ + Ψ θ σ + ψ θ σ 1 2 2 2 ˆ ˆ ˆ ˆ , , M S E b X X Q Q Q I W f m where σ 2 ˆ ( ) ∫ Ψ θ σ ˆ = − − σ − − − + − − − Ψ θ ˆ 2 2 ( )/ 2 1 ( )/ 2 1 n k m n k m ˆ ˆ ( , ) ( ) / 2 ( ) ( , ) n k m t t dt 1 0 and ⎧ ⎫ ( ) ⎪ m ⎪ 2 ∑ ′ ′ Ψ θ ˆ = + ∂ λ θ ˆ ∂ θ θ ˆ ˆ ⎨ ⎬ ( , ) ( , ) / . t Q W t W Q 1 i i ⎪ ⎪ ⎩ ⎭ = 1 i 15

  16. ( ) ( ) ( ) ( ) ( ) − ′ ′ ˆ ′ ˆ ˆ = σ − σ + θ − − θ 1 2 2 ˆ ˆ R b tr X X tr Q Q I W Q Q I W f m m { ( ) } + Ψ θ ˆ σ 2 ˆ 2 , tr ( ) ˆ R b One problem with minimizing is that f σ 2 ˆ ( ) ∫ ˆ − − − + − − − ˆ Ψ θ σ = − − σ Ψ θ 2 2 ( )/ 2 1 ( )/ 2 1 n k m n k m ˆ ˆ ( , ) ( ) / 2 ( ) ( , ) n k m t t dt 1 0 is complex. 16

  17. Solution : ( ) ˆ σ Ψ θ Replace by 2 ˆ , ( ) ˆ σ Ψ θ σ 2 2 ˆ ˆ , 1 where ⎧ ⎫ ( ) ⎪ ⎪ m 2 ∑ ′ ′ ˆ ˆ ˆ ˆ Ψ θ = + ∂ λ θ ∂ θ θ ⎨ ⎬ ( , ) ( , ) / . t Q W t W Q 1 i i ⎪ ⎪ ⎩ ⎭ , = 1 i { ( ) } { ( ) } . ˆ ˆ ψ θ σ = σ Ψ θ σ 2 2 2 ˆ ˆ , , since E E 1 17

  18. So, we have ( ) ( ) ( ) ( ) ( ) − ′ ′ ′ ˆ ˆ ˆ = σ − σ + θ − − θ 1 2 2 ˆ ˆ R b tr X X tr Q Q I W Q Q I W a f m m { ( ) } ˆ + Ψ θ σ 2 ˆ 2 , , tr ⎛ ⎞ m = ( ) 2 ∑ λ θ σ ˆ ˆ σ σ 2 2 2 c ⎜ c ⎟ ( , ) ˆ ˆ Write , ( ) / ( ) a a i i i i i ⎝ ⎠ = 1 i where are positive constants and c is a non ‐ positive ' s a i constant. S ‐ AIC (Buckland et al. (1997, Biometrics)): { ( ) } = − + = − n exp 1 ; a q c 2 i i ( ) − + = − = − n 1 2 q ; S ‐ BIC : a n c i 2 i S ‐ AICC (Hurvich and Tsai (1989, Biometrika)): { ( ) ( ) } − = − + − − = n exp 1 2 ; a n q n q c 18 2 i i i

  19. Recall that ( ) ( ( ) ) ⎧ ⎫ m 2 ∑ ′ ′ ˆ ˆ ˆ ˆ Ψ θ σ = + ∂ λ θ σ ∂ θ θ 2 2 ⎨ ⎬ ˆ ˆ , , . Q W W Q 1 i i ⎩ ⎭ = 1 i Now, ( ) ( ) ( ( ) ( ⎧ ⎫ ) ( ) m 2 ( ) ) ∑ − − 1 1 ˆ ˆ ˆ ˆ ∂ λ θ σ ∂ θ = λ θ σ σ − − λ θ σ σ 2 2 2 2 2 ⎨ ⎬ ˆ ˆ ˆ ˆ ˆ , 2 , , n c I W i i i m i i i ⎩ ⎭ = i 1 } ˆ × − θ ( ) , I W (*) m i 19

  20. ( ) , ψ θ ˆ σ 2 Putting (*) in we have ˆ , 1 ( ) ( ) ( ) ( ) − ′ ′ ′ ˆ = σ − σ + λ λ − σ λ λ 1 2 2 2 ˆ ˆ ˆ 4 R b tr X X tr Q Q L n c G a f ( ) ′ ′ + σ λ φ + σ λ 2 2 ˆ ˆ 2 4 , n c g where = = ( ) , ( ) , L l G g ij ij ( ) ( ) ′ ′ ˆ ˆ = θ − − θ , l I W Q Q I W ij m i m j ( ) ( ) − 1 ′ ′ ˆ ˆ = σ θ − θ = 2 … m ˆ , , 1 , 2 , g W Q Q I W i j ij j i m j and φ each be a 2 m × 1 vector with g consisting of the diagonal elemtns of G and the i th element of φ ( ) ′ = … m , 1 , 2 . be tr QW Q i 20 i

  21. Interesting special case Setting c = 0 and considering only mixing b u and b r , then minimization criterion leads to ⎧ ⎫ ′ ′ σ σ 2 2 ˆ ˆ ( ) ( ) tr Q Q tr Q Q = − + ⎨ ⎬ 1 , b b b − − js u r 2 2 || || || || ⎩ ⎭ b b b b u r u r i.e. James and Stein estimator !! 21

  22. Optimal predictor ˆ μ = H θ ˆ , Let f f ( ) ′ ′ θ ˆ = γ where ˆ , b f f f ( ) m 2 ∑ ˆ γ = λ θ σ γ 2 ˆ ˆ ˆ , and is the ( ) f i i = 1 i estimator of γ corresponding to b f 22

  23. ( ) ( ) ⎛ ( ) ⎞ ⎛ ( ) ⎞ ′ ′ − ˆ ′ ′ ˆ ˆ μ = σ − ϕ θ σ − ϕ θ σ 1 ⎜ ⎟ ⎜ ⎟ 2 2 2 ˆ ˆ ˆ ˆ , , , , , , M S E X X X X XQ XQ XQ ZQ ⎝ ⎠ ⎝ ⎠ f ⎛ ( ) ⎞ ⎛ ( ) ⎟ ⎞ ′ ′ ˆ ˆ − ϕ θ σ + ϕ θ σ ⎜ ⎟ ⎜ 2 2 ˆ ˆ , , , , , , ZD XQ ZD ZD ⎝ ⎠ ⎝ ⎠ where { } ⊗ 2 ′ ϕ θ σ ˆ = − σ + − θ ˆ + Ξ θ σ ˆ + Ξ θ σ ˆ 2 2 2 2 ˆ ˆ ˆ ˆ ( , , , ) ( ) ( , ) { ( , )} , C C C C C I W C C C C C 1 2 1 2 1 2 1 2 1 2 m − − − − − − n k m n k m − + σ 2 − n k m ˆ 1 1 ∫ ˆ ˆ Ξ θ σ = σ Ξ θ 2 2 ˆ ˆ ( , ) ( ) 2 2 ( , ) t t dt 1 2 0 and ˆ ∂ λ θ m 2 ( , ) t ∑ ′ ˆ ˆ Ξ θ = + θ i ( , ) t W W 1 ˆ ∂ θ i = 1 i 23

Recommend


More recommend