strong consistency of the aic bic c p and koo methods in
play

Strong Consistency of the AIC, BIC, C p and KOO Methods in - PowerPoint PPT Presentation

Strong Consistency of the AIC, BIC, C p and KOO Methods in High-Dimensional-Response Regression Jiang Hu (Joint work with Zhidong Bai and Yasunori Fujikoshi ) Northeast Normal University, P. R. China Hiroshima University, Japan


  1. Strong Consistency of the AIC, BIC, C p and KOO Methods in High-Dimensional-Response Regression Jiang Hu ∗ (Joint work with Zhidong Bai ∗ and Yasunori Fujikoshi † ) ∗ Northeast Normal University, P. R. China † Hiroshima University, Japan December, 2019 Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 1 / 38

  2. Outline Model selection 1 Linear regression model Classical selection criteria Asymptotic properties 2 Low-dimensional Large-dimension and small-model Main results 3 Assumptions and notations Strong consistency of AIC, BIC and C p KOO methods based on the AIC, BIC, and C p General KOO methods Proof strategy 4 Simulation 5 Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 2 / 38

  3. Outline Model selection 1 Linear regression model Classical selection criteria Asymptotic properties 2 Low-dimensional Large-dimension and small-model Main results 3 Assumptions and notations Strong consistency of AIC, BIC and C p KOO methods based on the AIC, BIC, and C p General KOO methods Proof strategy 4 Simulation 5 Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 3 / 38

  4. Linear regression model Consider the multi-response linear regression model: 1 × p · Σ 1 / 2 y = x 1 × k · Θ k × p + e (1) p × p 1 × p Aim: find the TRUE model if it exits. References: [1] Miller ALan. Subset Selection in Regression, Second Edition. Chapman and Hall/CRC, 2002. [2] Gerda Claeskens, Nils Lid Hjort. Model Selection and Model Averaging. Vol. 330. Cambridge University Press Cambridge, 2008. Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 4 / 38

  5. Overview of classical model selection criteria From the point of view of statistical performance of a method, and intended context of its use, there are only two distinct classes of methods: labeled efficient and consistent. Generally there are two main approaches: (I) Optimization of some selection criteria; (1) Criteria based on some form of mean squared error (e.g., Mallows’s C p , Mallows 1973) or mean squared prediction error (e.g., PRESS, Allen 1970); (2) Criteria that are estimates of Kullback-Leibler (K-L) information or distance (e.g., AIC, AICc, and QAICc ); (3) Criteria that are consistent estimators of the “true model” (e.g., BIC). (II) Tests of hypotheses. Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 5 / 38

  6. Notation Observations: Y : n × p and X ω = ( x 1 , . . . , x k ) : n × k . Notations: ω = { 1 , . . . , k } , j ∗ ∈ ω , j ∈ ω , k j = the cardinality of j . Full model ω : Y = X ω · Θ ω + E · Σ 1 / 2 . True model j ∗ : Y = X j ∗ · Θ j ∗ + E · Σ 1 / 2 . Candidate model j : Y = X j · Θ j + E · Σ 1 / 2 . Θ j = ( θ ji , j ∈ j , i = 1 , . . . , p ) X j = ( x j , j ∈ j ) P j = X j ( X ′ j X j ) − 1 X ′ j Σ j = n − 1 Y ′ ( I n − P j ) Y � Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 6 / 38

  7. Classical selection criteria Akaike’s information criterion (AIC, Akaike (1973,1974)): AIC j = n log | � Σ j | + 2 k j p and ˆ j A = arg min AIC j Key: Kullback-Leibler information/distance Kullback-Leibler Information Kullback-Leibler information between density functions f and g is defined for continuous functions � � f ( x ) � I ( f, g ) = f ( x ) log dx. g ( x ) The notation I ( f, g ) denotes the “information lost when g is used to approximate f .” As a heuristic interpretation, I ( f, g ) is the distance from g to f . Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 7 / 38

  8. Classical selection criteria Akaike’s information criterion (AIC, Akaike (1973,1974)): AIC j = n log | � Σ j | + 2 k j p and ˆ j A = arg min AIC j Key: Kullback-Leibler information/distance Kullback-Leibler Information Kullback-Leibler information between density functions f and g is defined for continuous functions � � f ( x ) � I ( f, g ) = f ( x ) log dx. g ( x ) The notation I ( f, g ) denotes the “information lost when g is used to approximate f .” As a heuristic interpretation, I ( f, g ) is the distance from g to f . Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 7 / 38

  9. Classical selection criteria Bayesian information criterion (BIC, Schwarz (1978), Akaike (1977, 1978)) : Σ j | + log( n ) k j p and ˆ BIC j = n log | � j B = arg min BIC j Key: Consistence Consistence As n → ∞ , under some conditions, ˆ j B → j ∗ almost surely. Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 8 / 38

  10. Classical selection criteria Bayesian information criterion (BIC, Schwarz (1978), Akaike (1977, 1978)) : Σ j | + log( n ) k j p and ˆ BIC j = n log | � j B = arg min BIC j Key: Consistence Consistence As n → ∞ , under some conditions, ˆ j B → j ∗ almost surely. Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 8 / 38

  11. Classical selection criteria Mallows’s C p ( C p , Mallows (1973)): C p j = ( n − k )tr( � Σ − 1 ω � Σ j ) + 2 pk j and ˆ j C = arg min C p j Key: Mean squared error Remark 1 Atilgan (1996) provides a relationship between AIC and Mallows’s C p , shows that under some conditions AIC selection behaves like minimum mean squared error selection, and notes that AIC and C p are somewhat equivalent criteria. Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 9 / 38

  12. Outline Model selection 1 Linear regression model Classical selection criteria Asymptotic properties 2 Low-dimensional Large-dimension and small-model Main results 3 Assumptions and notations Strong consistency of AIC, BIC and C p KOO methods based on the AIC, BIC, and C p General KOO methods Proof strategy 4 Simulation 5 Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 10 / 38

  13. Low-dimensional Assume k and p are fixed (Fujikoshi, 1985; Fujikoshi and Veitch, 1979). If j is an over-specified model, i.e., j ∗ ⊂ j , P ( AIC j − AIC j ∗ < 0) ∼ P ( χ 2 k j − k j ∗ > 2( k j − k j ∗ )) > 0 P ( BIC j − BIC j ∗ < 0) ∼ P ( χ 2 k j − k j ∗ > log( n )( k j − k j ∗ )) → 0 P ( C p j − C p j ∗ < 0) ∼ P ( χ 2 k j − k j ∗ > 2( k j − k j ∗ )) > 0 If j is an under-specified model, i.e., j ∗ �⊂ j , AIC j − AIC j ∗ = O ( n ) → + ∞ BIC j − BIC j ∗ = O ( n ) → + ∞ C p j − C p j ∗ = O ( n ) → + ∞ Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 11 / 38

  14. Large-dimension and small-model Assume j ∗ ∈ ω is the true model, k is fixed and p/n → c ∈ (0 , 1) . Theorem 4.1 in (Fujikoshi et al., 2014) If c ∈ (0 , c a ≈ 0 . 797) where log(1 − c a ) + 2 c a = 0 and for any j ∗ �⊂ j with k j − k j ∗ ≤ 0 , lim log( | I + Φ j | ) > ( k j ∗ − k j )[2 c + log(1 − c )] n Σ − 1 j ∗ ( P ω − P j ) X j ∗ Θ j ∗ Σ − 1 where Φ j = 1 2 Θ ′ j ∗ X ′ 2 . Then, p/n → c P (ˆ lim j A = j ∗ ) = 1 . Otherwise, p/n → c P (ˆ lim j A = j ∗ ) � = 1 . What about BIC? Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 12 / 38

  15. Large-dimension and small-model Assume j ∗ ∈ ω is the true model, k is fixed and p/n → c ∈ (0 , 1) . Theorem 4.1 in (Fujikoshi et al., 2014) If c ∈ (0 , c a ≈ 0 . 797) where log(1 − c a ) + 2 c a = 0 and for any j ∗ �⊂ j with k j − k j ∗ ≤ 0 , lim log( | I + Φ j | ) > ( k j ∗ − k j )[2 c + log(1 − c )] n Σ − 1 j ∗ ( P ω − P j ) X j ∗ Θ j ∗ Σ − 1 where Φ j = 1 2 Θ ′ j ∗ X ′ 2 . Then, p/n → c P (ˆ lim j A = j ∗ ) = 1 . Otherwise, p/n → c P (ˆ lim j A = j ∗ ) � = 1 . What about BIC? Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 12 / 38

  16. Large-dimension and small-model Assume j ∗ ∈ ω is the true model, k is fixed and p/n → c ∈ (0 , 1) . Theorem 4.1 in (Fujikoshi et al., 2014) If c ∈ (0 , 1 / 2) and for any j ∗ �⊂ j with k j − k j ∗ ≤ 0 , tr(Φ j ) > ( k j ∗ − k j ) c (1 − 2 c ) n Σ − 1 j ∗ ( P ω − P j ) X j ∗ Θ j ∗ Σ − 1 where Φ j = 1 2 Θ ′ j ∗ X ′ 2 . Then, p/n → c P (ˆ lim j C = j ∗ ) = 1 . Otherwise, p/n → c P (ˆ lim j C = j ∗ ) � = 1 . Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 13 / 38

  17. Outline Model selection 1 Linear regression model Classical selection criteria Asymptotic properties 2 Low-dimensional Large-dimension and small-model Main results 3 Assumptions and notations Strong consistency of AIC, BIC and C p KOO methods based on the AIC, BIC, and C p General KOO methods Proof strategy 4 Simulation 5 Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 14 / 38

  18. Assumptions and notations A1: The true model j ∗ is a subset of set ω and k ∗ := k j ∗ is fixed. A2: E = { e ij } are i.i.d. with zero means, unit variances and ✿✿✿✿✿ finite ✿✿✿✿✿✿ fourth moments. ✿✿✿✿✿✿✿✿ A3: X ′ X is (non-random) positive definite uniformly. A4: As { k, p, n } → ∞ , p/n → c ∈ (0 , 1) , k/n → α ∈ [0 , 1 − c ) . n Σ − 1 j ∗ X j ∗ Θ j ∗ Σ − 1 A5: � Φ � := � 1 2 Θ ′ j ∗ X ′ 2 � is bounded uniformly. A5’: As { k, p, n } → ∞ , n Σ − 1 j ∗ ( P ω − P j ) X j ∗ Θ j ∗ Σ − 1 � Φ j � := � 1 2 Θ ′ j ∗ X ′ 2 � → ∞ . Jiang Hu (NENU) AIC, BIC, C p and KOO Methods December, 2019 15 / 38

Recommend


More recommend