Vienna WS 1 Second-Order Bias-Corrected AIC for Selecting Structural Equation Models Kentaro H AYASHI Department of Psychology , University of Hawaii at Manoa (E-mail: hayashik@hawaii.edu) AND Hirokazu Y ANAGIHARA Department of Mathematics , Hiroshima University (E-mail: yanagi@math.sci.hiroshima-u.ac.jp)
Vienna WS 2 Introduction We derive a second-order bias correction of Akaike Information Criterion (AIC) in structural equation models (SEM) under the normal assumption when the model is overspecified . Note : “Overspecified” means the candidate models ( f ’s) include the true model ( ϕ ). Contents: 1. Introduction (a) Structural Equation Models (SEM) (b) AIC and CV (Cross-Validation) Criterion
Vienna WS 3 2. General Theory (a) True and Candidate Models (b) Likelihood and MLE (c) Risk, Bias, and Information Criterion (d) Estimated Bias 3. Recent and Current Studies (a) Notations (Derivatives, Expectation of Moment Matrices, Estimates of Expected Moment Matrices, and Coefficients in Bias-Correction Terms) (b) Evaluating Bias of Information Criteria (c) Asymptotic Expansion of Expectation of Estimated “Beta” Term (d) Bias of AIC (Main Result of Current Study) (e) Useful Formulas in Obtaining the Coefficient Terms
Vienna WS 4 Structural Equation Models (SEM) References : Bollen (1989), Bartholomew and Knott (1999), Skrondal and Rabe-Hesketh (2004), Yuan and Bentler (2007) • SEM is one of the most frequently used multivariate techniques in social sciences. • SEM aims to express the covariance structure using relatively small Σ θ ( ) number of parameters. Notation:
Vienna WS 5 • The single most famous SEM is the confirmatory factor analysis (CFA) model: , = μ + Λ + ε y f where p × ): Observed variables, y ( 1 μ ( p × ): Population means, 1 Λ ( p × m ): Factor loadings (Path coefficients), m × ): Factors, f ( 1 ε ( p × ): 1 Errors. Note : CFA is a linear model but the factors ( f ) are latent (NOT observed) variables.
Vienna WS 6 • Typical assumptions: Errors are mutually uncorrelated, and factors ε ε j = Cov( , ) 0 and errors are uncorrelated. That is, and i f ε j = Cov( , ) 0 . i • The covariance structure of variables ( y ) is expressed as a function of: Λ ( p × m ): Factor loadings (Path coefficients), Φ ( m × m ): Factor correlations, and Ψ ( p × ): Error variances. p • That is, the covariance structure under CFA is expressed as: ′ Σ ΛΦΛ Ψ , θ = + ( ) ′ ′ ′ ′ ′ θ = λ φ ψ = θ θ ( , , ) ( ,..., ) where . 1 q
Vienna WS 7 AIC (Akaike Information Criterion) • When the candidate model is overspecified , AIC is the first-order bias-corrected estimator of the risk function based on the expected predictive Kullback-Leibler (KL) discrepancy between the true model and the candidate model. That is, O n − 1 = + E[AIC] R ( ) . KL • AIC tends to choose the model with many parameters as the best model (when the full model has too many parameters). • Reason : AIC tends to underestimate the bias when the candidate model has many parameters (because the bias term of AIC is derived based on the asymptotic theory of ˆ θ ).
Vienna WS 8 • The (negative) property of AIC that the candidate model having too many parameters is chosen as the best model tends to appear in the overspecified models.
Vienna WS 9 Cross-Validation (CV) • Even when the candidate model is misspecified, by correcting the bias of the cross validation (CV) criterion (Stone, 1974), the second-order bias-corrected estimators of the risk function have been proposed under the general condition (e.g., Yanagihara, Tonda and Matsumoto, 2006). That is, O n − 2 = + E[CCV] R ( ) . KL • However, the many computational tasks are need for obtaining bias-corrected criteria and these criteria have large variance.
Vienna WS 10 True and Candidate Models (General) y 1 ,..., y be p -dimensional random observation vectors, where n Let n is the sample size. • The true model: ϕ y y y M : ,..., ∼ i i d . . . ( ) , ϕ 1 n ϕ y is an unknown probability density function. ( ) where • The candidate model: y θ y y M : ,..., ∼ i i d f . . . ( | ) , f 1 n θ ′ = y θ θ ⊆ Θ θ = θ f F { ( | ); }, ( ,..., ) where . 1 q
Vienna WS 11 Candidate Model in SEM If the candidate model is specified, then Σ θ S N ∼ W N ( , ( )) p 0 ⇒ = + + Σ θ N S W � W , W ,..., W ∼ i i d W . . . (1, ( )), 1 N 1 N p 0 = − . N n 1 where W 1 ,..., W can be regarded as independent observations. Therefore, the N candidate model is: Σ θ M : W ,..., W ∼ i i d W . . . (1, ( )) . f 1 N p
Vienna WS 12 Likelihood and MLE (General) n = ∑ • Log-likelihood function : θ θ , L ( | Y ) log ( f y | ) i = i 1 n ′ = Y ( y ,..., y ) where . 1 • Maximum likelihood estimator (MLE) of θ : ˆ θ = θ arg max ( | L Y ) θ • Convergence of MLE in the misspecified model (White, 1982) : ⎡ ⎤ ∂ ˆ ⎢ ⎥ θ = θ θ = lim , E log ( | ) f y 0 ⎢ ⎥ 0 y ∂ θ →∞ n ⎣ ⎦ θ = θ 0 where E y denotes an expectation with respect to y under the true ϕ y . ( ) model
Vienna WS 13 Likelihood and MLE in SEM In SEM, the discrepancy function is: − − θ = − Σ 1 + Σ 1 − F ( | S ) log | S | tr ( S ) p KL N 1 ∑ ⇒ θ W F ( | ). KL i N = i 1 − θ = θ 2 ( | L S ) ( N F ) ( | S ) Therefore, the log-likelihood is: , KL and the MLE is: ˆ θ = θ arg max ( | L S ) . θ
Vienna WS 14 Risk Function, Bias, and Information Criterion • Risk function based on the expected predictive KL discrepancy : ⎡ ⎤ ˆ = − θ R E E 2 ( | L U ) , ⎢ ⎥ ⎣ ⎦ KL y u n ′ = × U ( u ,..., u ) is an n p where future observation matrix 1 (independent of Y ), and u is distributed according to the same i = y ( i 1,..., n distribution of ). i ⎡ ⎤ ˆ • Bias: = − − θ B R E 2 ( | L Y ) . ⎢ ⎥ ⎣ ⎦ KL y
Vienna WS 15 • Information criterion (IC): ˆ ˆ = − θ + IC 2 ( | L Y ) B , where ˆ B is a consistent estimator of B . Note : The ICs are specified by different terms of ˆ B .
Vienna WS 16 Estimated Bias in Information Criteria • AIC: ˆ = B 2 q . • TIC (Takeuchi information criterion; Takeuchi, 1976): ˆ ˆ ˆ ˆ ˆ − = θ θ 1 B 2tr{ ( ) ( ) } I J . n ∑ ˆ ˆ ˆ • CV: = − θ + θ B 2 log ( f y | ) 2 ( | L Y ) , − i [ i ] = i 1 θ is Jackknife estimator of θ defined by where [ − i ] ⎧ ⎫ ⎪ ⎪ n ⎪ ⎪ ∑ ˆ θ = θ argmax log ( f y | ) ⎨ ⎬ . − [ i ] ⎪ j ⎪ θ ⎪ ⎪ ⎩ ⎭ ≠ j i
Vienna WS 17 Notations A. Derivatives: ∂ θ = − ∂ θ g y ( | ) log ( | ) f y 1. First -order ( Gradient ): , θ 2 ∂ θ = − θ H y ( | ) log ( | ) f y 2. Second -order ( Hessian ): , ′ ∂ ∂ θ θ ⎛ ⎞ ∂ ∂ 2 ⎟ ⎜ θ = − ⊗ θ C y ( | ) log ( | ) f y ⎟ ⎜ 3. Third -order: , ⎟ ⎜ ′ ′ ⎝ ∂ θ ∂ ∂ θ θ ⎠ ⎛ ⎞ ∂ 2 ∂ 2 ⎟ ⎜ θ = − ⊗ θ Q y ( | ) log ( | ) f y ⎟ ⎜ 4. Fourth -order: , ⎟ ⎜ ′ ′ ⎝ ∂ ∂ θ θ ∂ ∂ θ θ ⎠ where ⊗ is the Kronecker product.
Vienna WS 18 B. Expectation of Moment Matrices ′ θ = θ θ I ( ) E [ ( | ) ( | ) ] g y g y 1. Information : , y θ = θ , J E H y ( ) [ ( | )] 2. Jacobian : y θ = θ , K E C y ( ) [ ( | )] 3. Expected third -order moment matrix: y θ = θ . L E Q y ( ) [ ( | )] 4. Expected fourth -order moment matrix: y
Vienna WS 19 C. Estimates of Expected Moment Matrices: n 1 = ∑ ˆ( ) ′ θ θ θ , I g y ( | ) ( g y | ) 1. Estimated Information: i i n = i 1 n 1 = ∑ ˆ( ) θ θ , J H y ( | ) 2. Estimated Jacobian: i n = i 1 n 1 = ∑ ˆ( ) θ θ , K C y ( | ) 3. Estimated 3rd-order moment: i n = i 1 n 1 = ∑ ˆ( ) θ θ . L Q y ( | ) 4. Estimated 4th-order moment: i n = i 1
Recommend
More recommend