Consequences of measurement error Psychology 588: Covariance structure and factor models
Scaling indeterminacy of latent variables 2 • Scale of a latent variable is arbitrary and “determined” by a convention for convenience • Typically set to variance of one (factor analysis convention) or to be identical to an arbitrarily chosen indicator’s scale By centering indicator variables, we set latent variables’ means to zero • Consider the following transformation: x j J a b b * , 1,..., , , 0 j j j j a j * j j j b b
If all J indicators are considered simultaneously, vector • notation is more convenient: x ν λ δ a b * , a 1 ν λ λ δ * b b meaning that the linear transformation of ξ can be exactly compensated in the accordingly transformed ν * = ν – λ a / b and λ * = λ / b , leaving the errors δ unchanged (i.e., same fit)
What’s great about measurement errors in equation 4 • Regression weights and correlations are interpreted, implicitly assuming that the “operationally defined” variables involve no measurement error --- hardly realized for theoretical constructs (e.g., self esteem, IQ, etc.) • Ignoring the measurement error will lead to inconsistent estimates • We will see consequences of ignoring measurement errors
Univariate consequences 5 Consider a mean-included equation for X (hours worked per • week) to indicate ξ (achievement motivation): X E E E , , 0, 0 E X X X 2 var var Given only one indicator per latent variable, the intercept and loading (i.e., weight) are simply scaling constants for ξ However, if the ξ scale is set comparable to the X scale (i.e., λ = 1 ), we see that var( X ) is an over-estimation of ϕ = var( ξ ) if δ is not included in the equation
Bivariate relation and simple regression 6 zeta • True data structure: gamma xi eta x 1 1 1 η : job satisfaction y y x y : satisfaction scale 2 e d • cov( x , y ) is unbiased estimate of cov( ξ , η ) with λ 1 = λ 2 =1, since no other variables ( δ and ε ) can explain cov( x , y ) cov , cov , x y cov , cov ,
• From the previous equations, and by analogy cov , with y = γ * x + ζ * if measurement errors are ignored, x y cov , * xx x var var The parenthesized ratio (reliability) becomes 1 only with no measurement error; otherwise, γ * is an attenuated estimate of γ and is an inconsistent estimator of γ s s * ˆ xy xx 2 , • If the bias of regression weight has an additional 1 * factor as --- but such scaling is unusual xx 2 1 when there is only one indicator per latent variable
• Correlations: 2 2 cov , 2 var var 2 x y 2 2 cov , 2 xy x y x y var var var var 2 var 2 xx yy x y var var var which shows an attenuation of the “true” correlation due to measurement error, with the familiar correction formula: xy xx yy
Consequences in multiple regression 9 • True data structure: 1 x1 d1 xi1 zeta g γ ξ 1 g2 1 x2 xi2 eta d2 1 x ξ δ 3 g 1 y xi3 x3 d3 y e with Λ x = I and λ y = 1 y γ x * * • Ignoring measurement errors: σ ξ ξ ξ γ Φγ cov , cov , • σ x y ξ δ ξ γ Φγ cov , cov , xy
γ Φ σ y γ x 1 * • * and by analogy with , 1 γ Σ σ Σ Φγ Φ Θ Φγ 1 1 * xx xy xx * γ γ Without measurement error ( Θ δ = 0 ), * γ γ ; otherwise, x Σ Φ γ Σ Σ γ 1 * • Alternatively written: since --- where x xx is the OLS estimator of B in i.e., ξ Bx e Σ Σ 1 i , x xx regression weights for prediction of ξ by x Σ Σ I 1 Again, without measurement error, x xx γ σ σ Γ Σ Σ • Note: in Bollen (pp. 159-168), are meant to be , , , , , xy xy respectively, for the multiple regression model
As a very simplified case, suppose x 1 is the only fallible as: • x 1 1 1 x i q , 2,..., i i with the true and estimated regression equations: q q 1 1 2 2 x * * * * q q 1 1 2 2 • In this special case, the regression weight matrix has a simple Φ Φ Θ multiplicative form of bias (hint: use ): c 0 1 1 Σ Σ Φ Θ Φ I c 0 1 1 , c I xx x q q 1 2
• Consequently, resulting bias factors are: b * x 1 1 q 1 1 2 b i q * 1 , 2, , i i x i i 1 1 ~ Bias-factor for x 1 is less than 1 in absolute value (1 without * measurement error), and so is biased toward 0 --- the 1 b bias factor indicates regression weight b 1 in x q 1 1 2 ξ 1 = b 0 + b 1 x 1 + b 2 ξ 2 + … + b q ξ q Consequences for x i , i = 2,…, q are additive, depending on relationships between ξ 1 and ξ i holding all other IVs constant, and γ 1
• So far all reasoning is based on rather unrealistic assumptions: Only single indicator per latent variable, and so its loading becomes simply scaling constant Only one fallible IV • Without such assumptions (e.g., all IVs fallible), consequences of measurement error become too complicated and hard to Σ Σ 1 simplify algebraically --- no particular simple form of a x xx • One clear conclusion: all estimates are inconsistent --- systematically different from what they meant to be
• Consequence in standardization: var ii i * * standardized i i var var • Consequence in SMC is similar to the bivariate case: R R 2 *2 plim plim • What should we do with essentially omnipresent measurement error? Use SEM which allows for measurement errors in the model --- though we are limited in certain models regarding the model identification (e.g., Table 5.1, p. 164)
Correlated errors of measurement 15 • Consequence in regression weights further complicated: γ Σ Σ γ Σ σ * 1 1 xx x xx cov , * For simple regression: xx x var Now, is not necessarily < γ * • If correlated measurement errors are only within IVs (i.e., σ δε = 0 , Σ xx = Φ + Θ σ where Θ σ is not diagonal), γ Σ Σ γ 1 * x xx still holds (but the bias factor will have a more complicated form, also involving off-diagonal entries of Θ σ )
With multi-equations 16 • In path models with sequential causal paths, consequences of measurement errors very hard to simply generalize --- see the union sentiment (Fig. 5.2, p. 169) and SES (Fig. 5.4, p. 173) examples • If reliabilities are known, the corresponding error variances can be constrained; if unknown, the error variances may be modeled as free parameters provided that they are identifiable • To keep in mind: we need more than one indicator per latent variable for identifiability and statistical testing --- leading to measurement models with multiple indicators or CFA
Recommend
More recommend