a more powerful subvector anderson and rubin test in
play

A more powerful subvector Anderson and Rubin test in linear - PowerPoint PPT Presentation

A more powerful subvector Anderson and Rubin test in linear instrumental variables regression Patrik Guggenberger Pennsylvania State University Joint work with Frank Kleibergen (University of Amsterdam) and Sophocles Mavroeidis (University of


  1. A more powerful subvector Anderson and Rubin test in linear instrumental variables regression Patrik Guggenberger Pennsylvania State University Joint work with Frank Kleibergen (University of Amsterdam) and Sophocles Mavroeidis (University of Oxford) Indiana University September, 2018

  2. Overview • Robust inference on a slope coefficient(s) in a linear IV regression • "Robust" means uniform control of null rejection probability over all "em- pirically relevant" parameter constellations • "Weak instruments" — pervasive in applied research (Angrist and Krueger, 1991) — adverse effect on estimation and inference (Dufour, 1997; Staiger and Stock 1997)

  3. • Large literature on "robust inference" for the full parameter vector • Here: Consider subvector inference in the linear IV model , allowing for weak instruments • First assume homoskedasticity — then relax to general Kronecker-Product structure — then allow for arbitrary forms of heteroskedasticity • Presentation based on two papers; one being "A more powerful subvector Anderson Rubin test in linear instrumental variables regression"

  4. • Focus on the Anderson and Rubin (AR, 1949) subvector test statistic : — "History of critical values": — Projection of AR test (Dufour and Taamouti, 2005) — Guggenberger, Kleibergen, Mavroeidis, and Chen (2012, GKMC) pro- vide power improvement: Using χ 2 k − m W , 1 − α as critical value, rather than χ 2 k, 1 − α still controls asymptotic size "Worst case" occurs under strong identification • HERE: consider a data-dependent critical value that adapts to strength of identification

  5. • Show: controls finite sample/asymptotic size & has uniformly higher power than method in GKMC • One additional main contribution : computational ease • Implication: Test in GKMC is "inadmissible"

  6. Presentation • Introduction: � • finite sample case a) m W = 1 : motivation, correct size, power analysis (near optimality result) b) m W > 1 : correct size, uniform power improvement over GKMC c) refinement

  7. • asymptotic case: a) homoskedasticity b) general Kronecker-Product structure c) general case (arbitrary forms of heteroskedasticity)

  8. Model and Objective (finite sample case) y = Y β + Wγ + ε, Y = Z Π Y + V Y , W = Z Π W + V W , y ∈ R n , Y ∈ R n × m Y (end or ex) , W ∈ R n × m W (end) , Z ∈ R n × k (IVs) • Reduced form: � � β . I m Y 0 ( y . . Y . . W ) = Z (Π Y . . . + ( v y . . V Y . . . . . . . . . Π W ) . . V W ) , � �� � γ 0 I m W V where v y := ε + V Y β + V W γ. • Objective: test H 0 : β = β 0 versus H 1 : β � = β 0 .

  9. s.t. size bounded by nominal size & "good" power Parameter space: 1. The reduced form error satisfies: V i ∼ i.i.d. N (0 , Ω) , i = 1 , ..., n, for some Ω ∈ R ( m +1) × ( m +1) s.t. the variance matrix of ( Y 0 i , V � Wi ) � for Y 0 i = y i − Y � i β 0 = W � i γ + ε i , namely     � 1 0 1 0     Ω ( β 0 ) = − β 0 0 Ω − β 0 0     0 I m W 0 I m W is known and positive definite . 2. Z ∈ R n × k fixed, and Z � Z > 0 k × k matrix.

  10. • Note: no restrictions on reduced form parameters Π Y and Π W → allow for weak IV

  11. • Several robust tests available for full vector inference H 0 : β = β 0 , γ = γ 0 vs H 1 : not H 0 including AR (Anderson and Rubin, 1949), LM, and CLR tests, see Kleiber- gen (2002), Moreira (2003, 2009). • Optimality properties: Andrews, Moreira, and Stock (2006), Andrews, Marmer, and Yu (2018), and Chernozhukov, Hansen, and Jansson (2009)

  12. Subvector procedures • Projection: "inf" test statistic over parameter not under test, same critical value → "computationally hard" and "uninformative" • Bonferroni and related techniques: Staiger and Stock (1997), Chaud- huri and Zivot (2011), McCloskey (2012), Zhu (2015), Andrews (2017),Wang and Tchatoka (2018) ...; often computationally hard, power ranking with projection unclear • Plug-in approach: Kleibergen (2004), Guggenberger and Smith (2005)...Re- quires strong identification of parameters not under test.

  13. • GMM models: Andrews, I. and Mikusheva (2016) • Models defined by moment inequalities: Gafarov (2016), Kaido, Molinari, and Stoye (2016), Bugni, Canay, and Shi (2017), ...

  14. The Anderson and Rubin (1949) test • AR test stat for full vector hypothesis H 0 : β = β 0 , γ = γ 0 vs H 1 : not H 0 • AR statistic exploits EZ i ε i = 0 • AR test stat: AR n ( β 0 , γ 0 ) = ( y − Y β 0 − Wγ 0 ) � P Z ( y − Y β 0 − Wγ 0 ) � � � � � 1 . 0 . 1 . 0 . . − β � . − γ � . − β � . − γ � . . . . Ω 0 0 • AR stat is distri. as χ 2 k under null hypothesis; critical value χ 2 k, 1 − α

  15. • Subvector AR statistic for testing H 0 is given by ( Y 0 − Wγ ) � P Z ( Y 0 − Wγ ) AR n ( β 0 ) = min . − γ � � Ω . − γ � � , � 1 . � 1 . 0 . 0 . . − β � . − β � . . . . γ ∈ R mW where again Y 0 = y − Y β 0 . • Alternative representation (using κ min ( A ) = min x, || x || =1 x � Ax ): AR n ( β 0 ) = ˆ κ p , where ˆ κ i for i = 1 , ..., p = 1 + m W be roots of characteristic polynomial in κ � � � κI p − Ω ( β 0 ) − 1 / 2 � � � P Z � � � � Y 0 . Y 0 . Ω ( β 0 ) − 1 / 2 � . . � . W . W � = 0 , ordered non-increasingly

  16. • When using χ 2 k, 1 − α critical values, as for projection, trivially, test has correct size; GKMC show that this is also true for χ 2 k − m W , 1 − α critical values

  17. • Next show: AR statistic is the minimum eigenvalue of a non-central Wishart matrix • For par space above, the roots ˆ κ i solve � � � � κ i I 1+ m W − Ξ � Ξ 0 = � ˆ � , i = 1 , ..., p = 1 + m W , where Ξ ∼ N ( M, I k ⊗ I p ) , and M is a k × p . � � 0 k , Θ W • Under H 0 , the noncentrality matrix becomes M = , where � � 1 / 2 Π W Σ − 1 / 2 Z � Z Θ W = V W V W .ε , Σ V W V W .ε = Σ V W V W − Σ � εV W σ − 1 εε Σ εV W

  18. and     � � σ εε � 1 0 1 0 Σ εV W     = − β 0 0 Ω − β 0 0     Σ � Σ V W V W εV W − γ I m W − γ I m W • Summarizing , under H 0 the p × p matrix � � Ξ � Ξ ∼ W k, I p , M � M , has non-central Wishart with noncentrality matrix � � 0 0 M � M = Θ � 0 W Θ W and AR n ( β 0 ) = κ min (Ξ � Ξ)

  19. • The distribution of the eigenvalues of a noncentral Wishart matrix only depends on the eigenvalues of the noncentrality matrix M � M . κ i only depends on the eigenvalues of Θ � • Hence, distribution of ˆ W Θ W , κ i say , i = 1 , . . . , m W and κ = ( κ 1 , ..., κ m W ) � • When m W = 1 , κ = κ 1 = Θ � W Θ W is scalar .

  20. Figure 1: The cdf of the subset AR statistic with k = 3 instruments, for different values of κ 1 = 5 , 10 , 15 , 100 Theorem: Suppose m W = 1 . Then, under the null hypothesis H 0 : β = β 0 , the distribution function of the subvector AR statistic, AR n ( β 0 ) , is monoton- ically decreasing in the parameter κ 1 .

  21. New critical value for subvector Anderson and Rubin test: m W = 1 • Relevance: If we knew κ 1 we could implement the subvector AR test with a smaller critical value than χ 2 k − m W , 1 − α which is the critical value in the case when κ 1 is "large". • Muirhead (1978): Under null, when κ 1 "is large", the larger root � κ 1 (which measures strength of identification) is a sufficient statistic for κ 1 • More precisely: the conditional density of AR n ( β 0 ) = ˆ κ 2 given ˆ κ 1 can be approximated by κ 1 − x ) 1 / 2 g (ˆ f ˆ κ 1 ( x ) ∼ f χ 2 k − 1 ( x ) (ˆ κ 1 ) , κ 2 | ˆ

  22. k − 1 is the density of a χ 2 where f χ 2 k − 1 and g is a function that does not depend on κ 1 . • Analytical formula for g • The new critical value for the subvector AR-test at significance level 1 − α is given by 1 − α quantile of (approximation of AR n given � κ 1 ) • Denote cv by c 1 − α (ˆ κ 1 , k − m W ) Depends only on α, k − m W , and ˆ κ 1

  23. • Conditional quantiles can be computed by numerical integration • Conditional critical values can be tabulated → implementation of new test is trivial and fast κ 1 and converging to quantiles of χ 2 • They are increasing in ˆ k − 1 • We find, by simulations over fine grid of values of κ 1 , that new test 1( AR n ( β 0 ) > c 1 − α (ˆ κ 1 , k − m W )) controls size • It improves on the GKMC procedure in terms of power

  24. • Theorem: Suppose m W = 1 . The new conditional subvector Anderson Rubin test has correct size under the assumptions above. • Proof partly based on simulations; Verified for e.g. α ∈ { 1% , 5% , 10% } and k − m W ∈ { 1 , ..., 20 } . • Summary m W = 1 : the cond’l test rejects when ˆ κ 2 > c 1 − α (ˆ κ 1 , k − 1) , � ; � k, I p , M � M κ 2 ) are the eigenvalues of 2 × 2 matrix Ξ � Ξ ∼ W where (ˆ κ 1 , ˆ Under the null M � M is of rank 1; test has size α

More recommend