deep generalized method of moments for instrumental
play

Deep Generalized Method of Moments for Instrumental Variable - PowerPoint PPT Presentation

Deep Generalized Method of Moments for Instrumental Variable Analysis Andrew Bennett, Nathan Kallus, Tobias Schnabel Intro Background Method Experiments Endogeneity g 0 ( x ) = max( x, x/ 5) Y = g 0 ( X ) 2 + X = Z + 2


  1. Deep Generalized Method of Moments for Instrumental Variable Analysis Andrew Bennett, Nathan Kallus, Tobias Schnabel

  2. Intro Background Method Experiments Endogeneity ◮ g 0 ( x ) = max( x, x/ 5) ◮ Y = g 0 ( X ) − 2 ǫ + η ◮ X = Z + 2 ǫ, Z, ǫ, η ∼ N (0 , 1) 4 2 0 2 true g 0 estimated by neural net 4 observed 6 4 2 0 2 4 6 8

  3. Intro Background Method Experiments IV Model ◮ Y = g 0 ( X ) + ǫ ◮ E ǫ = 0 , E ǫ 2 < ∞ ◮ E [ ǫ | X ] � = 0 ◮ Hence, g 0 ( X ) � = E [ Y | X ] ◮ Instrument Z has ◮ E [ ǫ | Z ] = 0 ◮ P ( X | Z ) � = P ( X ) ◮ If had additional endogenous context L , include it in both X and Z ◮ g 0 ∈ G = { g ( · ; θ ) : θ ∈ Θ } ◮ θ 0 ∈ Θ is such that g 0 ( x ) = g ( x ; θ 0 )

  4. Intro Background Method Experiments IV is Workhorse of Empirical Research Source of Instrumental Outcome Variable Endogenous Variable Variable(s) Reference 1. Natural Experiments Labor supply Disability insurance Region and time variation in Gruber (2000) replacement rates benefit rules From Angrist & Krueger 2001 Labor supply Fertility Sibling-Sex composition Angrist and Evans (1998) Education, Labor Out-of-wedlock Occurrence of twin births Bronars and Grogger supply fertility (1994) Wages Unemployment State laws Anderson and Meyer insurance tax rate (2000) Earnings Years of schooling Region and time variation in Duflo (2001) school construction Earnings Years of schooling Proximity to college Card (1995) Earnings Years of schooling Quarter of birth Angrist and Krueger (1991) Earnings Veteran status Cohort dummies Imbens and van der Klaauw (1995) Earnings Veteran status Draft lottery number Angrist (1990) Achievement test Class size Discontinuities in class size Angrist and Lavy (1999) scores due to maximum class-size rule College enrollment Financial aid Discontinuities in financial van der Klaauw (1996) aid formula Health Heart attack surgery Proximity to cardiac care McClellan, McNeil and centers Newhouse (1994) Crime Police Electoral cycles Levitt (1997) Employment and Length of prison Randomly assigned federal Kling (1999) Earnings sentence judges Birth weight Maternal smoking State cigarette taxes Evans and Ringel (1999)

  5. Intro Background Method Experiments Going further ◮ Standard methods like 2SLS and GMM and more recent variants are significantly impeded when: ◮ X is structured high-dimensional ( e.g. , image)? ◮ and/or Z is structured high-dimensional ( e.g. , image)? ◮ and/or g 0 is complex ( e.g. , neural network)? ◮ (As we’ll discuss)

  6. Intro Background Method Experiments DeepGMM ◮ We develop a method termed DeepGMM ◮ Aims to addresses IV with such high-dimensional variables / complex relationships ◮ Based on a new variational interpretation of optimally-weighted GMM (inverse-covariance), which we use to efficiently control very many moment conditions ◮ DeepGMM given by the solution to a smooth zero-sum game, which we solve with iterative smooth-game-playing algorithms (` a la GANs) ◮ Numerical results will show that DeepGMM matches the performance of best-tuned methods in standard settings and continues to work in high-dimensional settings where even recent methods break

  7. Intro Background Method Experiments This talk 1 Introduction 2 Background 3 Methodology 4 Experiments

  8. Intro Background Method Experiments Two-stage methods ◮ E [ ǫ | Z ] = 0 implies � E [ Y | Z ] = E [ g 0 ( X ) | Z ] = g 0 ( x ) d P ( X = x | Z ) ◮ If g ( x ; θ ) = θ T φ ( x ) : becomes E [ Y | Z ] = θ T E [ φ ( X ) | Z ] ◮ Leads to 2SLS: regress φ ( X ) on Z (possibly transformed) by least-squares and then regress Y on ˆ E [ φ ( X ) | Z ] ◮ Various methods that find basis expansions non-parametrically ( e.g. , Newey and Powell) ◮ In lieu of a basis, DeepIV instead suggests to learn P ( X = x | Z ) as NN-parameterized Gaussian mixture ◮ Doesn’t work if X is rich ◮ Can suffer from “forbidden regression” ◮ Unlike least-squares, MLE doesn’t guarantee orthogonality irrespective of specification

  9. Intro Background Method Experiments Moment methods ◮ E [ ǫ | Z ] = 0 implies E [ f ( Z )( Y − g 0 ( X ))] = 0 ◮ For any f 1 , . . . , f m implies the moment conditions ψ ( f j ; θ 0 ) = 0 where ψ ( f ; θ ) = E [ f ( Z )( Y − g ( X ; θ ))] ◮ GMM takes ψ n ( f ; θ ) = ˆ E n [ f ( Z )( Y − g ( X ; θ ))] and sets θ GMM ∈ argmin ˆ � ( ψ n ( f 1 ; θ ) , . . . , ψ n ( f m ; θ )) � 2 θ ∈ Θ ◮ Usually: � · � 2 . Recently, AGMM: � · � ∞ ◮ Significant inefficiencies with many moments: wasting modeling power to make redundant moments small ◮ Hansen et al: (With finitely-many moments) this norm gives the minimal asymptotic variance (efficiency) for any ˜ θ → p θ 0 : � v � 2 = v T C − 1 � n θ v, [ C θ ] jk = 1 i =1 f j ( Z i ) f k ( Z i )( Y i − g ( X i ; θ )) 2 . ˜ n ◮ E.g., two-step/iterated/cts GMM. Generically OWGMM.

  10. Intro Background Method Experiments Failure with Many Moment Conditions ◮ When g ( x ; θ ) is a flexible model, many – possibly infinitely many – moment conditions may be needed to identify θ 0 ◮ But both GMM and OWGMM will fail if we use too many moments

  11. Intro Background Method Experiments This talk 1 Introduction 2 Background 3 Methodology 4 Experiments

  12. Intro Background Method Experiments Variational Reformulation of OWGMM ◮ Let V be vector space of real-valued fns of Z ◮ ψ n ( f ; θ ) is a linear operator on V i =1 f ( Z i ) h ( Z i )( Y i − g ( X i ; θ )) 2 is a bilinear ◮ C θ ( f, h ) = 1 � n n form on V ◮ Given any subset F ⊆ V , define ψ n ( f ; θ ) − 1 Ψ n ( θ ; F , ˜ θ ) = sup 4 C ˜ θ ( f, f ) f ∈F Theorem Let F = span( f 1 , . . . , f m ) be a subspace. For OWGMM norm: � ( ψ n ( f 1 ; θ ) , . . . , ψ n ( f m ; θ )) � 2 = Ψ n ( θ ; F , ˜ θ ) . θ OWGMM ∈ argmin θ ∈ Θ Ψ n ( θ ; F , ˜ Hence: ˆ θ ) .

  13. Intro Background Method Experiments DeepGMM ◮ Idea: use this reformulation and replace F with a rich set ◮ But not with a hi-dim subspace (that’d just be GMM) ◮ Let F = { f ( z ; τ ) : τ ∈ T } , G = { g ( x ; θ ) : θ ∈ Θ } be all networks of given architecture with varying weights τ, θ ◮ (Think about it as the union the spans of the penultimate layer functions) ◮ DeepGMM is then given by the solution to the smooth zero-sum game (for any data-driven ˜ θ ) θ DeepGMM ∈ argmin ˆ sup U ˜ θ ( θ, τ ) θ ∈ Θ τ ∈T � n θ ( θ, τ ) = 1 i =1 f ( Z i ; τ )( Y i − g ( X i ; θ )) where U ˜ n � n i =1 f 2 ( Z i ; τ )( Y i − g ( X i ; ˜ 1 θ )) 2 . − 4 n

  14. Intro Background Method Experiments Consistency of DeepGMM ◮ Assumptions: ◮ Identification: θ 0 uniquely solves ψ ( f ; θ ) = 0 ∀ f ∈ F ◮ Complexity: F , G have vanishing Rademacher complexities (alternatively, can use a combinatorial measure like VC) ◮ Absolutely star shaped: f ∈ F , | λ | ≤ 1 = ⇒ ( λf ) ∈ F ◮ Continuity: g ( x ; θ ) , f ( x ; τ ) are continuous in θ, τ for all x ◮ Boundedness: Y, sup θ ∈ Θ | g ( X ; θ ) | , sup τ ∈T | f ( Z ; τ ) | bounded Theorem Let ˜ θ n by any data-dependent sequence with a limit in probability. Let ˆ θ n , ˆ τ n be any approximate equilibrium of our game, i.e., θ n (ˆ θ n (ˆ sup U ˜ θ n , τ ) − o p (1) ≤ U ˜ θ n , ˆ τ n ) ≤ inf θ U ˜ θ n ( θ, ˆ τ n ) + o p (1) . τ ∈T Then ˆ θ n → p θ 0 .

  15. Intro Background Method Experiments Consistency of DeepGMM ◮ Specification is much more defensible when use such a rich F ◮ Nonetheless, if we drop specification we instead get θ : ψ ( f ; θ )=0 ∀ f ∈F � θ − ˆ inf θ n � → p 0

  16. Intro Background Method Experiments Optimization ◮ Thanks to surge of interest in GANs, lots of good algorithms for playing smooth games ◮ We use OAdam by Daskalakis et al. ◮ Main idea: use updates with negative momentum

  17. Intro Background Method Experiments Choosing ˜ θ ◮ Ideally ˜ θ ≈ θ 0 θ DeepGMM using another ˜ ◮ Can let it be ˆ θ ◮ Can repeat this ◮ To simulate this, at every step of the learning algorithm, we update it to be the last θ iterate

  18. Intro Background Method Experiments This talk 1 Introduction 2 Background 3 Methodology 4 Experiments

  19. Intro Background Method Experiments Overview ◮ Low-dimensional scenarios: 2-dim Z , 1-dim Z ◮ High-dimensional scenarios: Z , X , or both are images ◮ Benchmarks: ◮ DirectNN: regress Y on X with NN ◮ Vanilla2SLS: all linear ◮ Poly2SLS: select degree and ridge penalty by CV ◮ GMM+NN*: OWGMM with NN g ( x ; θ ) ; solve using Adam ◮ When Z is low-dim expand with 10 RBFs around EM clustering centroids. When Z is high-dim use raw instrument. ◮ AGMM: github.com/vsyrgkanis/adversarial gmm ◮ One-step GMM with � · � ∞ + jitter update to moments ◮ Same moment conditions as above ◮ DeepIV: github.com/microsoft/EconML

  20. Intro Background Method Experiments Low-dimensional scenarios Y = g 0 ( X ) + e + δ X = 0 . 5 Z 1 + 0 . 5 e + γ Z ∼ Uniform([ − 3 , 3] 2 ) e ∼ N (0 , 1) , γ, δ ∼ N (0 , 0 . 1) ◮ abs : g 0 ( x ) = | x | ◮ linear : g 0 ( x ) = x ◮ sin : g 0 ( x ) = sin( x ) ◮ step : g 0 ( x ) = I { x ≥ 0 }

  21. Intro Background Method Experiments sin step abs linear

  22. Intro Background Method Experiments sin step abs linear

  23. Intro Background Method Experiments

Recommend


More recommend