kernel matching with automatic bandwidth selection
play

Kernel matching with automatic bandwidth selection Ben Jann - PowerPoint PPT Presentation

Kernel matching with automatic bandwidth selection Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata Users Group meeting London, September 78, 2017 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 1


  1. Kernel matching with automatic bandwidth selection Ben Jann University of Bern, ben.jann@soz.unibe.ch 2017 London Stata Users Group meeting London, September 7–8, 2017 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 1

  2. Contents Background 1 What is Matching? Multivariate Distance Matching (MDM) Propensity Score Matching (PSM) Matching Algorithms “Why PSM Should Not Be Used for Matching” The kmatch command 2 Features Examples Some Simulation Results Conclusions 3 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 2

  3. What is Matching? Matching is an approach to “condition on X ” between a treatment group and a control group. Basic idea: 1. For each observation in the treatment group, find “statistical twins” in the control group with the same (or at least very similar) X values. 2. The Y values of these matching observations are then used to compute the counterfactual outcome without treatment for the observation at hand. 3. An estimate for the average treatment effect can be obtained as the mean of the differences between the observed values and the “imputed” counterfactual values over all observations. Ben Jann (University of Bern) Kernel matching London, 07.09.2017 3

  4. What is Matching? Formally: � � � � 1 � Y i − ˆ Y 0 Y 0 ˆ ATT = w ij Y j with i = i N T = 1 i | T = 1 j | T = 0 � � � � 1 � Y 1 ˆ Y 1 ˆ ATC = i − Y i w ij Y j with i = N T = 0 i | T = 0 j | T = 1 ATE = N T = 1 ATT + N T = 0 � · � · � ATC N N Different matching algorithms use different definitions of w ij . ATE : average treatment effect; ATT : a.t.e. on the treated; ATC : a.t.e. on the untreated T : treatment indicator (0/1) Y : observed outcome; Y 1 ; potential outcome with treatment; Y 0 : p.o. without treatment Ben Jann (University of Bern) Kernel matching London, 07.09.2017 4

  5. Exact Matching � Exact matching: 1 / k i if X i = X j w ij = 0 else with k i as the number of observations for which X i = X j applies. The result equivalent to “perfect stratification” or “subclassification” (see, e.g., Cochran 1968). Problem: If X contains several variables there is a large probability that no exact matches can be found for many observations (the “curse of dimensionality”). Ben Jann (University of Bern) Kernel matching London, 07.09.2017 5

  6. Multivariate Distance Matching (MDM) An alternative is to match based on a distance metric that measures the proximity between observations in the multivariate space of X . The idea then is to use observations that are “close”, but not necessarily equal, as matches. A common approach is to use � MD ( X i , X j ) = ( X i − X j ) ′ Σ − 1 ( X i − X j ) as distance metric, where Σ is an appropriate scaling matrix. ◮ Mahalanobis matching: Σ is the covariance matrix of X . ◮ Euclidean matching: Σ is the identity matrix. ◮ Mahalanobis matching is equivalent to Euclidean matching based on standardized and orthogonalized X . Ben Jann (University of Bern) Kernel matching London, 07.09.2017 6

  7. Propensity Score Matching (PSM) ( Y 0 , Y 1 ) ⊥ ⊥ T | X implies ( Y 0 , Y 1 ) ⊥ ⊥ T | π ( X ) , where π ( X ) is the treatment probability conditional on X (the “propensity score”) (Rosenbaum and Rubin 1983). This simplifies the matching task as we can match on one-dimensional π ( X ) instead of multi-dimensional X . Procedure ◮ Step 1: Estimate the propensity score, e.g. using a Logit model. ◮ Step 2: Apply a matching algorithm using differences in the π ( X i ) − ˆ π ( X j ) | , instead of multivariate distances. propensity score, | ˆ PSM is very popular ◮ https://scholar.google.ch/scholar?q="propensity+score"+AND+ (matching+OR+matched+OR+match) Ben Jann (University of Bern) Kernel matching London, 07.09.2017 7

  8. Matching Algorithms Various matching algorithms can be used to find potential matches based on MD or ˆ π ( X ) and determine the matching weights w ij . Pair matching (one-to-one matching without replacement) ◮ For each observation in the treatment group find the closest observation in the control group. Each control is only used once. Nearest-neighbor matching (with replacement) ◮ For each observation in the treatment group find the k closest observations in the control group. A single control can be used multiple times. In case of ties, use all ties as matches. k is set by the researcher. Caliper matching ◮ Like nearest-neighbor matching, but only use controls with a distance smaller than some threshold c . Ben Jann (University of Bern) Kernel matching London, 07.09.2017 8

  9. Matching Algorithms Radius matching ◮ Use all controls with a distance smaller than some threshold c . Kernel matching ◮ Like radius matching, but give larger weight to controls with smaller distances (using some kernel function such as, e.g., the Epanechnikov kernel). Optional: remove remaining imbalance after matching using regression adjustment (a.k.a. “bias correction” in the context of nearest-neighbor matching). Ben Jann (University of Bern) Kernel matching London, 07.09.2017 9

  10. “Why PSM Should Not Be Used for Matching” The message of a recent paper by Gary King and Richard Nielsen is: Do not use PSM, it is really, really bad. ◮ The paper: http://j.mp/1sexgVw ◮ Slides: https://gking.harvard.edu/presentations/ why-propensity-scores-should-not-be-used-matching-6 ◮ Watch it: https://www.youtube.com/watch?v=rBv39pK1iEs Their argument goes about as follows: ◮ In experimental language, PSM approximates complete randomization . ◮ Other methods such as MDM approximate fully blocked randomization . ◮ A fully blocked design is more efficient. It leads to less data imbalance and less “model dependence” (dependence of results on modeling decisions by the researcher). ◮ Hence, procedures such as MDM dominate PSM. ◮ King and Nielsen provide evidence suggesting that PSM performs shockingly bad. Ben Jann (University of Bern) Kernel matching London, 07.09.2017 10

  11. Types of Experiments Balance Complete Fully Covariates: Randomization Blocked On average Exact Observed Unobserved On average On average � Fully blocked dominates complete randomization for: imbalance, model dependence, power, efficiency, bias, research costs, robustness. E.g., Imai, King, Nall 2009: SEs 600% smaller! Goal of Each Matching Method (in Observational Data) (slides by King and Nielsen) • PSM: complete randomization • Other methods: fully blocked • Other matching methods dominate PSM (wait, it gets worse) Ben Jann (University of Bern) Kernel matching London, 07.09.2017 11

  12. Best Case: Mahalanobis Distance Matching 80 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C T C C C C C C C T T C C C C C 70 C C C C C T C C C T C C C C C C C C C C C C C C C C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C T C C C C C C C C T C C C C C C C C C C C C C C C C C C C C T C 60 C C C C C C C C C C C C C C T C C C C C T C C C T C C C C C T C C C C C T C C C C C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C T C C C C C C C C C C C T T C C C C C T C C C C C C C C C C C T C C C C C C C C Age C C C C C C T C 50 C C C C C C C C C C C C C C C T C C C T C C C C C C C T C T C C C C C C C C C C C C C C T C C C C C C C C C C C C C C C C C C C T C C C C TT C C C C C C C C C C C C C CC TT C C C C C C C C C T C C CC C C T C C C T C C C C C C C C C C T C C C C C C C 40 C C C T C C C C C T C C C C C C C C C C C C T C C C C C C C C C C C T C C C T C C C T C T C C C C T C C C C C C C C C C C C C T C C C C C C C C C C T C C T C C C C C C C C C C C C T C C T C T C C T C C C C C C C C C C C C C C C C C C C C C C C C C C C C C (slides by King and Nielsen) C C C C C C C C 30 C 20 12 14 16 18 20 22 24 26 28 Education (years) 9/23 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 12

  13. Best Case: Mahalanobis Distance Matching 80 C T C T C T 70 C T C T C T C T C T C T 60 C T C T C T C T T C C T C T T C T C C T T C Age T C 50 C T C T C T T C C T C T TT C C CC TT C T T C T C C T C 40 T C T C C T T C T C T C T C T C T C C T T C T C T C T C T (slides by King and Nielsen) 30 20 12 14 16 18 20 22 24 26 28 Education (years) 9/23 Ben Jann (University of Bern) Kernel matching London, 07.09.2017 12

Recommend


More recommend