acreg arbitrary correlation regression
play

acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael - PowerPoint PPT Presentation

acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael Lalive, Seyhun O. Sakalli, Mathias Thoenig (UNIL) (UNIL) (Kings College) (UNIL) www.acregstata.weebly.com (Virtual) Swiss Stata Meeting 2020 Bern, November 2020


  1. acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael Lalive, Seyhun O. Sakalli, Mathias Thoenig (UNIL) (UNIL) (King’s College) (UNIL) www.acregstata.weebly.com (Virtual) Swiss Stata Meeting 2020 Bern, November 2020

  2. Introduction

  3. Motivation I Modeling the convoluted correlation structures between units improves inference ❼ Spatial data: - Geographical positions of observations - Neighborhood structures ❼ Network data: - Social networks - Mobile data - Co-working relations Colella, Lalive, Sakalli, and Thoenig acreg

  4. Motivation II But only a few studies offers a flexible theoretical framework (Bester et al., 2011) Commonly used practices: ❼ Spatial Data - Cluster (Cameron et al., 2011) - Conley’s Spatial Clustering (Conley, 1999a) ❼ Network Data - Cluster Colella, Lalive, Sakalli, and Thoenig acreg

  5. Motivation III And the STATA literature on the topic is limited ❼ Robust (White, 1980) and Two-way clustering corrections (Cameron and Miller, 2015) included in most programs computing OLS and 2SLS regressions. ❼ In the Spatial literature there are some programs to account for correlation using coordinates - Conley, 1999b - Hsiang, 2010 ❼ There are no STATA packages available to account for correlation between neighbors or observations in a network Colella, Lalive, Sakalli, and Thoenig acreg

  6. Motivation IV In a related paper (Colella et al., 2019): ❼ Building on White (1980), we develop an Arbitrary Clustering approach to deal with inference with any type of topological and temporal dependence between observational units ❼ We perform extensive Monte Carlo simulations for both spatial and network data structures comparing different methods ❼ We show that commonly used techniques reject the null hypothesis about 110% times more than they should, while with our approach gets close to the true rejection rate. Go ❼ Provide guidelines for conducting inference in complex settings Colella, Lalive, Sakalli, and Thoenig acreg

  7. This Paper We introduce a new STATA package (and a companion paper) implementing the standard errors correction approach proposed in Colella et al. (2019): ACREG: Arbitrary Correlation Regression ❼ Computes adjusted standard errors for: - Spatial data (coordinates or contiguity matrix), - Network data (adjacency matrix), - Multi-way clustering environments (infinite list of clustering vari- ables) ❼ Suits OLS and 2SLS settings ❼ Includes temporal correlation for panel data Colella, Lalive, Sakalli, and Thoenig acreg

  8. Correlation with Spatial Data

  9. Correlation in Space Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  10. Correlation in Space - Clustering by State Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  11. Correlation in Space - Clustering by State Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  12. Correlation in Space - Conley 1999 Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  13. Correlation in Space - Conley 1999 Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  14. Correlation in Space - Conley 1999 Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  15. Correlation in Space - Conley 1999 Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  16. Correlation in Space - Conley 1999 Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

  17. Correlation with Network Data

  18. Correlation in Network Colella, Lalive, Sakalli, and Thoenig acreg

  19. Correlation in Network - One way clustering Colella, Lalive, Sakalli, and Thoenig acreg

  20. Correlation in Network - One way clustering Colella, Lalive, Sakalli, and Thoenig acreg

  21. Correlation in Network - Network Clusters Colella, Lalive, Sakalli, and Thoenig acreg

  22. Correlation in Network - Network Clusters Colella, Lalive, Sakalli, and Thoenig acreg

  23. Correlation in Network - Network Clusters Colella, Lalive, Sakalli, and Thoenig acreg

  24. Adjacency matrix j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9 j 10 j 11 j 1 1 0 1 0 0 1 1 0 0 0 1 j 2 0 1 1 0 1 0 0 1 0 0 1 j 3 1 1 1 0 0 0 0 0 0 1 0 j 4 0 0 0 1 0 0 1 1 0 1 0 j 5 0 1 0 0 1 0 0 0 0 0 1 j 6 1 0 0 0 0 1 1 0 0 0 0 j 7 0 0 0 1 0 1 1 0 0 1 0 j 8 0 1 0 1 0 0 0 1 1 0 0 j 9 1 0 0 0 0 0 0 1 1 0 0 j 10 0 0 1 1 0 0 1 0 0 1 0 j 11 1 1 0 0 1 0 0 0 0 0 1 Colella, Lalive, Sakalli, and Thoenig acreg

  25. Conceptual Framework

  26. Theoretical VCV of the OLS estimator Linear Model y = X β + ǫ Standard OLS Estimator b OLS = ( X ′ X ) − 1 ( X ′ y ) With Variance VCV ( b OLS ) = ( X ′ X ) − 1 X ′ Ω X ( X ′ X ) − 1 Where: y is the Dependent Variable X is the Matrix of Regressors (exogenous and endogenous) Ω is the VCV of errors Colella, Lalive, Sakalli, and Thoenig acreg

  27. Estimating the VCV of the OLS estimator Proposed Estimator for X ′ Ω X is: n T n T � � � � X ′ ( S × ( uu ′ )) X = x it u it u js x js s itjs i =1 t =1 j =1 s =1 Where: u ≡ y − X β OLS are the estimated residuals ❼ Each itjs -th component of s is a correlation weight [0,1] ❼ The correlation weight should reflect the dependence of the error of obser- vation it on the error of observation js , ❼ The matrix S can be computed from the adjacency matrix Colella, Lalive, Sakalli, and Thoenig acreg

  28. Syntax

  29. Syntax - Baseline acreg depvar [ varlist1 ] [( varlist2 = varlist iv )] [ if ] [ in ] [ fweight pweight ] ❼ depvar is the dependent variable ❼ varlist1 is the list of exogenous variables ❼ varlist2 is the list of endogenous variables ❼ varlist iv is the list of exogenous variables used with varlist1 as instruments for varlist2 Colella, Lalive, Sakalli, and Thoenig acreg

  30. Syntax - Time Dimension acreg depvar varlist1 ( varlist2 = varlist iv ), id ( idvar ) time ( timevar ) lag ( # ) ❼ idvar is the cross-sectional unit identifier ❼ timevar is the time unit variable ❼ lag ( # ) specifies the time lag cutoff for observations with the same idvar Colella, Lalive, Sakalli, and Thoenig acreg

  31. Syntax - Spatial I acreg depvar varlist1 ( varlist2 = varlist iv ), spatial latitude ( latitudevar ) longitude ( longitudevar ) dist ( # ) ❼ spatial specifies the spatial environment ❼ latitudevar is the variable containing the latitude of each obser- vation in decimal degrees: range[-180.0, 180.0] ❼ longitudevar is the variable containing the longitude of each ob- servation in decimal degrees: range[-180.0, 180.0] ❼ dist ( # ) specifies the distance cutoff beyond which the corre- lation between error term of two observations is assumed to be zero, in km Colella, Lalive, Sakalli, and Thoenig acreg

  32. Syntax - Spatial II acreg depvar varlist1 ( varlist2 = varlist iv ), spatial dist mat ( varlist distances ) dist ( # ) specifies the spatial environment ❼ spatial ❼ varlist distances is the list of N variables containing bilateral spa- tial distances between observations in any meaningful metric, e.g., physical or travel distance between two locations. ❼ dist ( # ) specifies the distance cutoff beyond which the corre- lation between error term of two observations is assumed to be zero, in the same metric as varlist distances Colella, Lalive, Sakalli, and Thoenig acreg

  33. Syntax - Network I acreg depvar varlist1 ( varlist2 = varlist iv ), network links mat ( varlist links ) dist ( # ) ❼ network specifies that the network environment is the list of N binary variables specifying the ❼ varlist links links between observations, e.g., the adjacency matrix. The links between two units can change over time. ❼ dist ( # ) specifies the distance cutoff (geodesic paths) beyond which the correlation between error term of two observations is assumed to be zero. If it is greater than 1, acreg computes the bilateral distance between two nodes. Colella, Lalive, Sakalli, and Thoenig acreg

  34. Syntax - Network II acreg depvar varlist1 ( varlist2 = varlist iv ), network dist mat ( varlist distances ) dist ( # ) ❼ network specifies that the network environment ❼ varlist distances is the list of N variables containing bilateral distances between observations in the network, i.e., the number of links along the shortest path between two nodes. ❼ dist ( # ) specifies the distance cutoff (geodesic paths) beyond which the correlation between error term of two observations is assumed to be zero. If it is greater than 1, acreg computes the bilateral distance between two nodes. Colella, Lalive, Sakalli, and Thoenig acreg

  35. Syntax - Multiway Clustering acreg depvar varlist1 ( varlist2 = varlist iv ), cluster ( varlist cluster ) ❼ varlist cluster is the list of variables identifying the different clus- ters. Each variable identify a specific cluster dimension and its clusters. Colella, Lalive, Sakalli, and Thoenig acreg

Recommend


More recommend