Economics for Data Science Chiara Binelli Academic year 2019-2020 Email: chiara.binelli@unimib.it
To Explain or to Predict? (Shmueli 2010) Causal and predictive modelling differs along 4 main dimensions: 1. Causation (x causes y) vs. association (x is associated to y). 2. Theory (model) vs. data (data-driven approach to establish how x is related to y). 3. Retrospective (test an existing set of hypothesis) vs. prospective (predict new observations). 4. Bias (focus on minimizing bias to get the correct impact of x on y) vs. bias-variance trade off (balance bias and variance to get the best predictions).
Economics and Machine Learning • Economics’ approach: 1. Theory that identifies a specific relationship of interest (ex. impact of going to college on wages). 2. Goal: estimate the causal impact of the variable of interest (ex. going to college) on the outcome (ex. wages). Effort to estimate unbiased effects with carefully constructed standard errors. • Supervised machine learning’s approach: 1. Predict how a given outcome varies with a large number of potential predictors. 2. May or not use prior theory to establish which predictors are relevant. Data-driven model selection to identify meaningful predictive variables. Less attention to statistical significance and more attention to prediction accuracy.
Concrete Example: Einav and Levin (2014) • Goal: assess if taking online classes improves earnings. • Economics Approach: – Either design an experiment that induces some workers to take on line classes for reasons unrelated to their earning potential. • e.g. change in the price of online classes. – Or, absent the experiment, choose econometric technique to estimate the unbiased impact of online classes on earnings. – Focus on : • Obtaining a point estimate of the impact of online classes on earnings that is precisely estimated . • Discussing whether there are omitted variables that might confound a causal interpretation (e.g. workers’ ambition driving a decision to take classes and work harder at the same time).
Concrete Example: Einav and Levin (2014) • Data Science Approach: – Identify what variables predict earnings, given a vast set of predictors in the data, and the potential for building a model that predicts earnings well, both in sample and out of sample. – Focus on : • Model that predicts earnings both for individuals that have and for individuals that have not taken online classes. – NOTE : causality and statistical significance are more difficult to assess since the exact source of variation identifying the impact of a given x on y is difficult to assess.
From Economics to Data Science: 1. Provide a Theory • Example: online advertising auctions. • Important question for Google or Facebook: – Which ads to show online and how much to charge for the ads? 1. Machine learning methods to build a predictive model to assess the likelihood that a user will click on an ad. By exploiting the enormous amount of data available online, this predictive model tells us which ads to show. 2. Economic theory to build auction models to set prices. • Several e-commerce companies have built teams of economists (often academic economists with PhDs), statisticians and computer scientists.
From Economics to Data Science: 2. Focus on Causality (Pearl 2018) • Human-level AI cannot emerge solely from model-blind learning machines; it requires the symbiotic collaboration of data and models. • Data science is only as much of a science as it facilitates the interpretation of data - a two-body problem, connecting data to reality. • Data alone are hardly a science, regardless how big they get and how skillfully they are manipulated. – We need a theory to interpret the data.
From Data Science to Economics • Test robustness to misspecifications. • Extract economically meaningful information from new sources of data (e.g. images and satellite data), innovative research designs, nowcasting. – When reliable data are missing such as in measuring poverty: Blumenstock (2016), Jean et al. (2016), Blumestock, Cadamuro and On (2015). – Bernheim et al. (2013) use a machine learning algorithm trained on a subset of respondents to a survey to predict actual choices from survey respondents, thus providing a tool to infer actual from reported behavior. • Provide better predictions. – Improve standard estimation techniques by providing good predictive models (e.g. first stage of IV estimation, propensity score for matching estimator). – Answer pure prediction policy problems (Kleinberg et al. 2015). • New tools for causal inference. – Construct counterfactuals for causal inference (Varian 2016).
From Data Science to Economics: 1. Test Robustness to Misspecifications (Athey and Imbens 2015) • Researches interested in the effect of a given variable (x) on an outcome (y) typically report the estimated impact of x on y and a measure of uncertainty of the estimate such as the standard error (se). • Problem: the measure of uncertainty, that is the statistical significance of the estimate, depends on the model’s specification: – Different specifications of the model (variables included, functional forms, etc.) produce different estimates of x on y and associated se. • Athey and Imbens (2015) propose a simple machine learning approach to assess the sensitivity of the point estimates to model specification.
Prediction Policy Problems (Kleinberg et al. 2015) • Many questions where causal inference is not necessary. – Example 1: is the chance of rain high enough to require an umbrella? The benefits of an umbrella depend on rain. – Example 2: Are the benefits of a hip surgery high enough to justify the surgery? The benefits of a hip surgery depend on whether the patient lives long enough after the surgery (Kleinberg et al. 2015). – Example 3: detain or release someone arrested before trial’s decision? The decision depends on prediction of arrestee’s probability of committing a crime. • Therefore, PURE PREDICTION PROBLEMS .
Prediction Policy Problems (Kleinberg et al. 2015) • OLS focuses on unbiasedness (it is the best linear unbiased estimator) and provides poor predictions. • Problem: given a dataset D of n points (y,x), pick a function that predicts the y value of a new data point. Goal is to ˆ y 2 minimize a loss function that we take to be ( f ( x )) ˆ • OLS finds that minimizes in-sample error: f OLS n ˆ 2 min ( y f ( x )) i i i 1 • PROBLEM: ensuring zero bias in sample creates problems out of sample.
Prediction Policy Problems (Kleinberg et al. 2015) • MSE at the new point x: ˆ ˆ ˆ ˆ 2 2 2 MSE ( x ) E ( f ( x ) y ) E ( f ( x ) E ( y )) ( E ( y ) y ) 2 Variance Bias • By ensuring 0 bias, OLS allows no trade off.
Prediction Policy Problems (Kleinberg et al. 2015) • Machine learning maximize predictive performance by exploiting this variance-bias trade off. • Instead of minimizing only in-sample error, ML minimizes: n ˆ 2 min ( y f ( x )) R ( f ) i i i 1 Variance Bias • R(f) is a regularizer that penalizes functions that create variance. is the price at which we trade off variance to bias. d • OLS: ; LASSO: ; RIDGE: d=2 0 R ( f ) , d 1
Prediction and Causal Questions (Athey 2017) • Pure prediction problems do not answer the more complex question of estimating heterogeneous effects. • Hip surgery example: the benefits of a hip surgery depend on whether the patient lives long enough after the surgery (Kleinberg et al. 2015). • We know the effect of the treatment is negative for the patients that will die, so for them it is easy to decide for no surgery. • However, an important open question remains: which patients should be given priority to receive surgery among the ones that are likely to survive more than one year? Causal question that requires estimating counterfactuals scenarios of the effects of alternative policies of assigning patients to hip surgeries.
From Data Science to Economics: 2. New Tools for Causal Inference • When estimating the causal effect of a given treatment, we want to compare the observed outcome with the hypothetical outcome in the absence of the treatment (counterfactual). • Machine learning methods can be used to build the best predictive model for the counterfactual without the (sometimes excessive) monetary costs of running a randomized controlled experiment. Ex.: compare actual visits to a website following an advertisement campaign (observed outcome) to the predicted visits absent the advertisement (counterfactual outcome) using time series data on past visits, seasonal effects, data on Google queries (pages 22-24 Varian 2014).
Recommend
More recommend