Performance Evaluation of Policies Performance Evaluation of Policies and Programmes Adam B. Jaffe Director, Motu Economic and Public Policy Research Treasury/MBIE Seminar Treasury/MBIE Seminar 27 August 2013
Background g • Evidence-based policy, etc. • Sk Skepticism? ti i ?
The Problem • A policy or a programme is like a new drug. We A li i lik d W would like to know if it is effective, and how its effectiveness compares to alternatives. ff ti t lt ti • With a drug, it is not enough that the patient gets better. With a policy, it is not enough that the policy goal is met. • Want to measure the treatment effect, i.e. how the state of the policy objectives compares to p y j p what it would have been without the policy.
We’d like to know… • M Magnitude of impacts (“outputs” and “outcomes”) it d f i t (“ t t ” d “ t ”) • Magnitude of impacts relative to resources required (cost-effectiveness) • Relative effectiveness of different instruments or approaches • Relative effectiveness in different contexts Relative effectiveness in different contexts (conditional cost-effectiveness)
Examples p • H Health service delivery modes lth i d li d • Scholarships • Tax subsidies • Regulations Regulations • Grant programs • ……
Analytical Issues y • O t Outputs and outcomes that are hard to measure t d t th t h d t • Long and/or uncertain lags between action and outcomes • Characterizing the unobserved “but for” world g Selection bias in programme participation • • Others I will not say much about: Others I will not say much about: • Incremental versus average impact • G General equilibrium effects l ilib i ff t
Thought on metrics g • Quantify where possible, but… • Non-quantifiable doesn’t mean unimportant • Multiple metrics Multiple metrics • Tradeoff between comparability and precision • Al Almost always proxy or indicator rather than t l i di t th th “true” variable • Measurement (random) error • Behavioral changes in response to evaluation Long/uncertain lags ongoing evaluation •
Isolating the Treatment Effect g • Typically, start by comparing performance of Typically, start by comparing performance of treated group before and after the treatment • • Issues Issues • Placebo effect • • Regression to the mean Regression to the mean • Sectoral trends • C Compare change in treated group to change in h i t t d t h i “control group”
“Difference in difference” approach pp • “Gold Standard is DID with Random Assignment Gold Standard is DID with Random Assignment (“RA”) to treatment group and control group
Hypothetical Comparison of Mean Sales Growth for Funded and Unfunded Firms Ignoring Selection Bias 30 25 Mean=20.8 20 Sales Growth h 15 Mean=12.5 10 "Treatment Effect" = 8.3 5 5 Unfunded Firms Funded Firms 0
Selection Bias • Frequently, government program provides Frequently, government program provides assistance to some individuals or firms but not to others others • Makes those not provided assistance a natural control group, but… control group but • Programme targets are chosen on the basis of need (unemployed; under achieving students) need (unemployed; under-achieving students), or expectation of success (scholarships; research grants) research grants) • Creates selection bias in difference-in-difference analysis
Regression Discontinuity (“RD”) g y ( ) Approach to Selection Bias • Retain information on ranking used to select individuals or firms for participation in the p p program • Use this measure of qualification or need as Use this measure of qualification or need as regressor in explaining subsequent success of treated and untreated groups treated and untreated groups • Dummy variable for program participation then captures treatment effect after controlling for captures treatment effect after controlling for selection effect
Hypothetical Comparison of Mean Sales Growth for Funded and Unfunded Firms Controlling for Selection Bias via Project Ranking at Application 30 25 20 owth Sales Gro 15 Treatment Effect= Regression Discontinuity=3 10 5 Unfunded Firms Funded Firms 0 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Project Ranking at Application
Regression Discontinuity (“RD”) g y ( ) Approach to Selectivity Bias • Statistically controls for the source of non- random difference between the treated and untreated groups • Works for positive or negative selection effect p g • Requires retention of information about criteria for selection • Requires ability to measure success of both treated and untreated individuals/firms • Note: if the selection criteria are not, in fact, correlated with success, then slope will be zero but RD measure of treatment effect is still unbiased
RD versus Random Assignment g • Both approaches measure the average Both approaches measure the average treatment effect for treated entities • • If the treatment effect were uniform for all If the treatment effect were uniform for all entities, then RD reproduces the result of random assignment random assignment • More likely, the magnitude of the treatment effect may be correlated with the selection measure may be correlated with the selection measure • Most appropriate targets may get biggest boost; or • D Decreasing returns may limit effect for most qualified i t li it ff t f t lifi d • Has implications for potential expansion of program to previously untreated group
Hypothetical Comparison of Mean Sales Growth for Funded and Unfunded Firms Controlling for Selection Bias via Project Ranking at Application 30 25 20 owth Sales Gro 15 Treatment Effect= Regression Discontinuity=3 10 5 Unfunded Firms Funded Firms 0 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Project Ranking at Application
RD versus Random RA • RA always produces unbiased estimate of RA always produces unbiased estimate of average effect, but tells you nothing about the underlying variation in efficacy underlying variation in efficacy • Note that in social settings, neither typically deals with placebo effect deals with placebo effect • Both methods require tracking of untreated group; not clear which approach makes this group; not clear which approach makes this easier
Example of RD Approach p pp • “Reading First” was a billion-dollar program to Reading First was a billion dollar program to introduce new pedagogy, new student evaluation measures, and specific teacher training methods measures, and specific teacher training methods to improve reading performance of 1 st -3 rd graders graders • Schools were chosen for the program using a ranking index based on poverty rates and ranking index based on poverty rates and fraction of students reading below grade level • Evaluation was carried out over three years in Evaluation was carried out over three years in 248 schools, 125 of which were Reading First Schools Schools
RD Analysis of Impact of Reading y p g First Source: Abt Associates, Reading First Final Report, 2008
Public Research Programmes g • Need to track performance of unsuccessful Need to track performance of unsuccessful applicants • Condition for eligibility to begin with? Condition for eligibility to begin with? • System of identifiers combined with external data— StarMetrics approach pp • Outputs and outcomes are hard to measure and subject to measurement response subject to measurement response • Routine/ongoing rather than episodic
Concluding Thoughts g g • Combination of faith and hard-to-measure Combination of faith and hard to measure outcomes • • Accept that some questions are not answerable: Accept that some questions are not answerable: • Relative effectiveness across policies with incommensurable outcomes incommensurable outcomes • Incremental versus marginal • • GE effects GE effects • Perfect should not be the enemy of good • B t But a little knowledge is a dangerous thing littl k l d i d thi • Long lags as an advantage?
Advert Science and Innovation Policy for New Science and Innovation Policy for New Zeland Motu Public Policy Seminar Motu Public Policy Seminar Wednesday 04 September Spectrum Theater BP House Spectrum Theater, BP House Veronica Jacobsen, Discussant
Recommend
More recommend