Econ 2148, fall 2019 Statistical decision theory Maximilian Kasy - PowerPoint PPT Presentation

Statistical Decision Theory Econ 2148, fall 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53

Statistical Decision Theory Takeaways for this part of class 1. A general framework to think about what makes a “good” estimator, test, etc. 2. How the foundations of statistics relate to those of microeconomic theory. 3. In what sense the set of Bayesian estimators contains most “reasonable” estimators. 2 / 53

Statistical Decision Theory Examples of decision problems ◮ Decide whether or not the hypothesis of no racial discrimination in job interviews is true ◮ Provide a forecast of the unemployment rate next month ◮ Provide an estimate of the returns to schooling ◮ Pick a portfolio of assets to invest in ◮ Decide whether to reduce class sizes for poor students ◮ Recommend a level for the top income tax rate 3 / 53

Statistical Decision Theory Agenda ◮ Basic definitions ◮ Optimality criteria ◮ Relationships between optimality criteria ◮ Analogies to microeconomics ◮ Two justifications of the Bayesian approach 4 / 53

Statistical Decision Theory Basic definitions Components of a general statistical decision problem ◮ Observed data X ◮ A statistical decision a ◮ A state of the world θ ◮ A loss function L ( a , θ ) (the negative of utility) ◮ A statistical model f ( X | θ ) ◮ A decision function a = δ ( X ) 5 / 53

Statistical Decision Theory Basic definitions How they relate ◮ underlying state of the world θ ⇒ distribution of the observation X . ◮ decision maker: observes X ⇒ picks a decision a ◮ her goal: pick a decision that minimizes loss L ( a , θ ) ( θ unknown state of the world) ◮ X is useful ⇔ reveals some information about θ ⇔ f ( X | θ ) does depend on θ . ◮ problem of statistical decision theory: find decision functions δ which “make loss small.” 6 / 53

Statistical Decision Theory Basic definitions Graphical illustration decision function a=δ(X) observed data decision X a statistical model X~f(x,θ) state of the world loss θ L(a,θ) 7 / 53

Statistical Decision Theory Basic definitions Examples ◮ investing in a portfolio of assets: ◮ X : past asset prices ◮ a : amount of each asset to hold ◮ θ : joint distribution of past and future asset prices ◮ L : minus expected utility of future income ◮ decide whether or not to reduce class size: ◮ X : data from project STAR experiment ◮ a : class size ◮ θ : distribution of student outcomes for different class sizes ◮ L : average of suitably scaled student outcomes, net of cost 8 / 53

Statistical Decision Theory Basic definitions Practice problem For each of the examples on slide 2, what are ◮ the data X , ◮ the possible actions a , ◮ the relevant states of the world θ , and ◮ reasonable choices of loss function L ? 9 / 53

Statistical Decision Theory Basic definitions Loss functions in estimation ◮ goal: find an a ◮ which is close to some function µ of θ . ◮ for instance: µ ( θ ) = E [ X ] ◮ loss is larger if the difference between our estimate and the true value is larger Some possible loss functions: 1. squared error loss, L ( a , θ ) = ( a − µ ( θ )) 2 2. absolute error loss, L ( a , θ ) = | a − µ ( θ ) | 10 / 53

Statistical Decision Theory Basic definitions Loss functions in testing ◮ goal: decide whether H 0 : θ ∈ Θ 0 is true ◮ decision a ∈ { 0 , 1 } (accept / reject) Possible loss function:  if a = 1 , θ ∈ Θ 0 1  L ( a , θ ) = if a = 0 , θ / ∈ Θ 0 c  0 else. truth θ ∈ Θ 0 θ / ∈ Θ 0 decision a 0 0 c 1 1 0 11 / 53

Statistical Decision Theory Basic definitions Risk function R ( δ , θ ) = E θ [ L ( δ ( X ) , θ )] . ◮ expected loss of a decision function δ ◮ R is a function of the true state of the world θ . ◮ crucial intermediate object in evaluating a decision function ◮ small R ⇔ good δ ◮ δ might be good for some θ , bad for other θ . ◮ Decision theory deals with this trade-off. 12 / 53

Statistical Decision Theory Basic definitions Example: estimation of mean ◮ observe X ∼ N ( µ , 1 ) ◮ want to estimate µ ◮ L ( a , θ ) = ( a − µ ( θ )) 2 ◮ δ ( X ) = α + β · X Practice problem (Estimation of means) Find the risk function for this decision problem. 13 / 53

Statistical Decision Theory Basic definitions Variance / Bias trade-off Solution: R ( δ , µ ) = E [( δ ( X ) − µ ) 2 ] = Var ( δ ( X ))+ Bias ( δ ( X )) 2 = β 2 Var ( X )+( α + β E [ X ] − E [ X ]) 2 = β 2 +( α +( β − 1 ) µ ) 2 . ◮ equality 1 and 2: always true for squared error loss ◮ Choosing β (and α ) involves a trade-off of bias and variance, ◮ this trade-off depends on µ . 14 / 53

Statistical Decision Theory Optimality criteria Optimality criteria ◮ Ranking provided by the risk function is multidimensional: ◮ a ranking of performance between decision functions for every θ ◮ To get a global comparison of their performance, have to aggregate this ranking into a global ranking. ◮ preference relationship on space of risk functions ⇒ preference relationship on space of decision functions 15 / 53

Statistical Decision Theory Optimality criteria Illustrations for intuition ◮ Suppose θ can only take two values, ◮ ⇒ risk functions are points in a 2D-graph, ◮ each axis corresponds to R ( δ , θ ) for θ = θ 0 , θ 1 . R(.,θ 1 ) R(.,θ 0 ) 16 / 53

Statistical Decision Theory Optimality criteria Three approaches to get a global ranking 1. partial ordering : a decision function is better relative to another if it is better for every θ 2. complete ordering, weighted average : a decision function is better relative to another if a weighted average of risk across θ is lower weights ∼ prior distribution 3. complete ordering, worst case : a decision function is better relative to another if it is better under its worst-case scenario. 17 / 53

Statistical Decision Theory Optimality criteria Approach 1: Admissibility Dominance: δ is said to dominate another function δ ′ if R ( δ , θ ) ≤ R ( δ ′ , θ ) for all θ , and R ( δ , θ ) < R ( δ ′ , θ ) for at least one θ . Admissibility: decisions functions which are not dominated are called admissible, all other decision functions are inadmissible. 18 / 53

Statistical Decision Theory Optimality criteria R(.,θ 1 ) feasible admissible R(.,θ 0 ) 19 / 53

Statistical Decision Theory Optimality criteria ◮ admissibility ∼ “Pareto frontier” ◮ Dominance only generates a partial ordering of decision functions. ◮ in general: many different admissible decision functions. 20 / 53

Statistical Decision Theory Optimality criteria Practice problem ◮ you observe X i ∼ iid N ( µ , 1 ) , i = 1 ,..., n for n > 1 ◮ your goal is to estimate µ , with squared error loss ◮ consider the estimators 1. δ ( X ) = X 1 2. δ ( X ) = 1 n ∑ i X i ◮ can you show that one of them is inadmissible? 21 / 53

Statistical Decision Theory Optimality criteria Approach 2: Bayes optimality ◮ natural approach for economists: ◮ trade off risk across different θ ◮ by assigning weights π ( θ ) to each θ Integrated risk: � R ( δ , π ) = R ( δ , θ ) π ( θ ) d θ . 22 / 53

Statistical Decision Theory Optimality criteria Bayes decision function: minimizes integrated risk, δ ∗ = argmin R ( δ , π ) . δ ◮ Integrated risk ∼ linear indifference planes in space of risk functions ◮ prior ∼ normal vector for indifference planes 23 / 53

Statistical Decision Theory Optimality criteria R(.,θ 1 ) R(δ*,.) π(θ) R(.,θ 0 ) 24 / 53

Statistical Decision Theory Optimality criteria Decision weights as prior probabilities � π ( θ ) d θ < ∞ ◮ suppose 0 < � π ( θ ) d θ = 1 (normalize) ◮ then wlog ◮ if additionally π ≥ 0 ◮ then π is called a prior distribution 25 / 53

Statistical Decision Theory Optimality criteria Posterior ◮ suppose π is a prior distribution ◮ posterior distribution: π ( θ | X ) = f ( X | θ ) π ( θ ) m ( X ) ◮ normalizing constant = prior likelihood of X � m ( X ) = f ( X | θ ) π ( θ ) d θ 26 / 53

Statistical Decision Theory Optimality criteria Practice problem ◮ you observe X ∼ N ( θ , 1 ) ◮ consider the prior θ ∼ N ( 0 , τ 2 ) ◮ calculate 1. m ( X ) 2. π ( θ | X ) 27 / 53

Statistical Decision Theory Optimality criteria Posterior expected loss � R ( δ , π | X ) := L ( δ ( X ) , θ ) π ( θ | X ) d θ Proposition Any Bayes decision function δ ∗ can be obtained by minimizing R ( δ , π | X ) through choice of δ ( X ) for every X . Practice problem Show that this is true. Hint: show first that � R ( δ , π ) = R ( δ ( X ) , π | X ) m ( X ) dX . 28 / 53

Statistical Decision Theory Optimality criteria Bayes estimator with quadratic loss ◮ assume quadratic loss, L ( a , θ ) = ( a − µ ( θ )) 2 ◮ posterior expected loss: R ( δ , π | X ) = E θ | X [ L ( δ ( X ) , θ ) | X ] ( δ ( X ) − µ ( θ )) 2 | X � � = E θ | X = Var ( µ ( θ ) | X )+( δ ( X ) − E [ µ ( θ ) | X ]) 2 ◮ Bayes estimator minimizes posterior expected loss ⇒ δ ∗ ( X ) = E [ µ ( θ ) | X ] . 29 / 53

Econ 2148, fall 2019 Statistical decision theory Maximilian Kasy - PowerPoint PPT Presentation

Statistical Decision Theory Econ 2148, fall 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Statistical Decision Theory Takeaways for this part of class 1. A general framework to think

Econ 2148, fall 2017 Statistical decision theory Maximilian Kasy Department of Economics,

Econ 2148, fall 2019 Applications of Gaussian process priors Maximilian Kasy Department of

Econ 2148, fall 2019 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2019 Data visualization Maximilian Kasy Department of Economics, Harvard

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

Econ 2148, fall 2019 Text as data Maximilian Kasy Department of Economics, Harvard University 1

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Econ 2148, fall 2019 Trees, forests, and causal trees Maximilian Kasy Department of Economics,

Econ 2148, fall 2019 Instrumental variables II, continuous treatment Maximilian Kasy Department

Econ 2148, fall 2019 Instrumental variables I, origins and binary treatment case Maximilian Kasy

Hypothesis testing and statistical decision theory Lirong Xia Fall, 2016 Schedule

Econ 2148, fall 2017 Applications of Gaussian process priors Maximilian Kasy Department of

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Shrinkage in the Normal means model Maximilian Kasy Department of

Econ 2148, fall 2017 Instrumental variables II, continuous treatment Maximilian Kasy Department

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy

MIT 9.520/6.860, Fall 2019 Statistical Learning Theory and Applications Class 02: Statistical

Advanced Econometrics 2, Hilary term 2021 Statistical decision theory Maximilian Kasy Department

Advanced Econometrics 2, Hilary term 2020 Statistical decision theory Maximilian Kasy Department

Statistical decision theory with economic incentives Aleksey Tetenov (University of Bristol)

Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised

Hypothesis testing and statistical decision theory Lirong Xia March 25, 2016 Schedule

Some Finitely Additive (Statistical) Decision Theory or How Bruno de Finetti might have channeled

2.6 Statistical Inference ECON 480 Econometrics Fall 2020 Ryan Safner Assistant