Gov 2000: 5. Estimation and Statistical Inference Matthew Blackwell Fall 2016 1 / 56
1. Point Estimation 2. Properties of Estimators 3. Interval Estimation 4. Where Do Estimators Come From?* 5. Wrap up 2 / 56
check it out. Housekeeping course this week. 3 / 56 • This Thursday, 10/6: HW 3 due, HW 4 goes out. • Next Thursday, 10/13: HW 4 due, HW 5 goes out. • Thursday, 10/20: HW 5 due, Midterm available. • Midterm: ▶ Check-out exam: you have 8 hours to complete it once you ▶ Answers must be typeset, as usual. ▶ You should have more than enough time. ▶ We’ll post practice midterms in advance. • Evaluations: we’ll be fjelding an anonymous survey about the
Where are we? Where are we going? with real data. can use it as a best guess for 𝜈 ? 4 / 56 • Last few weeks: probability, learning how to think about r.v.s • Now: how to estimate features of underlying distributions • Build on last week: if the sample mean will be “close” to 𝜈 ,
1/ Point Estimation 5 / 56
Motivating example 6 / 56 • Gerber, Green, and Larimer (APSR, 2008)
Motivating Example load("../data/gerber_green_larimer.RData") ## turn turnout variable into a numeric "Neighbors"]) neigh.mean ## [1] 0.378 contr.mean <- mean(social$voted[social$treatment == "Civic Duty"]) contr.mean ## [1] 0.315 neigh.mean - contr.mean ## [1] 0.0634 7 / 56 social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == • Is this a “real”? Is it big?
( 𝑍 − 𝑌 )? Why study estimators? estimate the difgerence among two subsets of the data (male and female, for instance) and then take the weighted average of the two ( 𝑎 is the share of women): 8 / 56 • Goal 1: Inference ▶ What is our best guess about some quantity of interest? ▶ What are a set of plausible values of the quantity of interest? • Goal 2: Compare estimators ▶ In an experiment, use simple difgerence in sample means ▶ Or the post-stratifjcation estimator, where we estimate the (𝑍 𝑔 − 𝑌 𝑔 )𝑎 + (𝑍 𝑛 − 𝑌 𝑛 )(1 − 𝑎) ▶ Which (if either) is better? How would we know?
Samples from the population population. 9 / 56 • Our focus: 𝑍 1 , … , 𝑍 𝑜 are i.i.d. draws from 𝑔 (𝑧) ▶ e.g.: 𝑍 𝑗 = 1 if citizen 𝑗 votes, 𝑍 𝑗 = 0 otherwise. ▶ i.i.d. can be justifjed through random sampling from a ▶ 𝑔 (𝑧) is often called the population distribution • Statistical inference or learning is using data to infer 𝑔 (𝑧) .
Point estimation value of some fjxed, unknown quantity of interest, 𝜄 . between two groups. (regression). 10 / 56 • Point estimation: providing a single “best guess” as to the ▶ 𝜄 is a feature of the population distribution, 𝑔 (𝑧) ▶ Also called: estimands, parameters. • Examples of quantities of interest: ▶ 𝜈 = 𝔽[𝑍 𝑗 ] : the mean (turnout rate in the population). ▶ 𝜏 2 = 𝕎[𝑍 𝑗 ] : the variance. ▶ 𝜈 𝑧 − 𝜈 𝑦 = 𝔽[𝑍] − 𝔽[𝑌] : the difgerence in mean turnout ▶ 𝑠(𝑦) = 𝔽[𝑍|𝑌 = 𝑦] : the conditional expectation function • These are the things we want to learn about.
Estimators ̂ convergence in probability/distribution. 𝜄 2 , …} is a sequence of r.v.s, so we can think about ̂ Estimator 11 / 56 • ̂ ̂ An estimator, 𝜄 𝑜 of some parameter 𝜄 , is a function of the sample: 𝜄 𝑜 = ℎ(𝑍 1 , … , 𝑍 𝑜 ) . 𝜄 𝑜 is a r.v. because it is a function of r.v.s. ▶ ⇝ 𝜄 𝑜 has a distribution. ▶ { ̂ 𝜄 1 , ̂ • An estimate is one particular realization of the estimator/r.v.
Examples of Estimators possible estimators: ▶ ̂ ▶ ̂ ▶ ̂ ▶ ̂ 12 / 56 • For the population expectation, 𝜈 , we have many difgerent 𝜄 𝑜 = 𝑍 𝑜 the sample mean 𝜄 𝑜 = 𝑍 1 just use the fjrst observation 𝜄 𝑜 = max (𝑍 1 , … , 𝑍 𝑜 ) 𝜄 𝑜 = 3 always guess 3
Understanding check estimate was the sample mean and my estimator was 0.38”? 13 / 56 • Question Why is the following statement wrong: “My
The three distributions example) repeated samples from the population distribution from this distribution 14 / 56 • Population Distribution: the data-generating process ▶ Bernoulli in the case of the social pressure/voter turnout • Empirical distribution: 𝑍 1 , … , 𝑍 𝑜 ▶ series of 1s and 0s in the sample • Sampling distribution: distribution of the estimator over ▶ the 0.38 sample mean in the “Neighbors” group is one draw
Sampling distribution, in pictures 𝑜 1 , … , 𝑍 𝑙−1 𝑜 } ̂ 𝜄 𝑙−1 {𝑍 𝑙 ⋮ 1 , … , 𝑍 𝑙 𝑜 } ̂ 𝜄 𝑙 𝑜 sampling distribution {𝑍 𝑙−1 ⋮ 𝑔 (𝑧) 𝑜 population distribution ̂ 𝜄 𝑜 estimator {𝑍 1 ̂ 𝜄 1 𝑜 {𝑍 2 ̂ 𝜄 2 15 / 56 1 , … , 𝑍 1 𝑜 } 1 , … , 𝑍 2 𝑜 }
Sampling distribution ## now we take the mean of one sample, which is one ## draw from the **sampling distribution** mean(my.samp) ## [1] 0.2 ## let's take another draw from the population dist my.samp.2 <- rbinom(n = 10, size = 1, prob = 0.4) ## Let's feed this sample to the sample mean ## estimator to get another estimate, which is ## another draw from the sampling distribution mean(my.samp.2) ## [1] 0.4 16 / 56 my.samp <- rbinom(n = 10, size = 1, prob = 0.4)
Sampling distribution by simulation the sample mean here when 𝑜 = 100 . nsims <- 10000 mean.holder <- rep(NA, times = nsims) mean.holder[i] <- mean(my.samp) ## sample mean first.holder[i] <- my.samp[1] ## first obs } 17 / 56 • Let’s generate 10,000 draws from the sampling distribution of for (i in 1:nsims) { my.samp <- rbinom(n = 100, size = 1, prob = 0.4)
Sampling distribution versus population distribution 18 / 56 5000 Population Distribution Frequency 3000 Sampling Distribution 1000 0 0.0 0.2 0.4 0.6 0.8 1.0
Question The sampling distribution refers to the distribution of 𝜄 , true or false. 19 / 56
2/ Properties of Estimators 20 / 56
Properties of estimators ̂ 𝜄 𝑜 . true value. fjxed sample size 𝑜 . let 𝑜 → ∞ . 21 / 56 • We only get one draw from the sampling distribution, • Want to use estimators whose distribution is “close” to the • There are two ways we evaluate estimators: ▶ Finite sample: the properties of its sampling distribution for a ▶ Large sample: the properties of the sampling distribution as we
Running example 𝑧 𝑦 ̂ 22 / 56 • Two independent random samples (treatment/control): ▶ 𝑍 1 , … , 𝑍 𝑜 𝑧 are i.i.d. with mean 𝜈 𝑧 and variance 𝜏 2 ▶ 𝑌 1 , … , 𝑌 𝑜 𝑦 are i.i.d. with mean 𝜈 𝑦 and variance 𝜏 2 ▶ Overall sample size 𝑜 = 𝑜 𝑧 + 𝑜 𝑦 • Parameter is the population difgerence in means, which is the treatment efgect of the social pressure mailer: 𝜈 𝑧 − 𝜈 𝑦 • Estimator is the difgerence in sample means: 𝐸 𝑜 = 𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦
Finite-sample properties ▶ 𝜄 𝑜 ] 𝜄 𝑜 ] . Let ̂ 𝜄 𝑜 ] = 0 ̂ 𝜄 𝑜 ] − 𝜄 23 / 56 𝜄 𝑜 be a estimator of 𝜄 . Then we have the following defjnitions: • bias [ ̂ 𝜄 𝑜 ] = 𝔽[ ̂ 𝜄 𝑜 is unbiased if bias [ ̂ ▶ Last week: 𝑌 𝑜 is unbiased for 𝜈 since 𝔽[𝑌 𝑜 ] = 𝜈 • Sampling variance is 𝕎[ ̂ ▶ Example: 𝕎[𝑌 𝑜 ] = 𝜏 2 /𝑜 • Standard error is se [ ̂ 𝜄 𝑜 ] = √𝕎[ ̂ ▶ Example: se [𝑌 𝑜 ] = 𝜏/√𝑜
Diff-in-means finite sample 𝑦 𝑜 𝑦 𝑦 𝑜 𝑧 𝑧 se [̂ properites 𝑜 𝑦 𝑜 𝑧 𝑧 24 / 56 • Unbiasedness from unbiasedness of sample means: 𝔽[𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 ] = 𝔽[𝑍 𝑜 𝑧 ] − 𝔽[𝑌 𝑜 𝑦 ] = 𝜈 𝑧 − 𝜈 𝑦 • Sampling variance, by independent samples: 𝕎[𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 ] = 𝕎[𝑍 𝑜 𝑧 ] + 𝕎[𝑌 𝑜 𝑦 ] = 𝜏 2 + 𝜏 2 • Standard error: + 𝜏 2 𝐸 𝑜 ] = √𝜏 2
𝜄 𝑜 ] Mean squared error lower overall MSE. 25 / 56 • Mean squared error or MSE is MSE = 𝔽[( ̂ 𝜄 𝑜 − 𝜄) 2 ] • The MSE assesses the quality of an estimator. ▶ How big are (squared) deviations from the true parameter? ▶ Ideally, this would be as low as possible! • Useful decomposition result: 𝜄 𝑜 ] 2 + 𝕎[ ̂ MSE = bias [ ̂ • ⇝ for unbiased estimators, MSE is the sampling variance. • Might accept some bias for large reductions in variance for
Consistency answers! 𝐸 𝑜 ] → 0 𝑜 𝑦 𝑦 𝑧 is consistent. 𝜄 𝑜 𝜄 𝑜 ] → 0 as 𝑜 → ∞ , then ̂ 26 / 56 ̂ 𝜄 𝑜 ̂ 𝑞 → 𝜄 . • An estimator is consistent if ▶ Distribution of 𝜄 𝑜 collapses on 𝜄 as 𝑜 → ∞ . ▶ WLLN: 𝑌 𝑜 is consistent for 𝜈 . ▶ Inconsistent estimator are bad bad bad: more data gives worse • Theorem: If bias [ ̂ 𝜄 𝑜 ] → 0 and se [ ̂ • Example: Difgerence-in-means. 𝐸 𝑜 ] = 𝜏 2 𝑜 𝑧 + 𝜏 2 ▶ ̂ 𝐸 𝑜 is unbiased with 𝕎[̂ ▶ ⇝ ̂ 𝐸 𝑜 consistent since 𝕎[̂ • NB: Unbiasedness does not imply consistency, nor vice versa.
Recommend
More recommend