Gov 51: Expectation, Variance, and Sample Means Matthew Blackwell Harvard University 1 / 13
Remember our goal Population Sample probability inference • We want to learn about the chance process that generated our data. • Last time: entire probability distributions. Is there something simpler? 2 / 13
How can we summarize distributions? • Two numerical summaries of the distribution are useful. 1. Mean/expectaion : where the center of the distribution is. 2. Variance/standard deviation : how spread out the distribution is around the center. • These are population parameters so we don’t get to observe them. • We won’t get to observe them… • but we’ll use our sample to learn about them 3 / 13
Two ways to calculate averages • Calculate the average of: { 𝟤 , 𝟤 , 𝟤 , 𝟦 , 𝟧 , 𝟧 , 𝟨 , 𝟨 } 𝟤 + 𝟤 + 𝟤 + 𝟦 + 𝟧 + 𝟧 + 𝟨 + 𝟨 𝟫 = 𝟦 • Alternative way to calculate average based on frequency weights : • Each value times how often that value occurs in the data. • We’ll use this intuition to create an average/mean for r.v.s. 4 / 13 𝟤 × 𝟦 𝟫 + 𝟦 × 𝟤 𝟫 + 𝟧 × 𝟥 𝟫 + 𝟨 × 𝟥 𝟫 = 𝟦
Expectation • We write 𝔽( 𝘠 ) for the mean of an r.v. 𝘠 . • For discrete 𝘠 ∈ { 𝘺 𝟤 , 𝘺 𝟥 , … , 𝘺 𝘭 } with 𝘭 levels: 𝔽[ 𝘠 ] = 𝘭 ∑ 𝘬 = 𝟤 𝘺 𝘬 ℙ( 𝘠 = 𝘺 𝘬 ) • Weighted average of the values of the r.v. weighted by the probability of each value occurring. • If 𝘠 is age of randomly selected registered voter, then 𝔽( 𝘠 ) is the average age in the population of registered voters. • Notation notes: • Lots of other ways to refer to this: expectation or expected value • Often called the population mean to distinguish from the sample mean. 5 / 13
Properties of the expected value • We use properties of 𝔽( 𝘠 ) to avoid using the formula every time. • Let 𝘠 and 𝘡 be r.v.s and 𝘣 and 𝘤 be constants. 1. 𝔽( 𝘣 ) = 𝘣 • Constants don’t vary. 2. 𝔽( 𝘣𝘠 ) = 𝘣 𝔽( 𝘠 ) • Suppose 𝘠 is income in dollars, income in $10k is just: 𝘠 / 𝟤𝟣𝟣𝟣𝟣 • Mean of this new variable is mean of income in dollars divided by 10,000. 3. 𝔽( 𝘣𝘠 + 𝘤𝘡 ) = 𝘣 𝔽( 𝘠 ) + 𝘤 𝔽( 𝘡 ) • Expectations can be distributed across sums. • 𝘠 is partner 1’s income, 𝘡 is partner 2’s income. • Mean household income is the sum of the each partner’s income. 6 / 13
Variance • The variance measures the spread of the distribution: 𝕎[ 𝘠 ] = 𝔽[( 𝘠 − 𝔽[ 𝘠 ]) 𝟥 ] • Weighted average of the squared distances from the mean. • If 𝘠 is the age of a randomly selected registered voter, 𝕎[ 𝘠 ] is the usual sample variance of age in the population. • Sometimes called population variance to contrast with sample variance. • Useful because it’s on the scale of the original variable. 7 / 13 • Larger deviations ( + or − ) ⇝ higher variance • Standard deviation : square root of the variance: 𝘛𝘌 ( 𝘠 ) = √𝕎[ 𝘠 ] .
Properties of variances • Some properties of variance useful for calculation. 1. If 𝘤 is a constant, then 𝕎[ 𝘤 ] = 𝟣 . 2. If 𝘣 and 𝘤 are constants, 𝕎[ 𝘣𝘠 + 𝘤 ] = 𝘣 𝟥 𝕎[ 𝘠 ]. 3. In general, 𝕎[ 𝘠 + 𝘡 ] ≠ 𝕎[ 𝘠 ] + 𝕎[ 𝘡 ] . • If 𝘠 and 𝘡 are independent, then 𝕎[ 𝘠 + 𝘡 ] = 𝕎[ 𝘠 ] + 𝕎[ 𝘡 ] 8 / 13
Sums and means are random variables • The sample mean is a function of sums and so it is a r.v. too: 𝟥 • Example: the average age of two randomly selected respondents. 9 / 13 • If 𝘠 𝟤 and 𝘠 𝟥 are r.v.s, then 𝘠 𝟤 + 𝘠 𝟥 is a r.v. • Has a mean 𝔽[ 𝘠 𝟤 + 𝘠 𝟥 ] and a variance 𝕎[ 𝘠 𝟤 + 𝘠 𝟥 ] 𝘠 = 𝘠 𝟤 + 𝘠 𝟥
Distribution of sums/means ⋮ draw 4 68 28 96 48 ⋮ ⋮ 82 ⋮ ⋮ distribution of the sum distribution of the mean 41 48 𝘠 𝟤 76 𝘠 𝟥 𝘠 draw 1 44 32 38 34 draw 2 27 50 77 38.5 draw 3 10 / 13 𝘠 𝟤 + 𝘠 𝟥
Independent and identical r.v.s • Independent and identically distributed r.v.s, 𝘠 𝟤 , … , 𝘠 𝘰 • Random sample of 𝘰 respondents on a survey question. • Written “i.i.d.” • 𝔽( 𝘠 𝟤 ) = 𝔽( 𝘠 𝟥 ) = ⋯ = 𝔽( 𝘠 𝘰 ) = 𝜈 11 / 13 • Independent : value that 𝘠 𝘫 takes doesn’t afgect distribution of 𝘠 𝘬 • Identically distributed : distribution of 𝘠 𝘫 is the same for all 𝘫 • 𝕎( 𝘠 𝟤 ) = 𝕎( 𝘠 𝟥 ) = ⋯ = 𝕎( 𝘠 𝘰 ) = 𝜏 𝟥
Distribution of the sample mean • Sample mean of i.i.d. random variables: 𝘰 12 / 13 𝘠 𝘰 = 𝘠 𝟤 + 𝘠 𝟥 + ⋯ + 𝘠 𝘰 • 𝘠 𝘰 is a random variable, what is its distribution? • What is the expectation of this distribution, 𝔽[ 𝘠 𝘰 ] ? • What is the variance of this distribution, 𝕎[ 𝘠 𝘰 ] ?
Properties of the sample mean Mean and variance of the sample mean 𝘰 • Key insights: • Sample mean is on average equal to the population mean sample size • Standard deviation of the sample mean is called its standard error : 𝜏 √ 𝘰 13 / 13 Suppose that 𝘠 𝟤 , … , 𝘠 𝘰 are i.i.d. r.v.s with 𝔽[ 𝘠 𝘫 ] = 𝜈 and 𝕎[ 𝘠 𝘫 ] = 𝜏 𝟥 . Then: 𝕎[ 𝘠 𝘰 ] = 𝜏 𝟥 𝔽[ 𝘠 𝘰 ] = 𝜈 • Variance of 𝘠 𝘰 depends on the population variance of 𝘠 𝘫 and the 𝘛𝘍 = √𝕎[ 𝘠 𝘰 ] =
Recommend
More recommend