Central Limit Theorem A group of means of N samples drawn from any distribution (even a non-normal distribution) approaches normality as N approaches infinity.
Central Limit Theorem A group of means of N samples drawn from any distribution (even a non-normal distribution) approaches normality as N approaches infinity.
Central Limit Theorem A group of means of N samples drawn from any distribution (even a non-normal distribution) approaches normality as N approaches infinity.
Implication of the Central Limit Theorem Mean of non-normal population can be estimated easily by sampling Draw N samples, compute mean of each sample Compute mean of these means As N -> ∞ this mean of means approaches population mean
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement The Central Limit Theorem only applies to a group of means, so computing multiple samples is key
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Not a very realistic approach in the real world
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Instead modelers choose only to work with data whose distributions are known
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement For normally distributed data we can often work with just one sample to estimate mean
Confidence Intervals from Normal Data Simple random sample . . . Population
Confidence Intervals from Normal Data Simple random sample . . . ↓ Calculate mean of sample ↓ Compute confidence intervals analytically Population
Demo The central limit theorem
Demo Observing the central limit theorem on a real dataset
Drawbacks of Conventional Methods
Drawbacks of Conventional Methods Make strong assumptions of the distribution of data Use analytical formulae to estimate statistics based on data distributions The analytical formula may not exist for certain combinations
Drawbacks of Conventional Methods Need to draw a large number of samples from the population Estimate statistics based on sampling distribution May not be practical or realistic
Estimating Population Statistic Conventional Bootstrap Approach Approach Sample population Sample once; once; calculate resample that sample sample statistic with replacement
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Parametric Method
Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Non-parametric Methods
The basic Bootstrap method is non- parametric, however parametric variants exist too
The Bootstrap Method
Conventional Methods Population Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1
Confidence Intervals from Non-normal Data Sample values . . . ↓ Repeat Calculate mean of each multiple times sample ↓ 97.5% percentile 2.5% percentile Population Sample Means
Bootstrap Method Population Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Draw just one sample from the population
Bootstrap Method Population Sample 1 Draw just one sample from the population
The Bootstrap Sample Population Bootstrap Sample Treat that one sample as if it were the population
Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Draw multiple samples from the one sample with replacement
Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Each of these samples is sometimes called a Bootstrap Replication
Estimate Statistics using the Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 With each bootstrap replication calculate the statistic e.g. mean
Estimate Statistics using the Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Each estimate from a bootstrapped replication is called a bootstrap realization of the statistic
Confidence Intervals using the Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Calculate confidence intervals using the bootstrap distribution of the statistic
Sampling with replacement is essential Else each Bootstrap Replication will merely reproduce the Bootstrap Sample
Sampling with Replacement Reusing the same data multiple times “Bootstrapping” comes from the phrase “pulling yourself up by your own bootstraps” Has empirically been shown to produce meaningful results
Sampling with Replacement Bootstrapping does not create new data Creates the samples that could have been drawn from the original population Assumes that the bootstrap sample accurately represents the population
The Bootstrap Method seems like cheating, but it is both theoretically sound and very robust
The Bootstrap Method and Confidence Intervals
Confidence Intervals with the Bootstrap Method Bootstrap Sample (treated as Population)
Confidence Intervals with the Bootstrap Method Sample values with replacement . . . Bootstrap Sample (treated as Population)
Confidence Intervals with the Bootstrap Method Sample values with replacement . . . ↓ Repeat Calculate mean of each multiple times sample ↓ Bootstrap Sample (treated as Population)
Confidence Intervals with the Bootstrap Method Sample values with replacement . . . ↓ Repeat Calculate mean of each multiple times sample ↓ Bootstrap Sample (treated as Population) Sample Means
Confidence Intervals with the Bootstrap Method Sample values with replacement . . . ↓ Repeat Calculate mean of each multiple times sample ↓ 97.5% percentile 2.5% percentile Bootstrap Sample (treated as Population) Sample Means
The Bootstrap Method Conventional Approach Bootstrap Method Sample population just once if no Sample population just once confidence intervals needed under all circumstances No need to re-sample for Re-sample bootstrap sample confidence intervals for common with replacement under all use-cases circumstances Re-sample population if No change in procedure, works confidence intervals needed for equally well for common and complex cases complex cases
The Bootstrap Method Great for - Arbitrary population (unknown distribution) - Arbitrary statistics (not commonly studied for arbitrary population) - Confidence interval around arbitrary statistics
Recommend
More recommend