implementing bootstrap methods in r
play

Implementing Bootstrap Methods in R GETTING STARTED WITH - PowerPoint PPT Presentation

Implementing Bootstrap Methods in R GETTING STARTED WITH BOOTSTRAPPING IN R Janani Ravi CO-FOUNDER, LOONYCORN www.loonycorn.com Estimating statistics and calculating Overview confidence intervals The Central Limit Theorem Conventional


  1. Central Limit Theorem A group of means of N samples drawn from any distribution (even a non-normal distribution) approaches normality as N approaches infinity.

  2. Central Limit Theorem A group of means of N samples drawn from any distribution (even a non-normal distribution) approaches normality as N approaches infinity.

  3. Central Limit Theorem A group of means of N samples drawn from any distribution (even a non-normal distribution) approaches normality as N approaches infinity.

  4. Implication of the Central Limit Theorem Mean of non-normal population can be estimated easily by sampling Draw N samples, compute mean of each sample Compute mean of these means As N -> ∞ this mean of means approaches population mean

  5. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement The Central Limit Theorem only applies to a group of means, so computing multiple samples is key

  6. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Not a very realistic approach in the real world

  7. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement

  8. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Instead modelers choose only to work with data whose distributions are known

  9. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement For normally distributed data we can often work with just one sample to estimate mean

  10. Confidence Intervals from Normal Data Simple random sample . . . Population

  11. Confidence Intervals from Normal Data Simple random sample . . . ↓ Calculate mean of sample ↓ Compute confidence intervals analytically Population

  12. Demo The central limit theorem

  13. Demo Observing the central limit theorem on a real dataset

  14. Drawbacks of Conventional Methods

  15. Drawbacks of Conventional Methods Make strong assumptions of the distribution of data Use analytical formulae to estimate statistics based on data distributions The analytical formula may not exist for certain combinations

  16. Drawbacks of Conventional Methods Need to draw a large number of samples from the population Estimate statistics based on sampling distribution May not be practical or realistic

  17. Estimating Population Statistic Conventional Bootstrap Approach Approach Sample population Sample once; once; calculate resample that sample sample statistic with replacement

  18. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement

  19. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Parametric Method

  20. Establishing Confidence Intervals Conventional Bootstrap Approach Approach Sample multiple Sample once; make Sample once; times with or without strong assumptions resample that sample out replacement about population with replacement Non-parametric Methods

  21. The basic Bootstrap method is non- parametric, however parametric variants exist too

  22. The Bootstrap Method

  23. Conventional Methods Population Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1

  24. Confidence Intervals from Non-normal Data Sample values . . . ↓ Repeat Calculate mean of each multiple times sample ↓ 97.5% percentile 2.5% percentile Population Sample Means

  25. Bootstrap Method Population Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Draw just one sample from the population

  26. Bootstrap Method Population Sample 1 Draw just one sample from the population

  27. The Bootstrap Sample Population Bootstrap Sample Treat that one sample as if it were the population

  28. Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Draw multiple samples from the one sample with replacement

  29. Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Each of these samples is sometimes called a Bootstrap Replication

  30. Estimate Statistics using the Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 With each bootstrap replication calculate the statistic e.g. mean

  31. Estimate Statistics using the Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Each estimate from a bootstrapped replication is called a bootstrap realization of the statistic

  32. Confidence Intervals using the Bootstrap Method Sample ∞ Sample 2 Sample 4 Sample 3 Sample 1 Calculate confidence intervals using the bootstrap distribution of the statistic

  33. Sampling with replacement is essential Else each Bootstrap Replication will merely reproduce the Bootstrap Sample

  34. Sampling with Replacement Reusing the same data multiple times “Bootstrapping” comes from the phrase “pulling yourself up by your own bootstraps” Has empirically been shown to produce meaningful results

  35. Sampling with Replacement Bootstrapping does not create new data Creates the samples that could have been drawn from the original population Assumes that the bootstrap sample accurately represents the population

  36. The Bootstrap Method seems like cheating, but it is both theoretically sound and very robust

  37. The Bootstrap Method and Confidence Intervals

  38. Confidence Intervals with the Bootstrap Method Bootstrap Sample (treated as Population)

  39. Confidence Intervals with the Bootstrap Method Sample values with replacement . . . Bootstrap Sample (treated as Population)

  40. Confidence Intervals with the Bootstrap Method Sample values with replacement . . . ↓ Repeat Calculate mean of each multiple times sample ↓ Bootstrap Sample (treated as Population)

  41. Confidence Intervals with the Bootstrap Method Sample values with replacement . . . ↓ Repeat Calculate mean of each multiple times sample ↓ Bootstrap Sample (treated as Population) Sample Means

  42. Confidence Intervals with the Bootstrap Method Sample values with replacement . . . ↓ Repeat Calculate mean of each multiple times sample ↓ 97.5% percentile 2.5% percentile Bootstrap Sample (treated as Population) Sample Means

  43. The Bootstrap Method Conventional Approach Bootstrap Method Sample population just once if no Sample population just once confidence intervals needed under all circumstances No need to re-sample for Re-sample bootstrap sample confidence intervals for common with replacement under all use-cases circumstances Re-sample population if No change in procedure, works confidence intervals needed for equally well for common and complex cases complex cases

  44. The Bootstrap Method Great for - Arbitrary population (unknown distribution) - Arbitrary statistics (not commonly studied for arbitrary population) - Confidence interval around arbitrary statistics

Recommend


More recommend