m2s2 distributions
play

M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State - PowerPoint PPT Presentation

M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State University August 29, 2018 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 1 / 19 Outline Population Location Spread Modality: unimodal, bimodal


  1. M2S2 - Distributions Professor Jarad Niemi STAT 226 - Iowa State University August 29, 2018 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 1 / 19

  2. Outline Population Location Spread Modality: unimodal, bimodal Skewness: symmetric, right-skewed, left-skewed Sample Boxplot Histogram Summary statistics Outliers Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 2 / 19

  3. Population Population Definition The population is the entire group of individuals that we want to say something about. Definition Individuals are the subjects/objects of interest. Definition A variable is any characteristic of an individual that we are interested in. Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 3 / 19

  4. Population Distribution Distribution Definition The distribution of a variable is the collection of possible values the variable can take and how often each value occurs in the population . Enumerating the values may be possible for categorical variables, but typically will not work for numerical variables. Instead we depict the distribution graphically, e.g. Example distribution 0.4 0.3 0.2 0.1 0.0 −3 −2 −1 0 1 2 3 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 4 / 19

  5. Population Distribution Distribution location and spread Location and spread 0.4 0.3 Spread 0.2 0.1 Location 0.0 −3 −2 −1 0 1 2 3 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 5 / 19

  6. Population Modality Modality Definition A unimodal distribution has one peak. A bimodal distribution has two peaks. Unimodal Bimodal 0.4 0.20 0.3 0.15 0.2 0.10 0.1 0.05 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 4 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 6 / 19

  7. Population Skewness Skewness Definition A distribution is symmetric if there is some vertical line where the graph is a mirror reflection. A distribution is right skewed if the tail of the distribution is longer to the right. A distribution is left skewed if the tail of the distribution is longer to the left. Left−skewed Symmetric Right−skewed 0.4 0.6 0.6 0.5 0.5 0.3 0.4 0.4 0.2 0.3 0.3 0.2 0.2 0.1 0.1 0.1 tail tail 0.0 0.0 0.0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 7 / 19

  8. Sample Sample We never see the population! Thus we often try to infer details about the population from our sample. We use our sample to infer the distribution’s location, spread, modality, and skewness. Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 8 / 19

  9. Sample Boxplot Vertical Boxplots A boxplot can be used to help infer location, spread, and skewness, e.g. Symmetric Right skewed 3 30 2 20 1 0 10 5 −1 0 Left skewed Bimodal 10 4 5 2 0 −5 0 −10 −2 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 9 / 19

  10. Sample Boxplot Horizontal Boxplots A boxplot can be used to help infer location, spread, and skewness, e.g. Symmetric Right skewed −1 0 1 2 3 0 5 10 15 20 25 30 35 Left skewed Bimodal −10 −5 0 5 10 −2 0 2 4 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 10 / 19

  11. Sample Histogram Histogram Definition A histogram is a graphical display of numerical data that counts the number of observations in each bin where the bins are determined by the user. Count Proportion 0.30 15 Frequency 0.20 Density 10 0.10 5 0.00 0 −2 −1 0 1 2 3 −2 −1 0 1 2 3 x x Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 11 / 19

  12. Sample Histogram Histograms A histogram can be used to help infer location, spread, skewness, and modality, e.g. Unimodal, Symmetric Unimodal, Right skewed 60 15 40 10 20 5 0 0 −1 0 1 2 3 0 5 10 15 20 25 30 35 Unimodal, Left skewed Bimodal 20 60 15 40 10 20 5 0 0 −15 −10 −5 0 5 10 −4 −2 0 2 4 6 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 12 / 19

  13. Sample Histogram Histograms Histograms are affected by the choice of bins Unimodal, Symmetric Unimodal, Right skewed 12 4 3 8 2 6 4 1 2 0 0 −1 0 1 2 3 0 5 10 15 20 25 30 35 Unimodal, Left skewed Bimodal 7 4 6 3 5 4 2 3 2 1 1 0 0 −10 −5 0 5 10 −2 0 2 4 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 13 / 19

  14. Sample Histogram Histograms Histograms are affected by the choice of bins Symmetric Right skewed 35 60 25 40 15 20 5 0 0 −2 −1 0 1 2 3 4 0 5 10 15 20 25 30 35 Left skewed Bimodal 60 30 40 20 20 10 0 0 −15 −10 −5 0 5 10 −4 −2 0 2 4 6 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 14 / 19

  15. Sample Summary statistics Measures of location Distribution min Q1 median mean Q3 max bimodal -3.02 -0.90 0.16 0.57 -0.90 5.42 left skew -13.96 4.36 7.14 5.24 4.36 9.76 right skew 0.18 1.39 2.84 4.89 1.39 34.23 symmetric -1.45 0.14 0.86 0.97 0.14 3.09 Right-skew: mean > median Left-skew: mean < median Symmetric: mean ≈ median Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 15 / 19

  16. Sample Summary statistics Measures of spread Distribution variance standard deviation range interquartile range bimodal 4.20 2.05 8.43 2.88 left skew 26.25 5.12 23.72 4.19 right skew 31.57 5.62 34.05 5.04 symmetric 1.35 1.16 4.54 1.67 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 16 / 19

  17. Sample Example Toyota Sienna Miles per Gallon Boxplot of mpg Histogram of mpg 40 120 35 100 30 Frequency 80 25 60 20 40 15 20 10 0 5 10 15 20 25 30 35 40 mpg summary(dd$mpg) Min. 1st Qu. Median Mean 3rd Qu. Max. 8.509 17.359 19.298 19.313 21.334 39.086 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 17 / 19

  18. Sample Outliers Outliers Definition An outlier is an observation that is distant from other observations. Sometimes, any observation below Q1-1.5 × IQR or above Q3+1.5 × IQR is called an outlier. Boxplot of mpg 10 15 20 25 30 35 40 Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 18 / 19

  19. Sample Outliers Summary statistic choice Choice of an appropriate measure of location/spread depends on shape of the distribution presence of outliers. Generally, symmetric with no outliers = ⇒ mean and standard deviation skewed and/or outliers = ⇒ median, IQR, 5-number summary Professor Jarad Niemi (STAT226@ISU) M2S2 - Distributions August 29, 2018 19 / 19

Recommend


More recommend