MAT 110 ‐ Chapter 6 Math for Liberal Arts MAT 110: Chapter 6 Notes Characterizing Data Putting Statistics to Work David J. Gisch Measures of Central Tendency Average = Mean • The m ean is what we most commonly call the average value. It is defined as follows: �̅ � ∑ � � ���� � ��� �� ��� ��� ���� ������ ��� ������ �� ���� ������ • The m edian is the middle value in the sorted data set (or halfway between the two middle values if the number of values is even). • The m ode is the most common value (or group of values) in a distribution. ▫ There can be multiple modes or no mode. 1
MAT 110 ‐ Chapter 6 An Odd S et of Data An Even S et of Data Example: Use the given data to answer the following. Example: Use the given data to answer the following. 13, 16, 13, 18, 13, 13, 14, 21, 14 2 10 4 5 2 4 6 5 3 6 7 16 3 3 8 10 (a) What is the mean? (a) What is the mean? (b) What is the mode? (b) What is the mode? (c) What is the median? (c) What is the median? 13 13 13 13 14 14 16 18 21 2 2 3 3 3 4 4 5 5 6 6 7 8 10 10 16 Example (On your own) Outliers • An outlier is a data value that is much higher or much Example: Use the given data to answer the following. lower than almost all other values. 5 11 4 3 6 4 6 3 3 6 6 16 3 3 8 11 Example: To see the effect of an outlier consider the (a) What is the mean? salaries of people in this DMACC classroom. (b) What is the mode? (c) What is the median? 2
MAT 110 ‐ Chapter 6 Outliers S hapes of Distributions • To accommodate for outliers we tend to use the median. The following are examples of data that are typically reported using the median. ▫ Income ▫ House Prices ▫ Age A double-peaked Two single-peaked (unimodal) distributions (bimodal) distribution Example: Give an example of a data set for each type of shape. S hapes of Distributions (S ymmetry) S hapes of Distributions (S kewness) A distribution is left- A distribution is sym m etric if its left half is a skewed if its values are mirror image of its right half. more spread out on the left side. Example: Give an example of a data set that is symmetric. Example: Give an example of a data set that is left-skewed. 3
MAT 110 ‐ Chapter 6 S hapes of Distributions (S kewness) Example Example: For each scenario state if it is (unimodal or A distribution is right- bimodal), (skew-left, skew-right, or symmetric), and skewed if its values are whether you would use the (mean, median, or mode) to more spread out on the measure the center. right side. (a) You want to buy a house in a neighborhood and are interested in the prices of other houses in the area. Example: Give an example of a data set that is right- (b) Your office mates are collecting money for Milton’s (“have you skewed. seen my stapler”) birthday. You are trying to determine what to give. (c) You are collecting data on the color of cars in the DMACC parking lot. Variation (S pread) Variation describes how widely data values are spread out about the center of a distribution. Measures of Variation From left to right, these three distributions have increasing variation. 4
MAT 110 ‐ Chapter 6 Why Variation Matters Measures of Variation (S pread) • The range of a data set is the total spread of the data set Example: Consider the following waiting times for 11 ����� � ������� ����� �max� – ������ ����� �min� customers at 2 banks. Calculate the mean, median and mode for both. • The lower quartile, Q1, (or first quartile) divides the lowest fourth of a data set from the upper three-fourths. It is the Big Bank (three lines): Best Bank (one line): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 median of the data values in the low er half of a data set. 8.5 9.3 11.0 7.7 7.8 7.8 • The m iddle quartile, Q2, (or second quartile) is the overall median. • The upper quartile, Q3, (or third quartile) divides the lower three-fourths of a data set from the upper fourth. It is the median of the data values in the upper half of a data set. Which bank is likely to have more unhappy customers? Quartiles (An Odd S et of Data) Quartiles (An Even S et of Data) Example: Use the given data to answer the following. Example: Use the given data to answer the following. 76, 65, 100, 85, 68, 70, 74, 87, 90, 80, 92 2 10 4 5 2 4 6 5 3 6 7 16 3 3 8 10 (a) What is the range? (a) What is the range? (b) State the quartiles. (b) State the quartiles. 65 68 70 74 76 80 85 87 90 92 100 2 2 3 3 3 4 4 5 5 6 6 7 8 10 10 16 5
MAT 110 ‐ Chapter 6 The Five-Number S ummary The Five-Number S ummary • The five-num ber sum m ary for a data set consists of Five-number summary of the waiting times at each bank: the following five numbers: Big Bank Best Bank low value (min) = 4.1 low value (min) = 6.6 low value low er quartile m edian upper quartile high value lower quartile = 5.6 lower quartile = 6.7 Q1 Q2 Q3 median = 7.2 median = 7.2 • A boxplot (or box-and-whiskers plot) shows the five- upper quartile = 8.5 upper quartile = 7.7 number summary visually, with a rectangular box high value (max) = 11.0 high value (max) = 7.8 enclosing the lower and upper quartiles, a line marking The corresponding boxplot: the median, and whiskers extending to the low and high values. This boxplot is for Example 6.B.2 data. Q1 Q3 65, 68, 70, 74, 76, 80, 85, 87, 90, 92, 100 High Low Q2 Lower Half Upper Half Quick Review S tandard Deviation • The standard deviation is the single number most Use the given data to answer the following. commonly used to describe variation. 2 2 2 4 5 5 6 7 10 15 ▫ Think of the standard deviation as a way to take different types of (a) What is the mean? data sets and convert them to common unit of measure so we can compare their variation (spread). (b) What is the mode? ∑ � � �̅ � � � � � 1 (c) What is the median? Here, � is each data point, �̅ is the mean, and � is the (d) State the quartiles number of data points. The symbol ∑ , capital greek letter sigma, means to “add up” or “to sum up.” 2 2 2 4 5 5 6 7 10 15 6
MAT 110 ‐ Chapter 6 S tandard Deviation S tandard Deviation Example: Calculate the standard deviation of the following • To guide you follow these instructions. data set. 2, 8, 9, 12, 19 x – mean (deviation) 2 x (data value) (deviation) 2 8 9 12 • You will get a blank table and this exact image to use on 19 the tests, see next example. Total S tandard Deviation Example: Two car companies have the same mean (15 years) lifespan for their best selling sedan. However, Company A has a standard deviation of 1.2 years and Company B has a standard deviation of 3.1 years. Which is better? The Normal Distribution 7
MAT 110 ‐ Chapter 6 The Normal Distribution Conditions for a Normal Distribution • The norm al distribution is a symmetric, bell-shaped A data set satisfying the following criteria is likely to distribution with a single peak. Its peak corresponds to have a nearly normal distribution. the mean, median, and mode of the distribution. 1. Most data values are clustered near the mean, giving the distribution a well-defined single peak. 2. Data values are spread evenly around the mean, making the distribution symmetric. 3. Larger deviations from the mean are increasingly rare, producing the tapering tails of the distribution. 4. Individual data values result from a combination of many different factors. Both sets of data (distributions) are normally distributed with a mean of 75, but the graph on the left has a larger variation (spread). Normal Distributions? The 68-95-99.7 Rule for a Normal Distribution Example: Describe each of the following distributions by their shapes. (a) Scores on a very easy test. (b) Shoe sizes of adult women. (c) The weight of Doritos bags of the same size. � is lower-case greek letter sigma. It is used for the (d) The length of hair donated to “locks of love.” standard deviation. 8
MAT 110 ‐ Chapter 6 S tandard Deviation S tandard Deviation Example: Two car companies have the same mean (15 • Notice that this means almost all data lies between 3 years) lifespan for their best selling sedan. However, standard deviations above and below the mean. Company A has a standard deviation of 1.2 years and Company B has a standard deviation of 3.1 years. Which is better? Company A Company B 15 � 3�1.2� 15 15 � 3�1.2� 15 � 3�3.1� 15 15 � 3�3.1� 11.4 18.6 5.7 24.3 68-95-99.7 Rule 68-95-99.7 Rule Example: A data set is normally distributed with a mean of Example: A data set is normally distributed with a mean of 84 and a standard deviation of 6. Use the 68-95-99.7 rule 15 and a standard deviation of 3. Using the 68-95-99.7 to answer each of the following. rule, what percent of the data lies below 15? (a) 68% of the data lies between? (b) 95% of the data lies between? (c) 99.7% of the data lies between? 9
Recommend
More recommend