Chapter 4 Numerical Methods for Describing Data 1
Population characteristic - Suppose we want to know the MEAN length of all the fish in Lake Lewisville . . . • Fixed value about a population • Typical unknown Is this a value that is known? Can we find it out? At any given point in time, how many values are there for the mean length of fish in the lake? 2
Statistic - Suppose we want to know the MEAN • Value calculated from a sample length of all the fish in Lake Lewisville. What can we do to estimate this unknown population characteristic? 3
Measures of Central Tendency • Mode – the observation that occurs the most often – Can be more than one mode – If all values occur only once – there is no mode – Not used as often as mean & median 4
Measures of Central Tendency Median - the middle value of the data; it divides the observations in half To find: list the observations in numerical order single middle value is n is odd sample median average of the two middle values if n is even Where n = sample size 5
Suppose we catch a sample of 5 fish from the lake. The lengths of the fish (in inches) are listed below. Find the median length of fish. The numbers are in order The median length of & n is odd – so find the fish is 5 inches. middle observation. 3 4 5 8 10 6
Suppose we caught a sample of 6 fish from the lake. The median length is … The median length The numbers are in order & is 5.5 inches. n is even – so find the middle two observations. Now, average these two values. 5.5 3 4 5 6 8 10 7
Measures of Central Tendency Population characteristic Mean is the arithmetic average. m is the lower case Greek letter mu statistic – Use m to represent a population mean S is the capital Greek – Use x to represent a sample mean letter sigma – it means to sum the values that Formula: follow x x n 8
Suppose we caught a sample of 6 fish from the lake. Find the mean length of the fish. To find the mean length of fish - add the observations and divide by n . 3 4 5 6 8 10 x 6 6 3 4 5 6 8 10 9
Now find how each observation deviates from the mean. The mean is considered This is the deviation the balance point of x ( x - x ) from the mean. the distribution 3 3-6 -3 because it “balances” 4 -2 Find the rest of the the positive and 5 -1 deviations from the mean negative deviations. 6 0 What is the sum 8 2 Will this sum always of the deviations 10 4 equal zero? from the mean? 0 Sum YES 10
Imagine a ruler with pennies placed at 3”, 4”, 5”, 6”, 8” and 10”. To balance the ruler on your finger, you would need to place your finger at the mean of 6. The mean is the balance point of a distribution 11
What happens to the median & mean if the length of 10 inches was 15 inches? The median is . . . 5.5 6.833 The mean is . . . 3 4 5 6 8 15 6 What happened? 3 4 5 6 8 15 12
What happens to the median & mean if the 15 inches was 20? The median is . . . 5.5 7.667 The mean is . . . 2 4 5 6 8 20 6 What happened? 3 4 5 6 8 20 13
Some statistics that are not affected by extreme values . . . Is the median resistant affected by extreme values? NO Is the mean affected by extreme values ? YES 14
Suppose we caught a sample of 20 fish with the following lengths. Create a histogram for the lengths of fish. (Use a class width of 1.) Mean = 6.5 6.5 Median = Look at the placement of the mean and median in this symmetrical Calculate the mean and median. distribution. 3 5 6 10 6 7 7 8 4 5 6 4 7 5 9 9 8 7 6 8 15
Suppose we caught a sample of 20 fish with the following lengths. Create a histogram for the lengths of fish. (Use a class width 1.) 6.8 Mean = Median = 5.5 Look at the placement of the mean and median in this skewed Calculate the mean and median. distribution. 3 5 6 10 15 7 3 3 4 5 6 4 12 5 3 4 8 13 11 9 16
Suppose we caught a sample of 20 fish with the following lengths. Create a histogram for the lengths of fish. (Use a class width of 1.) 7.75 Mean = Median = 8.5 Look at the placement of the mean and median in this skewed Calculate the mean and median. distribution. 3 5 6 10 10 7 10 8 9 5 6 4 9 10 9 9 10 7 10 8 17
Recap: • In a symmetrical distribution, the mean and median are equal . • In a skewed distribution, the mean is pulled in the direction of the skewness . • In a symmetrical distribution, you should report the mean ! • In a skewed distribution, the median should be reported as the measure of center! 18
Trimmed mean: Purpose is to remove outliers from a data set To calculate a trimmed mean: • Multiply the percent to trim by n • Truncate that many observations from BOTH ends of the distribution (when listed in order) • Calculate the mean with the shortened data set 19
Find the mean of the following set of data. 12 14 19 20 22 24 25 26 26 50 Find a 10% trimmed. Mean = 23.8 10%(10) = 1 So remove one observation from each side! 14 19 20 22 24 25 26 26 x 22 T 8 20
What values are used to describe categorical data? Suppose that each person in a sample of 15 cell phone users is asked if he or she is satisfied with the cell phone service. Pronounced p-hat The population proportion is Here are the responses: denoted by the letter p . What would be the possible responses? Y N Y Y Y N N Y Y N Y Y Y N N 9 60% of the sample was Find the sample proportion of the people ˆ p 0 . 6 satisfied with their cell who answered “yes”: 15 phone service. number of successes ˆ 21 p n
Why is the study of variability important? Does this can of soda • There is variability in virtually everything contain exactly 12 ounces? • Allows us to distinguish between usual & unusual values • Reporting only a measure of center doesn’t provide a complete picture of the distribution. 22
20 30 40 50 60 70 20 30 40 50 60 70 20 30 40 50 60 70 Notice that these three data sets all have the same mean and median (at 45), but they have very different amounts of variability. 23
Measures of Variability The simplest numeric measure of variability is range. Range = largest observation – smallest observation The first two data sets have a range of 20 30 40 50 60 70 50 (70-20) but the third data set has a 20 30 40 50 60 70 much smaller range 20 30 40 50 60 70 of 10. 24
Measures of Variability What can we do to the deviations so that we could Can we find an average Remember the sample of 6 fish that we Another measure of the variability in a find an average? deviation? caught from the lake . . . data set uses the deviations from the They were the following lengths: mean ( x – x ). 3”, 4”, 5”, 6”, 8”, 10” The estimated average of the deviations Population The mean length was 6 inches. Recall variance is squared is called the variance . that we calculated the deviations from denoted by the mean. What was the sum of these s 2 . Degree of deviations? freedom 2 x x 2 s n 1 25
When calculating sample variance, we use Suppose that everyone in the class degrees of freedom ( n – 1) in the caught a sample of 6 fish from the denominator instead of n because this lake. Would each of our samples tends to produce better estimates. contain the same fish? Degrees of freedom will be revisited Would our mean lengths be the again in Chapter 8. same? The samples would also have different ranges! 26
Remember the sample of 6 fish that we caught from the lake . . . Find the variance of the length of fish. First square the deviations x ( x - x ) ( x - x ) 2 Finding the average of 3 -3 9 the deviations would 4 -2 4 always equal 0! 5 -1 1 What is the sum 6 0 0 of the deviations 8 2 Divide this by 5. 4 squared? 10 4 16 s 2 = 6.5 34 Sum 0 27
Measures of Variability The square root of variance is called standard deviation. A typical deviation from the mean is the standard deviation. s 2 = 6.8 inches 2 so s = 2.608 inches The fish in our sample deviate from the mean of 6 by an average of 2.608 inches. 28
The most commonly used measures of Calculation of standard center and variability are the mean and standard deviation, respectively. deviation of a sample 2 x x s n 1 Population standard deviation is denoted by s (where n is used in the denominator). 29
Measures of Variability Interquartile range (iqr) is the range of the middle half of the data. Lower quartile (Q 1 ) is the median of the What advantage does the interquartile lower half of the data range have over the standard Upper quartile (Q 3 ) is the median of the deviation? upper half of the data iqr = Q 3 – Q 1 The iqr is resistant to extreme values 30
Recommend
More recommend