THE REVISION OF SOME CONCEPTS…
Summary Statistics Quantitative data describes a numeric set of data by its Center, Variability, Shape But important to consider if data are: • Non-normal • Non-normal median range • Normal mean variance standard deviation
Data Summarization To summarize quantitative data, we need to use one or two parameters that can describe the data. 1. Measures of Central tendency which describes the center of the data 1. and the Measures of Dispersion, which show how the data are scattered around its center
Measures of central tendency Variable usually has a point (center) around which the observed values lie. These averages are also called measures of central tendency. The three most commonly used averages are: 1. The arithmetic mean: 2. The Median 3. The Mode
1- The arithmetic mean: the sum of observation divided by the number of observations: • x = ∑ x n Where : Where : x = mean ∑ denotes the (sum of) x the values of observation n the number of observation
2- Median It is the middle observation in a series of observation after arranging them in an ascending or descending manner. • The rank of median • The rank of median for is (n + 1)/2 if the for is (n + 1)/2 if the number of observation is odd • and n/2 if the number is even
3- Mode • The most frequent occurring value in the data is the mode and is calculated as follows: Example: 5, 6, 7, 5, 10. The mode in this data is 5 since number 5 is repeated twice. Sometimes, there is more than one mode and sometimes there is no mode especially in small set of observations. Unimodal - Bimodal - Nomodal
Advantages and disadvantages of Central Tendency Measures (CTM): • Mean: is the preferred CTM since it takes into account each individual observation but its main disadvantage is that it is affected by the extreme values of observations. • Median: it is a useful descriptive measure if there are one or two • Median: it is a useful descriptive measure if there are one or two extremely high or low values. The median is less sensitive to outliers (extreme scores) than the mean and thus a better measure than the mean for highly skewed distributions. • Mode : is rarely used.
Measures of Dispersion • The measure of dispersion describes the degree of variations or scatter or dispersion of the data around its central values: Range - R Range - R 1. 1. Variance - V 2. Standard Deviation – SD 3. dispersion = variation = spread = scatter
1- Range • is the difference between the largest and smallest values. • is the simplest measure of variation. • Disadvantages : it is based only on two of the observations and gives no idea of how the other observations are arranged between these two. Also, it tends to be large when the size of the sample increases
2- Variance If we want to get the average of differences between the mean and each observation in the data, we have to reduce each value from the mean and then sum these differences and divide it by the number of observation. divide it by the number of observation. V = ∑ (mean – x i ) / n
2- Variance • Variance: V = ∑ (mean – x) / n • The value of this equation will be equal to zero because the differences between each zero because the differences between each value and the mean will have negative and positive signs that will equalize zero on algebraic summation.
2- Variance • To overcome this zero we square the difference between the mean and each value so the sign will be always positive. be always positive. • Thus we get: V = ∑ (mean – x) 2 / n - 1
3- Standard Deviation (SD) The main disadvantage of the variance is that it is the square of the units used. So, it is more convenient to express the variation in the original units by taking the square root in the original units by taking the square root of the variance. This is called the standard deviation (SD). Therefore SD = √ V i.e. SD = √ ∑ (mean – x) 2 / n - 1 •
Summary Statistics and Normal data Summary statistics in useful to identify if data are normal or not Normal Data: approximately 95% of observations are between the mean plus or minus 2 standard deviations
Normal Distribution curve (NDC) NDC is a Graphical Presentation <Frequency Polygon> of any Quantitative Variables. The Normal Distribution Curve is the frequency polygon of a quantitative variable measured in large number. It occupies a major role in the techniques of statistical analysis.
Areas under the NDC • X ± 1 SD = 68% of the area on each side of the mean. • X ± 2 SD = 95% of area on each side of the • X ± 2 SD = 95% of area on each side of the mean. • X ± 3 SD = 99% of area on each side of the mean.
Characteristics of NDC 1- It is bell shaped, continuous curve. 2- It is symmetrical (i.e., can be divided into two equal halves vertically). 3- The tails never touch the base line but extended to 3- The tails never touch the base line but extended to infinity in either direction. 4- T he mean , median and mode values coincide. 5- I t is described by two parameters: arithmetic mean determine the location of the center of the curve and standard deviation represents the scatter around the mean.
NDC and Skewed data • If we represent a collected data by a frequency polygon graph and the resulted curve does not simulate the normal distribution curve (with all its normal distribution curve (with all its characteristics) then these data are not normally distributed
Skewness and Kurtosis Skewness: measures asymmetry of data – Positive or right skewed: Longer right tail – Negative or left skewed: Longer left tail Longer left tail Kurtosis: measures peakedness of the distribution of data. The kurtosis of normal distribution is 0.
NDC and normal measurement NDC can be used in distinguishing between normal from abnormal measurements. Example: If we have NDC for hemoglobin levels for a population of normal adult males with mean ± SD = 11 ± 1.5 If we obtain a hemoglobin reading for an individual = 8.1 and we want to know if he/she is normal or anemic. If this reading is within the area under the curve at 95% of normal (i.e. mean ± 2 SD) he /she will be considered normal. If his/her reading is less then he/she is anemic.
NDC and normal measurement The normal range for hemoglobin in this example will be: • the higher level of hemoglobin: 11 + 2 ( 1.5 ) =14. • the lower hemoglobin level: 11 – 2 ( 1.5 ) = 8. The normal range of hemoglobin of adult males is from 8 The normal range of hemoglobin of adult males is from 8 to 14. to 14. The reading of 8.1 is within the 95% of this population, therefore this individual is normal because this reading is within the 95% of this population.
How to test for Normality • Mean = Median • (mean-2sd, mean+2sd) reasonable range • -1 < skewness < 1 • -1 < kurtosis < 1 • Histogram shows symmetric bell shape • Histogram shows symmetric bell shape If data are not normal: • Natural log transformation can transform very skewed data to ‘Normal’ data use transformed data in analysis
Use the tool at http://onlinestatbook.com/stat_sim/sampling_dist/index.html to check the characteristics of the sampling distribution of the mean.
disabled disabled disabled
disabled
Recommend
More recommend