Univariate Continuous Data MATH 185 – Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ ∼ eariasca/math185.html MATH 185 – University of California San Diego – Ery Arias-Castro 1 / 30
Lung dysfunction in workers in the detergent industry We consider the following dataset (quoted in Larsen & Marx, exercise 5.3.1). > str(bacillus) ' data.frame ' : 19 obs. of 1 variable: $ ratio: num 0.61 0.7 0.63 0.76 0.67 0.72 0.64 0.82 0.88 0.82 ... NULL The FEV 1 /V C ratio is a measure of lung capacity. � FEV 1 : forced expiratory volume � V C : vital capacity � Normal FEV 1 /V C ratio is 0.80 This ratio was measured for certain workers in the detergent industry exposed to a Bacillus subtilis enzyme. MATH 185 – University of California San Diego – Ery Arias-Castro 2 / 30
Boxplot A basic plot that helps visualize how the data is spread out. 0.85 0.80 0.75 0.70 0.65 0.60 MATH 185 – University of California San Diego – Ery Arias-Castro 3 / 30
Boxplot � The middle box represents the inter-quartile range and contains the 50% of the data. � The upper edge (hinge) of the box indicates the 75th percentile of the data set � The lower hinge indicates the 25th percentile � The line within the box indicates the median value of the data. � The ends of the vertical lines or ”whiskers” indicate the minimum and maximum data values, unless outliers are present in which case the whiskers extend to a maximum of 1.5 times the inter-quartile range. � The points outside the ends of the whiskers are outliers or suspected outliers. MATH 185 – University of California San Diego – Ery Arias-Castro 4 / 30
Boxplot The following table provides similar information > summary(ratio) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.6100 0.7100 0.7800 0.7663 0.8350 0.8800 MATH 185 – University of California San Diego – Ery Arias-Castro 5 / 30
Histogram The histogram is more detailed – approximates the actual distribution of the data. Histogram of ratio 7 6 5 Frequency 4 3 2 1 0 0.60 0.65 0.70 0.75 0.80 0.85 0.90 ratio MATH 185 – University of California San Diego – Ery Arias-Castro 6 / 30
Histogram � A bin’s width is the range it covers. � A bin’s height is proportional to the number of points that fall within that range. � The histogram is an (piecewise constant) approximation of the population’s probability density function. MATH 185 – University of California San Diego – Ery Arias-Castro 7 / 30
Boxplot v. Histogram > library(UsingR) > simple.hist.and.boxplot(ratio) Histogram of x 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.60 0.65 0.70 0.75 0.80 0.85 MATH 185 – University of California San Diego – Ery Arias-Castro 8 / 30
Main question � Are workers exposed to a bacillus subtilis enzyme more likely to suffer from lung dysfunction? � Since we know what the normal level for the FEV 1 /V C ratio is (0.80), we want to compare the observations to this baseline. MATH 185 – University of California San Diego – Ery Arias-Castro 9 / 30
Testing the Mean – Student’s t -Test � Let µ be the population mean. � Consider H 0 : µ = 0 . 80 versus H 1 : µ < 0 . 80. � The Student t -test rejects for large negative values of T , where n n T = X − µ X = 1 1 S 2 = � � ( X i − X ) 2 S/ √ n , X i , n n − 1 i =1 i =1 MATH 185 – University of California San Diego – Ery Arias-Castro 10 / 30
Testing the Mean – Student’s t -Test > t.test(ratio, mu = 0.8) One Sample t-test data: ratio t = -1.7091, df = 18, p-value = 0.1046 alternative hypothesis: true mean is not equal to 0.8 95 percent confidence interval: 0.7249096 0.8077219 sample estimates: mean of x 0.7663158 The p -value is based on the assumption that the observations are an i.i.d. sample from a normal distribution. MATH 185 – University of California San Diego – Ery Arias-Castro 11 / 30
Quantile-Quantile Plot Helps visualize whether a sample comes from a given translation/scale family of distributions. Here we compare with the normal family. Normal Q−Q Plot 0.85 0.80 Sample Quantiles 0.75 0.70 0.65 0.60 −2 −1 0 1 2 Theoretical Quantiles If the points lie close to the line, then this assumption is reasonable. MATH 185 – University of California San Diego – Ery Arias-Castro 12 / 30
Wilcoxon Sign Test � Let m be the population median. � Consider H 0 : m = m 0 versus H 1 : m < m 0 . – The Wilcoxon sign test rejects for small values of N + where N + = # { i : X i > m 0 } – In fact, for large n , the following statistic Z is approximately standard normal Z = N + + N 0 / 2 − n/ 2 , N 0 = # { i : X i = m 0 } � n/ 4 � Here we get a p-value of 0.4092729. MATH 185 – University of California San Diego – Ery Arias-Castro 13 / 30
Wilcoxon Signed Rank Test � Let F be the population’s cumulative distribution function, that we assume symmetric. Let m be the median of F . � Say we want to test H 0 : m = m 0 versus H 1 : m < m 0 . – The Wilcoxon signed-rank test rejects for small values of W where n � W = Y i R i , R i = rank( | X i − m 0 | ) , Y i = sign( X i − m 0 ) i =1 (Actually, R returns W − n ( n + 1) / 4.) – In fact, for large n , the following statistic Z is approximately standard normal W − n ( n + 1) / 4 Z = � n ( n + 1)(2 n + 1) / 24 � Here we get p-value = 0.07084 MATH 185 – University of California San Diego – Ery Arias-Castro 14 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 15 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 16 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 17 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 18 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 19 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 20 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 21 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 22 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 23 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 24 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 25 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 26 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 27 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 28 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 29 / 30
� MATH 185 – University of California San Diego – Ery Arias-Castro 30 / 30
Recommend
More recommend