Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments Statistics 101 Thomas Leininger May 16, 2013
Thought for the day ”We are drowning in information but starved for knowledge... Uncontrolled and unorganized information is no longer a resource in an information society, instead it becomes the enemy.” –John Naisbitt, Megatrends (1982) Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 2 / 33
Introduction to Data Some terminology Dr. Arbuthnot’s baptismal records Terms to know: year boys girls B4500 1 1629 5218 4683 TRUE case 2 1630 4858 4457 TRUE variable 3 1631 4422 4102 FALSE 4 1632 4994 4590 TRUE numerical variable 5 1633 5158 4839 TRUE discrete variable 6 1634 5035 4820 TRUE continuous variable 7 1635 5106 4928 TRUE 8 1636 4917 4605 TRUE categorical variable (levels) 9 1637 4703 4457 TRUE 10 1638 5359 4952 TRUE ordinal variable Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 3 / 33
Introduction to Data Some terminology Control vs. treatment groups A pharmaceutical company has created a wonder drug to cure bone loss. In order to sell this drug to consumers, the FDA requires this company to perform several highly regulated experiments to prove the efficacy (and safety) of this new drug. In this experiment, some patients will be randomly assigned to the control group, where they will receive a standard bone loss treatment. The other patients are all assigned to the treatment group, where they receive the new wonder drug. If the treatment group experiences significantly better outcomes, the FDA will allow this company to sell their new drug. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 4 / 33
Introduction to Data Some terminology Association and Independence http://biojournalism.com/2012/08/correlation-vs-causation/ Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 5 / 33
Overview of data collection principles Anecdotal evidence Anecdotal evidence and early smoking research Anti-smoking research started in the 1930s and 1940s when cigarette smoking became increasingly popular. While some smokers seemed to be sensitive to cigarette smoke, others were completely unaffected. Anti-smoking research was faced with resistance based on anecdotal evidence such as “My uncle smokes three packs a day and he’s in perfectly good health”, evidence based on a limited sample size that might not be representative of the population. It was concluded that “smoking is a complex human behavior, by its nature difficult to study, confounded by human variability.” In time researchers were able to examine larger samples of cases (smokers) and trends showing that smoking has negative health impacts became much clearer. Brandt, The Cigarette Century (2009), Basic Books. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 6 / 33
Overview of data collection principles Populations and samples Populations and samples Research question: Can people become better, more efficient runners on their own, merely by running? Population of interest: http://well.blogs.nytimes.com/2012/08/29/ finding-your-ideal-running-form Sample: Group of adult women who recently joined a running group Population to which results can be generalized: Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 7 / 33
Overview of data collection principles Sampling methods Census Wouldn’t it be better to just include everyone and “sample” the entire population? This is called a census . There are problems with taking a census: It can be difficult to complete a census: there always seem to be some individuals who are hard to locate or hard to measure. And there may be certain characteristics about those individuals who are hard to locate. Populations rarely stand still. Even if you could take a census, the population changes constantly, so it’s never possible to get a perfect measure. Taking a census may be more complex than sampling. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 8 / 33
Overview of data collection principles Sampling methods http://www.npr.org/templates/story/story.php?storyId=125380052 Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 9 / 33
Overview of data collection principles Sampling methods Exploratory analysis to inference Sampling is natural... Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis . If you generalize and conclude that your entire soup needs salt, that’s an inference . For your inference to be valid, the spoonful you tasted (the sample) needs to be representative of the entire pot (the population). If your spoonful comes only from the surface and the salt is collected at the bottom of the pot, what you tasted is probably not representative of the whole pot. If you first stir the soup thoroughly before you taste, your spoonful will more likely be representative of the whole pot. Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 10 / 33
Overview of data collection principles Sampling methods Simple random sample Randomly select cases from the population, each case is equally likely to be selected. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 11 / 33
Overview of data collection principles Sampling methods Stratified sample Strata are homogenous, simple random sample from each stratum. Stratum 2 Stratum 4 Stratum 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 5 Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 12 / 33
Overview of data collection principles Sampling methods Cluster sample Clusters are not necessarily homogenous, simple random sample from a random sample of clusters. Usually preferred for economical reasons. Cluster 9 Cluster 2 Cluster 5 ● ● Cluster 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Cluster 8 ● ● ● ● ● ● ● ● ● ● ● Cluster 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 6 ● ● ● ● ● ● ● ● ● ● Cluster 1 Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 13 / 33
Overview of data collection principles Sampling methods Question A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments, and others a diverse mixture of housing structures. Which approach would likely be the least effective? (a) Simple random sampling (b) Cluster sampling (c) Stratified sampling (d) Blocked sampling (e) Anecdotal sampling Statistics 101 (Thomas Leininger) U1 - L1: Data coll., obs. studies, experiments May 16, 2013 14 / 33
Recommend
More recommend