Warm-Up and Data Basics Announcements U nit 1: I ntroduction to data L ecture 2: E xploratory data analysis S tatistics 101 Nicole Dalzell May 14, 2015 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 2 / 1 Warm-Up and Data Basics Warm-Up and Data Basics Review Types of Variables Example Still our cat example: Example Study: A researcher is interested in whether or not cats will choose to sleep Cat Age Toys # of Naps Weight (lbs) less if they have toys to entertain themselves. She divides 250 cats 1 adult 1 3 8 (adults and kittens) into two rooms, with adult cats in one room and 2 juvenile 1 5 9 baby kittens in the other room. Within each room she erects a fence, 3 adult 0 2 10.5 randomly placing half the cats (or kittens) on each side of the fence. 4 adult 1 8 12.25 On one side of the fence she scatters a variety of cat toys. For 1 day, . . . . . . . . . . the researcher records the number of hours each cat spends . . . . . 250 adult 0 5 11.67 sleeping. What is the research question? What types of variables are these: What are the explanatory and response variables? Age? Is this an Experimental or Observational study? Toys? What are the controls and treatments? # of Naps? Is blocking employed in this study? Weight? Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 3 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 3 / 1
Warm-Up and Data Basics Sampling Methods Warm-Up and Data Basics Sampling Methods Population to sample Obtaining good samples It is usually not feasible to collect information on the entire population due to high costs of data collection so statisticians instead work with samples that are (hopefully) representative of Almost all statistical methods are based on the notion of implied the populations they come from. randomness. population If observational data are not collected in a random framework sample from a population, these statistical methods – the estimates and errors associated with the estimates – are not reliable. Most commonly used random sampling techniques are simple , stratified , and cluster sampling. We try to understand certain features of the population as a whole using summary statistics and graphs based on these samples. Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 4 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 5 / 1 Warm-Up and Data Basics Sampling Methods Warm-Up and Data Basics Sampling Methods Simple random sample Stratified sample Randomly select cases from the population, each case is equally Strata are homogenous, simple random sample from each stratum. likely to be selected. Stratum 2 Stratum 4 Stratum 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 5 ● ● ● ● ● Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 6 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 7 / 1
Warm-Up and Data Basics Sampling Methods Warm-Up and Data Basics Sampling Methods Cluster sample Clusters are not necessarily homogenous, simple random sample Participation question from a random sample of clusters. Usually preferred for economical reasons. A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and Cluster 9 Cluster 5 unique neighborhoods, some including large homes, some with only Cluster 2 ● ● Cluster 7 ● ● ● ● apartments. Which approach would likely be the least effective? ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 3 ● ● ● ● ● ● ● ● ● (a) Simple random sampling ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Cluster 8 (b) Cluster sampling ● ● ● ● ● ● ● ● ● ● Cluster 4 ● ● ● ● ● ● ● ● ● ● ● ● (c) Stratified sampling ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 6 ● ● ● ● ● ● ● ● ● ● Cluster 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 8 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 9 / 1 Warm-Up and Data Basics Exploratory Data Analysis Warm-Up and Data Basics Exploratory Data Analysis Explore the Data Visualizing numerical variables When you taste a spoonful of chili and decide it doesn’t taste Intensity map : Useful for displaying the spatial distribution. spicy enough, that’s exploratory analysis . Dot plot : Useful when individual values are of interest. For data analysis, we perform exploratory data analysis , or EDA , to determine trends in features that may be present in the data. Histogram : Provides a view of the data density , and are especially convenient for describing the shape of the data The distribution of a variable is a list of possible values the distribution. variable can take and how often it takes each of those values. Box plot : Especially useful for displaying the median, quartiles, Distributions are critical to assessing the probability of events. unusual observations, as well as the IQR. Plots are almost always useful for visualizing relationships and distributions in the data. Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 10 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 11 / 1
Warm-Up and Data Basics Exploratory Data Analysis Warm-Up and Data Basics Exploratory Data Analysis Why visualize? Why visualize? And let’s take a closer look at Durham. Describe the spatial distribution of race/ethnicity in the US. http://demographics.coopercenter.org/DotMap/index.html Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 12 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 13 / 1 Warm-Up and Data Basics Exploratory Data Analysis Warm-Up and Data Basics Exploratory Data Analysis Scatterplot Cars: ... vs. weight Scatterplots are useful for visualizing the relationship between two From the cars data: numerical variables. 60 miles per gallon (city rating) Do life expectancy and total fertil- 40 50 price ($1000s) ity appear to be associated or in- 40 dependent ? 30 30 20 Was the relationship the same 20 10 throughout the years, or did it 2000 2500 3000 3500 4000 2000 3000 4000 change? weight (pounds) weight (pounds) What do these scatterplots reveal about the data? How might they be useful? http://www.gapminder.org/world Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 14 / 1 Statistics 101 (Nicole Dalzell) U1 - L2: EDA May 14, 2015 15 / 1
Recommend
More recommend