Statistical Foundations: Sampling 17 February 2020 Modern Research Methods
The Single Experiment Population Question Hypothesis Exp. Design Experimenter Data Analyst Code Estimate Claim
Overview of course 1) Philosophy of Cumulati 1) tive Science Science 2) 2) The Single Experi riment t – Experimental data, tools in R for working with data and plotting data, reproducibility t – Intro to statistical concepts, 3) 3) Repeati ting an Experi riment replication of experiments ts – Meta-analysis 4) 4) Aggregati ting Many Experi riments
Original Reproduction Replication Population Original Question Different Hypothesis Exp. Design REPRO REPRODUCE CE = Get same result Experimenter from same dataset. Data Analyst REPLI LICATE = Get same result with a new dataset Code Estimate * Sometimes people are sloppy with these Claim terms and use them interchangeably. (Patil, Peng, & Leek, 2019)
High nameability Low nameability
Original Reproduction Replication Replication vs. Population Reproduction Question Hypothesis Exp. Design [You] [Y Experimenter Data [Y [You] Analyst Code Estimate Claim
Replicating Zettersten and Lupyan (2020) Original Replication [Y [You] High Nameability Condition = 75% Low Nameability Condition = 69% Sho Shoul uld you u ex expec ect to rep eplicate e the he origina nal find nding ng? Did you u re replicate it? What would convince you? Discuss with a partner.
High Nameability Condition = 75% ?= ?= Low Nameability Condition = 69% In order to evaluate this replication, we need think about sampling. In the next few classes, we’re going to discuss sampling in order to reason about the replicability of psychological effects.
Reading for today
Distributions Distributions = counts of a variable Distribution A Plot with histograms 200 Two measures: Mean measures center (“central • Me 150 count tendency”) 100 Variance measures dispersion. • Va 50 (There are other measures of the center and 0 dispersion of a distribution, but these are the − 5 0 5 measures we’re going to focus on here)
What is the mean of these distributions? Which ones have low vs. high variance? Distribution A Distribution B Distribution C Distribution D Distribution E Distribution F 200 150 count 100 50 0 − 5 0 5 − 5 0 5 − 5 0 5 − 5 0 5 − 5 0 5 − 5 0 5 x Mean = 0 Me Mean = 5 Me Me Mean = 0 Me Mean = 3 Mean = 0 Me Mean = 2 Me Low variance Lo Lo Low variance V. V. High Low variance Lo Hi High gh v variance Hi High gh v variance va variance ce
Calculating mean (Thanks to Danielle Navarro, LSR https://learningstatisticswithr.com/)
Calculating variance Variance is the average squared deviation from the mean of a dataset. Standard deviation is the square root of variance.
Our goal as scientists • As scientists, we want to es estimate e paramet eter ers about the world. • One of the most common parameters is the mean. • For example: What is the mean accuracy in the high nameability condition? What is the mean accuracy in the low nameability condition? (Zettersten & Lupyan, 2020) • Are the two means different from each other? • As psychologists we’re interested in the population of ALL PEOPLE if they had done our experiment. • But, to save time and effort, we only measure a sa sample.
Population vs. sample • A sample is a random subset of the population. • That means there are really two distributions. • Pop Population on : The distribution of all people (7.53 billion), or maybe all people who speak English (1.5 billion), or maybe all people at UW- Madisoin (44k) • Sa Sample : Zettersten and Lupyan only tested 50 participants. • Unlike the Zorbia example, we don’t know what the population looks like (and we usually don’t). Challenge: Make (good) inferences about the population from the sample.
Population Popul Po ulation 100000 75000 N = a lot count 50000 25000 0 0.0 0.4 0.8 Prop. Right Sample 6 Sa Sample 4 Use mean of sample to estimate count N = 50 mean of population. 2 0 0.4 0.6 0.8 1.0 Prop. Right
Popul Po ulation Sample Sa Sample 1 2 3 4 5 8 6 count 4 2 0 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 Prop. Right
Sampling distribution of the mean Sample 1 2 3 4 5 8 6 count 4 2 0 0.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.00.2 0.4 0.6 0.8 1.0 10.0 Prop. Right 7.5 count 5.0 2.5 0.0 0.70 0.72 0.74 0.76 Prop. Right
Two things to know about the sampling distribution of the mean 1. The mean of the sampling distribution is the same as the mean of the population. 2. The variance of the sampling distribution of means gets smaller as the sample size increases. (i.e. we get better at estimating the population with more data)
What ’ s the mean IQ of Zorbia? Zorbia Population IQ 16 29 29 25 29 33 25 29 33 12 25 29 33 25 29 31 33 25 27 29 31 33 N = 97 25 27 29 31 33 35 Mean = 29 8 25 27 30 31 33 35 23 25 27 30 31 33 35 21 23 26 28 30 31 33 36 21 23 26 28 30 31 34 36 4 22 24 26 28 30 32 34 36 18 22 24 26 28 30 32 34 36 38 17 19 22 24 26 28 30 32 34 36 37 39 17 19 22 24 26 28 30 32 34 36 37 40 18 20 22 24 26 28 30 32 34 36 38 40 Zorbia IQ
In class simulation What can we learn from sampling the population? In groups of ~5: 1. Cut the people of Zorbia out. 2. Put them in the envelope. 3. 3. Ea Each pe person on in in the gr grou oup p should take a sample of th three . 4. Calculate the average. 5. Write it on a stick note, and add it to the class plot 6. Do steps 3-5 once more.
Key points from Zorbia Simulation • Mores samples give better estimate of population mean • Two samples from the same population will tend to have somewhat different means • Conversely, two different samples means does NOT mean that they come from different populations
Next Time: Distributions and probability Explore this Shiny app: https://gallery.shinyapps.io/CLT_mean/
Acknowledgements • Slides 12-13 have content adapted from Danielle Navarro, Learning Statistics with R (https://learningstatisticswithr.com/)
Recommend
More recommend