Two-Sample Experimental Designs 707.031: Evaluation Methodology Winter 2015/2016 Eduardo Veas
Modelling 2
General statistical model Outcome = (model) / error 3
Calculating if a score will occur • z scores: conversion to normal distribution with a mean of 0 and sd =1. • z = (X - X’) / sd • z-scores: 1.96 cuts 2.5% off the top. • 2.58 cuts 0.5% off the top • 3.29 cuts 0.05% off the top 4
Beyond the data Advanced descriptive stats 5
Beyond the data 6
Advanced descriptive statistics • Goal: • establish relationships between circumstances and behaviors • Fit these relationships into an orderly body of knowledge • we want to say something about the world 7
8
Advanced descriptive statistics • sampling variation: different samples will have means that differ from the population • sampling distribution: the frequency distribution of the sample means from the same population • standard error= standard deviation of sample means represents h ow well the sample represents the population 9
Advanced descriptive statistics • standard error of the mean (SE): the standard deviation of sample means • samples > 30 • samples < 30 (t-distribution) 10
Advance descriptive statistics • effect size: how important is the effect? • Cohen’s d • Pearson correlation coefficient r • .1 (small effect) explains 1% of the total variance • .3 (med effect) explains 9% of the total variance • .5 (large effect) explains 25% of the total var. 11
Comparing two means 12
Comparing two means 13
null hypothesis H 0 : the difference between condition A and condition B can be attributed to chance 14
t-test • compare whether a correlation coefficient is different from 0 • compare whether a regression coefficient is different from 0 • compare whether two group means are different 15
t-test experimental design Q: Are people actually faster with multitouch than with mouse and keyboard? H1: People will be faster at selecting and resizing a target with multitouch than with mouse+kbd INDEPENDENT DEPENDENT faster interaction input (multitouch) input (mouse+kbd) 16
t-test experimental design • one independent variable with two levels and a measurable dependent variable • often we compare one condition of the independent variable against a baseline condition • Is the movie scream 2 scarier than the original scream? People who take 707.031 perform better experiments than those who don’t. 17
t-test rationale • two samples are collected • their means can differ by little or lot • assumption: if the samples come from the same population, their means will be roughly similar • procedure: compare difference between collected sample means and expected sample means with the standard error • larger difference -> more confidence 18
t-test rationale Observed difference Expected difference between sample means between population means t = Estimate of the standard error of the difference between two sample means 19
t-test experimental design • independent-means t-test: for independent measures designs • dependent-means t-test: for repeated measures desings 20
independent t-test: assumptions • the distribution is normally distributed • data are measured at least at interval level • scores in different conditions are independent • homogeneity of variance * 21
independent t-test equation Observed difference Expected difference between sample means between population means t = Estimate of the standard error of the difference between two sample means 22
independent t-test equation 23
independent t-test reporting An independent samples t-test indicated that the difference in anxiety experienced from real spiders (M=47.0, SE=3.18) and from a pictures of a spider (M=40.0, SE=2.68) was not significant t(21.39)= -1.68, p>.05, albeit representing a medium sized effect r=.34 24
dependent t-test assumptions • the distribution is normally distributed • data are measured at least at interval level 25
dependent t-test equation Observed difference Expected difference between sample means between population means t = standard error of the differences 26
dependent t-test equation 27
dependent t-test result • df = N-1 • p = the exact probability that a value of t could occur if the null hypothesis were true. • t : when positive means that the first condition had larger mean than the second 28
dependent t-test reporting A paired samples t-test revealed that on average, participants experienced significantly greater anxiety from real spiders (M=47.0, SE=3.18), than from pictures of spiders (M=40.0, SE=2.68), t(11)=2.47, r=.60. 29
Non-parametric tests When assumptions failed 30
Mann-Whitney U / Wilcoxon summed rank for independent measures 31
Mann-Whitney U / Wilcoxon summed rank • #1: no difference • rank the data ignoring the groups • expect a similar number of high and low values • #2: if there’s difference • ranking the data • expect higher ranks in one of the groups. 32
Mann-Whitney U / Wilcoxon summed rank • sort all scores in ascending order • rank them (1~n) • tied ranks (where scores are the same) are averaged • add ranks for each group • subtract the mean rank = N(N+1) / 2 • W = sum of ranks - mean rank • calculate p-values (Monte Carlo or approximation) 33
Mann-Whitney U / Wilcoxon summed rank 34
Mann-Whitney U / Wilcoxon summed rank Depression levels in ecstasy users (Mdn 17.50) did not differ significantly from alcohol users (Mdn=16.00) the day after the drugs were taken, W=35.5, p=.286, r=-.25. However, by Wed, ecstasy users (Mdn=33.5) were significantly more depressed than alcohol users (Mdn=7.5), W=4, p<.001, r=-.78. 35
Wilcoxon signed-rank test for repeated measures 36
Wilcoxon signed-rank test • based upon differences between scores in the two conditions • rank the differences • if the difference is 0, data are excluded • get sum of negative and positive ranks for each condition 37
Wilcoxon signed-rank test • use mean (T) and SE to calculate significance • both are functions of the sample size • convert the test statistics to a z-score • if the values are bigger than 1.96, then the test is significant at p<0.05 38
Wilcoxon signed-rank test: reporting A Wilcoxon signed rank test revealed that for ecstasy users, depression levels were significantly higher on Wed(Mdn=33.50) than on Sun (Mdn=17.50, p=.047, r=-.56). However, for alcohol users, the opposite was the opposite was true, depression levels were significantly lower on Wed(Mdn=7.50) than on Sun (Mdn=16.0, p=.012, r= -.45). 39
Readings • Discovering statistics using R (Andy Field, Jeremy Miles, Zoe Field) 40
Recommend
More recommend