CS 102 Human – Computer Interaction Lecture 17: Statistics for HCI Part III
Course updates • Idea log marks sent • Attendance and overall performance grades to follow • Guest lectures Ashish Goel, (Dec 3, Thursday)
Recap: R • Data Types Vectors: x <- c(10.1, 6.2, 3.1, 6.0, 21.9) Matrices: y<-matrix(1:20, nrow=5,ncol=4) Dataframes: d <- c(1,2,3,4) e <- c("red", "white", "red", NA) mydata <- data.frame(d,e) Factors
Recap: R • Importing Data CSV file: mydata <- read.csv(“chisq.csv", header = TRUE, row.names = 1) Excel file: library(xlsx) mydata <- read.xlsx("c:/myexcel.xlsx", 1) • Descriptive Statistics summary (df) mean(data) median(data)
Recap: Hypothesis Testing • H 0 : Null Hypothesis The difference observed is due to a sampling error • H 1 : Alternative Hypothesis The difference observed is a “ significant ” difference, due to the independent variable
Recap: Hypothesis Testing • p-value ue: How likely is the sample obtained, if if the null hypothesis holds true. • A threshold of significance = 0.05 (typically) • Example : Does the time taken to complete a transaction decrease when a design element is modified?
Recap: Hypothesis Testing If the null hypothesis is true, then the mean of your sampling distribution (the curve) before modification should be equal to mean after modification So, regardless of design modification, the mean should be the same if the null hypothesis holds before after
Recap: Hypothesis Testing Run a one-tailed t-test using the file “design.csv” t(14) = 4.8, p = 0.0001 The p-value is the probability of getting a sample like yours, that is 4.8 standard deviations 4.8 standard away from the mean, IF the deviations away from the null hypothesis is true mean Since the chance is very low (.01%), we reject the null hypothesis. Typical threshold = 0.05 (Less than 5% chance) Mean 1 σ 2 σ 3 σ 4 σ
Recap: Hypothesis Testing Accept Alternative Hypothesis The time decreases after design Null Alternative Hypothesis Hypothesis modification. Mean (Before) Mean (After) Errors possible: Type I error: You wrongly rejected the null hypothesis Threshold Type 2 error: You wrongly Type II error Type I error accepted the null hypothesis! Mean (Before) Mean (After)
Recap: Which Test When Group Type Quant antit itati ative ve Data Ordina nal l Data or Nomina nal l Data (Normali mality ty Quant antita itativ tive assumed umed) (Normal mality ty not assumed umed) Two unpaired groups Unpaired t test Mann-Whitney test Fisher's test Two paired groups Paired t test Wilcoxon on test McNemar' r's test More than two ANOVA Kruska kal-Wallis s test Chi-square test unmatched groups
Rank Sum Tests • Mann Whitney’s U Test When: Dependent variable is ordinal AND/OR normality cannot be assumed Compares: Medians of two independent groups Example 1 : Do men and women rate a product’s functionality differently?
Rank Sum Tests • Wilcoxon Signed Rank Test When: Dependent variable is ordinal AND/OR normality cannot be assumed Compares: Medians of two dependent groups (paired data) Example 1: Is there a difference in ratings for an interface before and after design modification?
Wilcoxon Signed Rank Test Example 1: Is there a difference in ratings for an interface before and after design modification? (7- Likert Scale) Before 4 3 2 4 2 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 What does your intuition say? Step 1: Calculate differences Step 2: Rank the differences Step 3: Sum up positive and negative ranks, choose lower W value
Wilcoxon Signed Rank Test Step 1: Difference Calculation Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign - - - - - + - - + - Diff 3 2 4 2 1 1 3 5 1 3
Wilcoxon Signed Rank Test Step 2: Ranking the differences: Before 4 3 2 4 3 5 4 1 6 2 After 7 5 6 6 4 4 7 6 5 5 Sign - - - - - + - - + - Diff 3 2 4 2 1 1 3 5 1 3 7 4.5 9 4.5 2 2 7 10 2 7 Rank -7 -4.5 -9 -4.5 -2 +2 -7 -10 +2 -7 Signed Rank
Wilcoxon Signed Rank Test Step 3: Summing positive and negative ranks: -7 -4.5 -9 -4.5 -2 +2 -7 -10 +2 -7 Signed Rank Sum of positive ranks = 4 Sum of negative ranks = 51 Check that W + + W - = n (n+1)/2 Choose the lower W value. Why?
Wilcoxon Signed Rank Test in R • X <- c(4,3,2,4,3,5,4,1,6,2) • Y <- c(7,5,6,6,4,4,7,6,5,5) • wilcox.test(X, Y, paired=T)
Wilcoxon Signed Rank Test in R • library(coin) • wilcoxsign_test(X ~ Y, distribution = "exact")
Wilcoxon Signed Rank Test in R • Effect size How powerful is your result? p-value is an indicator of significance Effect size indicates strength Measured by Cohen’s d or Pearson’s r • Effect size for rank tests r = Z-value/sqrt(total samples) 2.4095/sqrt(20) = 0.538
Wilcoxon Signed Rank Test in R • How to report your results: The medians of Before ratings and After ratings were 3.5 and 5.5, respectively. A Wilcoxon Signed-rank test showed that there is a significant effect of design modification (W = 4, Z=- 2.4095, p = 0.0136, r = .538) • p-value < 0.05: H 0 rejected
Wilcoxon Signed Rank Test Example 2: Do people visit different pages on your site at different times of the day? (5- Likert Scale) Time P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3 What does your intuition say? Calculate the W value!
Wilcoxon Signed Rank Test Step 1: Difference Calculation Morning 2 4 1 1 2 3 2 1 3 4 Evening 3 3 2 5 1 4 4 3 5 3 Sign - + - - + - - - - + Abs(Diff) 1 1 1 4 1 1 2 2 2 1
Wilcoxon Signed Rank Test Ranking the differences: Mornin 2 4 1 1 2 3 2 1 3 4 g Evening 3 3 2 5 1 4 4 3 5 3 Sign - + - - + - - - - + Abs(Dif 1 1 1 4 1 1 2 2 2 1 f) Rank 3.5 3.5 3.5 10 3.5 3.5 8 8 8 3.5 Signed -3.5 +3.5 -3.5 -10 +3.5 -3.5 -8 -8 -8 +3.5 Rank
Wilcoxon Signed Rank Test Summing positive and negative ranks: Signed -3.5 +3.5 -3.5 -10 +3.5 -3.5 -8 -8 -8 +3.5 Rank Sum of positive ranks = 10.5 Sum of negative ranks = 44.5 Check that W + + W - = n (n+1)/2 W = 10.5
Wilcoxon Signed Rank Test in R • GroupA <- c(2,4,1,1,2,3,2,1,3,4) • GroupB <- c(3,3,2,5,1,4,4,3,5,3) • wilcox.test(GroupA, GroupB, paired=T) : not significant
Question! If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other? What about extreme cases? What about interleaved cases? Download student.csv and studentinter.csv from course webpage
Question! If you have the grades of students from 2 sections of a class, can you tell if one class is better than the other? Student.csv: library(coin) wilcoxsign_test(Section1 ~ Section2, data = mydf, distribution="exact") W = 0, Z = -2.814, p-value = 0.001953 Studentinter.csv: wilcoxsign_test(inter$Section1 ~ inter$Section2, data = mydf, distribution="exact") W = 46, Z = -0.30779, p-value = 0.7871
Rank Sum Tests • Kruskal Wallis When: Dependent variable is ordinal AND/OR normality cannot be assumed Compares: More than two independent groups (unpaired data) Example 1: Is there a difference in ratings for 3 versions of interface?
Kruskal Wallis Test Is there a difference in ratings for 3 versions of interface? Rating <- c(1,2,5,3,2,1,1,3,2,1,4,3,6,5,2,6,1,6,5,4,9,6,7,7,5,1,8,9,6,5) Interface <- factor(c(rep(1,10),rep(2,10),rep(3,10))) data <- data.frame(Interface, Value) kruskal.test(Rating~Interface, data = data) p-value = 0.001073 Report: Significant effect of Interface was found( 2 (2)=13.7, p < 0.01).
Recap: Fisher’s Test What: Like Chi-square: nominal/categorical data When: small sample size (cell counts <10) A/B Testing for 2 website versions (click-rate) Compares: Means of two or more independent groups Assumptions: Independent samples
Fisher’s Test in R Do men and women differ in their preference for online surveys and personal interviews? PI PI Online ne Surveys ys Total Men 6 2 8 Women 8 4 12 Total 14 6 20 f <- read.csv(“f.csv”) fisher.test(f) : p-value = 1 : no significant difference
McNemar’sTest What: Paired chi-square When: Data is nominal/categorical AND paired Compares: Means of two or more dependent groups Example: Ease of use before and after interface change Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1
McNemar’sTest in R data <- matrix(c(16,29 , 14, 1), ncol=2, byrow=T) mcnemar.test(data) Before re Change ge After r Change ge Easy 16 29 Difficul cult 14 1 p-value = 0.03276 The change in interface was significant
Recommend
More recommend