user research statistics quick guide
play

User Research Statistics Quick Guide Reference: Jeff Sauro and James - PowerPoint PPT Presentation

User Research Statistics Quick Guide Reference: Jeff Sauro and James R. Lewis, Quantifying the User Experience, 2 nd ed, Chapter 3, parts of Chapter 9 1 CS464, Spring 2017 Why? To completely answer usability questions we need to test every member


  1. User Research Statistics Quick Guide Reference: Jeff Sauro and James R. Lewis, Quantifying the User Experience, 2 nd ed, Chapter 3, parts of Chapter 9 1 CS464, Spring 2017

  2. Why? To completely answer usability questions we need to test every member of the population. This isn’t possible so we: • Test a sample population, then estimate what the values would be for the entire population. – Estimates are less accurate as the sample size gets smaller. • The value we really want is called a population parameter . 2 CS464, Spring 2017

  3. Confidence Intervals • Range of values that we believe will have a specific chance of containing the unknown population parameter. • A confidence interval is twice the margin of error of a measurement. • Strict interpretation is that we are 95% confident in the method of creating the confidence interval – not 95% confident of any particular interval. – So, if a 95% confidence interval is calculated as 0.7  0.28, we can say that we are 95% confident that the actual population parameter mean value is between 42% and 98%. If we run 100 tests with the same sample size from the population and compute the 95% confidence interval each time, on average 95 of those 100 intervals will contain the population parameter mean value. But that also means that 5 of them won’t contain it, and we don’t know which ones don’t contain it. – You can say that any value inside the interval is plausible, and any outside the interval are not (Smithson, 2003). – DO NOT say there is a 95% probability that the population parameter mean value is between 42% and 98%. 3 CS464, Spring 2017

  4. Confidence Intervals Affected by 3 things: • Confidence level: e.g. 95% confident • Variability of the population: estimated using the standard deviation • Sample size: usually the only thing a researcher can control – Confidence interval width has an inverse square root relationship with sample size. To halve the interval width, you must quadruple your sample size:  20% error with sample size of 20 means sample size of 80 to achieve  10% error. 4 CS464, Spring 2017

  5. Confidence intervals for binary response questions Did the user complete the task? Did the user encounter problem X? • Yes or No, coded as 1 or 0 • A sample completion rate (proportion) is the number of successes divided by the sample size • What is the likely range for the completion rate of the full population? – Compute a binomial confidence interval around the sample proportion. • Problem: Many computations are very inaccurate for small sample sizes E.g. Laplace/Wald Interval found in most statistics texts: – Very inaccurate with sample sizes less than around 100 – Inaccurate when proportion is close to 0 or to 1 – Instead of containing the proportion 95% of the time, it can be as low as 50 ‐ 60% of the time. – More likely to contain the actual proportion 70% of the time. So your calculated 95% interval is really a 70% confidence interval. 5 CS464, Spring 2017

  6. Exact Confidence Intervals • Unlike Wald intervals, these work even for small sample sizes. • Computationally intensive. • Conservative: – If you calculate a 95% exact confidence interval, it is guaranteed to contain the proportion at least 95 times out of 100. In fact this interval would contain the proportion closer to 99% of the time. – Makes the interval wider than needed. 6 CS464, Spring 2017

  7. Adjusted Wald Intervals • Add 2 success and 2 failures for 95% confidence intervals and then use the Wald formula. – Works well for small sample sizes – Works well when the proportion is close to 1 or to 0 • The number of successes/failures to add depends on the confidence desired, and is actually the critical value from the normal distribution for the level of confidence: – The critical value for 90% is 1.64 – The critical value for 95% is 1.96 – The critical value for 99% is 2.57 7 CS464, Spring 2017

  8. Adjusted Wald Wald Interval Interval 8 CS464, Spring 2017

  9. Confidence intervals for rating scale questions How difficult was this task (Likert scale)? • Code the scale data: e.g., from very difficult =1 to very easy =7 for a 7 ‐ point Likert scale. • Compute mean and standard deviation • Determine t ‐ distribution (table lookup). – t ‐ distribution takes sample size into account • Compute t ‐ confidence interval 9 CS464, Spring 2017

  10. t ‐ confidence Interval • Interval is 2 margins of error around the mean: (mean ‐ (margin of error)) to (mean + (margin of error)) • Margin of error: (critical value from t ‐ distribution) x (standard error) • Standard error is how much the sample mean can fluctuate given a sample size (standard deviation divided by square root of sample size) – Standard error has to do with the sample mean – Standard deviation has to do with the raw data • Confidence interval calculated from sample mean, standard error, sample size, critical value from t ‐ distribution (table lookup based on sample size and desired confidence level) 10 CS464, Spring 2017

  11. t ‐ confidence intervals Excel 2013: T.INV.2T() 11 CS464, Spring 2017

  12. Statistical Analyses on Ordinal Data • Problem: scale data is ordinal data; many people believe it is wrong to use it for statistical analysis. • Many experts believe it is OK to perform statistical analysis with it (including t ‐ test, analysis of variance, factor analysis); you just have to make sure you don’t draw any conclusions that assume ratio or interval data. – Ex: Average response on design A is a 4 (e.g., “ I like the design” ), and on design B it is a 2 (“ I don’t really like the design ”). Assume a t ‐ test indicates the difference is statistically significant. • You can ONLY claim there is a consistent difference between the responses. • You CANNOT claim that design A is twice as good as design B – this is a ratio data claim • You CANNOT claim that the difference between the 4 and 2 is equal to what a difference between 4 and 6 would be – this is an interval claim. 12 CS464, Spring 2017

  13. Confidence intervals for continuous questions How long does it take to do task X? • Task time data tends to be positively skewed, not a symmetrical distribution. • We need to decide a better center of distribution than the mean. • Median may be a better center. • Problems: – Variability based on the number of samples: odd number and it is the middle, even number and it is the average of 2 other points. With small sample sizes it can jump around a lot by just adding another few samples. – Bias: with small samples the median of completion times tends to consistently overestimate the population median. Whereas any mean is just as likely to overestimate as underestimate the population mean. • Better choice for small samples: Geometric mean – Sauro/Lewis found for sample sizes < 25, geometric mean has less bias than mean or median. – To compute geometric mean: 1. Convert raw data to natural log 2. Find mean of transformed values 3. Convert back by exponentiation 13 CS464, Spring 2017

  14. Log transforming confidence intervals • Generate the confidence levels using the natural logs – Compute standard deviation of the natural logs of the raw data and the natural log of the geometric mean – Use these numbers as in the t ‐ confidence intervals to compute the log of the confidence interval. – Take the exponents of these values to get the confidence interval. 14 CS464, Spring 2017

  15. ln ‐ based transform confidence intervals 15 CS464, Spring 2017

  16. Median confidence intervals • If the sample size is >25, use the median to compute the confidence intervals using the z ‐ distribution (also called normal distribution). • Similar computation to t ‐ confidence intervals: (sample size) x (0.5)  (( z ‐ distribution) x (standard error)) – 0.5 is for median calculation; the 75 th percentile number could be used (higher than 75% of all the values), or any other percentile – Standard error is square root of: ((sample size) x (0.5) x ( 1 ‐ 0.5)) • Again, 0.5 is for median and any other percentile can be used 16 CS464, Spring 2017

  17. Using a median with binomial distribution to estimate confidence intervals 17 CS464, Spring 2017

Recommend


More recommend