Evaluation Robert W. Lindeman Worcester Polytechnic Institute - PowerPoint PPT Presentation

CS-525V: Building Effective Virtual Worlds Evaluation Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science gogo@wpi.edu

Measuring Effectiveness  How do we know if our world/technique/ application/etc. is effective?  Is this a binary thing?  Why measure this?  How can we measure? R.W. Lindeman - WPI Dept. of Computer Science 2

Qualitative vs. Quantitative  Qualitative  Look at the data, and draw conclusions  Quantitative  Form a hypothesis, and try to prove it  Both are effective, Quantitative is less time consuming to do R.W. Lindeman - WPI Dept. of Computer Science 3

Objective vs. Subjective Measures  Objective  Measure using performance metrics  Speed, accuracy, etc.  Subjective  Measure using questionnaires, interviews, etc.  These can either be gathered using quantitative or qualitative means R.W. Lindeman - WPI Dept. of Computer Science 4

Descriptive Methods  Frequency distributions  How many people were similar in the sense that according to the dependent variable, they ended up in the same bin  Table  histogram (vs. bar graph)  Frequency polygon  Pie chart R.W. Lindeman - WPI Dept. of Computer Science 5

Descriptive Methods (cont.)  Distributional shape  Normal distribution (bell curve)  Skewed distribution  Positively skewed (pointing high)  Negatively skewed (pointing low)  Multimodal (bimodal)  Rectangular  Kurtosis  High peak/thin tails (leptokurtic)  Low peak/thick tails (platykurtic) R.W. Lindeman - WPI Dept. of Computer Science 6

Descriptive Methods (cont.)  Central tendency  Mode  Most frequent score  Median  Divides the scores into two, equally sized parts  Mean  Sum of the scores divided by the number of scores  Normal distribution: mode ≈ median ≈ mean  Positive skew: mode < median < mean  Negative skew: mean < median < mode R.W. Lindeman - WPI Dept. of Computer Science 7

Descriptive Methods (cont.)  Measures of variability  Dispersion (level of sameness )  Range  max - min of all the scores  Interquartile range  max - min of the middle 50% of scores  Box-and-whisker plot  Standard deviation ( SD , s , σ , or sigma )  Good estimate of range: 4 * SD  Variance ( s 2 or σ 2 ) R.W. Lindeman - WPI Dept. of Computer Science 8

Descriptive Methods (cont.)  Standard scores  How many SDs a score is from the mean  z -score: mean = 0, each SD = +/-1  z -score of +2.0 means the score is 2 SDs above the mean  T -score: mean = 50, each SD = +/-10  T -score of 70 means the score is 2 SDs above the mean R.W. Lindeman - WPI Dept. of Computer Science 9

Bivariate Correlation  Discover whether a relationship exists  Determine the strength of the relationship  Types of relationship  High-high, low-low  High-low, low-high  Little systematic tendency R.W. Lindeman - WPI Dept. of Computer Science 10

Bivariate Correlation (cont.)  Scatter plot  Correlation coefficient: r -1.00 0.00 +1.00 •Negatively correlated •Positively correlated •Inverse relationship •Direct relationship •High-low, low-high •High-high, low-low High Low High Strong Weak Strong R.W. Lindeman - WPI Dept. of Computer Science 11

Bivariate Correlation (cont.)  Quantitative variables  Measurable aspects that vary in terms of intensity  Rank ; Ordinal scale : Each subject can be put into a single bin among a set of ordered bins  Raw score : Actual value for a given subject. Could be a composite score from several measured variables  Qualitative variables  Which categorical group does one belong to?  E.g., I prefer the Grand Canyon over Mount Rushmore  Nominal : Unordered bins  Dichotomy : Two groups (e.g., infielders vs. outfielders) R.W. Lindeman - WPI Dept. of Computer Science 12

Reliability and Validity  Reliability  To what extent can we say that the data are consistent?  Validity  A measuring instrument is valid to the extent that it measures what it purports to measure. R.W. Lindeman - WPI Dept. of Computer Science 13

Inferential Statistics  Definition: To make statements beyond description  Generalize  A sample is extracted from a population  Measurement is done on this sample  Analysis is done  An educated guess is made about how the results apply to the population as a whole R.W. Lindeman - WPI Dept. of Computer Science 14

Motivation  Actual testing of the whole population is too costly (time/money)  "Tangible population"  Population extends into the future  "Abstract population"  Four questions  What is/are the relevant populations?  How will the sample be extracted?  What characteristic of those sampled will serve as the measurement target?  What will be the study's statistical focus? R.W. Lindeman - WPI Dept. of Computer Science 15

Statistical Focus  What statistical tools should be used?  Even if we want the "average," which measure of average should we use? R.W. Lindeman - WPI Dept. of Computer Science 16

Estimation  Sampling error  The amount a sample value differs from the population value  This does not mean there was an error in the method of sampling, but is rather part of the natural behavior of samples  They seldom turn out to exactly mirror the population  Sampling distribution  The distribution of results of several samplings of the population  Standard error  SD of the sampling distribution R.W. Lindeman - WPI Dept. of Computer Science 17

Analyses of Variance (ANOVAs)  Determine whether the means of two (or more) samples are different  If we've been careful , we can say that the treatment is the source of the differences  Need to make sure we have controlled everything else!  Treatment order  Sample creation  Normal distribution of the sample  Equal variance of the groups R.W. Lindeman - WPI Dept. of Computer Science 18

Types of ANOVAs  Simple (one-way) ANOVA  One independent variable  One dependent variable  Between-subjects design  Two-way ANOVA  Two independent variables, and/or  Two dependent variables  Between-subjects design R.W. Lindeman - WPI Dept. of Computer Science 19

Types of ANOVAs (cont.)  One-way repeated-measures ANOVA  One independent variable  One dependent variable  Within-subjects design  Two-way repeated-measures ANOVA  Two independent variables, and/or  Two dependent variables  Within-subjects design R.W. Lindeman - WPI Dept. of Computer Science 20

Types of ANOVAs (cont.)  Main effects vs. interaction effect  Main effects present in conjunction with other effects  Post-hoc tests  Tukey's HSD test  Equal sample sizes  Scheffé test  Unequal sample sizes R.W. Lindeman - WPI Dept. of Computer Science 21

Types of ANOVAs (cont.)  Mixed ANOVA  2 x 3  Time of day  Real Walking / Walking in-place / Joystick R.W. Lindeman - WPI Dept. of Computer Science 22

References  Schuyler W. Huck Reading Statistics and Research , Fourth Edition, Pearson Education Inc., 2004. R.W. Lindeman - WPI Dept. of Computer Science 23

Evaluation Robert W. Lindeman Worcester Polytechnic Institute - PowerPoint PPT Presentation

CS-525V: Building Effective Virtual Worlds Evaluation Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science gogo@wpi.edu Measuring Effectiveness How do we know if our world/technique/ application/etc. is

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

SENSORY EVALUATION .. Basics of Sensory evaluation, Tools, Techniques, Methods and

Evaluation: Using CDCs Evaluation Framework By: Thomas J. Chapel, MA, MBA Chief Evaluation

Overview of Overview of Evaluation in Evaluation in the UN Secretariat the UN Secretariat Prepared

e-Bug Pack Evaluation 1 Evaluation Process Evaluation carried out in 3 countries Finland

An Evaluation of the Effectiveness of An Evaluation of the Effectiveness of School Zone Flashers

Sparsity and decomposition in semidefinite optimization Lieven Vandenberghe ECE Department, UCLA

Support Vector Machines (II): Non-linear SVMs LING 572 Advanced Statistical Methods for NLP

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

Determinacy for the complex moment problem via positive definite extensions Dariusz Cicho n

1 Clock skew optimization Another approach for sequential timing optimization

Cylindric Skew Schur Functions University of Minnesota Combinatorics Seminar 5 November 2004

CENG 4480 Lecture 10: Clock Bei Yu Reference : Chapter 11 Clock Distribution High speed

PARTIAL ACTIONS OF GROUPS ON ALGEBRAS Miguel Ferrero, with D. Bagio, W. Cort es, M.

Sambuz

Useful Links

Newsletter

Mail Us

Evaluation Robert W. Lindeman Worcester Polytechnic Institute - PowerPoint PPT Presentation

CS-525V: Building Effective Virtual Worlds Evaluation Robert W. Lindeman Worcester Polytechnic Institute Department of Computer Science gogo@wpi.edu Measuring Effectiveness How do we know if our world/technique/ application/etc. is

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation &amp; Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson &amp; Pyla UX Evaluation

SENSORY EVALUATION .. Basics of Sensory evaluation, Tools, Techniques, Methods and

Evaluation: Using CDCs Evaluation Framework By: Thomas J. Chapel, MA, MBA Chief Evaluation

Overview of Overview of Evaluation in Evaluation in the UN Secretariat the UN Secretariat Prepared

e-Bug Pack Evaluation 1 Evaluation Process Evaluation carried out in 3 countries Finland

An Evaluation of the Effectiveness of An Evaluation of the Effectiveness of School Zone Flashers

Sparsity and decomposition in semidefinite optimization Lieven Vandenberghe ECE Department, UCLA

Support Vector Machines (II): Non-linear SVMs LING 572 Advanced Statistical Methods for NLP

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

Determinacy for the complex moment problem via positive definite extensions Dariusz Cicho n

1 Clock skew optimization Another approach for sequential timing optimization

Cylindric Skew Schur Functions University of Minnesota Combinatorics Seminar 5 November 2004

CENG 4480 Lecture 10: Clock Bei Yu Reference : Chapter 11 Clock Distribution High speed

PARTIAL ACTIONS OF GROUPS ON ALGEBRAS Miguel Ferrero, with D. Bagio, W. Cort es, M.

Sambuz

Useful Links

Newsletter

Mail Us

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation