Statistical Methods by Robert W. Lindeman WPI, Dept. of Computer - PowerPoint PPT Presentation

Statistical Methods by Robert W. Lindeman WPI, Dept. of Computer Science gogo@wpi.edu

Descriptive Methods  Frequency distributions  How many people were similar in the sense that according to the dependent variable, they ended up in the same bin  Table  histogram (vs. bar graph)  Frequency polygon  Pie chart R.W. Lindeman - WPI Dept. of Computer Science 2

Descriptive Methods (cont.)  Distributional shape  Normal distribution (bell curve)  Skewed distribution  Positively skewed (pointing high)  Negatively skewed (pointing low)  Multimodal (bimodal)  Rectangular  Kurtosis  High peak/thin tails (leptokurtic)  Low peak/thick tails (platykurtic) R.W. Lindeman - WPI Dept. of Computer Science 3

Descriptive Methods (cont.)  Central tendency  Mode  Most frequent score  Median  Divides the scores into two, equally sized parts  Mean  Sum of the scores divided by the number of scores  Normal distribution: mode ≈ median ≈ mean  Positive skew: mode < median < mean  Negative skew: mean < median < mode R.W. Lindeman - WPI Dept. of Computer Science 4

Descriptive Methods (cont.)  Measures of variability  Dispersion (level of sameness )  Homogeneous vs. heterogeneous  Range  max - min of all the scores  Interquartile range  max - min of the middle 50% of scores  Box-and-whisker plot  Standard deviation ( SD , s , σ , or sigma )  Good estimate of range: 4 * SD  Variance ( s 2 or σ 2 ) R.W. Lindeman - WPI Dept. of Computer Science 5

Descriptive Methods (cont.)  Standard scores  How many SDs a score is from the mean  z -score: mean = 0, each SD = +/-1  z -score of +2.0 means the score is 2 SDs above the mean  T -score: mean = 50, each SD = +/-10  T -score of 70 means the score is 2 SDs above the mean R.W. Lindeman - WPI Dept. of Computer Science 6

Bivariate Correlation  Discover whether a relationship exists  Determine the strength of the relationship  Types of relationship  High-high, low-low  High-low, low-high  Little systematic tendency R.W. Lindeman - WPI Dept. of Computer Science 7

Bivariate Correlation (cont.)  Scatter plot  Correlation coefficient: r -1.00 0.00 +1.00 •Negatively correlated •Positively correlated •Inverse relationship •Direct relationship •High-low, low-high •High-high, low-low High Low High Strong Weak Strong R.W. Lindeman - WPI Dept. of Computer Science 8

Bivariate Correlation (cont.)  Quantitative variables  Measurable aspects that vary in terms of intensity  Rank ; Ordinal scale : Each subject can be put into a single bin among a set of ordered bins  Raw score : Actual value for a given subject. Could be a composite score from several measured variables  Qualitative variables  Which categorical group does one belong to?  E.g., I prefer the Grand Canyon over Mount Rushmore  Nominal : Unordered bins  Dichotomy : Two groups (e.g., infielders vs. outfielders) R.W. Lindeman - WPI Dept. of Computer Science 9

Reliability and Validity  Reliability  To what extent can we say that the data are consistent?  Validity  A measuring instrument is valid to the extent that it measures what it purports to measure. R.W. Lindeman - WPI Dept. of Computer Science 10

Inferential Statistics  Definition: To make statements beyond description  Generalize  A sample is extracted from a population  Measurement is done on this sample  Analysis is done  An educated guess is made about how the results apply to the population as a whole R.W. Lindeman - WPI Dept. of Computer Science 11

Motivation  Actual testing of the whole population is too costly (time/money)  "Tangible population"  Population extends into the future  "Abstract population"  Four questions  What is/are the relevant populations?  How will the sample be extracted?  What characteristic of those sampled will serve as the measurement target?  What will be the study's statistical focus? R.W. Lindeman - WPI Dept. of Computer Science 12

Statistical Focus  What statistical tools should be used?  Even if we want the "average," which measure of average should we use? R.W. Lindeman - WPI Dept. of Computer Science 13

Estimation  Sampling error  The amount a sample value differs from the population value  This does not mean there was an error in the method of sampling, but is rather part of the natural behavior of samples  They seldom turn out to exactly mirror the population  Sampling distribution  The distribution of results of several samplings of the population  Standard error  SD of the sampling distribution R.W. Lindeman - WPI Dept. of Computer Science 14

Analyses of Variance (ANOVAs)  Determine whether the means of two (or more) samples are different  If we've been careful , we can say that the treatment is the source of the differences  Need to make sure we have controlled everything else!  Treatment order  Sample creation  Normal distribution of the sample  Equal variance of the groups R.W. Lindeman - WPI Dept. of Computer Science 15

Types of ANOVAs  Simple (one-way) ANOVA  One independent variable  One dependent variable  Between-subjects design  Two-way ANOVA  Two independent variables, and/or  Two dependent variables  Between-subjects design R.W. Lindeman - WPI Dept. of Computer Science 16

Types of ANOVAs (cont.)  One-way repeated-measures ANOVA  One independent variable  One dependent variable  Within-subjects design  Two-way repeated-measures ANOVA  Two independent variables, and/or  Two dependent variables  Within-subjects design R.W. Lindeman - WPI Dept. of Computer Science 17

Types of ANOVAs (cont.)  Main effects vs. interaction effect  Main effects present in conjunction with other effects  Post-hoc tests  Tukey's HSD test  Equal sample sizes  Scheffé test  Unequal sample sizes R.W. Lindeman - WPI Dept. of Computer Science 18

Types of ANOVAs (cont.)  Mixed ANOVA  2 x 3  Time of day  Real Walking / Walking in-place / Joystick R.W. Lindeman - WPI Dept. of Computer Science 19

References  Schuyler W. Huck Reading Statistics and Research , Fourth Edition, Pearson Education Inc., 2004. R.W. Lindeman - WPI Dept. of Computer Science 20

Statistical Methods by Robert W. Lindeman WPI, Dept. of Computer - PowerPoint PPT Presentation

Statistical Methods by Robert W. Lindeman WPI, Dept. of Computer Science gogo@wpi.edu Descriptive Methods Frequency distributions How many people were similar in the sense that according to the dependent variable, they ended up in the

Statistical methods in bioinformatics Brief introduction, statistical models, dimension

STK-IN4300 Statistical Learning Methods in Data Science Statistical Boosting Boosting as a

STK-IN4300 Methods using Derived Input Directions Statistical Learning Methods in Data Science

Statistical Methods Statistical Methods Descriptive Inferential Statistics Statistics

Statistical Methods for Particle Physics Day 2: Statistical Tests and Limits

Information Session on Statistical Methods Offerings in LHAE and OISE Prof. Anna Katyn

Statistical methods for the detection of continuous gravitational waves M . A L E S S A N D R A

Statistical Methods Carey Williamson Department of Computer Science University of Calgary

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

Categorical Data Clustering Using Statistical Methods and Neural Networks . Kudov 1 , H.

Selected Bibliography for Statistical Methods (and Clinical Papers) for Assessing Correlates of

Thwart statistical analysis Shannon in the 1940s suggested two methods: Diffusion make

STK-IN4300 Model Assessment and Selection Statistical Learning Methods in Data Science Bias,

Introduction to Statistical Process Control Statistical Process Control (SPC) uses seven major

STK-IN4300 The bet on sparsity principle Statistical Learning Methods in Data Science

Interplay between statistical and deterministic methods Mihaela Pricop, Frank Bauer Institute

III.4 Statistical Language Models III.4 Statistical LM (MRS book, Chapter 12*) 4.1 What is

Statistical Methods Robert W. Lindeman Worcester Polytechnic Institute Department of Computer

Better Access to Water Quality Statistical and Assessment Methods: Developing a New Component of

Developing Bioanalytical Methods Balancing the Statistical Tightrope Lee: can I use this

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

STK-IN4300 Piecewise polynomials and splines Smoothing splines Statistical Learning Methods in