statistics toolbox in
play

Statistics Toolbox in A Review of Analysis Techniques for - PowerPoint PPT Presentation

Statistics Toolbox in A Review of Analysis Techniques for Scientific Research Professional Development Opportunity for the Flow Cytometry Core Facility October 12, 2018 LKG Consulting Email: consulting.lkg@gmail.com Website:


  1. Statistics Toolbox in A Review of Analysis Techniques for Scientific Research Professional Development Opportunity for the Flow Cytometry Core Facility October 12, 2018 LKG Consulting Email: consulting.lkg@gmail.com Website: www.consultinglkg.com

  2. The goal of this workshop is to give you the knowledge & tools to be confident in your ability to collect & analyze you’re data as well as correctly interpret your results… …Think of me as your new resource!

  3. A little about me… Laura Gray-Steinhauer www.ualberta.ca/~lkgray BSc in Mathematics, Statistics and Environmental Studies (UVIC, 2005) MSc in Forest Biology and Management (UofA, 2008) PhD in Forest Biology and Management (UofA, 2011) Designated Professional Statistician with The Statistical Society of Canada (2014) Research : Climate Change, Policy Evaluation, Adaptation, Mitigation, Risk management for forest resources, Conservation…

  4. Workshop Schedule 8:15 – 8:30 Arrive to the Lab & Start up the computers 8:30 – 8:45 Welcome to the Workshop (housekeeping & today’s goals) Statistics Toolbox 8:45 – 9:15 Refresh useful vocabulary, introduce a decision tree to plan your analysis path Hypothesis Testing 9:15 – 9:45 Refresher on p-values, Type 1 and Type 2 error, and statistical power 9:45 – 10:00 Break Parametric versus Non-Parametric Tests 10:00 – 11:00 Testing for parametric assumptions, ANOVA, Permutational ANOVA Multivariate Statistics 11:00 – 11:30 Introduction to principle component analysis (PCA) 11:30 – 1:00 Work period (questions are welcome) After 1:00 Enjoy your weekend! This may be A LOT of information to absorb OR we may not cover the specific topic you came to learn in class today. Feel free to reach out to me via email with more questions: consulting.lkg@gmail.com.

  5. Workbook • Yours to keep! • R code is identified by Century Gothic font (everything else is Arial) • Arbitrary object names are bold to indicate these could change depending on what you name your variables. • Referenced data is provided at www.ualberta.ca/~lkgray • Please contact me to obtain permission to redistribute content outside of the workshop attendees. Topics Included : Topics Included : Topics Included : • • • Descriptive statistics Permutational ANOVA & T-tests Non-linear regression • • • Confidence intervals Z-test for Proportions Logistic regression • • • Data distributions Chi-squared test Binomial ANOVA • • • Parametric assumptions Outlier tests and treatments Principle component analysis • • T-tests Correlation (PCA) • • • ANOVA Linear regression Discriminant analysis • • • ANCOVA Multiple linear regression Multivariate analysis of variance • • Non-parametric tests Akaike Information Criterion (MANOVA)

  6. R Project Website https://cran.r-project.org/index.html

  7. RStudio (IDE: Integrated Development Environment) Preferred among programmers, we will use it in this workshop https://www.rstudio.com/

  8. Statistics Toolbox “Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” Aaron Levenstein (Author)

  9. Statistical Vocabulary Statistical Term Real World Research World Population Class of things What you want to learn about E.g. Cancer patients E.g. Cancer patients in Alberta Sample Group representing a class What you actually study E.g. 1000 cancer patients in E.g. 1000 cancer patients from 10 Alberta treatment centres in Alberta Experimental Unit Individual thing Individual research subject E.g. each of the 1000 cancer E.g. Cancer patients n=1000 patients Hospital populations n=10 (depends on research question) Dependent Property of things What you measure about E.g. white blood cell count Variable subjects E.g. white blood cell count Independent Environment of things What you think might influence E.g. Treatment options, climate, Variable dependent variable etc. E.g. Amount of treatment, combination of treatments, etc. Data Values of variables What you record/information you collect

  10. Other important statistical terms Also see Appendix 1 in your workbook • Experiment – any controlled process of study which results in data collection, and which the outcome is unknown • Descriptive statistics – numerical/graphical summary of data • Inferential statistics – predict or control the values of variables (make conclusions with) • Statistical inference – to makes use of information from a sample to draw conclusions (inferences) about the population from which the sample was taken • Parameter – an unknown value (needs to be estimated) used to represent a population characteristic (e.g. population mean) • Statistic – estimation of parameter (e.g. mean of a sample) • Sampling distribution (aka. Probability distribution or Probability density function) – probability associated with each possible value of a variable • Error - difference between an observed value (or calculated) value and its true (or expected) value

  11. Statistics Toolbox What is the goal of my analysis? How many variables do I want to include in my analysis? What kind of data do I have to Does my data meet answer my the analysis research question? assumptions?

  12. Statistics Toolbox Analysis Goal Parametric Non-Parametric Binomial Assumptions Met Alternative if fail assumptions Binary data/Event likelihood Describe data Mean Median Proportions characteristics Standard deviation Quartiles Standard error Percentiles Etc. Probability distributions are always appropriate to describe data. Graphics are always appropriate to describe data. Compare 2 T-test Wilcox Rank-Sum Test Z-Test for proportions distinct/independent Paired t-test Klomogorov-Smirnov Test groups Permutational T-test Compare > 2 ANOVA Kruskall Wallace Test Chi-Squared Test distinct/independent Multi-Way ANOVA Friedman Rank Test Binomial ANOVA groups ANCOVA Permutational ANOVA Blocking Pearson’s Estimate the degree Spearman rank correlation Logistic regression Kendall’s rank correlation of association correlation between 2 variables Predict outcome Linear regression Non-linear regression Logistic regression based on relationship Multiple linear Odds Ratio regression

  13. Parametric Non-Parametric Regression Binomial If you have a continuous response variable … … and one predictor variable Predictor is categorical Predictor is continuous Pearson’s Two > Two Correlation treatment levels treatment levels Spearman’s Rank T-Test Permutational One-Way Kruskall Correlation T-Test ANOVA Wallace Test Kendall’s Rank Klomogorov Freidman Correlation Smirniov (KS) Test Rank Test Linear/Non- Wilcox Test linear Regression What you get • • Correlation coefficient P-value indicating if 2 groups are • P-value indicating there is a indicating direction and significantly different significant effect of “treatment”. magnitude of relationship. • Need pairwise comparisons to find • “Goodness of fit” indicting how where the difference between groups well predictor is linked to occurs. response (R 2 or AIC).

  14. Parametric Non-Parametric Regression Binomial … and two or more predictor variables Predictor is categorical Predictor is continuous Two or more Multiple treatment levels for each predictor Regression Multi-Way ANOVA Permutational ANOVA Blocking Blocking ANCOVA What you get • • P-value indicating if there is a significant effect of each treatment. Fit of how well predictors are linked to response • Size of a significant effect (no interactions). variable (Adjusted R2, AIC) • Need to consider the possibility of interactions. • P-values to indicate which • Need pairwise comparisons with adjusted p-values to determine the difference predictors significantly among treatments with interactions. affect the response • variable. Also get the effect of the blocking term and/or undesired covariate. • Do not need to consider the interaction between treatments and blocks and/or covariates.

  15. Parametric Non-Parametric Regression Binomial If you have a categorical response variable … … and two or more predictor variables … and one predictor variable Predictor is categorical Predictor is continuous Predictor is continuous Two or more Logistic Two Two or more treatment levels Regression treatment levels treatment levels Binomial Z-Test for Chi-squared ANOVA Proportions Test What you get • P-value indicating if there is • Fit of how well predictors • • P-value indicating if 2 P-value indicating a significant effect of each are linked to response groups are significantly there is a treatment. variable (Adjusted R2, different significant effect • Size of a significant effect (no AIC) of “treatment”. interactions). • P-values to indicate • Need pairwise • Need to consider the which predictors comparisons to possibility of interactions. significantly affect the find where the • response variable. Need pairwise comparisons difference with adjusted p-values to between groups determine the difference occurs. among treatments with interactions.

  16. The Lentil datasets (You are now a farmer) C A B Plot 1 Variety in each B A C A C B C A Farm 1 C A B A B C Farm 2 A C B Example research questions : C A Do yield of different lentil varieties differ at 2 farms? Individual lentil plants Do the varieties differ among themselves? Does the density of the plants impact their average height?

Recommend


More recommend