correlation and regression
play

Correlation and Regression Lecture 5 Objectives Outline: - PowerPoint PPT Presentation

Correlation and Regression Lecture 5 Objectives Outline: Situations when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful


  1. Correlation and Regression Lecture 5

  2. Objectives Outline: Situations when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Explain the principles of correlation and of regression ● Apply (appropriately), interpret and evaluate the legitimacy of, ● both in R Summarise and illustrate with appropriate R figures test results ● scientifically

  3. Choosing tests… Test of differences Test of relationship Explanatory variables: continuous Explanatory variables: discrete Response variable: continuous Response variable: continuous Test: regression/correlation Test: t-tests, anova

  4. Correlation and Regression Related but DIFFERENT techniques Correlation – association Regression – predictive relationship Linear (always do a scatterplot first!)

  5. Correlation vs regression Correlation ● Linear association ● No cause and effect ● Axes could be swapped Regression ● Linear relationship ● Cause and effect (explanatory and response) ● Axes cannot be swapped

  6. Correlation coefficients ● Measures how strong an association is between two variables. ● Several types of correlation coefficient ● Most commonly used parametric CC = Pearson’s Product Moment Denoted by ‘r’ ■ Ranges from -1 to +1 ■

  7. Types of correlation r ≈ -1 r ≈ 1 Highest scores on one Lowest scores on one axis associated with axis associated with highest scores on other highest scores on other

  8. Types of correlation

  9. Types of correlation No correlation! No LINEAR correlation! r ≈ 0 r ≈ 0

  10. How is r calculated? x = variable 1 y = variable 2 n = sample size

  11. Correlation: example 20 leaf samples: 10 compounds measured Investigate correlation between c1 and c2 ALWAYS DO A SCATTER PLOT

  12. Correlation: example Investigate normality Not great but we will carry on for demonstration purposes

  13. Correlation: Example The default method = c("pearson", "kendall", Running and interpreting the test "spearman") cor.test(comp$c2, comp$c1, method = " pearson") t -test of whether r is data: comp$c2 and comp$c1 different from zero t = 2.3132, df = 18, p-value = 0.03274 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.04589555 0.76018365 sample estimates: r could be between 0.046 cor 0.4786942 and 0.76 A wide margin, but positive Correlation coefficient, r

  14. Significance Correlation: Example Direction Statistics Reporting the result data: comp$c2 and comp$c1 t = 2.3132, df = 18, p-value = 0.03274 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.04589555 0.76018365 sample estimates: cor 0.4786942 There was a significant positive correlation between compounds 1 and 2 ( r = 0.48; t = 2.31; d.f. = 18; p = 0.033).

  15. Correlation: Example Understanding the significance test A t -test and works in the same way as all t-tests. Problem: Sensitive to sample size: Big n -> small s.e -> big t -> small p

  16. Correlation: Example Consider both the p value and the r value

  17. Correlation - nonparametric alternative Not sure about normality? Spearman’s Rank Correlation Running and interpreting the test cor.test(comp$c2, comp$c1, method = "spearman") Spearman's rank correlation rho data: comp$c2 and comp$c1 S = 664, p-value = 0.02605 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.5007519

  18. Regression Prediction One variable (the explanatory) causes the other (the response) Develops a best fitting straight line: ! = b 1 " + b 0 b1 = gradient b0 = y intercept

  19. Regression ! = b 1 " + b 0 ! i - the observed y for " i ŷ i - the predicted y for " i Residual: ! i - ŷ i Best fitting: minimising sum of squared residuals

  20. Regression Null hypothesis can be expressed as: b 1 = 0 ● x does not explain y ● Regression line doesn’t explain variance in y ● But all mean the same

  21. Regression: example Concentration of juvenile hormone (JH) and mandible length in stag beetles ALWAYS DO A SCATTER PLOT

  22. Regression: Example response ~ explanatory Running the test mod <- lm(data = stag, mand ~ jh) summary(mod)

  23. Regression: Example Interpreting the test Summary statistics about residuals

  24. Regression: Example b 0 (y intercept) Interpreting the test b 1 (gradient) t -test of b 0 = 0 Often not impt t -test of b 1 = 0 Always of interest Test of ‘model’ Same as t -test of b 1 = 0 for single regression

  25. Regression: Example Interpreting the test Proportion of y explained by x

  26. Significance Regression: Example Direction Statistics Reporting the result The concentration of juvenile hormone explained a significant amount of the variation (0.54) in stag beetle mandible length ( F = 16.6; d.f . = 1,4; p = 0.00113). The regression line is mandible length = (0.032*Jhconc) +0.419.

  27. Regression: Example Illustrating result Include the data and the line (the model) ggplot(data = reg,aes(x = QTL, y = pheno)) + geom_point(size=2) + xlim(0,15) + ylim(0,40) + xlab("Number of QTL") + ylab("Percentage of phenotype") + geom_smooth(method = "lm", se = FALSE) + theme_bw()

  28. 30 Spread should be similar in each group: equal variance Regression: Example Checking assumptions after running the regression Use the residuals plot(mod) Not that useful here - small dataset Should be approx 1:1 for normality

  29. Regression: Example Predicting from the model

  30. 32 Both Linear Summary ● Scatterplot first ● Correlation No cause and effect ● Axes could be swapped ● Do not line ● Quote r and its test ● Regression Explanatory and response ● Axes cannot be swapped ● Include line ● Quote model test and line ● and possibly r 2

  31. Objectives Outline: Situation when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Explain the principles of correlation and of regression ● Apply (appropriately), interpret and evaluate the legitimacy of, ● both in R Summarise and illustrate with appropriate R figures test results ● scientifically

Recommend


More recommend