business statistics
play

Business Statistics CONTENTS The correlation coefficient The rank - PowerPoint PPT Presentation

: ESTIMATES AND TESTS Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient Testing the correlation coefficient Non-linear relationships Old exam question Further study THE CORRELATION COEFFICIENT


  1. ๐œ : ESTIMATES AND TESTS Business Statistics

  2. CONTENTS The correlation coefficient The rank correlation coefficient Testing the correlation coefficient Non-linear relationships Old exam question Further study

  3. THE CORRELATION COEFFICIENT Correlation coefficient ๐‘ก ๐‘Œ,๐‘ โ–ช ๐‘  ๐‘Œ,๐‘ = ๐‘ก ๐‘Œ ๐‘ก ๐‘ Or written in full ๐‘œ ฯƒ ๐‘—=1 ๐‘ฆ ๐‘— โˆ’ าง ๐‘ฆ ๐‘ง ๐‘— โˆ’ เดค ๐‘ง โ–ช ๐‘  ๐‘Œ,๐‘ = ๐‘ฆ 2 ฯƒ ๐‘—=1 ๐‘œ ๐‘œ ๐‘ง 2 ฯƒ ๐‘—=1 ๐‘ฆ ๐‘— โˆ’ าง ๐‘ง ๐‘— โˆ’ เดค Or using the sums-of-squares notation ๐‘‡๐‘‡ ๐‘Œ,๐‘ โ–ช ๐‘  ๐‘Œ,๐‘ = ๐‘‡๐‘‡ ๐‘Œ,๐‘Œ ๐‘‡๐‘‡ ๐‘,๐‘

  4. THE CORRELATION COEFFICIENT Correlation coefficient โ–ช for two related numerical variables (paired data: ๐‘Œ, ๐‘ = ) ๐‘ฆ 1 , ๐‘ง 1 , ๐‘ฆ 2 , ๐‘ง 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ง ๐‘œ โ–ช โˆ’1 โ‰ค ๐‘  โ‰ค 1 โ–ช indicator of linear association between two numerical variables Alternative names: โ–ช Pearson correlation coefficient โ–ช Pearson product-moment correlation coefficient โ–ช named after Karl Pearson, 1857-1936

  5. THE CORRELATION COEFFICIENT Scatter plots showing various situations

  6. THE CORRELATION COEFFICIENT because for correlation, the Some points to observe variables are standardized โ–ช There is only a โ€œsignโ€ relation between the correlation coefficient and the slope of the regression line โ–ช if ๐‘  = 1 , the points fall on a straight line with slope>0 โ–ช if ๐‘  = โˆ’1 , the points fall on a straight line with slope<0 โ–ช Interchanging ๐‘Œ and ๐‘ will not change the correlation coefficient โ–ช so ๐‘  ๐‘Œ,๐‘ = ๐‘  ๐‘,๐‘Œ โ–ช Rescaling ๐‘Œ or ๐‘ will not change the correlation coefficient โ–ช in particular, ๐‘  is not sensitive to changes in units of ๐‘Œ or ๐‘ โ–ช Correlation coefficients are sensitive to outliers

  7. THE CORRELATION COEFFICIENT Note: โ–ช correlation implies no causality โ–ช sources: tylervigen.com and forbes.com

  8. THE CORRELATION COEFFICIENT

  9. EXERCISE 1 Which figure has a larger correlation coefficient?

  10. TESTING THE CORRELATION COEFFICIENT Can we do a hypothesis test on the correlation coefficient? First acknoweldge: โ–ช ๐‘  is the correlation coefficient of the two samples โ–ช ๐œ is the correlation coefficient of the bivariate population โ–ช so the null hypothesis would be ๐ผ 0 : ๐œ = 0 or ๐ผ 0 : ๐œ โ‰ฅ 0.3 etc. โ–ช never ๐‘  = 0 or so!

  11. TESTING THE CORRELATION COEFFICIENT And the null distribution? ๐‘† โ–ช It is known that 1โˆ’๐‘† 2 / ๐‘œโˆ’2 ~๐‘ข ๐‘œโˆ’2 Important limitation: ๐‘† โ–ช The distribution of the test statistic 1โˆ’๐‘† 2 / ๐‘œโˆ’2 is only for valid for ๐œ = 0 (crucial in step 3) โ–ช so we can only test ๐ผ 0 : ๐œ = 0 โ–ช fortunately, thatโ€™s by far the most interesting hypothesis Test of ๐œ = ๐œ 0 โ‰  0 : Google โ€œFisher transformationโ€; not in this course

  12. TESTING THE CORRELATION COEFFICIENT โ–ช Step 1: โ–ช ๐ผ 0 : ๐œ = 0 ; ๐ผ 1 : ๐œ โ‰  0 ; ๐›ฝ = 0.05 โ–ช Step 2: โ–ช sample statistic: ๐‘† ; reject for โ€œtoo smallโ€ and โ€œtoo largeโ€ values โ–ช Step 3: ๐‘† โ–ช if ๐ผ 0 is true, 1โˆ’๐‘† 2 / ๐‘œโˆ’2 ~๐‘ข ๐‘œโˆ’2 โ–ช normally distributed populations needed โ–ช Step 4: โ–ช as usual (insert ๐‘  for calculated value of ๐‘† ) โ–ช Step 5: โ–ช as usual

  13. TESTING THE CORRELATION COEFFICIENT What is the meaning of rejecting ๐ผ 0 : ๐œ = 0 ? Conclude: there is a significant linear correlation between ๐‘Œ and ๐‘ โ–ช meaning: the correlation is not 0 โ–ช do not conclude: ๐‘Œ causes ๐‘ (or ๐‘ causes ๐‘Œ ) โ–ช do not conclude: ๐‘Œ has a large influence on ๐‘ (or the other way around) There is an important difference between a correlation coefficient and a regression coefficient โ–ช we will come back to this soon

  14. NON-LINEAR RELATIONSHIPS What to do in case of a non-linear relation? โ–ช Two suggestions: โ–ช transform, e.g. log ๐‘Œ vs log ๐‘ โ–ช use ranked data

  15. NON-LINEAR RELATIONSHIPS Suggestion 1: log transformation โ–ช life expectancy vs. GDP/capita โ–ช โ†’ life expectancy vs. log(GDP/capita) โ–ช ๐‘  = 0.646 โ–ช โ†’ 0.774

  16. NON-LINEAR RELATIONSHIPS โ–ช Note: zero linear correlation does not exclude strong non- linear relation โ–ช e.g., quadratic

  17. NON-LINEAR RELATIONSHIPS Solution 2: with ranked data โ–ช replace data ( ๐‘Œ and ๐‘ ) by ranks (โ†’ ๐‘Œ ๐‘  and ๐‘ ๐‘  ) โ–ช compute the (Pearson) correlation coefficient of ๐‘Œ ๐‘  and ๐‘ ๐‘  ๐‘  ๐‘‡ ๐‘Œ, ๐‘ = ๐‘  ๐‘Œ ๐‘  , ๐‘ ๐‘  โ–ช This is the rank correlation coefficient ๐‘  ๐‘‡ โ–ช Also Spearman correlation coefficient โ–ช after Charles Spearman, 1863-1945

  18. NON-LINEAR RELATIONSHIPS Of course many properties for ๐‘  also hold for ๐‘  ๐‘‡ : โ–ช โˆ’1 โ‰ค ๐‘  ๐‘‡ โ‰ค 1 โ–ช If ๐‘  ๐‘‡ > 0 increasing (decreasing) ๐‘ฆ -values tend to be accompanied by increasing (decreasing) ๐‘ง -values โ–ช If ๐‘  ๐‘‡ < 0 increasing (decreasing) ๐‘ฆ -values tend to be accompanied by decreasing (increasing) ๐‘ง -values

  19. NON-LINEAR RELATIONSHIPS Example โ–ช life expectancy vs. GDP/capita โ–ช or expectancy vs. log GDP/capita โ–ช ๐‘  ๐‘‡ = 0.828 โ–ช for both (obviously!)

  20. NON-LINEAR RELATIONSHIPS Can we also test hypotheses for the rank correlation coefficient? โ–ช i.e. ๐ผ 0 : ๐œ ๐‘‡ = 0 We can use similar but slightly different test as for ๐œ ๐‘† ๐‘‡ โ–ช i.e. 1/ ๐‘œโˆ’1 ~๐‘‚ 0,1 โ–ช which requires ๐‘œ โ‰ฅ 20 , but not normality of ๐‘Œ and ๐‘

  21. OLD EXAM QUESTION 21 May 2015, Q1k-l

  22. FURTHER STUDY Doane & Seward 5/E 12.1, 16.7 Tutorial exercises week 5 hypothesis test

Recommend


More recommend