๐ : ESTIMATES AND TESTS Business Statistics
CONTENTS The correlation coefficient The rank correlation coefficient Testing the correlation coefficient Non-linear relationships Old exam question Further study
THE CORRELATION COEFFICIENT Correlation coefficient ๐ก ๐,๐ โช ๐ ๐,๐ = ๐ก ๐ ๐ก ๐ Or written in full ๐ ฯ ๐=1 ๐ฆ ๐ โ าง ๐ฆ ๐ง ๐ โ เดค ๐ง โช ๐ ๐,๐ = ๐ฆ 2 ฯ ๐=1 ๐ ๐ ๐ง 2 ฯ ๐=1 ๐ฆ ๐ โ าง ๐ง ๐ โ เดค Or using the sums-of-squares notation ๐๐ ๐,๐ โช ๐ ๐,๐ = ๐๐ ๐,๐ ๐๐ ๐,๐
THE CORRELATION COEFFICIENT Correlation coefficient โช for two related numerical variables (paired data: ๐, ๐ = ) ๐ฆ 1 , ๐ง 1 , ๐ฆ 2 , ๐ง 2 , โฆ , ๐ฆ ๐ , ๐ง ๐ โช โ1 โค ๐ โค 1 โช indicator of linear association between two numerical variables Alternative names: โช Pearson correlation coefficient โช Pearson product-moment correlation coefficient โช named after Karl Pearson, 1857-1936
THE CORRELATION COEFFICIENT Scatter plots showing various situations
THE CORRELATION COEFFICIENT because for correlation, the Some points to observe variables are standardized โช There is only a โsignโ relation between the correlation coefficient and the slope of the regression line โช if ๐ = 1 , the points fall on a straight line with slope>0 โช if ๐ = โ1 , the points fall on a straight line with slope<0 โช Interchanging ๐ and ๐ will not change the correlation coefficient โช so ๐ ๐,๐ = ๐ ๐,๐ โช Rescaling ๐ or ๐ will not change the correlation coefficient โช in particular, ๐ is not sensitive to changes in units of ๐ or ๐ โช Correlation coefficients are sensitive to outliers
THE CORRELATION COEFFICIENT Note: โช correlation implies no causality โช sources: tylervigen.com and forbes.com
THE CORRELATION COEFFICIENT
EXERCISE 1 Which figure has a larger correlation coefficient?
TESTING THE CORRELATION COEFFICIENT Can we do a hypothesis test on the correlation coefficient? First acknoweldge: โช ๐ is the correlation coefficient of the two samples โช ๐ is the correlation coefficient of the bivariate population โช so the null hypothesis would be ๐ผ 0 : ๐ = 0 or ๐ผ 0 : ๐ โฅ 0.3 etc. โช never ๐ = 0 or so!
TESTING THE CORRELATION COEFFICIENT And the null distribution? ๐ โช It is known that 1โ๐ 2 / ๐โ2 ~๐ข ๐โ2 Important limitation: ๐ โช The distribution of the test statistic 1โ๐ 2 / ๐โ2 is only for valid for ๐ = 0 (crucial in step 3) โช so we can only test ๐ผ 0 : ๐ = 0 โช fortunately, thatโs by far the most interesting hypothesis Test of ๐ = ๐ 0 โ 0 : Google โFisher transformationโ; not in this course
TESTING THE CORRELATION COEFFICIENT โช Step 1: โช ๐ผ 0 : ๐ = 0 ; ๐ผ 1 : ๐ โ 0 ; ๐ฝ = 0.05 โช Step 2: โช sample statistic: ๐ ; reject for โtoo smallโ and โtoo largeโ values โช Step 3: ๐ โช if ๐ผ 0 is true, 1โ๐ 2 / ๐โ2 ~๐ข ๐โ2 โช normally distributed populations needed โช Step 4: โช as usual (insert ๐ for calculated value of ๐ ) โช Step 5: โช as usual
TESTING THE CORRELATION COEFFICIENT What is the meaning of rejecting ๐ผ 0 : ๐ = 0 ? Conclude: there is a significant linear correlation between ๐ and ๐ โช meaning: the correlation is not 0 โช do not conclude: ๐ causes ๐ (or ๐ causes ๐ ) โช do not conclude: ๐ has a large influence on ๐ (or the other way around) There is an important difference between a correlation coefficient and a regression coefficient โช we will come back to this soon
NON-LINEAR RELATIONSHIPS What to do in case of a non-linear relation? โช Two suggestions: โช transform, e.g. log ๐ vs log ๐ โช use ranked data
NON-LINEAR RELATIONSHIPS Suggestion 1: log transformation โช life expectancy vs. GDP/capita โช โ life expectancy vs. log(GDP/capita) โช ๐ = 0.646 โช โ 0.774
NON-LINEAR RELATIONSHIPS โช Note: zero linear correlation does not exclude strong non- linear relation โช e.g., quadratic
NON-LINEAR RELATIONSHIPS Solution 2: with ranked data โช replace data ( ๐ and ๐ ) by ranks (โ ๐ ๐ and ๐ ๐ ) โช compute the (Pearson) correlation coefficient of ๐ ๐ and ๐ ๐ ๐ ๐ ๐, ๐ = ๐ ๐ ๐ , ๐ ๐ โช This is the rank correlation coefficient ๐ ๐ โช Also Spearman correlation coefficient โช after Charles Spearman, 1863-1945
NON-LINEAR RELATIONSHIPS Of course many properties for ๐ also hold for ๐ ๐ : โช โ1 โค ๐ ๐ โค 1 โช If ๐ ๐ > 0 increasing (decreasing) ๐ฆ -values tend to be accompanied by increasing (decreasing) ๐ง -values โช If ๐ ๐ < 0 increasing (decreasing) ๐ฆ -values tend to be accompanied by decreasing (increasing) ๐ง -values
NON-LINEAR RELATIONSHIPS Example โช life expectancy vs. GDP/capita โช or expectancy vs. log GDP/capita โช ๐ ๐ = 0.828 โช for both (obviously!)
NON-LINEAR RELATIONSHIPS Can we also test hypotheses for the rank correlation coefficient? โช i.e. ๐ผ 0 : ๐ ๐ = 0 We can use similar but slightly different test as for ๐ ๐ ๐ โช i.e. 1/ ๐โ1 ~๐ 0,1 โช which requires ๐ โฅ 20 , but not normality of ๐ and ๐
OLD EXAM QUESTION 21 May 2015, Q1k-l
FURTHER STUDY Doane & Seward 5/E 12.1, 16.7 Tutorial exercises week 5 hypothesis test
Recommend
More recommend