business statistics
play

Business Statistics CONTENTS Comparing two samples Comparing two - PowerPoint PPT Presentation

TWO S OR MEDIANS: COMPARISONS Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question Further study


  1. TWO ๐œˆ S OR MEDIANS: COMPARISONS Business Statistics

  2. CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question Further study

  3. COMPARING TWO SAMPLES It often happens that we want to compare two situations โ–ช do I sell more when there is music in my shop? โ–ช is the expensive machine more precise than the cheap one? โ–ช are adverisements on TV or internet equally profitable? โ–ช do people buy more on Tuesdays than on Wednesday? โ–ช in couples, who drinks more: the man or the woman? โ–ช etc.

  4. COMPARING TWO SAMPLES In all these questions we compare two populations โ–ช Situation 1: two populations (or sub-populations) with similar variable โ–ช sales in 105 days without music โ–ช sales in 96 days with music โ–ช Data matrix: two options SPSS requires this data presentation

  5. COMPARING TWO SAMPLES โ–ช Situation 2: one sample with paired observations โ–ช drinks of the man in 78 couples โ–ช drinks of the woman in the same 78 couples โ–ช Data matrix: one option only โ–ช Will be discussed in a later lecture

  6. COMPARING TWO UNRELATED SAMPLES Situation 1 โ–ช independent samples/unrelated samples โ–ช introduce symbols for the two random variables โ–ช e.g., using ๐‘Œ 1 en ๐‘Œ 2 โ–ช ๐‘Œ 1 with sample ๐‘Œ 1,1 , ๐‘Œ 1,2 , โ€ฆ , ๐‘Œ 1,๐‘œ 1 and ๐‘Œ 2 with sample ๐‘Œ 2,1 , ๐‘Œ 2,2 , โ€ฆ , ๐‘Œ 2,๐‘œ 2 โ–ช or using ๐‘Œ and ๐‘ โ–ช ๐‘Œ : ๐‘Œ 1 , ๐‘Œ 2 , โ€ฆ , ๐‘Œ ๐‘œ ๐‘Œ and ๐‘ : ๐‘ 1 , ๐‘ 2 , โ€ฆ , ๐‘ ๐‘œ ๐‘ โ–ช sample sizes can be different Or of course using โ€œmeaningfulโ€ indices: ๐‘Œ ๐ถ and ๐‘Œ ๐ป for Belgium and Germany. Not ๐ถ and ๐ป , because we need to stress that it is โ€œaboutโ€ a variable ๐‘Œ (like sales)

  7. COMPARING TWO UNRELATED SAMPLES We want to test hypothesis such as โ–ช are the means equal? โ–ช ๐ผ 0 : ๐œˆ ๐‘Œ = ๐œˆ ๐‘ or ๐ผ 0 : ๐œˆ 1 = ๐œˆ 2 or ๐ผ 0 : ๐œˆ ๐‘Œ 1 = ๐œˆ ๐‘Œ 2 or ... โ–ช are the variances equal? 2 or etc. 2 = ๐œ ๐‘ โ–ช ๐ผ 0 : ๐œ ๐‘Œ โ–ช are the proportions equal โ–ช ๐ผ 0 : ๐œŒ ๐‘Œ = ๐œŒ ๐‘ or etc. Also: โ–ช inequalities, like ๐ผ 0 : ๐œˆ ๐‘Œ โ‰ฅ ๐œˆ ๐‘ โ–ช and non-zero differences, like ๐ผ 0 : ๐œˆ ๐‘Œ = ๐œˆ ๐‘ + 85

  8. COMPARING TWO UNRELATED SAMPLES Context: โ–ช sample ๐‘Œ 1 : sales in ๐‘œ 1 = 105 days without music โ–ช sample ๐‘Œ 2 : sales in ๐‘œ 2 = 96 days with music General idea: ๐‘Œ 1 ~๐‘’๐‘—๐‘ก๐‘ข๐‘ ๐‘—๐‘๐‘ฃ๐‘ข๐‘—๐‘๐‘œ ๐œ„ 1 โ–ช เต  ๐œ„ 1 = ๐œ„ 2 ? ๐‘Œ 2 ~๐‘’๐‘—๐‘ก๐‘ข๐‘ ๐‘—๐‘๐‘ฃ๐‘ข๐‘—๐‘๐‘œ ๐œ„ 2

  9. COMPARING THE MEANS OF TWO UNRELATED SAMPLES Assumption (for now!): 2 โ–ช ๐‘Œ~๐‘‚ ๐œˆ ๐‘Œ ; ๐œ ๐‘Œ 2 โ–ช ๐‘~๐‘‚ ๐œˆ ๐‘ ; ๐œ ๐‘ โ–ช in words: both samples come from normally distributed populations with known variances Question โ–ช are ๐œˆ ๐‘Œ and ๐œˆ ๐‘ different? โ–ช can we test this, on the basis of the (limited!) evidence concerning าง ๐‘ฆ and เดค ๐‘ง ? โ–ช so, can we reject ๐ผ 0 : ๐œˆ ๐‘Œ = ๐œˆ ๐‘ ? So, the sampling To decide distribution of the 2 difference of โ–ช use เดค ๐‘Œ โˆ’ เดค ๐‘ ~๐‘‚ ๐œˆ เดค ๐‘ , ๐œ เดค ๐‘Œโˆ’ เดค ๐‘Œโˆ’ เดค ๐‘ means is normal

  10. าง COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had เดค ๐‘Œ โˆ’ ๐œˆ เดค ๐‘Œ ~๐‘‚ 0,1 ๐œ เดค ๐‘Œ As it turns out, for two samples, we have ๐‘Œ โˆ’ เดค เดค ๐‘ โˆ’ ๐œˆ เดค ๐‘Œ โˆ’ ๐œˆ เดค ๐‘ ~๐‘‚ 0,1 ๐œ เดค ๐‘Œโˆ’ เดค ๐‘ ๐‘ = ๐œˆ ๐‘Œ โˆ’ ๐œˆ ๐‘ follows from the null hypothesis โ–ช ๐œˆ เดค ๐‘Œ โˆ’ ๐œˆ เดค โ–ช for instance ๐ผ 0 : ๐œˆ ๐‘Œ = ๐œˆ ๐‘ or ๐ผ 0 : ๐œˆ ๐‘Œ โˆ’ ๐œˆ ๐‘ = 85 ๐‘ฆ and เดค ๐‘ง are obtained from the data โ–ช โ–ช but what is ๐œ เดค ๐‘ ? ๐‘Œโˆ’เดค

  11. COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had 2 2 = ๐œ ๐‘Œ ๐œ เดค ๐‘Œ ๐‘œ As it turns out, for two independent samples, we have 2 + ๐œ เดค 2 2 , so ๐œ เดค = ๐œ เดค ๐‘Œโˆ’ เดค ๐‘ ๐‘Œ ๐‘ 2 2 ๐œ ๐‘Œ + ๐œ ๐‘ ๐œ เดค ๐‘ = ๐‘Œโˆ’เดค ๐‘œ ๐‘Œ ๐‘œ ๐‘ โ–ช recall that variances add up when ๐‘Œ and ๐‘ are independent 2 but also ๐œ ๐‘Œโˆ’๐‘ 2 + ๐œ ๐‘ 2 + ๐œ ๐‘ โ–ช e.g., ๐œ ๐‘Œ+๐‘ 2 2 2 = ๐œ ๐‘Œ = ๐œ ๐‘Œ

  12. COMPARING THE MEANS OF TWO UNRELATED SAMPLES Example Context: โ–ช do I sell more when there is music in my shop? Experiment โ–ช on some days the music is turned on, on other days the music is turned off โ–ช you keep track of the sales during each day Data: โ–ช sample of sales on days with music ( ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ 105 ) โ–ช sample of sales on days without music ( ๐‘ง 1 , ๐‘ง 2 , โ€ฆ , ๐‘ง 96 ) Five step procedure

  13. COMPARING THE MEANS OF TWO UNRELATED SAMPLES โ–ช Step 1: โ–ช ๐ผ 0 : ๐œˆ ๐‘Œ = ๐œˆ ๐‘ ; ๐ผ 1 : ๐œˆ ๐‘Œ โ‰  ๐œˆ ๐‘ ; ๐›ฝ = 0.05 โ–ช Step 2: sample statistic: เดค ๐‘Œ โˆ’ เดค โ–ช ๐‘ reject for โ€œtoo largeโ€ and โ€œtoo smallโ€ values โ–ช โ–ช Step 3: ๐‘Œโˆ’เดค เดค ๐‘Œโˆ’เดค เดค ๐‘ โˆ’ ๐œˆ ๐‘Œ โˆ’๐œˆ ๐‘ ๐‘ โ–ช null distribution = ~๐‘‚ 0,1 ๐œ เดฅ ๐œ เดฅ ๐‘Œโˆ’เดฅ ๐‘Œโˆ’เดฅ ๐‘ ๐‘ โ–ช valid because ... in a minute we will supply โ–ช Step 4: full details and a worked โ–ช example ... ๐‘จ ๐‘‘๐‘๐‘š๐‘‘ = โ–ช ๐‘จ ๐‘‘๐‘ ๐‘—๐‘ข = โ–ช Step 5: reject or not reject because ... โ–ช

  14. COMPARING THE MEANS OF TWO UNRELATED SAMPLES โ–ช But, wait ... 2 and ๐œ ๐‘ 2 are known, while ๐œˆ ๐‘Œ โ–ช ... isnโ€™t it weird to assume that ๐œ ๐‘Œ and ๐œˆ ๐‘ are not known? โ–ช In reality the population variances will often be unknown as well! โ–ช remember we had the same problem in the one-sample case? โ–ช there we decided to estimate the value of ๐œ 2 with the value of ๐‘ก 2 โ–ช and paid a price of using the wider ๐‘ข -distribution โ–ช here we will do the same: estimate the two ๐œ 2 -values with two ๐‘ก 2 -values โ–ช and pay the same price: use ๐‘ข -dsitribution instead of ๐‘จ -distribution

  15. าง COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had เดค ๐‘Œ โˆ’ ๐œˆ เดค ๐‘Œ ~๐‘ข df ๐‘‡ เดค ๐‘Œ As it turns out, for two samples, we have ๐‘Œ โˆ’ เดค เดค ๐‘ โˆ’ ๐œˆ เดค ๐‘Œ โˆ’ ๐œˆ เดค ๐‘ ~๐‘ข df ๐‘‡ เดค ๐‘Œโˆ’ เดค ๐‘ ๐‘ = ๐œˆ ๐‘Œ โˆ’ ๐œˆ ๐‘ follows from the null hypothesis โ–ช ๐œˆ เดค ๐‘Œ โˆ’ ๐œˆ เดค ๐‘ฆ and เดค ๐‘ง are obtained from the data โ–ช โ–ช but what is ๐‘ก เดค ๐‘ ? ๐‘Œโˆ’เดค โ–ช and how to choose df ?

  16. COMPARING THE MEANS OF TWO UNRELATED SAMPLES Two options for ๐‘ก เดค ๐‘ : ๐‘Œโˆ’ เดค 2 and ๐œ ๐‘ 2 from ๐‘ก ๐‘Œ 2 and ๐‘ก ๐‘ 2 respectively โ–ช 1: estimating ๐œ ๐‘Œ 2 = ๐œ 2 and estimating ๐œ 2 as the 2 = ๐œ ๐‘ โ–ช 2: assuming ๐œ ๐‘Œ weighted average of both sample variances Both options lead to a different value of df

  17. COMPARING THE MEANS OF TWO UNRELATED SAMPLES Option 1: 2 and ๐œ ๐‘ 2 from ๐‘ก ๐‘Œ 2 and ๐‘ก ๐‘ 2 respectively (Welch โ€™s test) โ–ช estimating ๐œ ๐‘Œ 2 2 ๐‘ก ๐‘Œ + ๐‘ก ๐‘ ๐‘ก เดค ๐‘ = ๐‘Œโˆ’เดค ๐‘œ ๐‘Œ ๐‘œ ๐‘ โ–ช testing with ๐‘ข -distribution with Compare to 2 2 2 ๐œ ๐‘Œ + ๐œ ๐‘ 2 2 ๐‘œ ๐‘Œ + ๐‘ก ๐‘ ๐‘ก ๐‘Œ ๐œ เดค ๐‘ = ๐‘Œโˆ’เดค ๐‘œ ๐‘Œ ๐‘œ ๐‘ ๐‘œ ๐‘ df = 2 2 2 2 ๐‘ก ๐‘Œ ๐‘ก ๐‘ ๐‘œ ๐‘Œ ๐‘œ ๐‘ ๐‘œ ๐‘Œ โˆ’ 1 + ๐‘œ ๐‘ โˆ’ 1 quick rule, but bad approximation: ๐‘’๐‘” โ‰ˆ min ๐‘œ ๐‘Œ โˆ’ 1, ๐‘œ ๐‘ โˆ’ 1

  18. COMPARING THE MEANS OF TWO UNRELATED SAMPLES You can read this as Option 2: 2 +df ๐‘ ๐‘ก ๐‘ 2 df ๐‘Œ ๐‘ก ๐‘Œ โ–ช estimating the common ๐œ 2 from both samples df ๐‘Œ +df ๐‘ 2 and ๐‘ก ๐‘ a โ€œweighted meanโ€ of ๐‘ก ๐‘Œ 2 , the pooled variance ๐‘ก P 2 โ–ช 2 + ๐‘œ ๐‘ โˆ’ 1 ๐‘ก ๐‘ 2 2 = ๐‘œ ๐‘Œ โˆ’ 1 ๐‘ก ๐‘Œ ๐‘ก P ๐‘œ ๐‘Œ โˆ’ 1 + ๐‘œ ๐‘ โˆ’ 1 โ–ช and Compare to 2 2 ๐‘ก P + ๐‘ก P 2 2 ๐‘ก ๐‘Œ + ๐‘ก ๐‘ ๐‘ก เดค ๐‘ = ๐‘ก เดค ๐‘ = ๐‘Œโˆ’เดค ๐‘Œโˆ’ เดค ๐‘œ ๐‘Œ ๐‘œ ๐‘ ๐‘œ ๐‘Œ ๐‘œ ๐‘ โ–ช testing with ๐‘ข -distribution with You can read this as df = ๐‘œ ๐‘Œ โˆ’ 1 + ๐‘œ ๐‘ โˆ’ 1 = ๐‘œ ๐‘Œ + ๐‘œ ๐‘ โˆ’ 2 df ๐‘Œ + df ๐‘

  19. COMPARING THE MEANS OF TWO UNRELATED SAMPLES

  20. าง าง าง EXERCISE 1 When to use: ๐‘ฆโˆ’ เดค ๐‘ง a. ๐‘จ = 2 2 ๐œ๐‘Œ ๐œ๐‘ ๐‘œ๐‘Œ + ๐‘œ๐‘ ๐‘ฆโˆ’ เดค ๐‘ง b. ๐‘ข = 2 2 ๐‘ก๐‘Œ ๐‘ก๐‘ ๐‘œ๐‘Œ + ๐‘œ๐‘ ๐‘ฆโˆ’ เดค ๐‘ง c. ๐‘ข = 2 2 ๐‘ก๐‘„ ๐‘ก๐‘„ ๐‘œ๐‘Œ + ๐‘œ๐‘

  21. COMPARING THE MEANS OF TWO UNRELATED SAMPLES Use of SPSS a data set on Computer Anxiety Rating split by gender

  22. COMPARING THE MEANS OF TWO UNRELATED SAMPLES Results split by gender Results of ๐‘ข -test

  23. COMPARING THE MEANS OF TWO UNRELATED SAMPLES Zoom in ๐‘ข -test with pooled 2 = ๐œ ๐‘ 2 estimate of ๐œ ๐‘Œ value of the ๐‘ข -statistic ๐‘ข -test with separate ( ๐‘ข calc ) ๐‘ž -value degrees of freedom 2 and ๐œ ๐‘ 2 estimates of ๐œ ๐‘Œ (2-sided)

  24. COMPARING THE MEANS OF TWO UNRELATED SAMPLES And one more thing ... tests of the assumption of equal variance ๐‘ž -value for 2 = ๐œ ๐‘ 2 โ‰  ๐œ ๐‘ 2 versus ๐ผ 1 : ๐œ ๐‘Œ 2 ๐ผ 0 : ๐œ ๐‘Œ this test

  25. COMPARING THE MEANS OF TWO UNRELATED SAMPLES For these two tests, we need both เดค ๐‘Œ and เดค ๐‘ to be normally distributed โ–ช This means either of the following three requirements holds: โ–ช ๐‘Œ is a normally distributed population โ–ช ๐‘Œ has a symmetric distribution and ๐‘œ ๐‘Œ โ‰ฅ 15 โ–ช ๐‘œ ๐‘Œ โ‰ฅ 30 โ–ช Also for ๐‘ one of these requirements must hold โ–ช but not necessarily the same one as for ๐‘Œ

Recommend


More recommend