TWO ๐ S OR MEDIANS: COMPARISONS Business Statistics
CONTENTS Comparing two samples Comparing two unrelated samples Comparing the means of two unrelated samples Comparing the medians of two unrelated samples Old exam question Further study
COMPARING TWO SAMPLES It often happens that we want to compare two situations โช do I sell more when there is music in my shop? โช is the expensive machine more precise than the cheap one? โช are adverisements on TV or internet equally profitable? โช do people buy more on Tuesdays than on Wednesday? โช in couples, who drinks more: the man or the woman? โช etc.
COMPARING TWO SAMPLES In all these questions we compare two populations โช Situation 1: two populations (or sub-populations) with similar variable โช sales in 105 days without music โช sales in 96 days with music โช Data matrix: two options SPSS requires this data presentation
COMPARING TWO SAMPLES โช Situation 2: one sample with paired observations โช drinks of the man in 78 couples โช drinks of the woman in the same 78 couples โช Data matrix: one option only โช Will be discussed in a later lecture
COMPARING TWO UNRELATED SAMPLES Situation 1 โช independent samples/unrelated samples โช introduce symbols for the two random variables โช e.g., using ๐ 1 en ๐ 2 โช ๐ 1 with sample ๐ 1,1 , ๐ 1,2 , โฆ , ๐ 1,๐ 1 and ๐ 2 with sample ๐ 2,1 , ๐ 2,2 , โฆ , ๐ 2,๐ 2 โช or using ๐ and ๐ โช ๐ : ๐ 1 , ๐ 2 , โฆ , ๐ ๐ ๐ and ๐ : ๐ 1 , ๐ 2 , โฆ , ๐ ๐ ๐ โช sample sizes can be different Or of course using โmeaningfulโ indices: ๐ ๐ถ and ๐ ๐ป for Belgium and Germany. Not ๐ถ and ๐ป , because we need to stress that it is โaboutโ a variable ๐ (like sales)
COMPARING TWO UNRELATED SAMPLES We want to test hypothesis such as โช are the means equal? โช ๐ผ 0 : ๐ ๐ = ๐ ๐ or ๐ผ 0 : ๐ 1 = ๐ 2 or ๐ผ 0 : ๐ ๐ 1 = ๐ ๐ 2 or ... โช are the variances equal? 2 or etc. 2 = ๐ ๐ โช ๐ผ 0 : ๐ ๐ โช are the proportions equal โช ๐ผ 0 : ๐ ๐ = ๐ ๐ or etc. Also: โช inequalities, like ๐ผ 0 : ๐ ๐ โฅ ๐ ๐ โช and non-zero differences, like ๐ผ 0 : ๐ ๐ = ๐ ๐ + 85
COMPARING TWO UNRELATED SAMPLES Context: โช sample ๐ 1 : sales in ๐ 1 = 105 days without music โช sample ๐ 2 : sales in ๐ 2 = 96 days with music General idea: ๐ 1 ~๐๐๐ก๐ข๐ ๐๐๐ฃ๐ข๐๐๐ ๐ 1 โช เต ๐ 1 = ๐ 2 ? ๐ 2 ~๐๐๐ก๐ข๐ ๐๐๐ฃ๐ข๐๐๐ ๐ 2
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Assumption (for now!): 2 โช ๐~๐ ๐ ๐ ; ๐ ๐ 2 โช ๐~๐ ๐ ๐ ; ๐ ๐ โช in words: both samples come from normally distributed populations with known variances Question โช are ๐ ๐ and ๐ ๐ different? โช can we test this, on the basis of the (limited!) evidence concerning าง ๐ฆ and เดค ๐ง ? โช so, can we reject ๐ผ 0 : ๐ ๐ = ๐ ๐ ? So, the sampling To decide distribution of the 2 difference of โช use เดค ๐ โ เดค ๐ ~๐ ๐ เดค ๐ , ๐ เดค ๐โ เดค ๐โ เดค ๐ means is normal
าง COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had เดค ๐ โ ๐ เดค ๐ ~๐ 0,1 ๐ เดค ๐ As it turns out, for two samples, we have ๐ โ เดค เดค ๐ โ ๐ เดค ๐ โ ๐ เดค ๐ ~๐ 0,1 ๐ เดค ๐โ เดค ๐ ๐ = ๐ ๐ โ ๐ ๐ follows from the null hypothesis โช ๐ เดค ๐ โ ๐ เดค โช for instance ๐ผ 0 : ๐ ๐ = ๐ ๐ or ๐ผ 0 : ๐ ๐ โ ๐ ๐ = 85 ๐ฆ and เดค ๐ง are obtained from the data โช โช but what is ๐ เดค ๐ ? ๐โเดค
COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had 2 2 = ๐ ๐ ๐ เดค ๐ ๐ As it turns out, for two independent samples, we have 2 + ๐ เดค 2 2 , so ๐ เดค = ๐ เดค ๐โ เดค ๐ ๐ ๐ 2 2 ๐ ๐ + ๐ ๐ ๐ เดค ๐ = ๐โเดค ๐ ๐ ๐ ๐ โช recall that variances add up when ๐ and ๐ are independent 2 but also ๐ ๐โ๐ 2 + ๐ ๐ 2 + ๐ ๐ โช e.g., ๐ ๐+๐ 2 2 2 = ๐ ๐ = ๐ ๐
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Example Context: โช do I sell more when there is music in my shop? Experiment โช on some days the music is turned on, on other days the music is turned off โช you keep track of the sales during each day Data: โช sample of sales on days with music ( ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ 105 ) โช sample of sales on days without music ( ๐ง 1 , ๐ง 2 , โฆ , ๐ง 96 ) Five step procedure
COMPARING THE MEANS OF TWO UNRELATED SAMPLES โช Step 1: โช ๐ผ 0 : ๐ ๐ = ๐ ๐ ; ๐ผ 1 : ๐ ๐ โ ๐ ๐ ; ๐ฝ = 0.05 โช Step 2: sample statistic: เดค ๐ โ เดค โช ๐ reject for โtoo largeโ and โtoo smallโ values โช โช Step 3: ๐โเดค เดค ๐โเดค เดค ๐ โ ๐ ๐ โ๐ ๐ ๐ โช null distribution = ~๐ 0,1 ๐ เดฅ ๐ เดฅ ๐โเดฅ ๐โเดฅ ๐ ๐ โช valid because ... in a minute we will supply โช Step 4: full details and a worked โช example ... ๐จ ๐๐๐๐ = โช ๐จ ๐๐ ๐๐ข = โช Step 5: reject or not reject because ... โช
COMPARING THE MEANS OF TWO UNRELATED SAMPLES โช But, wait ... 2 and ๐ ๐ 2 are known, while ๐ ๐ โช ... isnโt it weird to assume that ๐ ๐ and ๐ ๐ are not known? โช In reality the population variances will often be unknown as well! โช remember we had the same problem in the one-sample case? โช there we decided to estimate the value of ๐ 2 with the value of ๐ก 2 โช and paid a price of using the wider ๐ข -distribution โช here we will do the same: estimate the two ๐ 2 -values with two ๐ก 2 -values โช and pay the same price: use ๐ข -dsitribution instead of ๐จ -distribution
าง COMPARING THE MEANS OF TWO UNRELATED SAMPLES For one sample, we had เดค ๐ โ ๐ เดค ๐ ~๐ข df ๐ เดค ๐ As it turns out, for two samples, we have ๐ โ เดค เดค ๐ โ ๐ เดค ๐ โ ๐ เดค ๐ ~๐ข df ๐ เดค ๐โ เดค ๐ ๐ = ๐ ๐ โ ๐ ๐ follows from the null hypothesis โช ๐ เดค ๐ โ ๐ เดค ๐ฆ and เดค ๐ง are obtained from the data โช โช but what is ๐ก เดค ๐ ? ๐โเดค โช and how to choose df ?
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Two options for ๐ก เดค ๐ : ๐โ เดค 2 and ๐ ๐ 2 from ๐ก ๐ 2 and ๐ก ๐ 2 respectively โช 1: estimating ๐ ๐ 2 = ๐ 2 and estimating ๐ 2 as the 2 = ๐ ๐ โช 2: assuming ๐ ๐ weighted average of both sample variances Both options lead to a different value of df
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Option 1: 2 and ๐ ๐ 2 from ๐ก ๐ 2 and ๐ก ๐ 2 respectively (Welch โs test) โช estimating ๐ ๐ 2 2 ๐ก ๐ + ๐ก ๐ ๐ก เดค ๐ = ๐โเดค ๐ ๐ ๐ ๐ โช testing with ๐ข -distribution with Compare to 2 2 2 ๐ ๐ + ๐ ๐ 2 2 ๐ ๐ + ๐ก ๐ ๐ก ๐ ๐ เดค ๐ = ๐โเดค ๐ ๐ ๐ ๐ ๐ ๐ df = 2 2 2 2 ๐ก ๐ ๐ก ๐ ๐ ๐ ๐ ๐ ๐ ๐ โ 1 + ๐ ๐ โ 1 quick rule, but bad approximation: ๐๐ โ min ๐ ๐ โ 1, ๐ ๐ โ 1
COMPARING THE MEANS OF TWO UNRELATED SAMPLES You can read this as Option 2: 2 +df ๐ ๐ก ๐ 2 df ๐ ๐ก ๐ โช estimating the common ๐ 2 from both samples df ๐ +df ๐ 2 and ๐ก ๐ a โweighted meanโ of ๐ก ๐ 2 , the pooled variance ๐ก P 2 โช 2 + ๐ ๐ โ 1 ๐ก ๐ 2 2 = ๐ ๐ โ 1 ๐ก ๐ ๐ก P ๐ ๐ โ 1 + ๐ ๐ โ 1 โช and Compare to 2 2 ๐ก P + ๐ก P 2 2 ๐ก ๐ + ๐ก ๐ ๐ก เดค ๐ = ๐ก เดค ๐ = ๐โเดค ๐โ เดค ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ โช testing with ๐ข -distribution with You can read this as df = ๐ ๐ โ 1 + ๐ ๐ โ 1 = ๐ ๐ + ๐ ๐ โ 2 df ๐ + df ๐
COMPARING THE MEANS OF TWO UNRELATED SAMPLES
าง าง าง EXERCISE 1 When to use: ๐ฆโ เดค ๐ง a. ๐จ = 2 2 ๐๐ ๐๐ ๐๐ + ๐๐ ๐ฆโ เดค ๐ง b. ๐ข = 2 2 ๐ก๐ ๐ก๐ ๐๐ + ๐๐ ๐ฆโ เดค ๐ง c. ๐ข = 2 2 ๐ก๐ ๐ก๐ ๐๐ + ๐๐
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Use of SPSS a data set on Computer Anxiety Rating split by gender
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Results split by gender Results of ๐ข -test
COMPARING THE MEANS OF TWO UNRELATED SAMPLES Zoom in ๐ข -test with pooled 2 = ๐ ๐ 2 estimate of ๐ ๐ value of the ๐ข -statistic ๐ข -test with separate ( ๐ข calc ) ๐ -value degrees of freedom 2 and ๐ ๐ 2 estimates of ๐ ๐ (2-sided)
COMPARING THE MEANS OF TWO UNRELATED SAMPLES And one more thing ... tests of the assumption of equal variance ๐ -value for 2 = ๐ ๐ 2 โ ๐ ๐ 2 versus ๐ผ 1 : ๐ ๐ 2 ๐ผ 0 : ๐ ๐ this test
COMPARING THE MEANS OF TWO UNRELATED SAMPLES For these two tests, we need both เดค ๐ and เดค ๐ to be normally distributed โช This means either of the following three requirements holds: โช ๐ is a normally distributed population โช ๐ has a symmetric distribution and ๐ ๐ โฅ 15 โช ๐ ๐ โฅ 30 โช Also for ๐ one of these requirements must hold โช but not necessarily the same one as for ๐
Recommend
More recommend