CHOOSING THE RIGHT TEST Business Statistics
CONTENTS Key questions Roadmaps for statistical tests A decision tree Old exam question Further study
KEY QUESTIONS ▪ number of variables ▪ 1, 2, more than 2 ▪ number of subpopulations ▪ 1, 2, more than 2 ▪ types of data ▪ numerical, categorical ▪ parameter to test ▪ centrality, dispersion, proportion, ... ▪ characteristics of the population ▪ normal, symmetric, ... ▪ paired/independent variables ▪ association vs. comparison
KEY QUESTIONS Some tests can be conceived in different ways ▪ Example: ▪ ANOVA: comparing 𝜈 of >2 numerical variables 𝑌 1 , 𝑌 2 , 𝑌 3 , ... ▪ or: association between categorical 𝑌 and numerical 𝑍 ▪ Here we focus on the most usual approach
ROADMAPS FOR STATISTICAL TESTS One sample ▪ centrality 𝐼 0 : 𝜈 = 𝜈 0 or 𝐼 0 : 𝑁 = 𝑁 0 ▪ CLT-conditions and 𝜏 known: 𝑨 -test ▪ CLT-conditions and 𝜏 unknown: 𝑢 -test ▪ symmetric distribution: Wilcoxon signed ranks test ▪ sign test ▪ dispersion 𝐼 0 : 𝜏 2 = 𝜏 0 CLT-conditions: 2 𝑜 < 15 : normal population ▪ normal population: 𝜓 2 -test 15 ≤ 𝑜 < 30 : symmetric population ▪ proportion 𝐼 0 : 𝜌 = 𝜌 0 𝑜 ≥ 30 : no restrictions ▪ binomial test ▪ 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5 : normal approximation
ROADMAPS FOR STATISTICAL TESTS Two related (dependent) samples ▪ comparison of “similar” variables: ▪ convert into one-sample situation ▪ e.g., 𝐸 = 𝑌 𝑏𝑔𝑢𝑓𝑠 − 𝑌 𝑐𝑓𝑔𝑝𝑠𝑓 and 𝐼 𝑝 : 𝜈 𝐸 = 0 ▪ association between two “dissimilar” variables: ▪ two numerical variables: ▪ normal populations: correlation ( 𝑢 -test) ▪ rank correlation ( 𝑨 -test) ▪ normal error term: simple regression ( 𝐺 -test and 𝑢 -test)
ROADMAPS FOR STATISTICAL TESTS Two independent subpopulations ▪ centrality 𝐼 0 : 𝜈 1 = 𝜈 2 or 𝐼 0 : 𝑁 1 = 𝑁 2 ▪ CLT-conditions and 𝜏 1 and 𝜏 2 known: 𝑨 -test ▪ CLT-conditions and 𝜏 1 = 𝜏 2 unknown: 𝑢 -test ▪ CLT-conditions and 𝜏 1 and 𝜏 2 unknown but not necessarily equal: 𝑢 -test ▪ distribution equally-shaped: Wilcoxon-Mann-Whitney test 2 = 𝜏 2 ▪ dispersion 𝐼 0 : 𝜏 1 2 ▪ normal populations: 𝐺 -test ▪ Levene’s test ▪ proportion 𝐼 0 : 𝜌 1 = 𝜌 2 ▪ 𝑨 -test
ROADMAPS FOR STATISTICAL TESTS More than two independent subpopulations ▪ centrality 𝐼 0 : 𝜈 1 = 𝜈 2 = 𝜈 3 or 𝐼 0 : 𝑁 1 = 𝑁 2 = 𝑁 3 ▪ normal populations and equal variances: ANOVA ▪ equally-shaped distributions and group sizes>5: Kruskal-Wallis test ▪ independence of two categorical variables ▪ expected count ≥ 5 : 𝜓 2 -test on contingency table 2 = 𝜏 2 2 = 𝜏 3 ▪ dispersion 𝐼 0 : 𝜏 1 2 ▪ Levene’s test
ROADMAPS FOR STATISTICAL TESTS More than two related samples ▪ dependence of one numerical variable on several other numerical variables: ▪ normal error term and linear relation: multiple regression ( 𝐺 -test and 𝑢 -test) ▪ dependence of one numerical variable on several other categorical variables: ▪ normal error term: multiple regression with dummy variables ( 𝐺 - test and 𝑢 -test)
EXERCISE 1 Which test to use when: a. comparing the mean of the income of men and women? b. comparing the variance of the income of men and women? c. the relation between the color of a car and its probability of being involved in an accident? d. the relation between the color of a car and the gender of its owner? e. the relation between the mean of the income and ethnicity (black, white, Asian)? f. the relation between income and IQ?
A DECISION TREE
A DECISION TREE
A DECISION TREE
A DECISION TREE
A DECISION TREE
OLD EXAM QUESTION 21 May 2015, Q2d
FURTHER STUDY Doane & Seward 5/E missing Tutorial exercises week 6 choosing 1 choosing 2
Recommend
More recommend