MISCELLANEOUS TOPICS Business Statistics
CONTENTS Back to the promise Back to the learning goals Standardizing data Assessing normality and symmetry Dealing with gaps in the tables The asterisk notation Review of distributions Old exam question Further study
BACK TO THE PROMISE Regression analysis −0.12 −2.11 ∗ 𝑞 < .05
BACK TO THE PROMISE Wilcoxon test paired samples significant
BACK TO THE LEARNING OBJECTIVES Academic skills ▪ abstraction Research skills ▪ translating Quantitative skills ▪ a variety of methods Knowledge ▪ reading and writing statistics
STANDARDIZING DATA We have often standardized a test statistic ത 𝑌−𝜈 ഥ ▪ example: ത 𝑌 𝑌 → 𝜏 ഥ 𝑌 But one also frequently encounters standardized data ▪ Standardization of data is done by subtracting the mean and dividing by the standard deviation ▪ So: 𝑦 𝑗 → 𝑨 𝑗 = 𝑦 𝑗 − ത 𝑌 𝑡 𝑌 Not only for data from a normal population, but it is often done for all sorts of data
ҧ STANDARDIZING DATA Some properties: 𝑎 = 0 (the mean of standardized data is 0 ) ▪ 2 = 1 (the variance and the standard deviation of ▪ 𝑡 𝑎 = 𝑡 𝑎 standardized data is 1 ) ▪ standardized data is dimensionless (has no unit) Interpretation 𝑦 𝑗 − ҧ 𝑦 ▪ each value 𝑨 𝑗 = 𝑡 𝑦 measure how many standard deviations that value is removed from the mean ▪ examples: ▪ −2.5 is far in the left tail ▪ 0.2 is pretty central, a bit to the right of the mean
EXERCISE 1 Which statements are true? a. 𝑨 -scores can be made for numerical and categorical variables b. 𝑨 -scores has a skewness and kurtosis of 0 c. 𝑨 -scores are normally distributed when 𝑜 ≥ 30 d. A 𝑨 -score of 3.2 means pretty high compared to most other data points
ASSESSING NORMALITY AND SYMMETRY In many cases we need to make assumptions on “normal populations” or “symmetric populations” Is there a way to assess this? Qualitatively: ▪ histograms Not really helpful to ▪ box plots judge normality for small Quantitatively: sample sizes ▪ skewness ▪ kurtosis No formal test, but rules of thumb; see next slide
ASSESSING NORMALITY AND SYMMETRY Practical rules of thumb that work more or less Normality: ▪ −1 ≤ skewness ≤ 1 ▪ −1 ≤ kurtosis ≤ 1 Symmetry: ▪ −1 ≤ skewness ≤ 1 Statistics sales N 18 Mean 3,956 Std. Deviation 2,0893 Skewness ,716 Std. Error of Skewness ,536 Kurtosis -,297 Std. Error of Kurtosis 1,038
DEALING WITH GAPS IN THE TABLES Try to look up 𝑄 𝑎 ≤ 0.402 ▪ table gives 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ and 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ but not 𝑄 𝑎 ≤ 0.402
DEALING WITH GAPS IN THE TABLES Three solutions ▪ take nearest value ▪ 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ make a linear interpolation ▪ 𝑄 𝑎 ≤ 0.402 ≈ 0.8𝑄 𝑎 ≤ 0.400 + 0.2𝑄 𝑎 ≤ 0.410 = 2 𝑄 𝑎 ≤ 0.400 + 10 0.6591 − 0.6554 = 0.65614 ▪ use a conservative value ▪ either 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.400 = 0.6554 ▪ or 𝑄 𝑎 ≤ 0.402 ≈ 𝑄 𝑎 ≤ 0.410 = 0.6591 ▪ depends on use (confidence interval, type of critical value) Unless specified, we leave it up to you ▪ the difference are tiny anyhow
DEALING WITH GAPS IN THE TABLES “conservative” sometimes Another type of gap: degrees of freedom means round up,wards Example: 𝑢 𝑑𝑠𝑗𝑢 df = 68 sometimes round downwards ▪ recommendation: conservative, so use df = 65
THE ASTERISK NOTATION Recall the introductory slides ▪ “ −.12 −2.11 ∗ ” ▪ what does that mean?
THE ASTERISK NOTATION In many journal articles in business and economics, regression models are used ▪ Every regression coefficient has ▪ an estimated value ( 𝑐 1 , etc) ▪ a standard error of the estimate ( 𝑡 𝐶 1 , etc) 𝑐 1 −0 ▪ a 𝑢 -value based on 𝐼 0 : 𝛾 1 = 0 , etc. ( 𝑢 𝑑𝑏𝑚𝑑 = 𝑡 𝐶1 , etc.) ▪ a 𝑞 -value for this: 𝑞 −value = 𝑄 𝑢 𝑑𝑏𝑚𝑑 ≥ 𝑢 𝑑𝑠𝑗𝑢
THE ASTERISK NOTATION In a journal we would need to report most of these ▪ This gives long sentences: “The estimated coefficient for uniqueness is 𝑐 = −.12 , with a 𝑢 -value of −2.11 , giving a 𝑞 -value between 0.0 1 and 0.05 .” ▪ Therefore, this is often abbreviated: “ −.12 −2.11 ∗ ” ▪ Usual conventions with the asterisks: ▪ * means 0.01 ≤ 𝑞 −value < 0.05 ▪ ** means 0.001 ≤ 𝑞 −value < 0.0 1 ▪ *** means 𝑞 −value < 0.001
REVIEW OF DISTRIBUTIONS 𝑈−𝜈 𝑈 𝑎 statistic: 𝑎 = 𝜏 𝑈 ~𝑂 0,1 ▪ Used for testing the following parameters/hypotheses: ▪ 𝜈 = 𝜈 0 , when 𝜏 2 is known ( 𝑜 < 15 ; 15 ≤ 𝑜 < 30 ; 𝑜 ≥ 30 ) ▪ 𝑁 = 𝑁 0 , (through Wilcoxon’s 𝑋 ) (when 𝑜 ≥ 20 ) 2 and 𝜏 𝑍 2 are known ( 𝑜 < 15 ; 15 ≤ ▪ 𝜈 𝑌 − 𝜈 𝑍 = 𝜈 0 , when 𝜏 𝑌 𝑜 < 30 ; 𝑜 ≥ 30 ) ▪ 𝜌 = 𝜌 0 , (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5 ) ▪ 𝜌 𝑌 − 𝜌 𝑍 = 𝜌 0 , (when 𝑜𝜌 ≥ 5 and 𝑜 1 − 𝜌 ≥ 5 ) ▪ 𝜍 𝑇 = 0 , in Spearman correlation test (when 𝑜 ≥ 20 ) ▪ 𝑁 1 = 𝑁 2 , when 𝑜 1 ≥ 10 and 𝑜 2 ≥ 10 in Mann-Whitney test
REVIEW OF DISTRIBUTIONS 𝑈−𝜈 𝑈 𝑢 statistic: 𝑢 = 𝑡 𝑈 ~𝑢 𝑒𝑔 ▪ Used for testing the following parameters/hypotheses : ▪ 𝜈 = 𝜈 0 , when 𝜏 2 is unknown ( 𝑜 < 15 ; 15 ≤ 𝑜 < 30 ; 𝑜 ≥ 30 ) 2 and 𝜏 𝑍 2 are unknown ( 𝑜 < 15 ; 15 ≤ ▪ 𝜈 𝑌 − 𝜈 𝑍 = 𝜈 0 , when 𝜏 𝑌 𝑜 < 30 ; 𝑜 ≥ 30 ) ▪ 𝛾 = 𝛾 0 , in regression analysis ▪ 𝜍 = 0 , in Pearson correlation test
REVIEW OF DISTRIBUTIONS 𝜓 2 statistic: 𝜓 2 = 𝑒𝑔×𝑇 2 2 ~𝜓 𝑒𝑔 𝜏 2 ▪ Used for testing the following parameters/hypotheses: ▪ 𝜏 2 = 𝜏 0 2 , of a normal population ▪ 𝑁 𝑌 = 𝑁 𝑍 = ⋯ = 𝑁 𝑎 , in Kruskal-Wallis test ▪ independence in contingency tables when 𝑜 𝑓𝑦𝑞 ≥ 5
REVIEW OF DISTRIBUTIONS 2 𝑇 1 𝐺 statistic: 𝐺 = 2 ~𝐺 𝑒𝑔 1 ,𝑒𝑔 2 𝑇 2 ▪ Used for testing the following parameters/hypotheses 2 = 𝜏 𝑍 2 , of two normal populations ▪ 𝜏 𝑌 2 = 𝜏 𝑍 2 = ⋯ = 𝜏 𝑎 2 , with Levene’s test ▪ 𝜏 𝑌 ▪ overall fit in regression analysis ▪ 𝜈 𝑌 = 𝜈 𝑍 = ⋯ = 𝜈 𝑎 , in ANOVA
REVIEW OF DISTRIBUTIONS binomial statistic: 𝑌~bin 𝑜, 𝜌 ▪ Used for testing the following parameters/hypotheses ▪ 𝜌 = 𝜌 0 , in a repeated Bernoulli experiment ▪ 𝑁 = 𝑁 0 , in the sign test
OLD EXAM QUESTION 26 March 2015, Q1i
FURTHER STUDY Doane & Seward 5/E missing Tutorial exercises week 6
Recommend
More recommend