Statistics for Business Descriptive Statistics Panagiotis Th. Konstantinou MSc in International Shipping, Finance and Management , Athens University of Economics and Business First Draft : July 15, 2015. This Draft : September 3, 2020. P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 1 / 20
Descriptive Statistics Key Concepts A population is the collection of all items of interest or under investigation ( N represents the population size) A sample is an observed subset of the population ( n represents the sample size) A parameter is a specific characteristic of a population A statistic is a specific characteristic of a sample Population Sample a b c d b c ef gh i jk l m n g i n o p q rs t u v w o r u x y z y Values calculated using Values computed from population data are called sample data are called parameters statistics P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 2 / 20
Descriptive Statistics Data Types Data Categorical Numerical Examples: Marital Status Discrete Continuous Are you registered to vote? Examples: Examples: Eye Color (Defined categories or Number of Children Weight groups) Defects per hour Voltage (Counted items) (Measured characteristics) P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 3 / 20
Descriptive Statistics Relationships Between Variables Cost per Day vs. Production Volume Volume Cost per per day day 23 125 250 26 140 Cost per Day 200 29 146 150 33 160 100 38 167 42 170 50 50 188 0 55 195 0 10 20 30 40 50 60 70 60 200 Volume per Day Investment Investor A Investor B Investor C Total Category Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324 P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 4 / 20
Descriptive Statistics Describing Data Numerically Describing Data Numerically Central Tendency Variation Arithmetic Mean Range Median Interquartile Range Mode Variance Standard Deviation Coefficient of Variation P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 5 / 20
Descriptive Statistics Measures of Central Tendency Measures of Central Tendency Overview Central Tendency Mean Median Mode n ∑ x i = = i 1 x n Midpoint of Most frequently Arithmetic ranked values observed value average Median position n + 1 2 position in the ordered data ◮ If the number of values is odd, the median is the middle number ◮ If the number of values is even, the median is the average of the two middle numbers P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 6 / 20
Descriptive Statistics Measures of Central Tendency Measures of Central Tendency Example House Prices $2,000,000 Mean : $3,000,000/5 = $600,000 500,000 Median : middle value of ranked data = 300,000 $300,000 100,000 Mode : most frequent value = $100,000 100,000 Sum $3,000,000 P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 7 / 20
Descriptive Statistics Measures of Central Tendency Shape of a Distribution Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median Median < Mean Describes how data are distributed Measures of shape : ◮ Symmetric or skewed ◮ Left = Negative (mass of distr. concentrated on the right of figure); Right = Positive (mass of distr. concentrated on the left of figure). 1 � n 1 � n x ) 3 x ) 3 i = 1 ( x i − ¯ i = 1 ( x i − ¯ n n SK = x ) 2 � 3 / 2 = � 1 s 3 � n i = 1 ( x i − ¯ n P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 8 / 20
Descriptive Statistics Measures of Variability Measures of Variability Variation Range Interquartile Variance Standard Coefficient of Range Deviation Variation Measures of variation give information on the spread or variability of the data values. Same center, different variation P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 9 / 20
Descriptive Statistics Measures of Variability Quartiles and IQR – I Quartiles split the ranked data into 4 segments with an equal number of values per segment (25 % of the values in each segment) We may find a quartile by determining the value in the appropriate position in the ranked data (with n being the number of observed values): ◮ First quartile position: Q 1 = 0 . 25 ( n + 1 ) ◮ Second quartile position: Q 2 = 0 . 50 ( n + 1 ) (the median position) ◮ Third quartile position: Q 3 = 0 . 75 ( n + 1 ) Example : Find the first quartile ◮ Sample Ranked Data: 11 12 ⋆ 13 16 16 17 18 21 22 ( n = 9) ◮ Q 1 = is in the 0 . 25 ( 9 + 1 ) = 2 . 5 position of the ranked data so use the value half way between the 2nd and 3rd values ◮ So Q 1 = 12 . 5 P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 10 / 20
Descriptive Statistics Measures of Variability Quartiles and IQR – II We can eliminate some outlier problems by using the interquartile range ◮ Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data ◮ Interquartile range = 3rd quartile – 1st quartile IQR = Q 3 − Q 1 Example : Sample Ranked Data: 12 30 45 57 70 ( n = 5) ◮ Q 1 = 30 ; Q 2 = 45 ; Q 3 = 57 ; ◮ IQR = Q 3 − Q 1 = 57 − 30 = 27 . P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 11 / 20
Descriptive Statistics Measures of Variability Variance Sample Variance : Average Population Variance : (approximately) of squared Average of squared deviations of values from the deviations of values from the sample mean: mean � N � n i = 1 ( X i − µ ) 2 x ) 2 i = 1 ( x i − ¯ σ 2 = s 2 = N n − 1 where where ◮ µ = population mean ◮ ¯ x = sample mean/average ◮ N = population size ◮ n = sample size ◮ X i = i − th value of the ◮ x i = i − th value of the variable X variable X P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 12 / 20
Descriptive Statistics Measures of Variability Standard Deviation Population Standard Sample Standard Deviation : Deviation : Most commonly Most commonly used used measure of variation measure of variation ◮ Shows variation about the ◮ Shows variation about the mean sample mean ◮ Has the same units as the ◮ Has the same units as the original data original data �� n �� N x ) 2 i = 1 ( x i − ¯ i = 1 ( X i − µ ) 2 s = σ = n − 1 N P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 13 / 20
Descriptive Statistics Measures of Variability Standard Deviation Example: Sample Standard Deviation Computation Sample Data ( x i ) : 10 12 14 15 17 18 18 24 n = 8 and sample mean = ¯ x = 16 So the standard deviation is x ) 2 + ( 12 − ¯ x ) 2 + ( 14 − ¯ x ) 2 + · · · + ( 24 − ¯ � ( 10 − ¯ x ) 2 s = n − 1 ( 10 − 16 ) 2 + ( 12 − 16 ) 2 + ( 14 − 16 ) 2 + · · · + ( 24 − 16 ) 2 � = 8 − 1 � 126 = = 4 . 2426 7 This is a measure of the “ average ” scatter around the (sample) mean. P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 14 / 20
Descriptive Statistics Measures of Variability Comparing Standard Deviations Small standard deviation Data A Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Large standard deviation Data B Mean = 15.5 11 12 13 14 15 16 17 18 19 20 21 s = 0.926 Data C Mean = 15.5 s = 4.570 The smaller the standard 11 12 13 14 15 16 17 18 19 20 21 deviation, the more Same mean, different concentrated are the values standard deviations. around the mean. P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 15 / 20
Descriptive Statistics Measures of Variability Coefficient of Variation Measures relative variation and is always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of data measured in different units � s x � CV = · 100 % ¯ x Stock A : Stock B : ◮ Avg price last year = $50 ◮ Avg. price last year = $100 ◮ Standard deviation = $5 ◮ Standard deviation = $5 � $ 5 � $ 5 � � CV A = · 100 % =10 % CV B = · 100 % =5 % $ 50 $ 100 Both stocks have the same standard deviation, but stock B is less variable relative to its price P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 16 / 20
Descriptive Statistics Empirical Rule μ ± σ μ ± σ The Empirical Rule μ ± σ μ ± σ If the data distribution is bell-shaped, then the interval: μ ± σ 99.7% 68% 95% μ μ ± μ ± σ 3 σ μ ± 1 σ μ ± μ ± 2 σ σ µ ± 3 σ contains µ ± 1 σ contains µ ± 2 σ contains almost all (about about 68 % of the about 95 % of the 99 . 7 % ) of the values in the values in the values in the population or the population or the population or the sample sample sample. P. Konstantinou (AUEB) Statistics for Business – I September 3, 2020 17 / 20
Recommend
More recommend