Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating Distributions Empirically 1 0E 2014 Schield Creating Distributions Empirically 2 Creating Distributions Generating Distributions Empirically In introductory statistics, students are shown how the Milo Schield means of random samples form a sampling distribution Augsburg College that -- in the limit -- forms a Normal distribution. Editor of www.StatLit.org This is extremely useful in sampling. But the Normal US Rep: International Statistical Literacy Project distribution applies in situations that do not involve sampling – as do the Log-Normal and Exponential. Fall 2014 NNN Conference Certain kinds of random activity can be shown to generate these well-known analytic distributions. www.StatLit.org/pdf/ 2014-Schield-Create-Distributions-Empirically-Excel-6up.pdf Understanding how this happens can be very useful to managers making data-based decisions. 0E 2014 Schield Creating Distributions Empirically 3 0E 2014 Schield Creating Distributions Empirically 4 Generating Distributions: Generating Distributions Empirically Normal, Log-Normal & Exponential Mathematicians have proven: Excel is used to generate these distributions empirically. 1. The sum of an infinite number of independent random The Normal and Log-Normal are the empirical results of variables generates a Normal distribution 10,000 independent lines of activity. Each line receives: 2. The product of an infinite # of random, independent, • An add by a random amount (discrete or continuous) positive variables generates a Log-Normal distribution generates a Normal distribution. 3. A random process of assigning counts to cells in a • a multiply by a positive random change (discrete or table can generate a chi-square distribution. continuous) generates a Log-Normal distribution. 4. A process having a constant chance of ‘death’ The Chi-square is the empirical result of randomly assign generates an exponential distribution. values to cells in a table (keeping the expected values Amazing! At the micro level, pure randomness. At the equal) and then compute chi-square for each series. macro level, an analytic distribution emerges without any The Exponential is the empirical result of 2,000 lines of agency or intent. Micro randomness generates macro- activity; each line has a fixed chance of death per period. order: spontaneous and unplanned. 0E 2014 Schield Creating Distributions Empirically 5 0E 2014 Schield Creating Distributions Empirically 6 1a Generating a Normal Dist. Test for Normality #1 Sum of Random Normals QQ-Plot: Sorted PDF vs Percentile Period1 = Period0 + K*NORM.S.INV(RAND()) . Sum of Random Normals (50 periods) Sum of Random Normals Sorted PDF vs Percentile 2000 130 1500 Frequency CV = ‐ 0.00278 120 1000 110 500 Values 100 Correlation: 0 91 93 95 97 99 101 103 105 107 109 90 0.97785 Score 10,000 lines Period0=100. K=1 80 Period 1 4 9 16 25 36 49 70 Median 100.02 100.03 100.01 100.04 100.01 100.03 100.04 StdDev 1.00 2.00 3.02 3.16 3.31 3.45 3.60 0 10 20 30 40 50 60 70 80 90 100 PredictSD 1 2 3 4 5 6 7 Percentile SD(T) = K*SD(1)*Sqrt(T) SD(1) = 1. 10,000 Series Start: 100 K = 1 2014-Schield-Create-Distributions-Empirically-Excel-6up.pdf 1
Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating Distributions Empirically 7 0E 2014 Schield Creating Distributions Empirically 8 Test for Normality #2 1b Generating a Normal Dist. QQ-Plot: Sorted PDF vs. Z-Score Sum of Random Discretes Period1 = Period0 + K*RandBetween(-1,+1) . Sum of Random Normals (50 periods) Sum of RandBetween [ ‐ 1,+1] Sorted PDF vs Z ‐ score StdDev=Sqrt((3^2 ‐ 1)/12) 2500 125 Correlation: 2000 Frequency 0.99997 CV = ‐ 0.00278 1500 115 1000 105 Values 500 0 95 Z ‐ score = 91 93 95 97 99 101 103 105 107 109 Score NORM.S.INV(Percentile/100) 10,000 lines Period0=100. K=1 85 Period 1 4 9 16 25 36 49 Median 100.00 100.00 100.00 100.00 100.00 100.00 100.00 75 StdDev 0.815 1.64 2.45 3.27 4.10 4.90 5.72 PredictSD 0.815 1.63 2.45 3.26 4.08 4.89 5.71 ‐ 3.5 ‐ 2.5 ‐ 1.5 ‐ 0.5 0.5 1.5 2.5 3.5 SD(T) = K*SD(1)*Sqrt(T) SD(1) = Sqrt(8/12) Z ‐ Score 10,000 Series Start: 100 K = 1 0E 2014 Schield Creating Distributions Empirically 9 0E 2014 Schield Creating Distributions Empirically 10 1c Generating a Normal Dist. 2a Generating a Log-Normal Dist. Sum of Random Uniforms Product: Random Normal Growth Period1 = Period0 + K*[2*Rand()-1] P1 = P0*{1+K*[1+NORMSINV(RAND()]} Sum of Rand() Products of Random Uniform Growth StdDev(Period1) = 2*K / Sqrt(12)) Distribution of Results after 5 and 50 Periods: 3500 3000 1400 Frequency 2500 5% growth per period 2000 1200 1500 1000 1000 500 Mean/Median = 1.06 Mean/Median = 1.00 0 800 5 periods 91 93 95 97 99 101 103 105 107 109 50 periods Score 10,000 lines Period0=100. K=1 600 Period 1 4 9 16 25 36 49 400 Median 100.00 99.99 100.00 100.00 100.05 100.03 100.08 StdDev 0.576 1.17 1.75 2.32 2.87 3.43 3.98 200 Results Scaled: 20 bins PredictSD 0.577 1.15 1.73 2.31 2.89 3.46 4.04 0 = Min; 50 = Mean. 0 SD(T) = K*SD(1)*Sqrt(T). SD(1) = 2/Sqrt(12) 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0E 2014 Schield Creating Distributions Empirically 11 0E 2014 Schield Creating Distributions Empirically 12 2a Generating a Log-Normal Dist. 2b Generating a Log-Normal Dist. Product: Random Normal Growth Product: Random Uniform Growth P1 = P0*{1+K*[1+NORMSINV(RAND()]} Period1 = Period0 * (1 + K*2*Rand() ) Products of Random Uniform Growth Products of Random Normal Growth Distribution of Results after 5 and 50 Periods: Distribution of Results after 5 and 50 Periods: 1200 1400 Mean/Median = 1.54 30% growth per period 15% growth per period 1200 K = 0.15 1000 Mean/Median: 1000 1.15 800 50 periods Mean/Median = 1.01 50 periods Mean/Median = 1.04 800 5 periods 5 periods 600 600 400 400 200 200 Results Scaled: 20 bins Results Scaled: 20 bins 0 = Min; 50 = Mean. 0 = Min; 50 = Mean. 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 2014-Schield-Create-Distributions-Empirically-Excel-6up.pdf 2
Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating Distributions Empirically 13 0E 2014 Schield Creating Distributions Empirically 14 3 Generating Chi-Square Distrib.: 4 Generating an Exponential Dist. Random Assignment to Table Cells Random “Death” . Chance of Death: =IF(RAND() < K, 1, "" ) Random Decay K = 0.1: Chance of 'Death' per Period 2,000 1,500 # Remaining 1,000 500 0 0 10 20 30 40 50 Period 2,000 Series 100 Periods 0E 2014 Schield Creating Distributions Empirically 15 0E 2014 Schield Creating Distributions Empirically 16 4 Generating an Exponential Dist. Conclusion Random “Death” Exponential Test: Is LN(#Remain) vs. time linear? Knowing the process that generates a distribution is helpful to anyone who wants to understand why a Random Decay given distribution is a good fit in a particular situation. K = 0.1: Chance of 'Death' per Period It also helps us understand what determines the 8 Correlation: ‐ 0.998 7 parameters of a distribution (or changes therein). LN(# Remaining) between Period and 6 LN(#Remain). Why have these increased over time? 5 • the standard deviation of heights? Expected Mean: 10 = 1/K 4 Actual Mean: 9.78 3 • the mean/median ratio (the skewness) for incomes? 2 • the economic return from higher education? 0 10 20 30 40 50 • the average lifetime of humans? Period 2,000 Series 100 Periods 0E 2014 Schield Creating Distributions Empirically 17 0E 2014 Schield Creating Distributions Empirically 18 Using Excel to Bibliography Create Distributions Empirically All files are in www.StatLit.org/Excel/ Aitchison and Brown (1957, 1963). The Log Normal Distribution with Special References to its uses in Economics . Cambridge U. Generating Normal Distributions with Sums of Random Variables: * 2014-Schield-Create-Normal-Empirically-Excel-Sum-Discrete.xlsx Crow, E. and Shimizu, K. (1988). Lognormal Distributions: Theory * 2014-Schield-Create-Normal-Empirically-Excel-Sum-Uniform.xlsx and Applications . New York: Marcel Dekker, Inc., 1988. * 2014-Schield-Create-Normal-Empirically-Excel-Sum-Normal.xlsx International Futures (2014) Using Lognormal Income Distributions Creating Log-Normals with Product of Uniform Random Variables: www.du.edu/ifs/help/understand/economy/poverty/lognormal.html * 2014-Schield-Create-LogNormal-Empirically-Excel-Product- Schield, M. and T. Burnham (2008). Von Mises’ Frequentist Uniforms.xlsx Approach to Probability, 2008 ASA Proceedings of the Section Creating Chi-Square with Random Assignment to Table Cells: on Statistical Education. [CD-ROM] P. 2187-2194. * 2014-Schield-Create-ChiSquare-Empirically-Excel-DF3.xlsx www.StatLit.org/pdf/2008SchieldBurnhamASA.pdf. Generating Exponentials with Fixed Chance of ‘Death” per period Wikipedia/Log-normal distribution, Uniform distribution * 2014-Schield-Create-Exponential-Empirically-Excel.xlsx (continuous) and Uniform distribution (discrete) 2014-Schield-Create-Distributions-Empirically-Excel-6up.pdf 3
0E 2014 Schield Creating Distributions Empirically 1 Creating Distributions Empirically Milo Schield Augsburg College Editor of www.StatLit.org US Rep: International Statistical Literacy Project Fall 2014 NNN Conference www.StatLit.org/pdf/ 2014-Schield-NNN3-Slides.pdf
Recommend
More recommend