Aims of these Lectures • Discuss some basic statistical concepts/techniques. • Relate to these to Economics. • Help you to help yourself • These lectures are not, and are not intended to be, a substitute for the Applied Statistics for Economics and Business course.
� A Useful Website: http://www.maths.murdoch.edu.au/units/statsnotes/ � A useful statistics package: MINITAB Start Networked Applications General Software Statistics and Graphing Minitab 14
Two Big Issues Why are Some Countries Richer than � Others? An old issue in Economics: Adam Smith ‘Wealth � of Nations, 1776 Why do Some Countries Grow faster than � Others? Countries are richer now because they grew � faster in the past, e.g. compare 2000 with 1500
Neoclassical Growth Theory � Higher s implies larger output per capita and real wage rate in the long run. � Higher n implies lower output per capita and real wage rate in the long run. � Countries which are far below their long run equilibrium will grow faster than countries which are close to their steady state.
The Penn World Tables � Panel data set of Macroeconomic variables � 208 countries � 25 macro variables � 1950 to 2000 � Many missing values
Accessing the PWT � available at http://www.pwt.econ.upenn.edu/ � Select countries/years/variables � Choose CSV option � Copy and Paste data into Notepad � Save as, e.g., “mydata.csv” – keep “” � Open in Excel � (Alternatively: use Word and save as test file.)
Describing Data � Tabulate, List � Numerical Summary � Graphical Summary
Frequency Table � Select a suitable set of class intervals bounded by class limits . � The class frequency is the number of data points in each interval. � The class mark is the midpoint of the class interval. � Class Boundaries may differ from Class Limits (as a result of rounding). � The class size is the difference between the upper and lower class boundaries.
Frequency Distributions Suppose we have a sample of n observations. The (absolute) frequency of any value is the number of times that value appears in the sample The relative frequency of a value is the proportion of the sample which has that value. The empirical frequency distribution of a random variable is the sample analogue of its probability distribution . It can be graphed by constructing a histogram .
World Distribution of Real GDP per capita 1960, 2000 rgdp1960 rgdp2000 0-999 23 15 1000-1999 27 13 2000-2999 21 7 3000-3999 13 10 4000-4999 5 8 5000-5999 3 4 6000-6999 1 4 7000-7999 6 2 8000-8999 2 2 9000-9999 2 3 10000-10999 4 2 11000-14999 3 3 15000-19999 0 6 20000-24999- 0 10 25000 + 0 9 Note: Class Boundaries are, e.g. 2999.5-3999.5
The Median � Smaller than 50% of the sample and larger than 50% of the sample � Order the sample from smallest to largest, the median lies halfway up the order. � Let n be the sample size: � if n is odd, median is at observation (n+1)/2 � if n is even, average the two values at n/2 and ( n/2)+1 . � A useful property: the median is insensitive to (changes in) extreme sample values.
Quantiles � The First Quartile , Q 1 , is larger than 25% of the sample values and smaller that 75% � The Third Quartile , Q 3 , is larger than 75% of the sample values and smaller that 25% � The Second Quartile , Q 2 , is the Median � The Interquartile Range , Q 3 -Q 1 , is a robust measure of the variability of the sample data. � Other frequently used quartiles are deciles and percentiles .
The Mean � Defined as + + + x x ... x = 1 2 n x n n 1 ∑ = x i n = i 1 � The ‘centre of gravity’ of the distribution. � Sensitive to extreme values. � Gives each sample value the same ‘weight’, 1/n.
Comparisons RGDP1960 RGDP2000 Mean 3332 9088 Q 1 1076 1669 Median 2305 4361 Q 3 3970 1590 IQR 2893 14231 Minimum 383 482 Maximum 14877 44009
Graphical Methods � Stem and Leaf Plots � Box Plots � Bar Charts � Histograms
Stem and Leaf Plots Given a set of numbers: � The leaf is the last digit considered. � The leaf unit specifies which digit. � The stem is the rest of the number. � The first column is the count for each stem. � The count where the median occurs is enclosed in parentheses.
Stem and Leaf for World GDP, 1960 23 0 34445556677778889999999 50 1 000001111123334455666778899 (21) 2 011223333344566667799 40 3 0000122234489 27 4 12669 22 5 238 19 6 8 18 7 334778 N = 111, 12 8 12 Leaf Unit = 100 10 9 26 i.e. lowest rgdp is in range 8 10 1469 300-400, highest is in range 4 11 55 2 12 4 14800-14900 1 13 1 14 8
Departures from Symmetry Skewness: A measure of asymmetry of a distribution Skewness is zero for a symmetric distribution. Positive Skewness - long tail to the right, mean greater than median. Negative Skewness - long tail to the left, mean less than median. Kurtosis: a measure of thickness of the tails
Box Plots � Indicate symmetry and variability of the sample values. � Measuring along the horizontal or vertical axis, draw a box with edges at Q 1 and Q 3 so its length is the IQR. � The width is up to you. � Draw a line across the box at the median value � Draw lines - whiskers - from the box to the sample maximum and sample minimum values (excluding outliers). � Observations lying more than 1.5*IQR from the edges of the box are ‘outliers’ and are represented by asterisks.
Boxplot of World Income Distribution, 1960 16000 14000 12000 10000 8000 6000 4000 2000 0
The Evolution of World Income Distribution, 1960- 2000 50000 40000 30000 20000 10000 0 rgdp1960 rgdp1970 rgdp1980 rgdp1990 rgdp2000
Bar Chart of World GDP per capita, 1960 1960 30 25 20 15 10 5 0 9 + 9 9 9 9 9 9 9 9 9 9 9 - 9 9 0 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 9 9 9 9 - 9 9 9 9 9 9 9 9 9 0 0 2 4 6 8 9 0 2 4 6 8 2 4 6 8 - - - - 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 0 0 0 0 - - - - - - - - 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 2 4 6 8 0 0 0 0 0 0 0 0 0 0 0 2 4 6 8 2 4 6 0 1 1 1 1 1 2 2 2 2
Bar Chart of World GDP per capita, 2000 2000 16 14 12 10 8 6 4 2 0 9 + 9 9 9 9 9 9 9 9 9 9 9 9 9 - 0 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 9 9 9 9 9 - 9 9 9 9 9 9 9 9 0 0 2 4 6 8 9 0 2 4 6 8 2 4 6 8 - - - - 0 1 1 1 1 1 2 2 2 0 0 0 0 2 2 - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 2 4 6 8 0 0 0 0 0 0 0 0 0 0 0 2 4 6 8 2 4 6 0 1 1 1 1 1 2 2 2 2
Histograms Given a sample of size n, 1. Select a number of classes - ‘bins’ - of equal width. Each sample value falls into one of the classes. 2. Calculate the number of values in each class - the class frequency . 3. Construct a bar graph where (a) the base of each bar is the class width (b) the height is the frequency for that class or the relative frequency for the class. 4. A rule for bin width IQR = h 2 1/3 n Note: Sometimes useful to have unequal bin widths.
Histogram of World GDP per capita, 1960 40 30 requency 20 F 10 0 0 2000 4000 6000 8000 10000 12000 14000 rgdp1960
Histogram of World GDP per capita, 2000 35 30 Note: Badly chosen class intervals. 25 Frequency 20 15 10 5 0 0 10000 20000 30000 40000 rgdp2000
Convergence Do Poorer Countries Grow Faster?
Scatterplot of Growth, 1960-2000, vs Initial RGDP 0.06 0.05 0.04 0.03 avgrowt h 0.02 0.01 0.00 -0.01 -0.02 0 2000 4000 6000 8000 10000 12000 14000 16000 rgdp1960
A Linear Relationship between Two Variables. = + Y a bX � Choose Y as the dependent variable and X as the independent variable. � What a and b best represent, the data?
Fitting a Line to Data � Could join any two points but line may be a long way from others. � Any line drawn through the data generates a set of residuals , some positive some negative. � The distance of a point from the line can be measured by the squared residual . � The Least Squares criterion: ‘ minimise the sum of the squared residuals ’.
The ‘Least Squares’ Coefficients ( )( ) n n ∑ ∑ − − X X Y Y x y i i i i = = = = 1 1 i i b ( ) n n ∑ ∑ − 2 2 X X x i i = = i 1 i 1 = − a Y b X
A Regression Worksheet Obs X Y x y xy xx 1 15.75 0.94 10.25 0.66 6.77 104.97 2 3.49 0.08 -2.01 -0.21 0.42 4.05 3 4.74 0.13 -0.76 -0.15 0.11 0.58 4 5.49 0.36 -0.02 0.07 0.00 0.00 5 5.41 0.47 -0.09 0.18 -0.02 0.01 6 2.07 0.10 -3.44 -0.19 0.65 11.80 7 7.69 0.41 2.18 0.13 0.27 4.76 8 2.48 0.11 -3.02 -0.18 0.53 9.12 9 5.44 0.13 -0.07 -0.15 0.01 0.00 10 2.48 0.11 -3.02 -0.17 0.52 9.13 Sums 55.05 2.84 0.00 0.00 9.26 144.42 Means 5.50 0.28 a = -0.07 b = 0.06
A verage Growth vs RGDP 1960: 30 Richest Countries 0.04 0.03 avgrowt h 0.02 0.01 0.00 5000 7500 10000 12500 15000 rgdp1960
Average Growth vs RGDP1960: 30 Poorest Countries 0.05 0.04 0.03 avgrowt h 0.02 0.01 0.00 -0.01 300 400 500 600 700 800 900 1000 1100 1200 rgdp1960
Recommend
More recommend