Arthur CHARPENTIER - Welfare, Inequality and Poverty Arthur Charpentier charpentier.arthur@gmail.com http ://freakonometrics.hypotheses.org/ Université de Rennes 1, January 2015 Welfare, Inequality & Poverty, # 2 1
Arthur CHARPENTIER - Welfare, Inequality and Poverty Modeling Income Distribution Let { x 1 , · · · , x n } denote some sample. Then n n � � x = 1 1 x i = nx i n i =1 i =1 This can be used when we have census data. ●● ● ● ● 1 load ( u r l ( " http : // freakonometrics . f r e e . f r / income_5 . RData" ) ) 2 income < − s o r t ( income ) 3 plot ( 1 : 5 , income ) 0 50000 100000 150000 200000 250000 income It is possible to use survey data. If π i denote the probability to be drawn, use weights 1 ω i ∝ nπ i 2
Arthur CHARPENTIER - Welfare, Inequality and Poverty The weighted average is then n � ω i x ω = ω x i i =1 where ω = � ω i . This is an unbaised estimator of the population mean. Sometime, data are obtained from stratified samples : before sampling, members of the population are groupes in homogeneous subgroupes (called a strata). Given S strata, such that the population in strata s is N s , then S � � N s N x s where x s = 1 x S = x i N s s =1 i ∈S s 3
Arthur CHARPENTIER - Welfare, Inequality and Poverty Statistical Tools Used to Describe the Distribution Consider a sample { x 1 , · · · , x n } . Usually, the order is not important. So let us order those values, ≤ x 2: n ≤ · · · ≤ x n − 1: n ≤ x 1: n x n : n ���� ���� min { x i } max { x i } As usual, assume that x i ’s were randomly drawn from an (unknown) distribution F . If F denotes the cumulative distribution function, F ( x ) = P ( X ≤ x ), one can prove that F ( x i : n ) = P ( X ≤ x i : n ) ∼ i n The quantile function is defined as the inverse of the cumulative distribution function F , Q ( u ) = F − 1 ( u ) or F ( Q ( u )) = P ( X ≤ Q ( u )) = u 4
Arthur CHARPENTIER - Welfare, Inequality and Poverty Lorenz curve Lorenz curve The empirical version of Lorenz curve is 1.0 0.8 � n, 1 i L = x j : n 0.6 nx L(p) j ≤ i ● 0.4 ● 0.2 1 > plot ( ( 0 : 5 ) / 5 , c (0 ,cumsum( income ) /sum( income ● ● ) ) ) 0.0 0.0 0.2 0.4 0.6 0.8 1.0 p 5
Arthur CHARPENTIER - Welfare, Inequality and Poverty Gini Coefficient A Gini coefficient is defined as the ratio of areas, A + B . It can be defined using order statistics as n � 2 i · x i : n − n + 1 1.0 ● G = n ( n − 1) x n − 1 0.8 i =1 0.6 L(p) 1 > n < − length ( income ) ● 0.4 A 2 > mu < − mean( income ) ● 0.2 B 3 > 2 ∗ sum ( ( 1 : n) ∗ s o r t ( income ) ) / (mu ∗ n ∗ (n − 1)) − (n ● ● 0.0 ● +1)/ (n − 1) 0.0 0.2 0.4 0.6 0.8 1.0 [ 1 ] 0.5800019 4 p 6
Arthur CHARPENTIER - Welfare, Inequality and Poverty Distribution Fitting Assume that we now have more observations, 1 > load ( u r l ( " http : // freakonometrics . f r e e . f r /income_500. RData" ) ) We can use some histogram to visualize the distribu- tion of the income Histogram of income 40 1 > summary( income ) Min . 1 st Qu. Median Mean 3rd Qu. 2 30 Max. Frequency 2191 23830 42750 77010 87430 20 3 2003000 10 4 > s o r t ( income ) [ 4 9 5 : 5 0 0 ] [ 1 ] 465354 489734 512231 539103 627292 5 0 2003241 0 500000 1000000 1500000 2000000 income 6 > h i s t ( income , breaks=seq (0 ,2005000 , by=5000) ) 7
Arthur CHARPENTIER - Welfare, Inequality and Poverty Distribution Fitting Because of the dispersion, look at the histogram of the logarithm of the data Histogram of log(income, 10) 1 > h i s t ( log ( income , 1 0 ) , breaks=seq ( 3 , 6 . 5 , length =51) ) 40 2 > boxplot ( income , h o r i z o n t a l= TRUE, log=" x " ) 30 Frequency 20 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 log(income, 10) 2e+03 1e+04 5e+04 2e+05 1e+06 8
Arthur CHARPENTIER - Welfare, Inequality and Poverty Distribution Fitting 1.0 The cumulative distribution function (on the log of 0.8 the income) Cumulated Probabilities 0.6 1 > u < − s o r t ( income ) 0.4 2 > v < − ( 1 : 5 0 0 ) /500 0.2 3 > plot (u , v , type=" s " , log=" x " ) 0.0 2e+03 1e+04 5e+04 2e+05 1e+06 Income (log scale) 9
Arthur CHARPENTIER - Welfare, Inequality and Poverty Distribution Fitting 1e+06 2e+05 If we invert that graph, we have the quantile function Income (log scale) 5e+04 1 > plot (v , u , type=" s " , c o l=" red " , log=" y " ) 1e+04 2e+03 0.0 0.2 0.4 0.6 0.8 1.0 Probabilities 10
Arthur CHARPENTIER - Welfare, Inequality and Poverty Distribution Fitting 1.0 ● 0.8 On that dataset, Lorenz curve is 0.6 L(p) 1 > plot ( ( 0 : 5 0 0 ) / 500 , c (0 ,cumsum( income ) /sum( 0.4 income ) ) ) 0.2 0.0 ● 0.0 0.2 0.4 0.6 0.8 1.0 p 11
Arthur CHARPENTIER - Welfare, Inequality and Poverty Distribution and Confidence Intervals There are two techniques to get the distribution of an estimator � θ , – a parametric one, based on some assumptions on the underlying distribution, – a nonparametric one, based on sampling techniques If X i ’s have a N ( µ, σ 2 ) distribution, then � � µ, σ 2 X ∼ N n But sometimes, distribution can only be obtained as an approximation, because of asymptotic properties. From the central limit theorem, � � µ, σ 2 X → N as n → ∞ n In the nonparametric case, the idea is to generate pseudo-samples of size n , by resampling from the original distribution. 12
Arthur CHARPENTIER - Welfare, Inequality and Poverty Bootstraping Consider a sample x = { x 1 , · · · , x n } . At step b = 1 , 2 , · · · , B , generate a pseudo sample x b by sampling (with replacement) within sample x . Then compute any statistic � θ ( x b ) 1 > boot < − function ( sample , f , b=500){ 2 + F < − rep (NA, b) 3 + n < − length ( sample ) 4 + f o r ( i in 1 : b) { 5 + idx < − sample ( 1 : n , s i z e=n , r e p l a c e= TRUE) 6 + F[ i ] < − f ( sample [ idx ] ) } 7 + return (F) } 13
Arthur CHARPENTIER - Welfare, Inequality and Poverty Bootstraping Let us generate 10,000 bootstraped sample, and com- pute Gini index on those 15 1 >boot_g i n i < − boot ( income , gini ,1 e4 ) To visualize the distribution of the index 10 Density 1 > h i s t ( boot_gini , p r o b a b i l i t y= TRUE) 5 2 > u < − seq ( . 4 , . 7 , length =251) 3 > v < − dnorm(u , mean( boot_g i n i ) , sd ( boot_g i n i ) 0 ) 0.45 0.50 0.55 0.60 4 > l i n e s (u , v , c o l=" red " , l t y =2) boot_gini 14
Arthur CHARPENTIER - Welfare, Inequality and Poverty Continuous Versions The empirical cumulative distribution function n � F n ( x ) = 1 � 1 ( x i ≤ x ) n i =1 Observe that F n ( x j : n ) = j � n If F is absolutely continuous, � x f ( t ) dt i.e. f ( x ) = dF ( x ) F ( x ) = . dx 0 Then � b P ( x ∈ [ a, b ]) = f ( t ) dt = F ( b ) − F ( a ) . a 15
Arthur CHARPENTIER - Welfare, Inequality and Poverty Continuous Versions One can define quantiles as x = Q ( p ) = F − 1 ( p ) The expected value is � ∞ � ∞ � 1 µ = xf ( x ) dx = [1 − F ( x )] dx = Q ( p ) dp. 0 0 0 We can compute the average standard of living of the group below z . This is equivalent to the expectation of a truncated distribution. � � � z � ∞ 1 1 − F ( x ) µ − z = xf ( x ) dx = fx F ( z ) F ( z ) 0 0 16
Arthur CHARPENTIER - Welfare, Inequality and Poverty Continuous Versions Lorenz curve is p �→ L ( p ) with � Q ( p ) L ( p ) = 1 xf ( x ) dx µ 0 Gastwirth (1971) proved that � p � p 0 Q ( u ) du L ( p ) = 1 Q ( u ) du = � 1 µ 0 Q ( u ) du 0 The numerator sums the incomes of the bottom p proportion of the population. The denominator sums the incomes of all the population. L is a [0 , 1] → [0 , 1] function, continuous if F is continuous. Observe that L is increasing, since dL ( p ) = Q ( p ) dp µ Further, L is convex 17
Arthur CHARPENTIER - Welfare, Inequality and Poverty The sample case � i � i � j =1 x j : n � n L = n j =1 x j : n The points { i/n, L ( i/n ) } are then linearly interpolated to complete the corresponding Lorenz curve. The continuous distribution case � F − 1 ( p ) � p ydF ( y ) 1 0 F − 1 ( u ) du � ∞ L ( p ) = = E ( X ) ydF ( y ) 0 0 with p ∈ (0 , 1). Let L be a continuous function on [0 , 1], then L is a Lorenz curve if and only if L (0) = 0 , L (1) = 1 , L ′ (0 + ) ≥ 0 and L ′′ ( p ) ≥ 0 on [0 , 1] . 18
Arthur CHARPENTIER - Welfare, Inequality and Poverty From Lorenz to Bonferroni The Bonferroni curve is B ( p ) = L ( p ) p and the Bonferroni index is � 1 BI = 1 − B ( p ) dp. 0 Define i � P i = i n and Q i = 1 x j nx j =1 then � P i − Q i � n − 1 � 1 B = n − 1 P i i =1 19
Recommend
More recommend