Arthur CHARPENTIER - Welfare, Inequality and Poverty Arthur Charpentier charpentier.arthur@gmail.com http ://freakonometrics.hypotheses.org/ Université de Rennes 1, February 2015 Welfare, Inequality & Poverty, # 4 1
Arthur CHARPENTIER - Welfare, Inequality and Poverty Regression ? Galton (1870, galton.org , 1886, galton.org ) and Pear- son & Lee (1896, jstor.org , 1903 jstor.org ) studied ge- netic transmission of characterisitcs, e.g. the heigth. On average the child of tall parents is taller than other children, but less than his parents. “I have called this peculiarity by the name of regres- sion’, Francis Galton, 1886. 2
Arthur CHARPENTIER - Welfare, Inequality and Poverty Regression ? 1 > l i b r a r y ( HistData ) 74 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 > attach ( Galton ) ● ● ● ● ● ● ● ● ● ● ● ● 72 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 > Galton$ count < − 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 70 ● ● ● ● ● ● height of the child ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 > df < − aggregate ( Galton , by=l i s t ( parent , ● ● ● ● ● ● ● ● ● ● ● ● ● ● 68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● c h i l d ) , FUN =sum) [ , c (1 ,2 ,5) ] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 66 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 > plot ( df [ , 1 : 2 ] , cex=sqrt ( df [ , 3 ] / 3) ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 64 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 > a bline ( a=0,b=1, l t y =2) ● ● ● ● 62 ● ● ● ● ● ● ● ● ● ● ● ● 7 > a bline (lm( c h i l d ~parent , data=Galton ) ) 64 66 68 70 72 height of the mid−parent 3
Arthur CHARPENTIER - Welfare, Inequality and Poverty Least Squares ? Recall that � � [ Y − m ] 2 �� � Y − m � 2 E ( Y ) = argmin ℓ 2 = E m ∈ R � � [ Y − m ] 2 �� � [ Y − E ( Y )] 2 � Var( Y ) = min = E E m ∈ R The empirical version is � n � � 1 n [ y i − m ] 2 y = argmin m ∈ R � n � i =1 n � � 1 1 s 2 = min n [ y i − m ] 2 n [ y i − y ] 2 = m ∈ R i =1 i =1 The conditional version is � � [ Y − ϕ ( X )] 2 �� � Y − ϕ ( X ) � 2 E ( Y | X ) = argmin ℓ 2 = E ϕ : R k → R � � [ Y − ϕ ( X )] 2 �� � [ Y − E ( Y | X )] 2 � Var( Y | X ) = min = E E ϕ : R k → R 4
Arthur CHARPENTIER - Welfare, Inequality and Poverty Changing the Distance in Least-Squares ? � n � � One might consider � | Y i − X T β ∈ argmin i β | , based on the ℓ 1 -norm, and i =1 not the ℓ 2 -norm. This is the least-absolute deviation estimator, related to the median regression, since median( X ) = argmin { E | X − x |} . More generally, assume that, for some function R ( · ), � n � � � R ( Y i − X T β ∈ argmin i β ) i =1 If R is differentiable, the first order condition would be R ′ � � n � Y i − X T · X T i = 0 . i β i =1 5
Arthur CHARPENTIER - Welfare, Inequality and Poverty Changing the Distance in Least-Squares ? i.e. � � � � n � i = 0 with ω ( x ) = R ′ ( x ) Y i − X T Y i − X T X T ω · , i β i β x � �� � i =1 ω i It is the first order condition of a weighted ℓ 2 regression. To obtain the ℓ 1 -regression, observe that ω = | ε | − 1 6
Arthur CHARPENTIER - Welfare, Inequality and Poverty Changing the Distance in Least-Squares ? = ⇒ use iterative (weighted) least-square regressions. Start with some standard ℓ 2 regression 1 > reg_0 < − lm(Y~X, data=db) For the ℓ 1 regression consider weight function 120 ● 1 > omega < − function ( e ) 1/abs ( e ) 100 ● ● Then consider the following iterative algorithm ● ● 80 ● ● ● ● ● ● 1 > r e s i d < − r e s i d u a l s ( reg_0) dist 60 ● ● ● ● ● ● ● ● ● ● 2 > f o r ( i in 1:100) { ● 40 ● ● ● ● ● ● ● ● ● ● ● 3 + W < − omega ( e ) ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 4 + reg < − lm(Y~X, data=db , weights= W) ● ● 0 5 + e < − r e s i d u a l t s ( reg ) } 5 10 15 20 25 speed 7
Arthur CHARPENTIER - Welfare, Inequality and Poverty Quantile Regression Observe that, for all τ ∈ (0 , 1) Q X ( τ ) = F − 1 X ( τ ) = argmin { E [ R τ ( X − m )] } m ∈ R where R τ ( x ) = [ τ − 1 ( x < 0)] · x . From a statistical point of view � � n � 1 � Q x ( τ ) = argmin R τ ( x i − m ) . n m ∈ R i =1 The quantile- τ regression � n � � � R τ ( Y i − X T β = argmin i β ) . i =1 8
Arthur CHARPENTIER - Welfare, Inequality and Poverty 120 120 120 100 100 100 80 80 80 ● dist dist dist 60 60 60 40 40 40 ● 20 20 20 ● ● 0 0 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 speed speed speed There are n (1 − p ) points in the upper region, and np in the lower one. 1 > l i b r a r y ( quantreg ) 2 > f i t 1 < − rq ( y ~ x1 + x2 , tau = . 1 , data = df ) see cran.r-project.org . 9
Arthur CHARPENTIER - Welfare, Inequality and Poverty Quantile Regression : Empirical Analysis Consider here some salaries, as a func- ● tion of the experience (in years), see ● 35000 ● data.princeton.edu ● ● ● ● ● 30000 1 > s a l a r y=read . table ( " http : // data . ● ● ● ● ● Salary princeton . edu/wws509/ datasets ● ● ● 25000 ● ● ● / s a l a r y . dat " , header= TRUE) ● ● ● ● ● ● ● ● ● ● ● 2 > l i b r a r y ( quantreg ) ● ● 20000 ● ● ● ● ● 3 > plot ( s a l a r y $yd , s a l a r y $ s l ) ● ● ● ● ● 4 > a bline ( rq ( s l ~yd , tau =.1 , data= ● ● ● ● ● ● ● 15000 ● ● s a l a r y ) , c o l=" red " ) ● 0 5 10 15 20 25 30 35 Experience (years) 10
Arthur CHARPENTIER - Welfare, Inequality and Poverty Quantile Regression : Empirical Analysis 1 > u < − seq ( . 0 5 , . 9 5 , by=.01) 2 > c o e f s t d < − function (u) summary( rq ( s l ~yd , data=salary , tau=u) ) $ 2000 c o e f f i c i e n t s [ , 2 ] 3 > c o e f e s t < − function (u) summary( 1500 rq ( s l ~yd , data=salary , tau=u) ) $ c o e f f i c i e n t s [ , 1 ] 1000 CE[2, ] 4 > CS < − Vectorize ( c o e f s t d ) (u) ● ● ● ● ● ● ● 5 > CE < − Vectorize ( c o e f e s t ) (u) ● ● ● ● ● ● ● ● 500 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 > CEinf < − CE − 2 ∗ CS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 7 > CEsup < − CE+2 ∗ CS 0 8 > plot (u ,CE[ 2 , ] , ylim=c ( − 500 ,2000) −500 , c o l=" red " ) 9 > polygon ( c (u , rev (u) ) , c ( CEinf 0.2 0.4 0.6 0.8 [ 2 , ] , rev (CEsup [ 2 , ] ) ) , c o l=" probability yellow " , border= NA) 11
Arthur CHARPENTIER - Welfare, Inequality and Poverty Datasets for Empirical Analysis Income the U.K., in 1988, 1992 and 1996, 1 > uk88 < − read . csv ( " http : //www. vcha r it e . univ − mrs . f r /pp/ lubrano / cours / f e s 8 8 . csv " , sep=" ; " , header=FALSE) $V1 2 > uk92 < − read . csv ( " http : //www. vcha r it e . univ − mrs . f r /pp/ lubrano / cours / f e s 9 2 . csv " , sep=" ; " , header=FALSE) $V1 3 > uk96 < − read . csv ( " http : //www. vcha r it e . univ − mrs . f r /pp/ lubrano / cours / f e s 9 6 . csv " , sep=" ; " , header=FALSE) $V1 4 > cpi < − c (421.7 , 546.4 , 602.4) 5 > y88 < − uk88/ cpi [ 1 ] 6 > y92 < − uk92/ cpi [ 2 ] 7 > y96 < − uk96/ cpi [ 3 ] 8 > plot ( density ( y88 ) , type=" l " , c o l=" red " ) 9 > l i n e s ( density ( y92 ) , type=" l " , c o l=" blue " ) 10 > l i n e s ( density ( y96 ) , type=" l " , c o l=" purple " ) 12
Arthur CHARPENTIER - Welfare, Inequality and Poverty Datasets for Empirical Analysis 13
Arthur CHARPENTIER - Welfare, Inequality and Poverty Inequalities : Empirical Analysis We can visualize empirical Lorenz curves, and theoretical version (lognormal) 1 > plot ( Lc ( y88 ) ) ; s=sd ( log ( y88 ) ) ; l i n e s ( Lc . lognorm , parameter=s ) 2 > plot ( Lc ( y92 ) ) ; s=sd ( log ( y92 ) ) ; l i n e s ( Lc . lognorm , parameter=s ) 3 > plot ( Lc ( y96 ) ) ; s=sd ( log ( y96 ) ) ; l i n e s ( Lc . lognorm , parameter=s ) 14
Recommend
More recommend