Arthur CHARPENTIER - Welfare, Inequality and Poverty Arthur Charpentier charpentier.arthur@gmail.com http://freakonometrics.hypotheses.org/ Université de Rennes 1, February 2016 Welfare, Inequality & Poverty, # 4 1
Arthur CHARPENTIER - Welfare, Inequality and Poverty Regression? Galton (1870, galton.org , 1886, galton.org ) and Pear- son & Lee (1896, jstor.org , 1903 jstor.org ) studied ge- netic transmission of characterisitcs, e.g. the heigth. On average the child of tall parents is taller than other children, but less than his parents. “I have called this peculiarity by the name of regres- sion’, Francis Galton, 1886. 2
Arthur CHARPENTIER - Welfare, Inequality and Poverty Regression? 1 > l i b r a r y ( HistData ) 74 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 > attach ( Galton ) ● ● ● ● ● ● ● ● ● ● ● ● 72 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 > Galton$ count < − 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 70 ● ● ● ● ● ● height of the child ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 > df < − aggregate ( Galton , by=l i s t ( parent , ● ● ● ● ● ● ● ● ● ● ● ● 68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● c h i l d ) , FUN =sum) [ , c (1 ,2 ,5) ] ● ● ● ● ● ● ● ● ● ● ● ● ● ● 66 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 > plot ( df [ , 1 : 2 ] , cex=sqrt ( df [ , 3 ] / 3) ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 64 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 > a bline ( a=0,b=1, l t y =2) ● ● ● ● 62 ● ● ● ● ● ● ● ● ● ● ● ● 7 > a bline (lm( c h i l d ~parent , data=Galton ) ) 64 66 68 70 72 height of the mid−parent 3
Arthur CHARPENTIER - Welfare, Inequality and Poverty Least Squares? Recall that � � [ Y − m ] 2 �� � Y − m � 2 E ( Y ) = argmin ℓ 2 = E m ∈ R � � [ Y − m ] 2 �� � [ Y − E ( Y )] 2 � Var( Y ) = min = E E m ∈ R The empirical version is � n � � 1 n [ y i − m ] 2 y = argmin m ∈ R � n � i =1 n � � 1 1 s 2 = min n [ y i − m ] 2 n [ y i − y ] 2 = m ∈ R i =1 i =1 The conditional version is � � [ Y − ϕ ( X )] 2 �� � Y − ϕ ( X ) � 2 E ( Y | X ) = argmin ℓ 2 = E ϕ : R k → R � � [ Y − ϕ ( X )] 2 �� � [ Y − E ( Y | X )] 2 � Var( Y | X ) = min = E E ϕ : R k → R 4
Arthur CHARPENTIER - Welfare, Inequality and Poverty Changing the Distance in Least-Squares? � n � � One might consider � | Y i − X T β ∈ argmin i β | , based on the ℓ 1 -norm, and i =1 not the ℓ 2 -norm. This is the least-absolute deviation estimator, related to the median regression, since median( X ) = argmin { E | X − x |} . More generally, assume that, for some function R ( · ), � n � � � R ( Y i − X T β ∈ argmin i β ) i =1 If R is differentiable, the first order condition would be R ′ � � n � Y i − X T · X T i = 0 . i β i =1 5
Arthur CHARPENTIER - Welfare, Inequality and Poverty Changing the Distance in Least-Squares? i.e. � � � � n � i = 0 with ω ( x ) = R ′ ( x ) Y i − X T Y i − X T X T ω · , i β i β x � �� � i =1 ω i It is the first order condition of a weighted ℓ 2 regression. To obtain the ℓ 1 -regression, observe that ω = | ε | − 1 6
Arthur CHARPENTIER - Welfare, Inequality and Poverty Changing the Distance in Least-Squares? = ⇒ use iterative (weighted) least-square regressions. Start with some standard ℓ 2 regression 1 > reg_0 < − lm(Y~X, data=db) For the ℓ 1 regression consider weight function 120 ● 1 > omega < − function ( e ) 1/abs ( e ) 100 ● ● Then consider the following iterative algorithm ● ● 80 ● ● ● ● ● ● 1 > r e s i d < − r e s i d u a l s ( reg_0) dist 60 ● ● ● ● ● ● ● ● ● ● 2 > f o r ( i in 1:100) { ● 40 ● ● ● ● ● ● ● ● ● ● ● 3 + W < − omega ( e ) ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● 4 + reg < − lm(Y~X, data=db , weights= W) ● ● 0 5 + e < − r e s i d u a l t s ( reg ) } 5 10 15 20 25 speed 7
Arthur CHARPENTIER - Welfare, Inequality and Poverty Quantile Regression Observe that, for all τ ∈ (0 , 1) Q X ( τ ) = F − 1 X ( τ ) = argmin { E [ R τ ( X − m )] } m ∈ R where R τ ( x ) = [ τ − 1 ( x < 0)] · x . From a statistical point of view � � n � 1 � Q x ( τ ) = argmin R τ ( x i − m ) . n m ∈ R i =1 The quantile- τ regression � n � � � R τ ( Y i − X T β = argmin i β ) . i =1 8
Arthur CHARPENTIER - Welfare, Inequality and Poverty 120 120 120 100 100 100 80 80 80 ● dist dist dist 60 60 60 40 40 40 ● 20 20 20 ● ● 0 0 0 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 speed speed speed There are n (1 − p ) points in the upper region, and np in the lower one. 1 > l i b r a r y ( quantreg ) 2 > f i t 1 < − rq ( y ~ x1 + x2 , tau = . 1 , data = df ) see cran.r-project.org . 9
Arthur CHARPENTIER - Welfare, Inequality and Poverty Quantile Regression: Empirical Analysis Consider here some salaries, as a func- ● tion of the experience (in years), see ● 35000 ● data.princeton.edu ● ● ● ● 30000 ● 1 > s a l a r y=read . table ( " http : // data . ● ● ● ● ● Salary princeton . edu/wws509/ datasets ● ● ● 25000 ● ● ● / s a l a r y . dat " , header= TRUE) ● ● ● ● ● ● ● ● ● ● ● 2 > l i b r a r y ( quantreg ) ● ● 20000 ● ● ● ● ● 3 > plot ( s a l a r y $yd , c ) ● ● ● ● ● 4 > a bline ( rq ( s l ~yd , tau =.1 , data= ● ● ● ● ● ● ● 15000 ● ● s a l a r y ) , c o l=" red " ) ● 0 5 10 15 20 25 30 35 Experience (years) 10
Arthur CHARPENTIER - Welfare, Inequality and Poverty Quantile Regression: Empirical Analysis 1 > u < − seq ( . 0 5 , . 9 5 , by=.01) 2 > c o e f s t d < − function (u) summary( rq ( s l ~yd , data=salary , tau=u) ) $ 2000 c o e f f i c i e n t s [ , 2 ] 3 > c o e f e s t < − function (u) summary( 1500 rq ( s l ~yd , data=salary , tau=u) ) $ c o e f f i c i e n t s [ , 1 ] 1000 CE[2, ] 4 > CS < − Vectorize ( c o e f s t d ) (u) ● ● ● ● ● ● ● ● ● ● 5 > CE < − Vectorize ( c o e f e s t ) (u) ● ● ● ● ● 500 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 > CEinf < − CE − 2 ∗ CS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 7 > CEsup < − CE+2 ∗ CS 0 8 > plot (u ,CE[ 2 , ] , ylim=c ( − 500 ,2000) −500 , c o l=" red " ) 9 > polygon ( c (u , rev (u) ) , c ( CEinf 0.2 0.4 0.6 0.8 [ 2 , ] , rev (CEsup [ 2 , ] ) ) , c o l=" probability yellow " , border= NA) 11
Arthur CHARPENTIER - Welfare, Inequality and Poverty Quantile Regression: Empirical Analysis Consider the evolution of the 90% − 10% quantile ratio, 1 > ratio9010 = function ( age ) { 2 + p r e d i c t (Q90 , newdata=data . frame ( yd=age ) ) / 3 + p r e d i c t (Q10 , newdata=data . frame ( yd=age ) ) 4 + } 5 > ratio9010 (5) 6 1.401749 7 > A=0:30 8 > plot (A, Vectorize ( ratio9010 ) (A) , type=" l " , ylab=" 90 − 10 qua nt ile r a t i o " ) 12
Arthur CHARPENTIER - Welfare, Inequality and Poverty Local Regression: Empirical Analysis which is smoother than the local esti- mator 1 > ratio9010_k = function ( age , k =10){ 2 + idx=which ( rank ( abs ( s a l a r y $yd − age ) )<=k) 3 + qu a nt il e ( s a l a r y $ s l [ idx ] , . 9 ) / qua nt ile ( s a l a r y $ s l [ idx ] , . 1 ) } 4 > A=0:30 5 > plot (A, Vectorize ( ratio9010_k) (A ) , type=" l " , ylab=" 90 − 10 qua nt ile r a t i o " ) 13
Arthur CHARPENTIER - Welfare, Inequality and Poverty Local Regression: Empirical Analysis 1 > Gini ( s a l a r y $ s l ) [ 1 ] 0.1391865 2 We can also consider some local Gini index 1 > Gini_k = function ( age , k=10){ 2 + idx=which ( rank ( abs ( s a l a r y $yd − age ) )<=k) 3 + Gini ( s a l a r y $ s l [ idx ] ) } 4 > A=0:30 5 > plot (A, Vectorize ( Gini_k ) (A) , type=" l " , ylab=" Local Gini index " ) 14
Recommend
More recommend