r packages for rank based estimates
play

R Packages for Rank-based Estimates John Kloke 1 Joseph McKean 2 1 - PowerPoint PPT Presentation

R Packages for Rank-based Estimates John Kloke 1 Joseph McKean 2 1 University of Wisconsin - Madison 2 Western Michigan University 10 July 2013 Kloke, McKean R Packages for Rank-based Estimates Outline Rank-based (R) estimation for linear


  1. R Packages for Rank-based Estimates John Kloke 1 Joseph McKean 2 1 University of Wisconsin - Madison 2 Western Michigan University 10 July 2013 Kloke, McKean R Packages for Rank-based Estimates

  2. Outline ◮ Rank-based (R) estimation for linear models ◮ Rfit ◮ Joint ranking (JR) estimation for cluster correlated data ◮ jrfit Kloke, McKean R Packages for Rank-based Estimates

  3. Notation Usual iid Linear Model y i = α + x T i β + e i for i = 1 , . . . , n iid Assume e i ∼ F , f are continuous errors with I ( f ) < ∞ . Matrix Formulation y = α 1 n + X β + e WLOG assume X is centered ( X T 1 n = 0 ) and has full rank and design conditions hold. Kloke, McKean R Packages for Rank-based Estimates

  4. Rank Estimation Jureˇ ckov´ a (1971) & Jaeckel (1972) The rank-based estimator of β is defined as � = Argmin � y − X β � ϕ β ϕ n � a [ R ( y i − x T i β )]( y i − x T = Argmin i β ) i =1 ◮ � v � ϕ = � n v ∈ R n , is Jaeckel’s (1972) i =1 a [ R ( v i )] v i dispersion function . ◮ R denotes Rank. � � t ◮ Scores are generated as a [ t ] = ϕ . ( n +1) ◮ Score function : ϕ ( u ) is a nondecreasing, square-integrable � function defined on (0 , 1) such that ϕ ( u ) du = 0 and � ϕ 2 ( u ) du = 1. Kloke, McKean R Packages for Rank-based Estimates

  5. Asymptotic Theory and Inference � ϕ ( X T X ) − 1 � ˆ β , τ 2 β ϕ ˙ ∼ N p where τ ϕ is a scale parameter which depends on the error distribution ( f ) and the score function ( ϕ ). A consistent estimate of τ ϕ was proposed by Koul, Sievers, McKean (1987). Inference ◮ Simple confidence intervals and tests of hypothesis for the slope parameters should be based on a t n − p − 1 where p = ncol( X ) (see Hettmansperger, McKean 2009). ◮ A reduction in dispersion which is akin to the least squares reduced model test is based on the statistic F RD = ( � y − ˆ y F � ϕ − � y − ˆ y R � ) / q ˆ τ ϕ / 2 where ˆ y F is the full model fit, ˆ y R is the reduced model fit, and q is the number of constraints. Kloke, McKean R Packages for Rank-based Estimates

  6. Rfit Kloke, McKean 2012 > args(rfit.default) function (formula, data = list(), yhat0 = NULL, scores = ws symmetric = FALSE, intercept = NULL, TAU = "F0", ...) ◮ yhat0 : initial fit. Defaults to an L 1 estimate ( quantreg , Koenker 2013). ◮ scores : object of class ‘scores’. Several available. Defaults to Wilcoxon. ◮ symmetric : use median (F) or Hodges-Lehmann (T) to estimate intercept ◮ intercept : should an intercept be added to the model? Default is yes unless already in column space of the design matrix. ◮ TAU : F0 = fortran; R = R; N = NA Kloke, McKean R Packages for Rank-based Estimates

  7. Example 1: Uric Acid & Cardiovascular Risk Factors Heritier, et. al. 2009 ◮ Investigating the association between uric acid levels and various cardiovascular risk factors in developing countries. ◮ Outcome variable: Uric Acid ( µ mol/L) ◮ Risk Factors: body mass index (bmi), systolic blood pressure (sys), low-density lipoprotein cholesterol (ldl), total cholesterol (choles), male (sex) ◮ 998 Participants (474 men & 524 women) aged 25 to 64. Kloke, McKean R Packages for Rank-based Estimates

  8. > summary(rfit(Uric~bmi+sys+choles+ldl+sex,data=CardioRiskFactor Call: rfit.default(formula = Uric ~ bmi + sys + choles + ldl + sex, data = CardioRiskFactors) Coefficients: Estimate Std. Error t.value p.value -122.54244 19.10680 -6.4136 2.197e-10 *** bmi 5.16547 0.49141 10.5115 < 2.2e-16 *** sys 0.67100 0.10480 6.4027 2.352e-10 *** choles 60.37128 5.10748 11.8202 < 2.2e-16 *** ldl -53.30736 5.33107 -9.9994 < 2.2e-16 *** sex 128.74061 5.00197 25.7380 < 2.2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Multiple R-squared (Robust): 0.4688407 Reduction in Dispersion Test: 175.1226 p-value: 0 Kloke, McKean R Packages for Rank-based Estimates

  9. > fitF<-rfit(Uric~bmi+sys+choles+ldl+sex +smok+alco+apoa+trig+age,data=CardioRiskFactors) > fitR<-rfit(Uric~bmi+sys+choles+ldl+sex, data=CardioRiskFactors) > drop.test(fitF,fitR) Drop in Dispersion Test F-Statistic p-value 82.193 0.000 Kloke, McKean R Packages for Rank-based Estimates

  10. Wilcoxon Studentized Residuals > qqnorm(rstudent(fit)) Normal Q−Q Plot 20 15 Sample Quantiles 10 5 0 −5 −3 −2 −1 0 1 2 3 Theoretical Quantiles Kloke, McKean R Packages for Rank-based Estimates

  11. Cluster Correlated Data Consider an experiment done over m blocks or clusters. For block k ( k = 1 , . . . , m ) we have the linear model y k = α 1 n k + X k β + e k ◮ Assume e 1 , . . . , e m are independent random vectors. ◮ N = � m k =1 n k . Matrix Formulation y = α 1 N + X β + e m ] T is a N × p design matrix where X = [ X T 1 · · · X T Kloke, McKean R Packages for Rank-based Estimates

  12. JR Estimator Kloke, McKean, Rashid 2009 Assume equal marginals: e kj ∼ f , F for k = 1 , . . . , m , j = 1 , . . . n k The joint rankings (JR) estimator of β is ˆ β JR = Argmin � y − X β � ϕ , where � N � v � ϕ = a ( R ( v t )) v t t =1 is Jaeckel’s (1972) dispersion function. Kloke, McKean R Packages for Rank-based Estimates

  13. Asymptotic Distribution Using a result from Brunner and Denker (1994) � � − 1 � � � − 1 � ˆ β , τ 2 X T X X T X β JR ˙ ∼ N p V ϕ ϕ ◮ V ϕ = � m k =1 X T k Σ ϕ, k X k ◮ Σ ϕ, k = var( ϕ ( F ( e k ))) ◮ ϕ ( F ( e k )) = [ ϕ ( F ( e k 1 )) , . . . , ϕ ( F ( e kn k ))] T Kloke, McKean R Packages for Rank-based Estimates

  14. Estimates of V ϕ = � m k =1 X T k Σ ϕ X k 1. Assume Σ ϕ is compound symmetric . Then � Σ ϕ = (1 − ˆ ρ ϕ ) I + ˆ ρ ϕ J where � m � e kj )), M = � m � n k � 1 ρ ϕ = ˆ i > j a ( R (ˆ e ki )) a ( R (ˆ . k =1 k =1 M − p 2 2. Sandwich estimator. � m k =1 X T e k )) T X k k a ( R (ˆ e k )) a ( R (ˆ 1. Valid under exchangeable within block errors. 2. Requires a large number of clusters (large m ). Kloke, McKean R Packages for Rank-based Estimates

  15. jrfit > args(jrfit) function (x, y, block, yhat0 = NULL, scores = wscores, fitint = NULL, var.type = "sandwich", fitblock = NULL, ...) ◮ x : N × p design matrix ◮ y : N × 1 vector of responses ◮ block : N × 1 vector denoting block membership ◮ var.type : ‘sandwich’, ‘cs’, or ‘user’. ◮ fitblock : should block be fit as fixed effect? Default T for ‘cs’ and F for ‘sandwich’ Kloke, McKean R Packages for Rank-based Estimates

  16. Example 2: Simulated Cluster-Correlated ◮ Model: y ij = α + w i ∆ + x ij β + b j + e ij ◮ m = 160 blocks ◮ n = 4 observations (repeated measures) per block ◮ ∆ = 0 . 5 is the treatment effect ◮ β = 0 is the effect of a (baseline) covariate ◮ e ij ∼ t 5 and b j ∼ t 3 Kloke, McKean R Packages for Rank-based Estimates

  17. > fit<-jrfit(cbind(x,w),y,block) > summary(fit) Coefficients: Estimate Std. Error t-ratio p.value x 0.0904889 0.1390482 0.6507737 0.5161260 w 0.7094427 0.2578163 2.7517367 0.0066124 > > fit<-jrfit(cbind(x,w),y,block,var.type=’cs’,fitblock=F) > summary(fit) Coefficients: Estimate Std. Error t-ratio p.value x 0.0904889 0.1182933 0.7649543 0.4454256 w 0.7094427 0.2642782 2.6844538 0.0080298 Kloke, McKean R Packages for Rank-based Estimates

  18. Summary ◮ Rfit provides robust rank-based inference for iid linear model. Available at CRAN. ◮ jrfit provides robust rank-based inference for linear model w/ cluster correlated errors. Available at http://www.biostat.wisc.edu/~kloke/ . Kloke, McKean R Packages for Rank-based Estimates

  19. Future Work ◮ Regression through the origin ◮ High-breakdown R estimates (+ diagnostics) ◮ Generalized JR estimate based on working covariance of var( y i ). ◮ “Nonparametric Statistical Methods Using R” (2014) - Chapman Hall Kloke, McKean R Packages for Rank-based Estimates

  20. References ◮ Brunner, E. and Denker, M. (1994), Rank statistics under dependent observations and applications to factorial designs, Journal of Statistical Inference and Planning , 42, 353-378. ◮ Hettmansperger, T. P. and McKean, J. W. (2011), Robust Nonparametric Statistical Methods, 2nd Edition , Boca Raton, FL: CRC Press. ◮ Heritier, S. Cantoni, E., Copt, S. and Victoria-Feser, M. (2009), Robust Methods in Biostatistics , New York: Wiley. ◮ Jaeckel, L. A. (1972), Estimating regression coefficients by minimizing the dispersion of the residuals, Annals of Mathematical Statistics , 43, 1449-1458. ◮ Jureˇ akov´ a, J. (1971), Nonparametric estimate of regression coefficients, Annals of Mathematical Statistics , 42, 1328:1338. ◮ Kloke, JD, McKean, JW (2012). Rfit: Rank-based Estimation for Linear Models. R Journal , 4/2, 57-64. ◮ Kloke, J.D., McKean, J.W., and Rashid, M. (2009), Rank-Based Estimation and Associated Inferences for Linear Models with Cluster Correlated Errors, Journal of the American Statistical Association , 104, 485:384-390. ◮ Koenker, R. (2013). quantreg: Quantile Regression. R package version 4.98. http://CRAN.R-project.org/package=quantreg. ◮ Koul, H.L., Sievers, G.L. and McKean, J.W. (1987). An estimator of the Scale parameter for the Rank Analysis of Linear Models under General Score Functions. Scand J. Statist. 14, 131 - 141. Kloke, McKean R Packages for Rank-based Estimates

Recommend


More recommend