nhanes data management with r
play

NHANES Data: Management with R David Winsemius, MD. MPH Heritage - PowerPoint PPT Presentation

NHANES Data: Management with R David Winsemius, MD. MPH Heritage Laboratories Olathe, KS West Hartford, CT Acknowledgements and Credits: External R Development Core Team (2009). R: A language and environment for statistical computing. R


  1. NHANES Data: Management with R David Winsemius, MD. MPH Heritage Laboratories Olathe, KS West Hartford, CT

  2. Acknowledgements and Credits: External • R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, • http://www.R-project.org. • Harrell FE: “Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis.” New York: Springer, 2001. – Alzola CF, Harrell FE: An Introduction to S and the Hmisc and Design Libraries . Freely available electronic book. • The survival Package: Title “Survival analysis, including penalised likelihood.” – Maintainer Thomas Lumley <tlumley@u.washington.edu> – Author: Terry Therneau, original R port by Thomas Lumley 2

  3. http://www.cdc.gov/nchs/nhanes.htm

  4. http://www.cdc.gov/nchs/nhanes/nhanes2007-2008/nhanes07_08.ht m

  5. > library(foreign) #Demographics with subject ID, age > merge0304 <- read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/demo_c.xpt") # Followed by various laboratory values from 2003-2004 cycle > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l02_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l10_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l13_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l16_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l40_c.xpt") ) # and finally append to the cases assembled from prior years > merge0904 <- rbind(merge0902[ , names(merge0902)[ names(merge0902) %in% names(merge0304)] ], + merge0304[ , names(merge0902)[ names(merge0902) %in% names(merge0304)] ])

  6. Option of using Hmisc sasxport.get > require(Hmisc) # I usually type require(rms) > w <- sasxport.get('/Users/davidwinsemius/Documents/Heritage/NHANES-continuous/nchs/ nhanes/nhanes2003-2004/demo_c.xpt') Processing SAS dataset DEMO_C .. > str(w) 'data.frame': 10122 obs. of 43 variables: $ seqn :Class 'labelled' atomic [1:10122] 21005 21006 21007 21008 21009 21010 21011 21012 21013 21014 ... .. ..- attr(*, "label")= chr "Respondent sequence number" $ sddsrvyr:Class 'labelled' atomic [1:10122] 3 3 3 3 3 3 3 3 3 3 ... .. ..- attr(*, "label")= chr "Data Release Number" $ ridstatr:Class 'labelled' atomic [1:10122] 2 2 2 2 2 2 2 2 2 2 ... .. ..- attr(*, "label")= chr "Interview/Examination Status" $ riagendr:Class 'labelled' atomic [1:10122] 1 2 2 1 1 2 1 1 2 1 ... > contents(w) .. ..- attr(*, "label")= chr "Gender - Adjudicated" Data frame:w 10122 observations and 43 variables Maximum # NAs:8678 Labels Storage NAs seqn Respondent sequence number integer 0 sddsrvyr Data Release Number integer 0 ridstatr Interview/Examination Status integer 0 riagendr Gender - Adjudicated integer 0 ridageyr Age at Screening Adjudicated - Recode integer 0 ridagemn Age in Months - Recode integer 223 ridageex Exam Age in Months - Recode integer 692 ridreth1 Race/Ethnicity - Recode integer 0 ridreth2 Linked NH3 Race/Ethnicity - Recode integer 0 dmqmilit Veteran/Military Status integer 4196

  7. > load("/Users/davidwinsemius/Documents/Heritage/merge9906.Rdta") > require(foreign) > #demographics 2007-2008 > merge0304 <- read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/DEMO_E.xpt") > # hep c > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/HEPC_E.xpt") ) > # Glycohemoglobin > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/GHB_E.xpt") ) > # Cholesterol and HDL-Cholesterol > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/TCHOL_E.xpt") ) > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/HDL_E.xpt") ) > #Urinary Albumin and Urinary Creatinine > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/ALB_CR_E.xpt") ) > #Biochemistry Profile > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/BIOPRO_E.xpt") ) > str(merge0304) 'data.frame': 6917 obs. of 85 variables: $ SEQN : num 41475 41477 41479 41481 41482 ... $ SDDSRVYR: num 5 5 5 5 5 5 5 5 5 5 ... $ RIDSTATR: num 2 2 2 2 2 2 2 2 2 2 ... $ RIDEXMON: num 2 2 1 1 2 1 1 1 1 1 ...

  8. > describe(merge0304$LBDHCV) merge0304$LBDHCV n missing unique Mean 6416 501 3 1.990 1 (114, 2%), 2 (6286, 98%), 5 (16, 0%) > merge0708 <- merge0304 > rm(merge0304) >merge9908 <- rbind(merge9906[, names(merge9906)[names(merge9906) %in% names(merge0708)] ], merge0708[, names(merge9906)[names(merge9906) %in% names(merge0708)] ])

  9. > str(merge9908) 'data.frame': 27456 obs. of 57 variables: $ SEQN : num 1000 1003 1005 1009 1010 ... $ SDDSRVYR: num 1 1 1 1 1 1 1 1 1 1 ... $ RIDSTATR: num 2 2 2 2 2 2 2 2 2 2 ... <snip> $ DMDHRAGE: num 53 78 40 39 85 26 49 41 85 64 ... $ DMDHREDU: num 4 5 3 5 3 3 1 2 1 2 ... $ DMDHRMAR: num 1 NA 1 1 1 1 NA 99 1 1 ... $ DMDHSEDU: num 4 NA 3 4 3 3 NA NA 2 3 ... $ WTINT2YR: num 9488 29627 2867 105753 21555 ... $ WTMEC2YR: num 9946 31456 2964 110068 26756 ... $ SDMVPSU : num 2 1 2 1 1 1 1 2 1 2 ... $ SDMVSTRA: num 13 8 3 8 1 7 13 2 11 11 ... $ LBDHCV : num 2 2 2 2 2 2 2 2 2 2 ... $ LBXSAL : num 4.3 4.5 5 4.4 4.1 4.6 5.2 4.9 4.6 4.6 ... $ LBDSALSI: num 43 45 50 44 41 46 52 49 46 46 ... $ LBXSATSI: num 37 19 13 23 12 27 18 12 11 69 ... $ LBXSASSI: num 18 21 19 18 19 25 17 15 17 43 ... $ LBXSBU : num 28 31 14 11 31 10 11 8 17 10 ... $ LBDSBUSI: num 10 11.1 5 3.9 11.1 3.6 3.9 2.9 6.1 3.6 ... $ LBXSCH : num 265 133 131 223 218 233 176 212 215 170 ...

  10. Variabl e tracking :

  11. Or from the NHANES mailing list (citation OK: stetzer@uwm.edu): https://pantherfile.uwm.edu/stetzer/www/NHANES_filelocator_20100414.xlsx

  12. Or from the NHANES mailing list (citation OK: stetzer@uwm.edu): https://pantherfile.uwm.edu/stetzer/www/NHANES_filelocator_20100414.xlsx

  13. > with(merge9908[merge9908$LBDHCV!=1,], plot(hcv.scr3, LBXSATSI, ylab="ALT", cex=0.1, col=LBDHCV, xlim=c(-5,5), ylim=c(0,300))) > with(merge9908[merge9908$LBDHCV==1,], points(hcv.scr3, LBXSATSI, col=LBDHCV)) > abline(v=-0.8926, lty=4) > abline(h=49, lty=4) > title(main="NHANES 1999-2008: Compare ALT and New HCV.scr for ability\n to classify HCV positives (open circles): 90th%ile lines") > text(-3.5,200, labels="46 HCV cases w/ \nHCV.sc1 low, ALT high") > text(3.5,10, labels="72 HCV cases w/ \nHCV.sc1 high, ALT low") > with(merge9908[merge9908$LBDHCV==1, ], table(HCVscr=hcv.scr1>5.2, ALT= LBXSATSI > 45)) ALT HCVscr FALSE TRUE FALSE 147 42 TRUE 68 119 > save(merge9908, HCVHIV, build.detect, build.detect2, ddH, file="HCVApr2010.Rdta")

  14. Now look at ... what motivated this effort. Not from NHANES , but rather from an insurance derived dataset

  15. Assessment: Is the HCV Score also an HCV Mortality Score? • Predicts mortality in both HCV positive and HCV negative applicants • Much better at predicting mortality in HCV positive

  16. Not from NHANES , but rather from an insurance derived dataset # require(rms) # ... prior creation of Cox model with interacting rcs() spline terms > bplot( Predict(HCVALT.mort.test2, hcv.scr2=seq(-5,2,by=0.1), ALT=seq(0, 120, by=1), hepC3="Negative", fun=exp), xlab="HCV score (revised)", ylab="ALT (mIU/mL)", main= "Mortality: HCV-score x ALT in HepC='Negative': \n Age-Sex-Smoking-ALT Adjusted", lwd=3, scales=list(alternating=FALSE)) > # 3way.HCVneg.rcs(hcv-scr.ALT).pdf

  17. Assessment: Is ALT also an HCV Mortality Score when used in combination? • Well, yes, er, um, ... it is, but ... • It has extremely paradoxical risk relationship when used in combination with the HCV Score • BUT … as everyone should know, plotting marginal effects when these effects are included in interactions is just statistically wrong!

Recommend


More recommend