NHANES Data: Management with R David Winsemius, MD. MPH Heritage - PowerPoint PPT Presentation

NHANES Data: Management with R David Winsemius, MD. MPH Heritage Laboratories Olathe, KS West Hartford, CT

Acknowledgements and Credits: External • R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, • http://www.R-project.org. • Harrell FE: “Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis.” New York: Springer, 2001. – Alzola CF, Harrell FE: An Introduction to S and the Hmisc and Design Libraries . Freely available electronic book. • The survival Package: Title “Survival analysis, including penalised likelihood.” – Maintainer Thomas Lumley <tlumley@u.washington.edu> – Author: Terry Therneau, original R port by Thomas Lumley 2

http://www.cdc.gov/nchs/nhanes.htm

http://www.cdc.gov/nchs/nhanes/nhanes2007-2008/nhanes07_08.ht m

> library(foreign) #Demographics with subject ID, age > merge0304 <- read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/demo_c.xpt") # Followed by various laboratory values from 2003-2004 cycle > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l02_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l10_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l13_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l16_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l40_c.xpt") ) # and finally append to the cases assembled from prior years > merge0904 <- rbind(merge0902[ , names(merge0902)[ names(merge0902) %in% names(merge0304)] ], + merge0304[ , names(merge0902)[ names(merge0902) %in% names(merge0304)] ])

Option of using Hmisc sasxport.get > require(Hmisc) # I usually type require(rms) > w <- sasxport.get('/Users/davidwinsemius/Documents/Heritage/NHANES-continuous/nchs/ nhanes/nhanes2003-2004/demo_c.xpt') Processing SAS dataset DEMO_C .. > str(w) 'data.frame': 10122 obs. of 43 variables: $ seqn :Class 'labelled' atomic [1:10122] 21005 21006 21007 21008 21009 21010 21011 21012 21013 21014 ... .. ..- attr(*, "label")= chr "Respondent sequence number" $ sddsrvyr:Class 'labelled' atomic [1:10122] 3 3 3 3 3 3 3 3 3 3 ... .. ..- attr(*, "label")= chr "Data Release Number" $ ridstatr:Class 'labelled' atomic [1:10122] 2 2 2 2 2 2 2 2 2 2 ... .. ..- attr(*, "label")= chr "Interview/Examination Status" $ riagendr:Class 'labelled' atomic [1:10122] 1 2 2 1 1 2 1 1 2 1 ... > contents(w) .. ..- attr(*, "label")= chr "Gender - Adjudicated" Data frame:w 10122 observations and 43 variables Maximum # NAs:8678 Labels Storage NAs seqn Respondent sequence number integer 0 sddsrvyr Data Release Number integer 0 ridstatr Interview/Examination Status integer 0 riagendr Gender - Adjudicated integer 0 ridageyr Age at Screening Adjudicated - Recode integer 0 ridagemn Age in Months - Recode integer 223 ridageex Exam Age in Months - Recode integer 692 ridreth1 Race/Ethnicity - Recode integer 0 ridreth2 Linked NH3 Race/Ethnicity - Recode integer 0 dmqmilit Veteran/Military Status integer 4196

> load("/Users/davidwinsemius/Documents/Heritage/merge9906.Rdta") > require(foreign) > #demographics 2007-2008 > merge0304 <- read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/DEMO_E.xpt") > # hep c > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/HEPC_E.xpt") ) > # Glycohemoglobin > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/GHB_E.xpt") ) > # Cholesterol and HDL-Cholesterol > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/TCHOL_E.xpt") ) > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/HDL_E.xpt") ) > #Urinary Albumin and Urinary Creatinine > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/ALB_CR_E.xpt") ) > #Biochemistry Profile > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/BIOPRO_E.xpt") ) > str(merge0304) 'data.frame': 6917 obs. of 85 variables: $ SEQN : num 41475 41477 41479 41481 41482 ... $ SDDSRVYR: num 5 5 5 5 5 5 5 5 5 5 ... $ RIDSTATR: num 2 2 2 2 2 2 2 2 2 2 ... $ RIDEXMON: num 2 2 1 1 2 1 1 1 1 1 ...

> describe(merge0304$LBDHCV) merge0304$LBDHCV n missing unique Mean 6416 501 3 1.990 1 (114, 2%), 2 (6286, 98%), 5 (16, 0%) > merge0708 <- merge0304 > rm(merge0304) >merge9908 <- rbind(merge9906[, names(merge9906)[names(merge9906) %in% names(merge0708)] ], merge0708[, names(merge9906)[names(merge9906) %in% names(merge0708)] ])

> str(merge9908) 'data.frame': 27456 obs. of 57 variables: $ SEQN : num 1000 1003 1005 1009 1010 ... $ SDDSRVYR: num 1 1 1 1 1 1 1 1 1 1 ... $ RIDSTATR: num 2 2 2 2 2 2 2 2 2 2 ... <snip> $ DMDHRAGE: num 53 78 40 39 85 26 49 41 85 64 ... $ DMDHREDU: num 4 5 3 5 3 3 1 2 1 2 ... $ DMDHRMAR: num 1 NA 1 1 1 1 NA 99 1 1 ... $ DMDHSEDU: num 4 NA 3 4 3 3 NA NA 2 3 ... $ WTINT2YR: num 9488 29627 2867 105753 21555 ... $ WTMEC2YR: num 9946 31456 2964 110068 26756 ... $ SDMVPSU : num 2 1 2 1 1 1 1 2 1 2 ... $ SDMVSTRA: num 13 8 3 8 1 7 13 2 11 11 ... $ LBDHCV : num 2 2 2 2 2 2 2 2 2 2 ... $ LBXSAL : num 4.3 4.5 5 4.4 4.1 4.6 5.2 4.9 4.6 4.6 ... $ LBDSALSI: num 43 45 50 44 41 46 52 49 46 46 ... $ LBXSATSI: num 37 19 13 23 12 27 18 12 11 69 ... $ LBXSASSI: num 18 21 19 18 19 25 17 15 17 43 ... $ LBXSBU : num 28 31 14 11 31 10 11 8 17 10 ... $ LBDSBUSI: num 10 11.1 5 3.9 11.1 3.6 3.9 2.9 6.1 3.6 ... $ LBXSCH : num 265 133 131 223 218 233 176 212 215 170 ...

Variabl e tracking :

Or from the NHANES mailing list (citation OK: stetzer@uwm.edu): https://pantherfile.uwm.edu/stetzer/www/NHANES_filelocator_20100414.xlsx

> with(merge9908[merge9908$LBDHCV!=1,], plot(hcv.scr3, LBXSATSI, ylab="ALT", cex=0.1, col=LBDHCV, xlim=c(-5,5), ylim=c(0,300))) > with(merge9908[merge9908$LBDHCV==1,], points(hcv.scr3, LBXSATSI, col=LBDHCV)) > abline(v=-0.8926, lty=4) > abline(h=49, lty=4) > title(main="NHANES 1999-2008: Compare ALT and New HCV.scr for ability\n to classify HCV positives (open circles): 90th%ile lines") > text(-3.5,200, labels="46 HCV cases w/ \nHCV.sc1 low, ALT high") > text(3.5,10, labels="72 HCV cases w/ \nHCV.sc1 high, ALT low") > with(merge9908[merge9908$LBDHCV==1, ], table(HCVscr=hcv.scr1>5.2, ALT= LBXSATSI > 45)) ALT HCVscr FALSE TRUE FALSE 147 42 TRUE 68 119 > save(merge9908, HCVHIV, build.detect, build.detect2, ddH, file="HCVApr2010.Rdta")

Now look at ... what motivated this effort. Not from NHANES , but rather from an insurance derived dataset

Assessment: Is the HCV Score also an HCV Mortality Score? • Predicts mortality in both HCV positive and HCV negative applicants • Much better at predicting mortality in HCV positive

Not from NHANES , but rather from an insurance derived dataset # require(rms) # ... prior creation of Cox model with interacting rcs() spline terms > bplot( Predict(HCVALT.mort.test2, hcv.scr2=seq(-5,2,by=0.1), ALT=seq(0, 120, by=1), hepC3="Negative", fun=exp), xlab="HCV score (revised)", ylab="ALT (mIU/mL)", main= "Mortality: HCV-score x ALT in HepC='Negative': \n Age-Sex-Smoking-ALT Adjusted", lwd=3, scales=list(alternating=FALSE)) > # 3way.HCVneg.rcs(hcv-scr.ALT).pdf

Assessment: Is ALT also an HCV Mortality Score when used in combination? • Well, yes, er, um, ... it is, but ... • It has extremely paradoxical risk relationship when used in combination with the HCV Score • BUT … as everyone should know, plotting marginal effects when these effects are included in interactions is just statistically wrong!

NHANES Data: Management with R David Winsemius, MD. MPH Heritage - PowerPoint PPT Presentation

NHANES Data: Management with R David Winsemius, MD. MPH Heritage Laboratories Olathe, KS West Hartford, CT Acknowledgements and Credits: External R Development Core Team (2009). R: A language and environment for statistical computing. R

Intro to NHANES & Sampling Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

Laurie Haws, PhD, DABT ToxStrategies, Inc Comments to the USEPA Science Advisory Board Dioxin

Fasting Fasting T Time ime in CDC NHANES Richard Stahlhut, MD MPH University of Rochester y

Discrepancies in Self-Report Diabetes Survey Questions y Q using NHANES, NHIS, and CHIS data

Resistant (NHANES) 80 Hypertension 70 60 Percent 50 Awareness 40 Treatment BP control

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Deepa M. Gopal, MD, MS Assistant Professor of Medicine Cardiovascular Division, Heart Failure

PUBLIC PUBLIC HEAL HEALTH TH GRAN GRAND D ROU OUNDS NDS Aug ugust ust 19 19, , 20 2010

To use this slideshow with an LCD projector, click on Full Screen Mode in the View menu.

Heart failure Complex clinical syndrome caused by any structural or functional impairment

PRESENTS PRESENTS TRUE HOTEL MANAGEMENT SYSTEM TRUE HOTEL MANAGEMENT SYSTEM MANAGEMENT FEATURE

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Disclosure Diagnosis and Novel Management Strategies Grants/Research Support: 2016

Project #12 Unravelling the Primary Photoprotection Pathways in Model DNA systems Dr. Tom Oliver

Overview Information-driven modeling of biomolecular complexes ! Introduction ! Information

Applica'on Accelera'on on Current and Future Cray Pla4orms Alice Koniges, NERSC, Berkeley Lab

Synthesis of unnatural - N -linked glycopeptides with potential antifreeze activity

Methodological problems for the detection of resistant S. aureus: MRSA, VISA Y. Glupczynski

Microbiology Risk factors predictive of ulcers and amputation 1-4 Infections: & Previous

An Improved Approach for Glycan Structure Identification from HCD MS/MS Spectra Weiping Sun, Yi

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

NHANES Data: Management with R David Winsemius, MD. MPH Heritage - PowerPoint PPT Presentation

NHANES Data: Management with R David Winsemius, MD. MPH Heritage Laboratories Olathe, KS West Hartford, CT Acknowledgements and Credits: External R Development Core Team (2009). R: A language and environment for statistical computing. R

Intro to NHANES &amp; Sampling Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

Laurie Haws, PhD, DABT ToxStrategies, Inc Comments to the USEPA Science Advisory Board Dioxin

Fasting Fasting T Time ime in CDC NHANES Richard Stahlhut, MD MPH University of Rochester y

Discrepancies in Self-Report Diabetes Survey Questions y Q using NHANES, NHIS, and CHIS data

Resistant (NHANES) 80 Hypertension 70 60 Percent 50 Awareness 40 Treatment BP control

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Deepa M. Gopal, MD, MS Assistant Professor of Medicine Cardiovascular Division, Heart Failure

PUBLIC PUBLIC HEAL HEALTH TH GRAN GRAND D ROU OUNDS NDS Aug ugust ust 19 19, , 20 2010

To use this slideshow with an LCD projector, click on Full Screen Mode in the View menu.

Heart failure Complex clinical syndrome caused by any structural or functional impairment

PRESENTS PRESENTS TRUE HOTEL MANAGEMENT SYSTEM TRUE HOTEL MANAGEMENT SYSTEM MANAGEMENT FEATURE

Data Collection and Data Management saverio . giallorenzo @gmail.com 1 Web Science Data

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Diagnose data for cleaning Cleaning Data in Python Cleaning data Prepare data for analysis

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Disclosure Diagnosis and Novel Management Strategies Grants/Research Support: 2016

Project #12 Unravelling the Primary Photoprotection Pathways in Model DNA systems Dr. Tom Oliver

Overview Information-driven modeling of biomolecular complexes ! Introduction ! Information

Applica'on Accelera'on on Current and Future Cray Pla4orms Alice Koniges, NERSC, Berkeley Lab

Synthesis of unnatural - N -linked glycopeptides with potential antifreeze activity

Methodological problems for the detection of resistant S. aureus: MRSA, VISA Y. Glupczynski

Microbiology Risk factors predictive of ulcers and amputation 1-4 Infections: &amp; Previous

An Improved Approach for Glycan Structure Identification from HCD MS/MS Spectra Weiping Sun, Yi

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Intro to NHANES & Sampling Kaelen Medeiros Product Data Scientist at DataCamp DataCamp

Microbiology Risk factors predictive of ulcers and amputation 1-4 Infections: & Previous