a19 research internship results
play

A19 Research Internship Results Charlie Cloutier-Langevin & - PowerPoint PPT Presentation

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion A19 Research Internship Results Charlie Cloutier-Langevin & Julien Corriveau-Trudel Universit e de Sherbrooke Tuesday, December


  1. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion A19 Research Internship Results Charlie Cloutier-Langevin & Julien Corriveau-Trudel Universit´ e de Sherbrooke Tuesday, December 10th 2019 Supervisors : F´ elix Camirand Lemyre, Alan A. Cohen, Nancy Presse Collaborators : V´ eronique Legault, Val´ erie Turcot, Alistair Senior

  2. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Introduction Context New approach to study aging through Physiological Dysregulation (Phys. Dys.) with the Mahalanobis distance [4][5][9] Advent of the NuAge Dataset Task Study the potential relationship between nutrients intake of an individual and the deviance of his or her biological profile from a reference population.

  3. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion NuAge Dataset The NuAge dataset in numbers: 1 754 elderly women and men, from age 68 to 81; 6 586 visits, between 1 and 4 visits per person; 23 186 24h recalls, 1 to 3 recalls per timepoints, for 5 timepoints; 188 medical variables and 43 nutritional variables; 364 421 missing values out of 1 238’168 entries (29.4%); Each year, a set of of biological, nutritional, functional, medical, and social traits is measured for each participant.

  4. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Physiological systems considered Different physiological systems considered: 1 Oxygen Transport 2 Liver/Kidney functions 3 Hematopoiesis 4 Micronutrients 5 Lipids System information comes from a previous study on how to regroup these biomarkers and the effects of using different subsets of biomarkers [4]. Global Phys. Dys. score has been computed, which is the sum of the Phys. Dys. values of all systems.

  5. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Table of contents Transformation to normality 1 Longitudinal imputation 2 Intrapolation Extrapolation Results Clustering 3 Measurement error and regression 4 Additive Error Model CoCoLasso Deconvolution Nonparametric Regression Conclusion 5

  6. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation to normality

  7. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation for normality Statistical methods + normality = Better performances Classic transformations As best transform provided by V´ eronique Legault Examples: sqrt(), log(), exp() Parametric transformation methods Provide an accurate and simplified process for transformation. Parametric transformation methods BoxCox[1] Yeo-Johnson[12] Manly[8]

  8. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion BoxCox Transformation Best transformation? ... = ⇒ BoxCox transformation! BoxCox transformation (Box & Cox, 1964) Parametric power transformation Strictly positive observation values λ ǫ [-5, 5] Defined as: � y λ − 1 λ , if λ � = 0 y ( λ ) := ln ( y ), if λ = 0

  9. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion BoxCox Transformation Since the BoxCox transformation requires strictly positive data, Box & Cox proposed a shifted modification. Shifted BoxCox transformation Parametric power transformation λ 1 ǫ [-5, 5] New shifting parameters λ 2 Defined as: � ( y i + λ 2 ) λ 1 − 1 , if λ 1 � = 0 y ( λ ) λ 1 := i ln ( y i + λ 2 ), if λ 1 = 0 Remark Shift the data by λ 2 = 1 does not impact the result because it will not impact the variable distribution

  10. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Let’s compare best transform and BoxCox for 2 examples EXAMPLE 1 Biomarker name: Creatinine Name in data set: CREAT Best transform applied: log(x) 1 BoxCox λ applied: λ = -0.5 (equivalent to √ x )

  11. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) No transformation histogram: Right skewed

  12. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) Best transform transformation histogram: Approaching normality

  13. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) BoxCox transformation histogram: Almost normal

  14. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) No transformation Q-Q plot: No line shape

  15. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) Best transform transformation Q-Q plot: Ends does not fit the line

  16. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) BoxCox transformation Q-Q plot: Approaching a line shape

  17. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) Shapiro-Wilk [11] normality test comparison: H0: Data are from a normally distributed population No transformation: p-value < 2.2e-16 = ⇒ Reject H0 Best transform: p-value = 3.844e-15 = ⇒ Reject H0 BoxCox: p-value = 0.004968 = ⇒ Reject H0

  18. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison EXAMPLE 2 Biomarker name: Weight Name in data set: weight Best transform applied: x (no transformation) BoxCox λ applied: λ = 0.1

  19. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight No transformation (& Best transform ) histogram: Right skewed

  20. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight BoxCox transformation histogram: Almost normal

  21. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight No transformation (& Best transform Q-Q plot: No line shape

  22. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight BoxCox transformation Q-Q plot: Almost a line shape

  23. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight Shapiro-Wilk [11] normality test comparison: H0: Data are from a normally distributed population No transformation: p-value < 2.2e-16 = ⇒ Reject H0 BoxCox: p-value = 0.005793 = ⇒ Reject H0

  24. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison General results 119 continuous biomarker variables transformed Average difference of λ 1 between Best transform and BoxCox = 0.7831933 The only biomarker that was not transformed by BoxCox is lipids tot , compare to 75 for Best transform Limitations In most case, we still have to reject normality. BoxCox search for the best power transformation in NuAge data set, it could vary on other data sets . Not necessarily the best results for every variable, but the best overall

  25. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Imputation

  26. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion What is imputation and why impute data? What is imputation? Imputation is the act of substituting missing values. Replacing NA s by a plausible value. Why imputation? Having more data means stronger statistical model, since data imputed > data initial , and a consistent statistical model is one that is ”stronger” when n → ∞ .

  27. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Beware naive imputation Risk : Introducing bias to subsequent statistical estimations. Mean imputation Imputing variable X with the mean of non missing values of X attenuates correlation in the data. Linear regression imputation On the opposite, imputing with linear regression based on non missing values of X will strengthen correlation.

  28. Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Imputation of NuAge biomarker dataset In NuAge biomarkers: 1 238 168 entries, with 364 421 of them missing. In the light of our objectives: Conservative approach ⇒ only impute the necessary, without negatively impacting the computation of Mahalanobis Distance (MHBD).

Recommend


More recommend