statistical online learning of large scale imaging
play

Statistical online learning of large-scale imaging-genetics data - PowerPoint PPT Presentation

Statistical online learning of large-scale imaging-genetics data Data Science Meetup Nice - Sophia-Antipolis Marco Lorenzi Universit Cte dAzur Inria Sophia Antipolis, Asclepios Research Project - 1 William Utermolhen (1933-2007)


  1. Statistical online learning of large-scale imaging-genetics data Data Science Meetup Nice - Sophia-Antipolis Marco Lorenzi Université Côte d’Azur Inria Sophia Antipolis, Asclepios Research Project - 1

  2. William Utermolhen (1933-2007) Self-portrais 1967 1997 1998 1999 2000 1996 1995: Alzheimer’s disease diagnosis - 2

  3. Alzheimer’s disease: the most common form of dementia Memory loss Language problems Memory loss Language problems Apraxia Apraxia Functionality loss Functionality loss Cognitive impairment Mood alterations Cognitive impairment Mood alterations Enormous human and societal cost The disease with the largest economic impact (Europe et US) Health-care Health-care Impact on families Impact on families 160 billion $ every year worldwide 160 billion $ every year worldwide ~20,000 $ every year in 1998 ~20,000 $ every year in 1998 [Wimo et al., Dement Geriatr Cogn Disord 1998] [Moore et al., J Gerontol B Psychol Sci Soc Sci 1998] [Wimo et al., Dement Geriatr Cogn Disord 1998] [Moore et al., J Gerontol B Psychol Sci Soc Sci 1998] - 3

  4. People affected in the world 26,6 millions in 2006 3,1 3,1 3,1 3,1 7,21 7,21 7,21 7,21 12,63 12,63 1,33 1,33 2,3 2,3 0,23 0,23 [Brookmeyer et al., Alzheimers and Dementia 2007] - 4

  5. People affected in the world 106 millions en 2050 16,51 16,51 8,85 8,85 62,85 62,85 6,33 6,33 6,33 6,33 10,85 10,85 10,85 10,85 0,84 0,84 “Looming epidemy” “Looming epidemy” 2017 2017 No cures nor preventive measures No cures nor preventive measures [Brookmeyer et al., Alzheimers and Dementia 2007] - 5

  6. Urgent need: understanding the disease Dr. Aloysius “Alois” Auguste Deter Alzheimer (1850-1906) (1864-1915) Amyloid plaques & Brain atrophy neurofibrillary tangles Normal “Alzheimer’s” “Alzheimer’s” Normal [Kahn et al, PNAS 2007] source http://www.alz.org - 6

  7. A story with several actors Jack et al, Lancet Neurol 2010; Frisoni et al, Nature Rev Neurol 2010 Sociodemographic ? Genetics Vascularity … Microbiome - 7

  8. Multifactorial processes Introduction Disentangling Patient stratification the pathological (diagnostic) mechanisms effective clinical trials drug discovery Approaches Forward Backward Models  Data Data  Models •Targeted •Exploring unknown interactions •Testing “mechanistic” hypothesis •Based on inferential methods •  Difficult to account for several •  Generalization and validation factors Lorenzi Marco IPMC 2017 - 8

  9. A research challenge Introduction Data science Biomedical research Statistical learning Neuroimaging + Combine heterogeneous data and observations for: • Improve the understanding of the disease • Better treatment • Better diagnostic - 9

  10. Joint modeling of brain and genetic data in Alzheimer’s disease - Ingredients - • Data (disease markers) • Algorithms • Databases - 10

  11. Joint modeling of brain and genetic data in Alzheimer’s disease - Ingredients - • Data (disease markers) • Algorithms • Databases - 11

  12. Brain imaging Quantify the brain structure Grey matter Connectivity Brain cortical thickness - 12

  13. Genetics Identifying meaningful genetic variants (Single Nucleotide Polymorphism -SNP- ) in a population Association with a disease Heritability Novembre et al, Nature, 2008 Discovering the encoded information - 13

  14. Joint modeling of brain and genetic data in Alzheimer’s disease - Ingredients - • Data (disease markers) • Algorithms • Databases - 14

  15. Association between SNP and brain features statistical complexity candidate SNP low chromosome N several scalars high … chromosome 1 chromosome N many SNP (~10 6 ) GWAs many voxel /mesh … chromosome 1 chromosome N measures (~10 5 ) very high GWAS = genome wide association studies - 15

  16. Multivariate Association studies Introduction Maximizing the joint relationship between genetic variants and brain features Partial least squares (PLS) ~10 6 SNPs max p,q Cov( X . p , Y . q ) X = N individuals ~10 5 brain features Y = N individuals Liu et al, Front in Neuroinformatics, 2014; Silver et al, NeuroImage 2012; S zymczak et al, Genetic Epidemiology 2009; … - 16

  17. Multivariate Association studies Introduction Maximizing the joint relationship between genetic variants and brain features Partial least squares (PLS) ~10 6 SNPs max p,q Cov( X . p , Y . q ) X = N individuals PLS weights ~10 5 brain features = Y = N individuals relative importance chromosome N Liu et al, Front in Neuroinformatics, 2014; Silver et al, NeuroImage 2012; S zymczak et al, Genetic Epidemiology 2009; … - 17

  18. Multivariate Association studies Introduction Maximizing the joint relationship between genetic variants and brain features Partial least squares (PLS) ~10 6 SNPs max p,q Cov( X . p , Y . q ) X = N individuals PLS weights ~10 5 brain features = Y = N individuals relative importance chromosome N Cons. Pros. Overcomes issues of mass univariate analysis •Avoiding independent multiple testing • Overfitting and reproducibility •Exploring SNP-SNP interaction (epistatic effects) •Computational complexity Liu et al, Front in Neuroinformatics, 2014; Silver et al, NeuroImage 2012; S zymczak et al, Genetic Epidemiology 2009; … - 18

  19. Stability assessment Random partitioning of the Imaging genetic population in non-overlapping groups (split-half) - 19

  20. Stability assessment Random partitioning of the population in non-overlapping groups (split-half) Extraction of PLS PLS PLS PLS PLS components PLS weights associated to individual SNPs PLS weights associated to individual SNPs Partitioning of chromosomes (bin size: 10k ) - 20

  21. Stability assessment Random partitioning of the Imaging genetic population in non-overlapping groups (split-half) Extraction of PLS PLS PLS PLS PLS components PLS weights associated to individual SNPs PLS weights associated to individual SNPs Top 5% Partitioning of chromosomes (bin size: 10k ) - 21

  22. Stability assessment PLS PLS PLS PLS Identification of relevant loci (binarization) 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 - 22

  23. Stability assessment PLS PLS PLS PLS 0 1 0 1 1 1 1 1 1 1 Stable estimator of relevant loci (AND) . 0 1 1 - 23

  24. Stability assessment 10 6 10 6 iterations iterations PLS PLS PLS PLS 0 1 0 1 1 1 1 1 1 1 Stable estimator of relevant loci (AND) . 0 1 1 - 24

  25. Stability assessment 10 6 10 6 iterations iterations PLS PLS PLS PLS Same procedure for the assessment of brain thickness component at each mesh point 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 Stable estimator of relevant loci (AND) . 0 1 1 - 25

  26. A multivariate answer A. Altmann Lorenzi et al. AAIC 2016 - 26

  27. Investigating biological mechanisms through Meta-analysis PLS statistical result chromosome N p relevant proximal areas (+/- 5kbp) locus Analysis of genomic areas - 27

  28. Investigating biological mechanisms through Meta-analysis PLS statistical result chromosome N Querying gene annotation databases p relevant proximal areas (+/- 5kbp) locus Analysis of genomic areas McLaren et al. The Ensembl Variant Effect Predictor. Genome Biology, 20 - 28

  29. Investigating biological mechanisms through Meta-analysis S. Wray 148 SNP-gene combinations Significance (p-value) 6 tested tissues training testing hippocampus, whole blood, TM2D1 0.005 0.053 Adipose subcutaneous, artery tibia, nerve tibial, IL10RA 0.107 0.620 treated fibroblast TRIB3 0.003 0.003 ZBTB7A 0.036 0.913 LYSMD4 0.000 0.206 14 Significantly expressed genes CRYL1 0.621 0.118 FAM135B 0.000 0.559 TM2D1 (amyloid-beta binding protein) , IP6K3 0.000 0.465 IL10RA (increase in hippo in mouse model) , ITGA1 0.099 0.731 KIN 0.001 0.206 TRIB3 LAMC1 0.002 0.062 (neuronal cell death, modulates PSEN1 stability, LINC00941 0.000 0.690 interacts with APP) RBPMS2 0.000 0.215 RP11-181K3.4 0.002 0.053 - 29

  30. Joint modeling of brain and genetic data in Alzheimer’s disease - Ingredients - • Data (disease markers) • Algorithms • Databases - 30

  31. Large multicentric clinical studies Data for ~100’000 individuals Challenge: Meta-study - 31

  32. Meta-analysis in genetic studies C 1 C 2 C M … Cons. • Multiple testing  low statistical power •No SNP-SNP interaction •Limited interpretability State-of-art: analysis of univariate outcome (p-value, effect size, standard error, …) Problem. How to develop multivariate imaging- genetics modeling approaches within a meta-analysis context? - 32

  33. Extending meta-analysis for multivariate models X Λ U V ’ Y ’ q brain features (~10 5 ) = chromosome 1 … N individuals chromosome 22 p SNPs (~10 6 ) Λ M V M ’ Λ 1 V 1 ’ Λ 2 V 2 ’ C 1 C 2 … C M C 1 U 1 U 2 U M C 2 = + … C M + … + - 33

  34. Extending meta-analysis for multivariate models Meta PLS Sequential PLS - 34

  35. Testing Absolute feature-wise error Mean and sd of dot product Lorenzi et al. MASAMB 2016 - 35

Recommend


More recommend