large scale machine learning for
play

Large-scale machine learning for genotype / phenotype association - PowerPoint PPT Presentation

Large-scale machine learning for genotype / phenotype association Aidan OBrien Health Data Analytics 2018 HEALTH AND BIOSECURITY aydun1 By 2025 it is estimated that 50% of the world population will have been sequenced. Frost&Sullivan


  1. Large-scale machine learning for genotype / phenotype association Aidan O’Brien Health Data Analytics 2018 HEALTH AND BIOSECURITY aydun1

  2. By 2025 it is estimated that 50% of the world population will have been sequenced. Frost&Sullivan Data acquisition of BigData disciplines in 2025 Genomics YouTube Astronomy 20 EB Storage / year Twitter Stephens et al. BigData: Astronomical or Genomical (2015) 2 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  3. Understanding disease and finding biomarkers https://www.projectmine.com/about/ 3 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  4. Finding the disease gene(s) Gene1 Gene2 cases controls 4 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  5. Complex diseases are driven by multiple genes Need an cases approach to capture feature- interactions controls 5 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  6. Machine learning on 1.7 Trillion datapoints Genomic profile Disease 80 Million features status 22,500 samples Individuals A B C Disease genes 6 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  7. Machine learning can capture complex features Trad. GWAS (logistic regression) Required Solution Genomic profile Genomic profile Individuals Individuals Predictive variants Predictive variants 7 | Large-scale Machine Learning for Gen- Phen Association | Aidan O’Brien | @aydun1

  8. Random forest – a collection of decision trees 8 | Large-scale Machine Learning for Gen- Phen Association | Aidan O’Brien | @aydun1

  9. Population-scale genomic data analysis requires BigData solutions High-performance compute cluster Hadoop/Spark compute cluster Focus Compute-intensive Data-intensive Fault tolerant No Yes Node-bound Yes No Parallelization 100+ CPU 1000+ CPU Parallelization procedure bespoke standardized CSIRO solution 9 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  10. Solution: VariantSpark - “Wide” machine learning for population - scale cohorts Speed high Variant Spark SparkML MLlib low Spark Core Accuracy high low “Analyzes 3000 individuals with 80M features in 30 minutes“ BMC Genomics 2015, 16:1052 PMID: 26651996 (citation=16) 10 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  11. VariantSpark – amplifies association in the signal • Bone Mineral Density (BMD) as the phenotype: 1,936 individuals with 7.2 Million variants (imputed from array) • Replicate known BMD genes identified by traditional GWAS (single loci regression). • Amplify signal over traditional methods so smaller cohorts give robust insights More accurate biomarker discovery 11 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  12. Hipster Index Synthetic dataset Genome Hipster? 1 1 0 1 1 0 1 Y 1 1 0 0 1 0 1 Y 0 0 0 0 1 0 1 N 1 1 1 0 1 0 1 Y 0 0 0 1 0 1 1 N HipsterScore = 0 0 1 0 1 0 0 N (2 * B6 ) + ( 0.2 * B2 ) + (1.5 * R1) + (0.1 * C2) + (3 * B6 * B2) + (2.5 * R1 * C1) + noise 0 1 1 0 0 0 0 N independent interacting 1 1 1 0 1 1 1 Y 0 0 0 0 0 0 0 N 12 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  13. Share research notebooks • Databricks • AWS EKS • Try it on your data https://docs.databricks.com/applications /genomics/variant-spark.html 13 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  14. Understanding relationships can lead to clinical applications Correcting Genomes Treating Finding Individuals Disease Genes CSIRO’s cloud - based solutions 14 | Innovation In Digital Health - Open Floor Forum | Denis C. Bauer | @allPowerde

  15. Three things to remember • Complex diseases need software to detect gene-interactions • VariantSpark detects gene-interactions • Bringing findings into clinical practise requires new cloud technologies 15 | Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

  16. Let’s build a healthier world together Team We are hiring… You? Denis Bauer, Arash Bayat Oscar Luo, Laurence Wilson, Aidan O’Brien Brendan Hosking Rob Dunne, Piotr Szul Natalie Twine, …email Denis PhD PhD PhD PhD PhD Collaborators Software Lynn Langit News Top 10 Australian IT stories of 2017 Keynote Aidan O’Brien, CSIRO Large-scale Machine Learning for Gen-Phen Association | Aidan O’Brien | @aydun1

Recommend


More recommend