PRESENTATION OUTLINE: Adaptation of a parallel Random Jungle (RJ) algorithm empowered by Coarse Grain Parallel computing for Cloud environment for Genome-Wide Association Studies(GWAS). Maria Pospelova School of Computer Science Carleton University Ottawa, Canada K1S 5B6 maria.pospelova@carleton.ca March 17, 2014 1 GWAS • What is GWAS • Why GWAS important: applications • Personalized medicine • How GWAS conducted 2 SNP • What are SNPs • How are they collected • What is the meaning on SNP correlations • fourth thing 3 Challenges of GWAS • Data set size • Dependence of variables: amplification and masking effects: epistasis • ”Genetic hitchhiking” • Rare variations • Sample size • Missing variables: incomplete data sets 1
4 Current GWAS approaches • Deterministic • Nondeterministic 5 Deterministic GWAS • Example • Pros/cons 6 Nondeterministic GWAS • Example • Pros/cons 7 Random Forest as an example of nondeterministic approach • Decision Tree methods overview • Ensemble learning approach • Random Forest: historical 8 Random Forest • Main Prinsiples • Algorithm • Pros : strong points + its popularity and practical applications • Cons : challenges, especially fading GWAS problems 9 Random Jungle • RJ as extension of RF tailored for GWAS • Its main features and strong sides • Note: in MPI: multiple single messages passes around 10 Coarse Grained Model • Refresh on the concept 2
11 Project Concept • Suggest modification of RJ towards collective communications • Illustrate complexity of the source code 12 Cluster run • About cluster used • Results • Analysis 13 Cloud • What is Cloud • Currently available clouds • Programming concepts : Map Reduce - Hadoop and MPI 14 Starcluster • History/origin • Quick review of the tool • How to create MPI cluster with the tool 15 Hadoop • Alternative RF implementations • Example: RF on Hadoop 16 Cloud run • About Cloud used • Results • Analysis 17 Conclusion • Discuss results • Suggest farther steps in research of/and improvement 3
Recommend
More recommend