in silico blood genotyping from exome sequencing data
play

In silico blood genotyping from exome sequencing data Silvio - PowerPoint PPT Presentation

In silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/ Today Personalized genetics has been upon us for some time How


  1. In silico blood genotyping from exome sequencing data Silvio Tosatto BioComputing UP, Department of Biology, University of Padova, Italy URL: http://protein.bio.unipd.it/

  2. Today • Personalized genetics has been upon us for some time • How good are we at actually identifying phenotype from whole genome?

  3. The CAGI Personal Genom e Project ( PGP) Challenge • Few goals are more pure to genome interpretation than predicting traits from raw sequence (or genotype) data • In this CAGI challenge, phenotypes/traits are predicted for real people with genetic data • 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10) Dataset provided by George Church

  4. Personal genome project (PGP) ‐ Predict individuals’ phenotype Numerical traits 33. Birth weight (in g) 34. HDL level (in mg/dL) * 35. LDL level (in mg/dL) * 36. Triglyceride level (in mg/dL) * 37. Fasting blood glucose level (in mg/dL) 38. Warfarin dose (in mg) 39. Age at Menarche 40. Annual income (in $)

  5. Personal genome project (PGP) ‐ Predict individuals’ phenotype Numerical traits 33. Birth weight (in g) 34. HDL level (in mg/dL) * 35. LDL level (in mg/dL) * 36. Triglyceride level (in mg/dL) * 37. Fasting blood glucose level (in mg/dL) 38. Warfarin dose (in mg) 39. Age at Menarche 40. Annual income (in $)

  6. Blood Groups • Clear genetic cause of phenotypes • Model system for phenotype prediction • Good description in literature • High relevance, especially for blood transfusions (Blood. 2009;114: 248-256)

  7. Exam ple: ABO glycosyltransferase Amino acid residues differing between blood group A- and B-active transferases, respectively (Arg176Gly; Gly235Ser; Leu266Met; Gly268Ala) are shown with the single-letter code and their positions indicated. Blood Grp Genes Antigens ABO ABO A, B, O

  8. Relevant Blood Types 10 out of ca. 30 blood groups are relevant for transfusions Blood Grp Genes Antigens ABO ABO A, B, O RH RHCE, RHD D, E, C plus 50 minor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 minor Di a , Di b , Wr a , Wr b Diego SLC4A1 Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 minor MNS GYPA, GYPB, M, N, S plus 40 minor GYBE Bombay FUT1, FUT2 H, secretor

  9. BOOGI E: BlOOd Group I dEntifier • A knowledge-based system to predict blood groups from sequencing data • All 10 groups relevant for blood transfusions are predicted • A specialized genotype-phenotype knowledge base is required

  10. BOOGI E: Know ledge representation • Stored in tree-like structure • Rules expressed in “ if <mutation(s)> then <phenotype(s)> ” form

  11. BOOGI E: Know ledge collection Blood G rp G enes Antigens ABO ABO A, B, O RH RH CE, RHD D, E, C plus 50 m inor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 m inor Di a , Di b , Wr a , Wr b Diego SLC4A1 Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 m inor M NS GYPA, GYPB, M , N, S plus 40 m inor GYBE Bom bay FUT1, FUT2 H, secretor – Manually curated – 580 rules derived

  12. ANNOVAR ANNOVAR Millions of SNVs (Wang et al., Nucleic Acids Research 2010) Gene ‐ based annotation of variants Select conserved positions ANNOVAR is used to reduce the SNVs Remove unrelated to manageable genes number. Relevant variants Few relevant SNVs

  13. BOOGI E Pipeline Blood G rp G enes Antigens ABO ABO A, B, O RH RHCE, RHD D, E, C plus 50 m inor DUFFY DARC FY(a), FY(b) Kell KEL K1, K2 plus 23 m inor Diego SLC4A1 Di a , Di b , W r a , Wr b Kidd SLC14A1 Jk(a), Jk(b) Lewis FUT3 a, b Lutheran BCAM Lu(a), Lu(b) plus 15 m inor M NS GYPA, GYPB, M , N, S plus 40 m inor GYBE Bom bay FUT1, FUT2 H, secretor

  14. Benchm arking • BOOGIE covers all known blood group variants • Difficulty in finding genome sequences with known blood phenotypes • Personal Genome Project (PGP) as annotated benchmark set

  15. Personal Genom e Project ( PGP) The mission of the PGP is to encourage the development of personal genomics • 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10) • A larger dataset (PGP-1K) aims to cover at least 1,000 genomes Unfortunately, only ABO and Rh blood group information is available

  16. PGP-1 0 Data Back row ( left to right ): James Sherley, Misha Angrist, John Halamka, Keith Batchelder, Rosalynn Gill. Front row ( left to right ): Esther Dyson, George Church, Kirk Maxey. Not shown : Stan Lapidus and Steven Pinker.

  17. PGP-1 0 Data

  18. PGP-1 0 Results BOOGIE predicts correctly all ABO types and all except one (PGP-4) Rh groups PGP1 PGP4 PGP8 Known O + A - B + ABO O A B Rh c; e; weak D c; e; weak D c; e; weak D DUFFY FY(a+); FY(b-) FY(a-); FY(b+) FY(a-); FY(b+) KELL K2; K21+; K4-; K2; K21+; K4-; K2; K21+; K4-; K3-; K11; K17; K3-; K11; K17; K3-; K11; K17; K14; K24; K6+; K14; K24; K6+; K14; K24; K6+; K7- K7- K7- Diego Dib; Memph neg Dib; Memph neg Dib; Memph neg KIDD Jk(a-); Jk(b+) Jk(a-); Jk(b+) Jk(a+); Jk(b-) Lewis negative negative negative Lutheran Lu(a-); Lu(b+); Lu(a-); Lu(b+); Lu(a-); Lu(b+); Lu6+; Lu9-; Lu4; Lu6-; Lu9+;Lu4-; Lu6+; Lu9-;Lu4-; Lu8+; Aua+;Aub- Lu8+; Aua-;Aub+ Lu8+; Aua+;Aub- MNS M; S M; s M,s Bombay H+; secretor H+; secretor H+; secretor

  19. PGP-1 K Results • A second dataset was built from all PGP-1K participants with available blood group information for a total of 22 individuals • This dataset contains micro array data ( 23&me SNPs) P = predicted R = real * = missing blood group relevant SNPs from dataset

  20. Conclusions • We developed a method, called BOOGIE, to predict the ten blood groups relevant for transfusions from sequencing data – Specialized knowledgebase with 580 genotype to phenotype rules – Novel variants can be easily considered • Benchmarking was (so far) only possible on PGP data for the ABO and Rh blood groups – The ABO and Rh systems are correctly predicted in 85-100% of cases – The Rh- type presents some additional difficulties

  21. Acknowledgements Acknowledgements Manuel Giollo Giovanni Minervini Marta Scalzotto (not shown) Emanuela Leonardi Carlo Ferrari Funding FIRB Futuro in Ricerca Università di Padova CARIPLO AIRC URL: http:// URL: http://protein.bio.unipd.it protein.bio.unipd.it/ /

Recommend


More recommend