4 applications in computational biology
play

4. Applications in Computational Biology Karsten Borgwardt - PowerPoint PPT Presentation

4. Applications in Computational Biology Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 231 / 253 4.1 Cotraining for Phenotype Prediction based on: Damian Roqueiro, Menno Witteveen, Verneri Anttila,


  1. 4. Applications in Computational Biology Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 231 / 253

  2. 4.1 Cotraining for Phenotype Prediction based on: Damian Roqueiro, Menno Witteveen, Verneri Anttila, Gisela Terwindt, Arn van den Maagdenberg, Karsten Borgwardt. In silico phenotyping via co-training for improved phenotype prediction from genotype. ISMB 2015, Bioinformatics (2015) 31 (12): i303-i310. Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 232 / 253

  3. Goal Construction of a genotype classifier Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  4. Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  5. Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Increasingly larger availability of genotype data Not sufficient disease phenotypes for genotype samples Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  6. Goal Construction of a genotype classifier Important implications for disease diagnosis and therapy Yet, h relies on training dataset with labeled examples Increasingly larger availability of genotype data Not sufficient disease phenotypes for genotype samples Can we boost the performance of a classifier when few labeled examples are available? → Use co-training Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 233 / 253

  7. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  8. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  9. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  10. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  11. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  12. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  13. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  14. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  15. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  16. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  17. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  18. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  19. Co-training Blum & Mitchell, 1988 Dataset D with two classes of data: labeled ( L ) & unlabeled ( U ) X features X is the union of subsets of features X 1 and X 2 A labeled object x = (( x 1 , x 2 ) , y ) Learn classifiers h 1 and h 2 on each view of L Iteratively use h 1 to label instances in U and add to L use h 2 to label instances in U and add to L repeat until U = ∅ . . . or other condition Two requirements x 1 and x 2 should be conditionally independent of each other given y X 1 or X 2 are sufficient to train h 1 or h 2 to classify data points in D Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 234 / 253

  20. Proposed approach Apply co-training to migraine dataset Dutch cohorts, 1,938 patients Two disease phenotypes: migraine with aura (820) migraine without aura (1,118) Data available for each patient: disease phenotype (aura vs. no aura) clinical covariates (e.g. pulsating quality?) genotype data: single nucleotide polymorphisms (SNPs) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 235 / 253

  21. Proposed approach Apply co-training to migraine dataset Dutch cohorts, 1,938 patients Two disease phenotypes: migraine with aura (820) migraine without aura (1,118) Data available for each patient: disease phenotype (aura vs. no aura) clinical covariates (e.g. pulsating quality?) genotype data: single nucleotide polymorphisms (SNPs) Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 235 / 253

  22. Assumption: implicit price-tag of data Disease phenotype (diagnosis) Clinical covariates (results of tests) Genotype data (DNA sequencing) Source: http://www.flaticon.com/authors/freepik Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 236 / 253

  23. Decaying cost of sequencing/genotyping Source: National Human Genome Research Institute http://www.genome.gov/ Cost of genotyping (array): ∼ $110 per sample HumanOmniExpress-24 BeadChips 713,014 markers Karsten Borgwardt Department Biosysteme Data Mining 2 Course, Basel Spring Semester 2016 237 / 253

Recommend


More recommend