improved gene ontology annotation predictions through
play

Improved Gene Ontology Annotation Predictions through Bayesian - PowerPoint PPT Presentation

Improved Gene Ontology Annotation Predictions through Bayesian Network Post-processing Marco Tagliasacchi, Marco Masseroli Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy


  1. Improved Gene Ontology Annotation Predictions through Bayesian Network Post-processing Marco Tagliasacchi, Marco Masseroli Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy

  2. Improved GO Annotation Predictions through Bayesian Network Post-processing Summary 2 � Motivation � Related work � Problem statement and goal � SVD method � Bayesian network method � Evaluation results � Conclusions BITS 2009, Genova, 18-20 March 2009

  3. Improved GO Annotation Predictions through Bayesian Network Post-processing Motivation 3 Several controlled vocabularies and ontologies � available and used to functionally annotate genes and proteins • Gene Ontology is the most widely used – Biological processes – Molecular functions – Cellular components Controlled annotations are paramount to: � • Support biological interpretation of experimental results • Derive new biomedical knowledge BITS 2009, Genova, 18-20 March 2009

  4. Improved GO Annotation Predictions through Bayesian Network Post-processing Motivation 4 Annotation issues: � • Not exhaustive – Only a subset of genes and proteins of sequenced organisms known and annotated • Incomplete annotations – Biological knowledge yet to be discovered • Incorrect annotations – Possibly those inferred from electronic annotations • Only few reliable annotations – By time consuming human curation Extremely useful computational methods: � • Reliably predict annotations • Provide prioritized lists of predicted annotations to be checked by curators BITS 2009, Genova, 18-20 March 2009

  5. Improved GO Annotation Predictions through Bayesian Network Post-processing Related work 5 Prediction of annotation profiles has been addressed in � the past literature: • Methods based on existing annotations: – Decision trees/Bayesian networks [Kings et al., 2003] – Singular value decomposition (SVD) [Khatri et al., 2005] – k-NN classifiers [Tao et al., 2007] – ... • Methods based on other information sources: – Microarray data [Barutcuoglu et al., 2006] – Mined textual information [Raychaudhuri et al., 2002], [Perez et al., 2004] – ... • For a survey: Pandey et al. “Computational approaches for protein function prediction: A survey” (2006) BITS 2009, Genova, 18-20 March 2009

  6. Improved GO Annotation Predictions through Bayesian Network Post-processing Problem statement and goal 6 Propose a post-processing � method to be applied to the output of the SVD method [Khatri et al., 2005] Anomalous Fix the issue related to the � prediction existence of anomalous predictions of ontological annotations: • A gene might be predicted annotated to an ontology term, but not to one of its ancestors GO:0003647 Molecular function GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity Output score of the GO:0022804 Active transmembrane transporter activity SVD method GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  7. Improved GO Annotation Predictions through Bayesian Network Post-processing Proposed solution 7 Leverage the semantic relationship � between ontological terms as expressed by the ontology structure Construct a Bayesian network � based on the ontology topology and use the output of SVD as prior evidence Produce corrected anomaly � free annotation profiles GO:0003647 Molecular function GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity Output score of the GO:0022804 Active transmembrane transporter activity proposed method GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  8. Improved GO Annotation Predictions through Bayesian Network Post-processing SVD method 8 1. Input: available direct annotations Ontological terms (e.g. GO terms)   0 1 0 0 0 0 1 0 ... 0   0 1 0 0 0 1 1 0 ... 0     0 1 0 1 0 0 0 0 ... 0 Genes =  A  1 0 0 0 0 0 0 1 ... 0     M M M M M M M M O M   1 1 0 0 0 0 0 0 ... 0   GO:0003647 Molecular function GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity GO:0022804 Active transmembrane transporter activity GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  9. Improved GO Annotation Predictions through Bayesian Network Post-processing SVD method 9 2. Annotation unfolding: Ontological terms (e.g. GO terms) Ontological terms (e.g. GO terms)     0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 ... ... 0 0     0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 0 ... ... 0 0           1 1 1  0 1 0 1 0 0 0 0 ... 0  0 1 1 1 1 0 0 0 1 0 1 ... 0 Genes Genes % =  =  A A       1 0 0 0 0 0 0 1 ... 0 1 0 0 0 0 0 0 1 ... 0         M M M M M M M M M M M M M M M M O O M M     1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 ... ... 0 0     GO:0003647 Molecular function GO:0003647 Molecular function GO:0005215 Transporter activity GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity GO:0022857 Transmembrane transporter activity GO:0022804 Active transmembrane transporter activity GO:0022804 Active transmembrane transporter activity GO:0015291 Secondary active transmembrane transporter activity GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  10. Improved GO Annotation Predictions through Bayesian Network Post-processing SVD method 10 3. Compute SVD: % = Σ = Σ = Σ T A U V U V U V = % = Σ T A U V 4. Compute reduced rank approximation: % = = Σ k Σ = Σ T A U A V A U U V A V U V k k k k k k k k = % = Σ T A U V k k k k 5. Apply threshold ( ): > τ , ) k % • If and A i j = � predicted new annotation (FP) % > τ ( , ) 0 A i j ( , ) k % % • If and � confirmed annotation (TP) > τ A i j = A i j ( , ) ( , ) 1 k • If and � confirmed no annotation (TN) % ≤ τ % A i j ( , ) A i j = ( , ) 0 k % • If and � annotation to be checked (FN) % A i j = ≤ τ ( , ) 1 A i j ( , ) k BITS 2009, Genova, 18-20 March 2009

  11. Improved GO Annotation Predictions through Bayesian Network Post-processing Anomalous predictions 11 The output of the SVD � method might contain anomalous predictions The real valued output of � Anomalous the SVD method might be prediction such that: % % > A i j ( , ) A i r ( , ) k k where r is ancestor of j After thresholding, term j � might result annotated to gene i , while term r is not Output score of the SVD method BITS 2009, Genova, 18-20 March 2009

  12. Improved GO Annotation Predictions through Bayesian Network Post-processing Bayesian network method 12 Design a Bayesian network to remove anomalous � predictions • Input: real-valued scores computed by SVD method • Output: anomaly-free real-valued scores Bayesian network structure based on ontology topology � • Term nodes • Evidence nodes e t j j t e t e t e c c c c c c 1 1 2 2 L L Need to define conditional probabilities � BITS 2009, Genova, 18-20 March 2009

  13. Improved GO Annotation Predictions through Bayesian Network Post-processing Bayesian network method 13 For each gene i: e t j j t e t e t e c c c c c c 1 1 2 2 L L Term nodes (t-nodes) conditional probabilities � p t t t ( | , ,..., t ) ( | t t t t t , t ,..., t ) t i j c c c j j c c c c c c 1 2 L 1 2 L 1 2 3 Estimated from available annotations BITS 2009, Genova, 18-20 March 2009

  14. Improved GO Annotation Predictions through Bayesian Network Post-processing Bayesian network method 14 e t j j t e t e t e c c c c c c 1 1 2 2 L L Evidence nodes (e-nodes) conditional probabilities: � • Gaussian Mixture Model (estimated from available <t j ,e j > pairs) BITS 2009, Genova, 18-20 March 2009

Recommend


More recommend