using graph theory to analyze gene network coherence
play

Using Graph Theory to Analyze Gene Network Coherence Francisco A. - PowerPoint PPT Presentation

Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gmez-Vela Norberto Daz-Daz fgomez@upo.es ndiaz@upo.es Jess S. Aguilar Jos A. Lagares Jos A. Snchez 1 Outlines n Introduction n Proposed Methodology n


  1. Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gómez-Vela Norberto Díaz-Díaz fgomez@upo.es ndiaz@upo.es Jesús S. Aguilar José A. Lagares José A. Sánchez 1

  2. Outlines n Introduction n Proposed Methodology n Experiments n Conclusions 2

  3. Outlines n Introduction n Proposed Methodology n Experiments n Conclusions 3

  4. Introduction Gene Network n There is a need to generate patterns of expression, and behavioral influences between genes from microarray. n GNs arise as a visual and intuitive solution for gene- gene interaction. n They are presented as a graph: q Nodes: are made up of genes. q Edges: relationships among these genes. 4

  5. Introduction Gene Network 5

  6. Introduction Gene Network n Many GN inference algorithms have been developed as techniques for extracting biological knowledge q Ponzoni et al., 2007. q Gallo et al., 2011. n They can be broadly classified as (Hecker M, 2009): q Boolean Network q Information Theory Model q Bayesian Networks 6

  7. Introduction Gene Network Validation in Bioinformatics n Once the network has been generated, it is very important to assure network reliability in order to illustrate the quality of the generated model. Synthetic data based validation q This approach is normally used to validate new • methodologies or algorithms . Well-Known data based validation q The literature prior knowledge is used to validate • gene networks . 7

  8. Introduction Well-Known Biological data based Validation n The quality of a GN can be measured by a direct comparison between the obtained GN and prior biological knowledge (Wei and Li, 2007; Zhou and Wong, 2011). n However, these approaches are not entirely accurate as they only take direct gene–gene interactions into account for the validation task, leaving aside the weak (indirect) relationships (Poyatos, 2011). 8

  9. Outlines n Introduction n Proposed Methodology n Experiments n Conclusions 9

  10. Proposed Methodology n The main features of our method: q Evaluate the similarities and differences between gene networks and biological database. q Take into account the indirect gene-gene relationships for the validation process. q Using Graph Theory to evaluate with gene networks and obtain different measures. 10

  11. Proposed Methodology Biological Database Input Network B A B A E Floyd Warshall D C C Algorithm F DM DB DM IN Distance Matrices 11

  12. Proposed Methodology Biological Database Input Network B A B A DM IN DM DB E CM=|DMi – DMj| D C C F Coherence Matrix CM CM = |DM IN – DM DB | 12

  13. Proposed Methodology Floyd-Warshall Algorithm n This approach is a graph analysis method that solves the shortest path between nodes. Network Distance Matrix A B C E F B A 0 2 1 1 2 A E B 2 0 1 1 2 C 1 1 0 2 1 E 1 1 2 0 1 F F 2 2 1 1 0 C 13

  14. Proposed Methodology Distance Threshold n Distance threshold ( δ ) q It is used to exclude relationships that lack biological meaning. q This threshold denotes the maximum distance to be considered as relevant in the Distance Matrix generation process. q If the minimum distance between two genes is greater than δ , then no path between the genes will be assumed. 14

  15. Proposed Methodology Distance Threshold Network Distance Matrix δ = 1 B A A B B C C E E F F A A A 0 0 2 2 1 1 1 1 2 2 E B B 2 2 0 0 1 1 1 1 2 2 C C 1 1 1 1 0 0 2 2 1 1 E E 1 1 1 1 2 2 0 0 1 1 F C F F 2 2 2 2 1 1 1 1 0 0 15

  16. Proposed Methodology Distance Threshold Network Distance Matrix δ = 1 B A A A B B B C C C E E E F F F A A A A 0 0 0 2 ∞ 2 1 1 1 1 1 1 ∞ 2 2 E B B B ∞ 2 2 0 0 0 1 1 1 1 1 1 ∞ 2 2 C C C 1 1 1 1 1 1 0 0 0 ∞ 2 2 1 1 1 E E E 1 1 1 1 1 1 2 2 ∞ 0 0 0 1 1 1 F C F F F 2 2 ∞ ∞ 2 2 1 1 1 1 1 1 0 0 0 16

  17. Proposed Methodology DM DB DM IN A B C E F A B C E F A B C D A B C D A 0 2 1 2 2 A 0 2 1 2 2 A 0 1 3 2 CM=|DMi – DMj| A 0 1 ∞ 2 B 2 0 1 1 2 B 2 0 1 1 2 B 1 0 2 1 B 1 0 2 1 C 1 1 0 1 1 C 1 1 0 1 1 C 3 2 0 1 C ∞ 2 0 1 E 2 1 1 0 1 E 2 1 1 0 1 D 2 1 1 0 D 2 1 1 0 F 2 2 1 1 0 F 2 2 1 1 0 Coherence Matrix (CM) A B C A 0 1 ∞ B 1 0 1 C ∞ 1 0 17

  18. Proposed Methodology Obtaining Measures n Coherence Level threshold ( θ ) q This threshold denotes the maximum coherence level to be considered as relevant in the Coherence Matrix. q It is used to obtain well-Known indices by using the elements of the coherence matrix: 0< v,y < ∞ |v-y|<= θ TP FP |v-y|> θ CM i,j | ∞ -y| ( α ) FN | ∞ - ∞ |( β ) TN 18

  19. Proposed Methodology θ = 3 Coherence Matrix A B C D E α A - 1 4 7 β B 1 - 2 5 α β C - 1 8 D 4 2 1 - 1 E 7 5 8 1 - 19

  20. Proposed Methodology θ = 3 Coherence Matrix A B C D E A A A B B B C C C D D D E E E A A A A A B B B B B C C C C C D D D D D E E E E E α α α α α α A - TP 4 7 A A A - - - TP TP TP FN FN FN FP FP FP FP FP FP A A A A A - - - - - TP TP TP 1 1 FP FP 4 4 4 FP FP 7 7 7 β β β B TP - TN β β β β β TP FP B B B TP TP TP - - - TP TP TP FP FP 5 B B B B B TP TP TP 1 1 - - - - - TP TP TP 2 2 FP FP 5 5 5 C FN α TN β β β - TP FP α α α α α β β β β β C C C FN FN - - - TP TP TP FP FP 8 C C C C C - - - - - TP TP TP 1 1 FP FP 8 8 8 D FP TP TP - TP D D D FP FP 4 TP TP TP TP TP TP - - - TP TP TP D D D D D FP FP 4 4 4 TP TP TP 2 2 TP TP TP 1 1 - - - - - TP TP TP 1 1 E FP FP FP TP - E E E FP FP 7 FP FP 5 FP FP 8 TP TP TP - - - E E E E E FP FP 7 7 7 FP FP 5 5 5 FP FP 8 8 8 TP TP TP 1 1 - - - - - 20

  21. Outlines n Introduction n Proposed Methodology n Experiments n Conclusions 21

  22. Results Real data experiment n Input networks were obtained by applying four inference network techniques on the well-known yeast cell cycle expression data set (Spellman et al., 1998). Soinov et al., 2003. • Bulashevska et al., 2005. • Ponzoni (GRNCOP) et al., 2007 • n Comparison with Well-Known data: BioGrid • KEGG • SGD • YeastNet • 22

  23. Results Real data experiment n Several studies were carried out using different threshold value combinations: q Distance threshold ( δ ) and Coherence level threshold ( θ ) have been modified from one to five, generating 25 different combinations. n The results show that the higher δ and θ values, the greater is the noise introduced. q The most representative result, was obtained for δ =4 and θ =1. 23

  24. Results Soinov Bulashevska Ponzoni Accuracy F-measure Accuracy F-measure Accuracy F-measure Biogrid 0,65 0,79 0,82 0,90 0,27 0,42 KEGG 0,34 0,50 0,28 0,43 0,58 0,48 0,53 0,69 SGD 0,31 0,47 1 1 0,29 0,45 0,50 0,66 1 1 YeastNet 24

  25. Results q These results are consistent with the experiment carried out in Ponzoni et al., 2007. q Ponzoni was successfully compared with Soinov and Bulashevska approaches. 25

  26. Outlines n Introduction n Proposed Methodology n Experiments n Conclusions 26

  27. Conclusions n A new approach of a gene network validation framework is presented: q The methodology not only takes into account the direct relationships, but also the indirect ones. q Graph theory has been used to perform validation task. 27

  28. Conclusions n Experiments with Real Data . q These results are consistent with the experiment carried out in Ponzoni et al., 2007. q Ponzoni was successfully compared with Soinov and Bulashevska approaches. q These behaviours are also found in the obtained results. Ponzoni presents better coherence values than Soinov and Bulashevska in BioGrid, SGD and YeastNet. 28

  29. Future Works n The methodology has been improved: q The elements in coherence matrix will be weighted based on the gene-gene relationships distance . q A new measure, based on different databases will be generated. n Moreover, a Cytoscape plugin will be implemented. 29

  30. Some References Pavlopoulos GA, et al. (2011): Using graph theory to analyze biological networks. BioData Mining , 4: 10. Asghar A, et al (2012) Speeding up the Floyd–Warshall algorithm for the cycled shortest path problem . AppliedMathematics Letters 25(1): 1 Bulashevska S and Eils R (2005) Inferring genetic regulatory logic from expression data. Bioinformatics 21(11):2706. Ponzoni I, et al (2007) Inferring adaptive regulationthresh-olds and association rules from gene expressiondata through combinatorial optimization learning .IEEE/ACM Transaction on Computation Biology andBioinformatics 4(4):624. Poyatos JF (2011). The balance of weak and strong interactions in genetic networks . PloS One 6(2):e14598. 30

  31. Using Graph Theory to Analyze Gene Network Coherence Thanks for your attention 31

Recommend


More recommend