Identification of fever and vaccine- associated gene interaction networks using ontology-based literature mining Arzucan Özgür Bogazici University Junguk Hur, Zuoshuang Xiang, and Yongqun Oliver He University of Michigan The VDOSME workshop, ICBO 2012 July 21, 2012
Motivation
Fever Fever is a symptom of abnormal elevation of body temperature, usually as a result of a pathologic process. Fever-associated genes include PGE2, PLA2, COX-2, PTGES, and many cytokines Many vaccines cause fever, but how vaccination perturbs which fever-related genes is unclear Goal: Identify gene-gene and gene-vaccine interaction networks that are associated with fever processes using ontology-based literature mining
Workflow
Fever-related literature-derived network
Literature Corpus Fever-related articles obtained from PubMed: “Fever OR Hyperthermia OR Pyrexia OR Febrile OR Pyrexial” → 179,156 articles Vaccine and fever-related articles: including the terms “vaccine”, “vaccination”, and their variants (e.g., “vaccines”) → 6,224 articles including 186 specific vaccine names from VO 6,537 articles → Sentences of titles and abstracts obtained from: BioNLP database in the National Center for Integrative Biomedical Informatics (NCIBI; http://ncibi.org/)
Vaccine Ontology Support - Motivating Example • These results suggest that the BCG-CWS induces TNF-alpha secretion from DC via TLR2 and TLR4 and that the secreted TNF- alpha induces the maturation of DC per se. [PMID: 11083809] – The term “vaccine” or its variants does not occur in the abstract. – Bacillus Calmette-Guérin (BCG) is a licensed tuberculosis vaccine to protect against infection of Mycobacterium tuberculosis. We use vaccine ontology for two purposes: 1) Obtain vaccine-related literature. 2) Identify specific vaccine-gene interactions.
Vaccine Ontology (VO) Ontology of the vaccine domain for vaccine data standardization, integration, and analysis. http://www.violinet.org/vaccineontology/ Classifies a large number of existing vaccines (> 1,000 vaccines) in licensed use, on trial, or in research. Follows the OBO Foundry principles. Led by Yongqun “Oliver” He (co-author of this paper).
VO Terms Obtained from http://www.ontobee.org/
Gene and Vaccine Name Identification Gene names tagged using SciMiner (Hur et al., Bioinformatics, 2009) Dictionary and rule-based system F-score: 76% Genes reported in terms of the official human genes based on the HUGO Gene Nomenclature Committee database (http://www.genenames.org/). VO-SciMiner used to identify vaccine names based on a set of 186 VO terms (Hur et al., BMC immunology, 2011) F-score: 95%
Interaction Extraction IL-2 and IL-15 induced the production of IL-17 and IFN-gamma in a dose dependent manner by PBMCs. Path between proteins: good description of No semantic relation between interaction. them. No interaction interaction. Stanford Parser is used to generate the dependency parse trees (de Marneffe et al., 2006).
Path Edit Kernel Minimum number of operations (insertion, deletion, or substitution of a single word) to transform the first string to the second. IL2 – nsubj – induced – dobj – production – prep_of – IL-17 IL2 – nsubj – induced – dobj – production – prep_of – IL-17 – conj_and – IFN-gamma IL-17 – conj_and – IFN-gamma Edit distance (Path1 -> Path2) = 2 (2 insertions) Edit distance (Path1 -> Path3) = 8 (6 deletions + 2 insertions) Convert to Similarity Function: EditSim p i ,p j =e [ − γ EditDist p i ,p j ] • Integrate as a kernel function to SVM light package (Joachims, 1999). • 56% F-score on AIMED, 85% F-score on CB) (Erkan et al., EMNLP, 2007; Ozgur et al., Journal of Biomedical Semantics, 2011).
Gene-gene interaction networks Gene-gene interactions in all fever- related articles Articles containing the term “vaccine” and its variants Articles containing the term “vaccine” and its variants + terms in the Vaccine Ontology (VO)
Generic fever-related network
Vaccine/VO-associated fever-related network
Centrality Analysis
Degree Centrality The number of nodes a given node is connected to n k i = ∑ A ij j= 1 z x y Measures the extent of inluence a node has on the network The more neighbors a node has, the more important it is Degree centrality of x = 5; of y = 2
Eigenvector Centrality Proportional to the sum of the centralities of the neighbors of a given node. n x i = λ − 1 ∑ A ij x j j= 1 In matrix representation: λ x = Ax For non-negative centrality vector: λ is largest eigenvalue of A and x is the corresponding eigenvector Not all neighbors contribute equally to the centrality of node Defined as “prestige” in social networks The prestige of a person depends not only on how many friends he has, but also on who (how prestigious) his friends are
Closeness Centrality Inverse sum of the geodesic distances from a given node to the other nodes in the network closeness i = [ ∑ d ij ] − 1 n j= 1 x y The closer a node to the other nodes, the more important it is . Geodesic distance: length of shortest path between node i and node j (d ij )
Betweenness Centrality For a node i : sum over all pairs of nodes of proportion of the number of shortest paths passing through i Betweenness i = ∑ g jk i / g jk g jk (i): # of geodesics passing over i g jk : total # of geodesics j<k x y Control of a node over the information flow of the network A node is important if it is on many geodesics
Genes that rank high in both networks - 7 genes - well studied in both contexts
Genes that rank high in generic fever network - 7 genes - not well studied in vaccine context
Genes that rank high in vaccine/VO fever network - 7 genes - well studied in vaccine context
Gene Set Enrichment Analysis
Gene Set Enrichment Analysis The Database for Annotation, Visualization and Integrated Discovery (DAVID) used. 997 significantly over-represented functional terms (GO or KEGG) in the fever network. 239 significantly over-represented functional terms (GO or KEGG) in the VO-associated fever network. New scientific hypothesis can be generated (e.g. Role of phosphorylation process in vaccine-induced fever response).
Top 10 most significantly enriched biological functions Values are –log 10 (Benjamini-Hochberg corrected P-values)
Gene-Vaccine Interaction Network
Gene-Vaccine Interactions SVM pipeline applied to extract gene-vaccine interactions. 1,716 articles containing 2,835 interactions identified. – 32 articles also related to fever – 52 sentences with 44 unique interactions identified. – Specific vaccines: B rucella vaccine RB51, Shigella flexneri vaccine S602, Shigella sonnei strain WRSS1, and Shigella dysenteriae 1 strain WRSd1. New scientific hypothesis can be generated.
Fever-related gene-vaccine interaction network Green: vaccine Red: genes Blue: genes associated with vaccines New hypothesis: e.g., TLR-vaccine interactions in inducing fever response.
Conclusion Identification of fever and fever–vaccine associated gene-gene and gene-vaccine networks Improved mining performance by VO Phosphorylation-focused regulation enriched in the fever vaccine-subnetwork suggests its crucial role Identification of TLRs as potential key factors in vaccine-induced fever responses.
Future works Expansion of the networks by including more specific vaccines (via improved VO) using a sentence-level co-citation rather than SVM-based approach integrating Ontology of Adverse Events (OAE; http://www.oae-ontology.org) Apply the ontology-based literature mining approach to different domains.
Acknowledgments University of Michigan Dragomir Radev Junguk Hur Eva Feldman Yongqun He Alex Ade Zoushuang Xiang Brian Athey Rebecca Racz Funding: NIH grant R01AI081062 & Marie Curie Career Integration Grant within the 7th European Community Framework Programme.
Thank you!
Recommend
More recommend