Working with biological databases Nicos Angelopoulos and Georgios Giamas nicos.angelopoulos@imperial.ac.uk Department of Surgery and Cancer, Imperial College, London WCB, 8/9/2014 – p.1
introduction bio_db is an SWI-Prolog library/pack for serving biological data high-quality data data from primary sources convenience to end-user encourage use of Prolog in bioinformatics and computational biology WCB, 8/9/2014 – p.2
key features biological data as Prolog relations served from fact files, or SQLite databases on-demand downloading from server maps between biological products interaction databases WCB, 8/9/2014 – p.3
availability ?- pack_install(bio_db). ?- library( bio_db ). ?- debug( bio_db ). ?- bio_db_interface( Iface ). Iface = prolog. ?- map_hgnc_prev_symb( Prev, Symb ). %Loading prolog db: . . . / map_hgnc_prev_symb.pl Prev = ’A1BG-AS’, Symb = ’A1BG-AS1’; Prev = ’A1BGAS’, Symb = ’A1BG-AS1’... WCB, 8/9/2014 – p.4
database resources Database Abbv. Description HGNC hgnc HUGO Gene Nomenclature Committee genenames.org NCBI/entrez entz Nat. Center for Biot. Inf. Uniprot unip Universal Protein Resource GO gont Gene Ontology Interactions database String string protein-protein interactions WCB, 8/9/2014 – p.5
database populations 5e+06 60000 4e+06 3e+06 Database 40000 ense Population Population gont Database hgnc string ncbi 2e+06 unip 20000 1e+06 0 0e+00 ensg ensp entz gont hgnc prev symb syno unip gene protein Field Edge WCB, 8/9/2014 – p.6
map relations translate between products gene <-> protein gene name <-> gene identifier map products to groups gene <-> GO term name convension: map_<DB>_<From>_<To> map_hgnc_hgnc_symb( 19295, ’LMTK3’ ). map_gont_symb_gont( ’LMTK3’, ’GO:0003674’ ). WCB, 8/9/2014 – p.7
key map relations GONaMe HGNC Ensembl NCBI/Entrez ENTreZ UNIPROT GONTerm GO SYMBol SYNOnym ENSGene HGNC PREVious symbol UNIProtein ENSProtein WCB, 8/9/2014 – p.8
gene ontology terms for LMTK3 lmtk3_go :- map_gont_symb_gont(’LMTK3’, Gont), findall(Symb, map_gont_gont_symb(Gont,Symb), Symbs), map_gont_gont_gonm(Gont, Gonm), sort(Symbs,Oymbs), length(Oymbs, Len), write(Gont-Gonm-Len), nl, fail. lmtk3_go. WCB, 8/9/2014 – p.9
gene ontology terms for LMTK3 GO term GO name population GO:0003674 molecular_function 764 GO:0004674 protein serine/threonine kinase activity 340 GO:0004713 protein tyrosine kinase activity 89 GO:0005524 ATP binding 1488 GO:0005575 cellular_component 497 GO:0006468 protein phosphorylation 557 GO:0010923 negative regulation of phosphatase activity 53 GO:0016021 integral component of membrane 200 GO:0018108 peptidyl-tyrosine phosphorylation 131 WCB, 8/9/2014 – p.10
weighted graphs String database of protein-protein interactions. Weight is strength of belief in physical interaction between 2 genes ( 0 ≤ i < 1000 ). edge_string_hs_symb( ’AATK’, ’LMTK3’, 203 ). WCB, 8/9/2014 – p.11
key map relations go_term_graph(GoTerm,Min,Graph):- findall( Symb, map_gont_gont_symb(Gont,Symb), Symbs ), findall( Symb1-Symb2:W, ( member(Symb1,Symbs), member(Symb2,Symbs), edge_string_hs_symb(Symb1,Symb2,W), Lim < W ), Graph ). WCB, 8/9/2014 – p.12
String net for GO:10332 SCG2 MEN1 CYP11A1 GPX1 DCUN1D3 LIG4 MYC PTPRC ERCC6 CCL7 PRKDC TRIM13 CDS1 FANCD2 XRCC4 GATA3 CCL2 CXCL10 TIGAR TP53 XRCC2 BAX TP73 BAK1 BCL2 BRCA2 TP63 CHEK2 APOBEC1 PRKAA1 SOD2 PML WCB, 8/9/2014 – p.13
piece-meal prolog bioinformatics Real 147 Swi/Yap <-> R interface proSQLite 180 Swi/Yap <-> SQLite interface db_facts 61 DB tables as Prolog facts bio_db 5 biological databases pubmed 16 access pumed citation records wgraph 5 graph visualisation via R functions silac functional analysis of quantative proteomics versus the more holistic blip : http://www.blipkit.org/ WCB, 8/9/2014 – p.14
bottom-line key-points re-usable techniques high-quality, precise biological data infrastructure for logical bioinformatics. future work gene ontology term relations: is , part_of pathway databases (Reactome, KEGG, biopax) WCB, 8/9/2014 – p.15
Recommend
More recommend