Introduction to Microarray Data Analysis and Gene Networks lecture 8 Alvis Brazma European Bioinformatics Institute
Lecture 8 • Gene networks – part 2 – Network topology (part 2) – Network logics – Network dynamics
Gene Networks - four levels of hierarchical description • Parts list – genes, transcription factors, promoters, binding sites, … • Topology – a graph describing the connections between the parts • Control logics – how combinations of regulatory signals interact (e.g., promoter logics) • Dynamics – how does it all work in real time
Gene Networks - four levels of hierarchical description • Parts list – genes, transcription factors, promoters, binding sites, … • Topology – a graph describing the connections between the parts • Control logics – how combinations of regulatory signals interact (e.g., promoter logics) • Dynamics – how does it all work in real time
The arcs can have different meaning - The product of gene G1 is a G1 G2 transcription factor, which binds to the promoter of gene G2 (in Chip-chip experiment) – physical interaction network (direct network) - The disruption of gene G1 changes G1 G2 the expression level of gene G2 – data interpretation network (indirect network)
How both networks compare • How much the two networks have in common • We can look at the intersection of the networks whether the common parts have evidence in our existing knowledge • If the target sets of the transcription factors present in both networks are similar • Are the network topology (e.g., connectivity) properties similar
How both networks compare • How much the two networks have in common • We can look at the intersection of the networks whether the common parts have evidence in our existing knowledge • If the target sets of the transcription factors present in both networks are similar • Are the network topology (e.g., connectivity) properties similar
A couple of simple notions • Any gene (node in the graph) with outgoing edges is called a source gene • Any gene with incoming edges is a target gene target node • Target set source node target set
A problem: • Both network depend on the chosen significance threshold - i.e., what level of microarray signal to use to draw and edge in the network
The size of the networks for different significance thresholds ChIP ChIP mutant mutant mutant network network network network network ( γ =2.0) ( γ =2.5) ( γ =3.0) (p<0.01) (p<0.001) source genes 202 169 250 236 226 target genes 4939 2845 5396 4778 3920 genes 4980 2930 5654 4798 3959 edges 18842 6170 32017 17436 10356 edges where source gene 3694 857 4096 2425 1507 and target gene have the (19.6%) (13.9%) (12.8%) (13.9%) (14.6%) same cellular role annotation in YPD (http://www.proteome.com ) edges per source gene 93.3 36.5 135.7 73.8 45.6
How both networks compare • How much networks have in common • We can look at the intersection of the networks whether the common parts have evidence in our existing knowledge • If the target sets of the transcription factors present in both networks are similar • Are the network topology (e.g., connectivity) properties similar
Intersection of the networks – many connections are consistent with out a priori knowledge YNL313C YOX1 PDS1 GPA1 YJR030C UFE1 ARG10 YLR104W YDR115W KAR4 ARO1 ARG5 SPT21 FUS1 CDC21 STE12 MUT5 RAD27 CPA2 GCN4 LEU4 PDS5 RFA2 STE2 SST2 IRR1 MET22 ECM40 DIN7 HOM3 GSH1 ERP3 GIC2 YJL073W YAP1 YBR070C YHR149C GIN4 MNN5 SMC3 SW I6 SGA1 YLR460C DUN1 PCL1 YLR103C PCL2 MBP1 YER079W RNR1 PRY2 PLB3 SVS1 YHR150W ABF1 SIC1 YKL185W YDR528W YPL158C YGR086C YLR297W SWI5 YLR194C YER128W HCM1 SWI4 CHS1 MCD1 YPL267W PST1 CCW6 SWE1 YLR049C YPR157W MNN1 CIS3 SCW10 CLB2 YER078C
YNL313C YO X1 P DS 1 GP A 1 YJR030C UFE 1 A RG 10 YLR104W YDR115W K A R4 A RO 1 A RG5 S P T21 FUS 1 CDC21 S TE 12 M UT5 RA D27 CP A 2 G CN4 LE U4 P DS 5 RFA 2 S TE 2 S S T2 IRR1 M E T22 E CM 40 DIN7 HO M 3 GS H1 E RP 3 GIC2 YJ L073W YA P 1 YB R070C YHR149C G IN4 M NN5 S MC3 S W I6 S GA 1 YLR460C DUN1 P CL1 YLR 103C P CL2 M B P 1 YE R079W RNR1 P RY2 P LB 3 S V S 1 YHR150W A B F1 S IC1 YK L185W YDR528W YP L158C YG R086C YLR297W S W I5 YLR194C YE R128W HC M 1 S W I4 CHS 1 M CD1 YP L267W P S T1 CCW 6 S W E 1 YLR 049C YP R157W M NN1 CIS 3 CLB 2 S CW 10 YE R078C Figure 6
How both networks compare • How much networks have in common • We can look at the intersection of the networks whether the common parts have evidence in our existing knowledge • If the target sets of the transcription factors present in both networks are similar • Are the network topology (e.g., connectivity) properties similar
How Chip-chip and disruption networks relate? All genes All genes t Regulation Regulation Transcription set of t set o f t factors h Ef Effectual fectual set set Disrupted genes of h of h
How Chip-chip and disruption networks relate? All genes All genes Regulation Regulation set o set of g f g Transcription factors Ef Effectual fectual set set of g of g Disrupted genes
How to estimate that the overlap is more than expected by random? We assume that the elements of the set E are marked, and pick the set of size |R| at random. Then the size x=| R ∩ E| of the G intersection are distributed according to hypergeometric distribution. R The probability of observing an intersection of size k or larger can be R ∩ E computed according to formula: − E | | | | | | E G E k ∑ − | | i R i ≥ = − ( ) 1 P x k | | G = 0 i | | R
How Chip-chip and disruption networks relate? All genes All genes 146 Regulation Regulation set o set of g f g Transcription factors 23 (9) Ef Effectual fectual set set of g of g Disrupted genes 213 From 23 transcription factors studied in both networks only 9 have their target sets overlapping more than expected by chance L
From 23 transcription factors studied in both networks only 9 have their target sets overlapping more than expected by chance • Is it as bad as my look? – We will expect many indirect connections in the disruption network that are not present in Chip network – is this the case?
Direct vs. indirect interactions Y Direct Direct Z X Indirect
GLN3 RTG1 YAP1 GCN4 BAS1 YAP6 ROX1 HIS4 ADE3 ADE13 ADE17 ADE4 YOL158C FET4 LYS2 YHM1 ARO3 ARO1 ARG4 YJL200C CPA2 MBP1 SWI6 SWI4 RNR1 NDD1 YBR070C GIC2 SVS1 SOK2 YNL058C GDH3 ECM33 SWI5 SLY1 YDR451C YER189W YER190W PMA1 YGL114W Y HL029C Y IL158W YJL051W CIS3 SUR7 CDC5 CLN1 SRL1 YOR248W YOR315W CLB2 NCE102 YBL029W UTR2
From 23 transcription factors studied in both networks only 9 have their target sets overlapping more than expected by chance • Is it as bad as my look? – We will expect many indirect connections in the disruption network that are not present in Chip network – is this the case? There is an anecdotal evidence that this is the case – What about the connections present in the Chip network, but not in the disruption network? – can be explained by nonfunctional relationships in the chip network and combinatorial regulatory effects
Conclusions • We want to think that networks share enough in common both to be meaningful, but at the same time apparently there is a lots of noise in at least one of them present
How both networks compare • How much networks have in common • We can look at the intersection of the networks whether the common parts have evidence in our existing knowledge • If the target sets of the transcription factors present in both networks are similar • Are the network topology (e.g., connectivity) properties similar – and what are they
Degree of a node in a graph The central node has degree = 7 indegree = 3 outdegree = 4
Important genes and genes with complex regulation Most genes have only a few incoming / outgoing edges, but some have high numbers (>500) Indegree Outdegree
Genes with highest in- and out-degree γ outdegree m n indegree m n 2.0 Carbohydrate metabolism 363 4 Amino-acid metabolism 9 194 RNA turnover 353 4 Nucleotide metabolism 6 82 Meiosis 244 3 Energy generation 5 242 Cellstress 207 9 Small molecule transport 5 343 Protein translocation 197 3 Other metabolism 5 148 2.8 RNA turnover 110 4 Amino-acid metabolism 4 167 Cellstress 8 Nucleotide metabolism 62 3 67 Meiosis 3 Energy generation 54 2 184 Proteinsynthesis 53 7 Differentiation 2 43 Cellwallmaintenance 6 Small molecule transport 47 2 286 3.6 RNA turnover 48 4 Small molecule transport 2 230 RNA processing/ modification 41 4 Other metabolism 2 96 Cellstress 27 8 Nucleotide metabolism 2 58 Small molecule transport 8 Matingresponse 19 2 57 Cellwallmaintenance 19 6 Amino-acid metabolism 2 133 Cellular role table showing the top 5 groups with the highest median degrees for the networks with γ =2.0, 2.8 and 3.6 with a minimum group size of 3 for outdegree and 40 for the indegree (m median degree, n number of genes per group)
Recommend
More recommend