A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF PROBABILISTIC GENETIC NETWORKS J. Barrera 1, R. M. Cesar Jr. 1 , D. C. Martins Jr. 1 , E. F Merino 2 , R. Z. N. Vêncio 1 , F. G. Leonardi 1, M. M. Yamamoto 2 , C. A. B. Pereira 1 , H. A. del Portillo 2 UNIVERSITY OF SAO PAULO, BRAZIL 1- Institute of Mathematics and Statistics 2- Institute of Biomedical Sciences
Layout • Introduction • Probabilistic genetic network (PGN) • PGN design • Data analysis pipeline • Biological interpretation
Introduction
Functional Classification Sinusoidal Non signal Sinusoidal signal
Interaction Graph plastid genome glycolysis
Probabilistic Genetic Network (PGN )
∈ − + Expression of gene i at time t: x i [ t ] { 1 , 0 , 1 } � � x [ t ] 1 � � x [ t ] � � 2 � � = x [ t ] . State of the regulatory network at time t: � � . � � � � � � x [ t ] n Network dynamics: + = φ x [ t 1 ] ( x [ t ])
� � φ 1 � � φ � � 2 � � . + = φ x [ t 1 ] ( x [ t ]) � � φ = i i � . � � � . � � � � � φ � n x j [ t ] [ + x i t 1 ] target φ i predictors x k [ t ]
Probabilistic Genetic Network (PGN) x j [ t ] � 1 p ( 1 | x [ t ], x [ t ]) j k � � + = x [ t 1 ] 0 p ( 0 | x [ t ], x [ t ]) φ i i j k � − − 1 p ( 1 | x [ t ], x [ t ]) � j k x k [ t ] ∃ ∈ − ≠ ≠ y , z , w { 1 , 0 , 1 }, y z w : >> + p ( y | x [ t ], x [ t ]) p ( z | x [ t ], x [ t ]) p ( w | x [ t ], x [ t ]) j k j k j k
This system - depends just on the previous time - is time translation invariant - is a conditionally independent Markov chain n ∏ + = + ( [ 1 ] | [ ]) ( [ 1 ] | [ ]) P x t x t p x t x t i = i 1 - is characterized by the conditional probabilities + ( [ 1 ] | [ ]) p x t x t i
PGN Design
PGN Design. x [ 1 ], x [ 2 ],..., x [ 48 ] Target genes
P(Y) Distribution of Y − → P : { 1 , 0 , 1 } [ 0 , 1 ] � = P ( y ) 1 -1 0 1 y ∈ { − 1 , 0 , 1 } P(Y’) Entropy � = − H ( Y ) P ( y ) log P ( y ) ∈ − y { 1 , 0 , 1 } -1 0 1 > = H ( Y ) H ( Y ' ) H ( Y ' ) H ( Y ' ' ) P(Y’’) Mutual information = − ≥ ( , ) ( ) ( | ) 0 I X Y H Y H Y X -1 0 1
Mean conditional entropy � � = − E [ H ( Y | X )] P ( X ) P ( Y | X ). log( P ( Y | X )) Mean mutual information = − [ ( , )] ( ) [ [ | ]] E I X Y H Y E H Y X Mean mutual information estimation ∧ ∧ ∧ ∧ � � = − E [ H ( Y | X )] P ( X ) P ( Y | X ) log( P ( Y | X )) . ∧ ∧ ∧ = − E [ I ( X , Y )] H ( Y ) E [ H ( Y | X )]
Estimation of P(Y|X) = [ + Y x t 1 ] Y: the taget gene at t+1, that is, i = X: the predictors at t, that is, X ( x [ t ], x [ t ]) j k For a fixed parameter n = ∧ = ^ # (( Y c ) X ( a , b )) If #(X=(a,b)) ≥ n, then = = = P ( Y c | X ( a , b )) = # ( X ( a , b )) ^ If #(X=(a,b)) < n, then is uniform = ( | ( , )) P Y X a b
Estimation of P(X) for a fixed parameter n P(X) = X ( x [ t ], x [ t ]) j k X � � + − = = = = N # ( X ( a , b )) N # ( X ( a , b )) = ≥ ∀ # ( X ( a , b )) n , ( a , b ) # ( = ( , )) < , ∀ ( , ) X a b n a b + = # ( ( , )) ^ N X a b = = × If #(X=(a,b)) ≥ n, then P ( X ( a , b )) − + + + N N N − N 1 ^ If #(X=(a,b)) < n, then = = × P ( X ( a , b )) − + 2 + − = ≥ 3 | {( , ) :# ( ( , )) } | N N a b X a b n
Building Interactions Graphs - For each target gene, rank all predictors by their mean estimated mutual information; - Choose best predictors; - Design the interaction graph Target gene Predictor genes
Data analysis pipeline
Data analysis pipeline GPR Functional groups Plasmo DB Overview Scaling set and quantization USP dataset determination Metabolic Pathways USP Quantized dataset GraphViz USP Output dataset graph DeRisi´s Table transcriptome of predictors Design Biological Target of genes Interpretation PGN
USP-dataset • directly from original .gpr “raw” data; • intensity = foreground mean - background median; • mean for replicated time points; • different definition of “weak” spots and elimination rules; • no interpolation used; • consider ALL accepted oligos as unique entities (including _almost sinusoidal). USP-dataset: 6532 oligos Overview dataset: 3719 oligos
Weak spots definition X = (0, 0, ... , 100, 100, ... , 100, 0, 0, ... , 0, 0) < X > = 9 * 100 / 46 = 19.56 R = normalized cy5/cy3 = X /< X > = R = (0, 0, ... , 5.11, 5.11, ... , 5.11, 0, 0, ... , 0, 0) log 2 ( R ) = (- ∞ , - ∞ , ... , 1.63, 1.63, ..., 1.63, - ∞ , - ∞ , ... , - ∞ ) Not amenable to Fourier analysis due to infinities .
Scaling ^ [ [ ]] E x t For each i , estimate the mean i ^ σ [ [ ]] x i t and standard desviation normal transform ^ − x E [ x [ t ]] i i = n [ t ] i ^ σ [ x [ t ]] i
Quantization + − n i [ t ] n i [ t ] Let and denote, respectively, the normalized signals greater and lower than zero at t.. ^ + + > = + If n [ t ] E [ n [ t ]], then x [t] 1 i i i ^ ^ − − + + > < = If n [ t ] E [ n [ t ]] and n [ t ] E [ n [ t ]], then x [t] 0 i i i i i ^ − − < = − If n [ t ] E [ n [ t ]], then x [t] 1 i i i
Output example Unknown group Not in Overview Plastid genome In Overview Organelar Translation machinery In Overview In Overview Unknown group In Overview Not in Overview
Back
Back
Back
Biological Interpretation
Glycolytic PGN network (single genes) hexokinase aldolase isomerase mutase 6 PFK TP isomerase G3PDH Pyruvate kinase Pyruvate kinase PG kinase enolase glycolysis proteoasome plastid genome transcription machinery kinases cytoplasmic translation ribonucleotide synthesis actin myosoin motors deoxynucleotide synthesis mitochondrial DNA replication
No TCA genes
550 apicoplast proteins 124 apicoplast proteins
Apicoplast PGN network (singlets) glycolysis proteoasome plastid genome transcription machinery kinases cytoplasmic translation ribonucleotide synthesis actin myosoin motors deoxynucleotide synthesis mitochondrial DNA replication
Apicoplast PGN network (doublets) glycolysis proteoasome plastid genome transcription machinery kinases cytoplasmic translation ribonucleotide synthesis actin myosoin motors deoxynucleotide synthesis mitochondrial DNA replication
124/phase 466/ bipartite PGN 676 Biological validation
J. Barrera, R.M. Cesar Jr., C. P. Pereira, D. Martins, R. Z. Vencio, E. F. Merino, M. M. Yamamoto
Recommend
More recommend