a new annotation tool for malaria based on inference of
play

A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF - PowerPoint PPT Presentation

A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF PROBABILISTIC GENETIC NETWORKS J. Barrera 1, R. M. Cesar Jr. 1 , D. C. Martins Jr. 1 , E. F Merino 2 , R. Z. N. Vncio 1 , F. G. Leonardi 1, M. M. Yamamoto 2 , C. A. B. Pereira 1 , H. A.


  1. A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF PROBABILISTIC GENETIC NETWORKS J. Barrera 1, R. M. Cesar Jr. 1 , D. C. Martins Jr. 1 , E. F Merino 2 , R. Z. N. Vêncio 1 , F. G. Leonardi 1, M. M. Yamamoto 2 , C. A. B. Pereira 1 , H. A. del Portillo 2 UNIVERSITY OF SAO PAULO, BRAZIL 1- Institute of Mathematics and Statistics 2- Institute of Biomedical Sciences

  2. Layout • Introduction • Probabilistic genetic network (PGN) • PGN design • Data analysis pipeline • Biological interpretation

  3. Introduction

  4. Functional Classification Sinusoidal Non signal Sinusoidal signal

  5. Interaction Graph plastid genome glycolysis

  6. Probabilistic Genetic Network (PGN )

  7. ∈ − + Expression of gene i at time t: x i [ t ] { 1 , 0 , 1 } � � x [ t ] 1 � � x [ t ] � � 2 � � = x [ t ] . State of the regulatory network at time t: � � . � � � � � � x [ t ] n Network dynamics: + = φ x [ t 1 ] ( x [ t ])

  8. � � φ 1 � � φ � � 2 � � . + = φ x [ t 1 ] ( x [ t ]) � � φ = i i � . � � � . � � � � � φ � n x j [ t ] [ + x i t 1 ] target φ i predictors x k [ t ]

  9. Probabilistic Genetic Network (PGN) x j [ t ] � 1 p ( 1 | x [ t ], x [ t ]) j k � � + = x [ t 1 ] 0 p ( 0 | x [ t ], x [ t ]) φ i i j k � − − 1 p ( 1 | x [ t ], x [ t ]) � j k x k [ t ] ∃ ∈ − ≠ ≠ y , z , w { 1 , 0 , 1 }, y z w : >> + p ( y | x [ t ], x [ t ]) p ( z | x [ t ], x [ t ]) p ( w | x [ t ], x [ t ]) j k j k j k

  10. This system - depends just on the previous time - is time translation invariant - is a conditionally independent Markov chain n ∏ + = + ( [ 1 ] | [ ]) ( [ 1 ] | [ ]) P x t x t p x t x t i = i 1 - is characterized by the conditional probabilities + ( [ 1 ] | [ ]) p x t x t i

  11. PGN Design

  12. PGN Design. x [ 1 ], x [ 2 ],..., x [ 48 ] Target genes

  13. P(Y) Distribution of Y − → P : { 1 , 0 , 1 } [ 0 , 1 ] � = P ( y ) 1 -1 0 1 y ∈ { − 1 , 0 , 1 } P(Y’) Entropy � = − H ( Y ) P ( y ) log P ( y ) ∈ − y { 1 , 0 , 1 } -1 0 1 > = H ( Y ) H ( Y ' ) H ( Y ' ) H ( Y ' ' ) P(Y’’) Mutual information = − ≥ ( , ) ( ) ( | ) 0 I X Y H Y H Y X -1 0 1

  14. Mean conditional entropy � � = − E [ H ( Y | X )] P ( X ) P ( Y | X ). log( P ( Y | X )) Mean mutual information = − [ ( , )] ( ) [ [ | ]] E I X Y H Y E H Y X Mean mutual information estimation ∧ ∧ ∧ ∧ � � = − E [ H ( Y | X )] P ( X ) P ( Y | X ) log( P ( Y | X )) . ∧ ∧ ∧ = − E [ I ( X , Y )] H ( Y ) E [ H ( Y | X )]

  15. Estimation of P(Y|X) = [ + Y x t 1 ] Y: the taget gene at t+1, that is, i = X: the predictors at t, that is, X ( x [ t ], x [ t ]) j k For a fixed parameter n = ∧ = ^ # (( Y c ) X ( a , b )) If #(X=(a,b)) ≥ n, then = = = P ( Y c | X ( a , b )) = # ( X ( a , b )) ^ If #(X=(a,b)) < n, then is uniform = ( | ( , )) P Y X a b

  16. Estimation of P(X) for a fixed parameter n P(X) = X ( x [ t ], x [ t ]) j k X � � + − = = = = N # ( X ( a , b )) N # ( X ( a , b )) = ≥ ∀ # ( X ( a , b )) n , ( a , b ) # ( = ( , )) < , ∀ ( , ) X a b n a b + = # ( ( , )) ^ N X a b = = × If #(X=(a,b)) ≥ n, then P ( X ( a , b )) − + + + N N N − N 1 ^ If #(X=(a,b)) < n, then = = × P ( X ( a , b )) − + 2 + − = ≥ 3 | {( , ) :# ( ( , )) } | N N a b X a b n

  17. Building Interactions Graphs - For each target gene, rank all predictors by their mean estimated mutual information; - Choose best predictors; - Design the interaction graph Target gene Predictor genes

  18. Data analysis pipeline

  19. Data analysis pipeline GPR Functional groups Plasmo DB Overview Scaling set and quantization USP dataset determination Metabolic Pathways USP Quantized dataset GraphViz USP Output dataset graph DeRisi´s Table transcriptome of predictors Design Biological Target of genes Interpretation PGN

  20. USP-dataset • directly from original .gpr “raw” data; • intensity = foreground mean - background median; • mean for replicated time points; • different definition of “weak” spots and elimination rules; • no interpolation used; • consider ALL accepted oligos as unique entities (including _almost sinusoidal). USP-dataset: 6532 oligos Overview dataset: 3719 oligos

  21. Weak spots definition X = (0, 0, ... , 100, 100, ... , 100, 0, 0, ... , 0, 0) < X > = 9 * 100 / 46 = 19.56 R = normalized cy5/cy3 = X /< X > = R = (0, 0, ... , 5.11, 5.11, ... , 5.11, 0, 0, ... , 0, 0) log 2 ( R ) = (- ∞ , - ∞ , ... , 1.63, 1.63, ..., 1.63, - ∞ , - ∞ , ... , - ∞ ) Not amenable to Fourier analysis due to infinities .

  22. Scaling ^ [ [ ]] E x t For each i , estimate the mean i ^ σ [ [ ]] x i t and standard desviation normal transform ^ − x E [ x [ t ]] i i = n [ t ] i ^ σ [ x [ t ]] i

  23. Quantization + − n i [ t ] n i [ t ] Let and denote, respectively, the normalized signals greater and lower than zero at t.. ^ + + > = + If n [ t ] E [ n [ t ]], then x [t] 1 i i i ^ ^ − − + + > < = If n [ t ] E [ n [ t ]] and n [ t ] E [ n [ t ]], then x [t] 0 i i i i i ^ − − < = − If n [ t ] E [ n [ t ]], then x [t] 1 i i i

  24. Output example Unknown group Not in Overview Plastid genome In Overview Organelar Translation machinery In Overview In Overview Unknown group In Overview Not in Overview

  25. Back

  26. Back

  27. Back

  28. Biological Interpretation

  29. Glycolytic PGN network (single genes) hexokinase aldolase isomerase mutase 6 PFK TP isomerase G3PDH Pyruvate kinase Pyruvate kinase PG kinase enolase glycolysis proteoasome plastid genome transcription machinery kinases cytoplasmic translation ribonucleotide synthesis actin myosoin motors deoxynucleotide synthesis mitochondrial DNA replication

  30. No TCA genes

  31. 550 apicoplast proteins 124 apicoplast proteins

  32. Apicoplast PGN network (singlets) glycolysis proteoasome plastid genome transcription machinery kinases cytoplasmic translation ribonucleotide synthesis actin myosoin motors deoxynucleotide synthesis mitochondrial DNA replication

  33. Apicoplast PGN network (doublets) glycolysis proteoasome plastid genome transcription machinery kinases cytoplasmic translation ribonucleotide synthesis actin myosoin motors deoxynucleotide synthesis mitochondrial DNA replication

  34. 124/phase 466/ bipartite PGN 676 Biological validation

  35. J. Barrera, R.M. Cesar Jr., C. P. Pereira, D. Martins, R. Z. Vencio, E. F. Merino, M. M. Yamamoto

Recommend


More recommend