Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Local Substitutability for Sequence Generalization Fran¸ cois Coste , Ga¨ elle Garet , Jacques Nicolas Dyliss Bioinformatic Team Inria Rennes-Bretagne Atlantique France ICGI, September 6, 2012 Local Substitutability for Sequence Generalization 1/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Table of Contents 1 Biological Problem to Grammatical Inference 2 Generalization using Substitutability 3 Generalization using Local Substitutability First Experiments 4 Local Substitutability for Sequence Generalization 2/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Table of Contents Biological Problem to Grammatical Inference 1 Generalization using Substitutability 2 Generalization using Local Substitutability 3 First Experiments 4 Local Substitutability for Sequence Generalization 3/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Prediction of Protein Function Protein: Amino acid sequence : length ≈ 500, alphabet of size 20 KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTM Structure : determined by sequence Function : largely dependent on structure A lot of sequences available (sequencing projects) = ⇒ Find the protein’s function from its sequence Local Substitutability for Sequence Generalization 4/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Characterization of a Protein Functional Family Usual representations: Sub-regular expressions, profiles, ... Proteins: short term interactions long term interactions Local Substitutability for Sequence Generalization 5/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Characterization of a Protein Functional Family KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL... Usual representations: Sub-regular expressions, profiles, ... Proteins: short term interactions long term interactions beta sheet alpha helix Local Substitutability for Sequence Generalization 5/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Characterization of a Protein Functional Family KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL... Usual representations: Sub-regular expressions, profiles, ... Proteins: short term interactions: automata[Ker08] long term interactions beta sheet alpha helix Local Substitutability for Sequence Generalization 5/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Characterization of a Protein Functional Family KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL... Usual representations: Sub-regular expressions, profiles, ... Proteins: short term interactions long term interactions Local Substitutability for Sequence Generalization 5/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Characterization of a Protein Functional Family KETAAAKFERQHMDSSTSAASSSNYCN- QMMKSRNL... Usual representations: Sub-regular expressions, profiles, ... Proteins: short term interactions long term interactions Abstraction Context free grammars enable modeling important protein contacts. Issue How to infer such CFG from a set of protein sequences? Local Substitutability for Sequence Generalization 5/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Protomata-inspired Approach Detection of blocks of conservation by partial local multiple alignment[Ker08] Seq1 IDLQTVLPEWVRVGFSASTG QNV SVSLD ERNSILAWSFSS Seq2 TVSYD VDLKTELPEWVRVGFSGSTG GYV QNHNILSWTFNS Seq3 HVSAT VPLEKEVEDWVSVGFSATSG SKKETT ETHNVLSWSFSS Seq4 AYQWSY NVSTT VELEKEVYDWVSVGFSATSG ETHDVLSWSFSS Seq5 SVSAT VHLEKEVDEWVSVGFSATSG LTEDTT ETHDVLSWSFSS Recoding sequences with Grammar induced by recoding conservation blocks S − → Block1 Block2 Block3 Block4 | Block5 Block2 Block6 Block4 Seq1 Block1 Block2 Block3 Block4 Block1 − → P1 P2 P3 P4 P5 Seq2 Block1 Block2 Block3 Block4 P1 − → S | T Seq3 Block5 Block2 Block6 Block4 ... Seq4 Block5 Block2 Block6 Block4 Seq5 Block5 Block2 Block6 Block4 How to generalize more? Local Substitutability for Sequence Generalization 6/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Protomata-inspired Approach Detection of blocks of conservation by partial local multiple alignment[Ker08] Seq1 IDLQTVLPEWVRVGFSASTG QNV SVSLD ERNSILAWSFSS Seq2 TVSYD VDLKTELPEWVRVGFSGSTG GYV QNHNILSWTFNS Seq3 HVSAT VPLEKEVEDWVSVGFSATSG SKKETT ETHNVLSWSFSS Seq4 AYQWSY NVSTT VELEKEVYDWVSVGFSATSG ETHDVLSWSFSS Seq5 SVSAT VHLEKEVDEWVSVGFSATSG LTEDTT ETHDVLSWSFSS Recoding sequences with Grammar induced by recoding conservation blocks S − → Block1 Block2 Block3 Block4 | Block5 Block2 Block6 Block4 Seq1 Block1 Block2 Block3 Block4 Block1 − → P1 P2 P3 P4 P5 Seq2 Block1 Block2 Block3 Block4 P1 − → S | T Seq3 Block5 Block2 Block6 Block4 ... Seq4 Block5 Block2 Block6 Block4 Seq5 Block5 Block2 Block6 Block4 How to generalize more? Local Substitutability for Sequence Generalization 6/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Protomata-inspired Approach Detection of blocks of conservation by partial local multiple alignment[Ker08] Seq1 IDLQTVLPEWVRVGFSASTG QNV SVSLD ERNSILAWSFSS Seq2 TVSYD VDLKTELPEWVRVGFSGSTG GYV QNHNILSWTFNS Seq3 HVSAT VPLEKEVEDWVSVGFSATSG SKKETT ETHNVLSWSFSS Seq4 AYQWSY NVSTT VELEKEVYDWVSVGFSATSG ETHDVLSWSFSS Seq5 SVSAT VHLEKEVDEWVSVGFSATSG LTEDTT ETHDVLSWSFSS Recoding sequences with Grammar induced by recoding conservation blocks S − → Block1 Block2 Block3 Block4 | Block5 Block2 Block6 Block4 Seq1 Block1 Block2 Block3 Block4 Block1 − → P1 P2 P3 P4 P5 Seq2 Block1 Block2 Block3 Block4 P1 − → S | T Seq3 Block5 Block2 Block6 Block4 ... Seq4 Block5 Block2 Block6 Block4 Seq5 Block5 Block2 Block6 Block4 How to generalize more? Local Substitutability for Sequence Generalization 6/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Table of Contents Biological Problem to Grammatical Inference 1 Generalization using Substitutability 2 Generalization using Local Substitutability 3 First Experiments 4 Local Substitutability for Sequence Generalization 7/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Substitutability[Har54] Based Inference [CE07]: substitutable languages ∀ y 1 , y 2 ∈ Σ + : [ ∃� x 1 , z 1 � : x 1 y 1 z 1 ∈ L ∧ x 1 y 2 z 1 ∈ L ] ⇒ [ ∀� x 2 , z 2 � : x 2 y 1 z 2 ∈ L ⇔ x 2 y 2 z 2 ∈ L ] Two strings occurring between common left and right contexts are substitutable. [Yos08]: (k,l)-substitutable languages ∀ y 1 , y 2 ∈ Σ + , ∀� u , v � ∈ � Σ k , Σ l � : [ ∃� x 1 , z 1 � : x 1 uy 1 vz 1 ∈ L ∧ x 1 uy 2 vz 1 ∈ L ] ⇒ [ ∀� x 2 , z 2 � : x 2 uy 1 vz 2 ∈ L ⇔ x 2 uy 2 vz 2 ∈ L ] Two strings occurring between common left and right contexts are substitutable in these left and right sub-contexts of length k and l . Local Substitutability for Sequence Generalization 8/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Substitutability[Har54] Based Inference [CE07]: substitutable languages ∀ y 1 , y 2 ∈ Σ + : [ ∃� x 1 , z 1 � : x 1 y 1 z 1 ∈ L ∧ x 1 y 2 z 1 ∈ L ] ⇒ [ ∀� x 2 , z 2 � : x 2 y 1 z 2 ∈ L ⇔ x 2 y 2 z 2 ∈ L ] Two strings occurring between common left and right contexts are substitutable. [Yos08]: (k,l)-substitutable languages ∀ y 1 , y 2 ∈ Σ + , ∀� u , v � ∈ � Σ k , Σ l � : [ ∃� x 1 , z 1 � : x 1 uy 1 vz 1 ∈ L ∧ x 1 uy 2 vz 1 ∈ L ] ⇒ [ ∀� x 2 , z 2 � : x 2 uy 1 vz 2 ∈ L ⇔ x 2 uy 2 vz 2 ∈ L ] Two strings occurring between common left and right contexts are substitutable in these left and right sub-contexts of length k and l . Local Substitutability for Sequence Generalization 8/24
Biological Problem Generalization using Substitutability Generalization using Local Substitutability First Experiments Preliminary Experiments on Protein Sequences Unsatisfactory results No generalization Precision Recall F-measure Before substitutability 1 0.2 0.33 After substitutability 1 0.2 0.33 Analysis of failure causes Training sequences are long (Global) Contexts of two strings are never identical How to generalize more? Our solution : Introduction of local substitutability new classes of languages new generalization criterion Local Substitutability for Sequence Generalization 9/24
Recommend
More recommend