Acknowledgements • Laurent Miclet, Jose Oncina and Tim Oates for previous versions of these Learning k -reversible slides. • Rafael Carrasco, Paco Casacuberta, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck languages Thollard, Enrique Vidal, Frédéric Tantini,... • List is necessarily incomplete. Excuses to those that have been forgotten. http://eurise.univ-st-etienne.fr/~cdlh/slides c c d d l l h h 1 2 Grammatical Inference 2006 1 Grammatical Inference 2006 2 Outline 1. The problem 1. Introduction • Gold: identification from positive data only is 2. Definitions impossible for super-finite 3. The algorithm classes. 4. Properties • The problem thus is over- generalisation. c c d d l l h h 3 4 Grammatical Inference 2006 Grammatical Inference 2006 3 4 Avoid 2. The k -reversible languages • The class was proposed by Angluin • Accumulation points; (1982). • Languages that do not have • The class is identifiable in the tell-tale sets. limit from text. • The class is composed by regular languages that can be accepted by a DFA such that its reverse is deterministic with a look-ahead of k . c c d d l l h h 5 6 Grammatical Inference 2006 5 Grammatical Inference 2006 6 1
a b A a 2 b 1 0 a 4 a Let A =( Σ , Q , I , F , δ ) be a NFA , a 3 we denote by A T =( Σ , Q , F , I , δ T ) a the reversal automaton with: A T b a 2 b 1 0 δ T ( q , a )={ q’ ∈ Q : q ∈δ ( q’ , a )} a 4 a a 3 c c d d l l h h 7 8 Grammatical Inference 2006 7 Grammatical Inference 2006 8 Some definitions a b A a 2 b 1 • u is a k-successor of q if 0 a 4 a a │ u │ = k and δ ( q , u ) ≠∅ . 3 • u is a k - predecessor of q if • aa is a 2 -successor of 0 and │ u │ = k and δ T ( q , u T ) ≠∅ . 1 but not of 3. • a is a 1-successor of 3. • λ is 0-successor and 0- • aa is a 2 -predecessor of 3 predecessor of any state. but not of 1. c c d d l l h h 9 10 Grammatical Inference 2006 Grammatical Inference 2006 9 10 A NFA is deterministic with Prohibited: ∀ q , q’ ∈ Q: look-ahead k iff q ≠ q’ ( q , q’ ∈ I ) ∨ ( q , q’ ∈δ ( q” , a )) u 1 ⇒ │ u │ = k a ( u is a k -successor of q ) ∧ ⇒ u ≠ v ( v is a k -successor of q’ ) a u 2 c c d d l l h h 11 12 Grammatical Inference 2006 11 Grammatical Inference 2006 12 2
Example Note a b a • You must have intersection 2 b 1 0 free successor sets only for a 4 a a 3 those states that have a non determinism issue! This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2. The fact that states 1 and 2 have common successors is not a problem. c c d d l l h h 13 14 Grammatical Inference 2006 13 Grammatical Inference 2006 14 K -reversible automata Violation of k -reversibility • Two states q , q’ violate the • A is k -reversible if A is k -reversibility condition if deterministic and A T is – they violate the deterministic deterministic with look-ahead k. condition: q , q’ ∈δ ( q” , a ); or • Example b – they violate the look-ahead b a condition: a b a 2 b a • q , q’ ∈ F , ∃ u ∈Σ k : u is k -predecessor of 2 1 0 1 0 both; • ∃ u ∈Σ k , δ ( q , a )= δ ( q’ , a ) and u is k - b b predecessor of both q and q’ . deterministic with look-ahead 1 deterministic c c d d l l h h 15 16 Grammatical Inference 2006 Grammatical Inference 2006 15 16 K -RL Algorithm ( a k-RL ) 3 Learning k-reversible automata Data: k ∈ N , X text sample • Key idea: the order in which the merges are performed does A =PTA( X ) not matter! ∃ q , q’ While k -reversibility • Just merge states that do not violators do comply with the conditions A =merge( A , q , q’ ) for k -reversibility. c c d d l l h h 17 18 Grammatical Inference 2006 17 Grammatical Inference 2006 18 3
k =2 K -RL algorithm ( a k-RL ) (with partitions) Let X ={ a , aa , abba , abbbba } Data: k ∈ N , X text sample a aa abba A 0 =PTA( X ) a a a π ={{ q }: q ∈ Q } λ b b b b a ∃ B , B’ ∈π ab abb abbb abbbb abbbba While k -reversibility violators do π = π - B - B’ ∪ { B ∪ B’ } A = A 0 / π Violators, for u = ba c c d d l l h h 19 20 Grammatical Inference 2006 19 Grammatical Inference 2006 20 k =2 k =2 Let X ={ a , aa , abba , abbbba } Let X ={ a , aa , abba , abbbba } a a aa abba aa abba a a a a a a a b λ b b b b λ b b b ab abb abbb abbbb ab abb abbb This automaton is 2 reversible. Note that with k =1 more merges are possible Violators, for u = bb c c d d l l h h 21 22 Grammatical Inference 2006 Grammatical Inference 2006 21 22 Properties (1) Properties (2) • ∀ k ≥ 0, ∀ X , • A regular language is k - a k-RL ( X ) is a k - reversible iff reversible language. ∀ u 1 , u 2 ∈Σ * , ∀ v ∈Σ k , • L ( a k-RL ( X )) is the smallest k - ∃ w ∈Σ * : u 1 vw ∈ L ∧ u 2 vw ∈ L reversible language that contains X. ⇒ • The class k-RL is identifiable ( u 1 v ) -1 L =( u 2 v ) -1 L in the limit from text. (any two strings who have a common suffix of length k can be ended, then the strings are Nerode-equivalent) c c d d l l h h 23 24 Grammatical Inference 2006 23 Grammatical Inference 2006 24 4
Properties (3) Properties (4) The time complexity is in O( k ║ X ║ 3 ). • L ( a k-RL ( X )) ⊂ L ( a [ k -1] -RL ( X )) L ( a k- • Remember we also had: The space complexity is in TSS ( X )) ⊂ L ( a [ k -1] -TSS ( X )) O( ║ X ║ ). The algorithm can be made incremental. c c d d l l h h 25 26 Grammatical Inference 2006 25 Grammatical Inference 2006 26 Properties (4) Extensions Polynomial aspects • Sakakibara built an extension for • Polynomial update time; context-free grammars whose tree language is k -reversible. • Polynomial characteristic samples? • Marion & Besombes propose an extension to tree languages. • Polynomial number of mind • Different authors propose to learn changes? these automata and then estimate • Polynomial number of implicit the probabilities as an alternative prediction errors? to learning stochastic automata. c c d d l l h h 27 28 Grammatical Inference 2006 Grammatical Inference 2006 27 28 Exercises Solution (idea) • L k ={ a i : i ≤ k } • Construct a language L that is not k -reversible, ∀ k ≥ 0. • Then for each k : L k is k - • Prove that the class of k - reversible but not k -1- reversible languages is not reversible. identifiable from text. • Run a k-RL on X ={ aa , aba , abb , abaaba , baaba } for k =0,1,2,3 c c d d l l h h 29 30 Grammatical Inference 2006 29 Grammatical Inference 2006 30 5
Recommend
More recommend