Acknowledgements Polynomial learning from • Laurent Miclet, Jose Oncina and Tim Oates for previous versions of these slides. • Rafael Carrasco, Paco Casacuberta, Rémi positive and negative Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck Thollard, Enrique Vidal, Frédéric Tantini,... examples • List is necessarily incomplete. Excuses to those that have been forgotten. http://eurise.univ-st-etienne.fr/~cdlh/slides c c d d l l h h 1 2 Grammatical Inference 2005 1 Grammatical Inference 2005 2 Outline 1 The problem: • In a general way to learn a language 1. The problem (belonging to some class L ) from 2. Notations examples and perhaps from: – counter-examples 3. Models – queries to an oracle – specific knowledge 4. Proof techniques 5. Conclusion • Once the program is written we would like: – to say it is correct – to prove that no correct program can be written c c d d l l h h 3 4 Grammatical Inference 2005 Grammatical Inference 2005 3 4 What does ‘correct’ mean? Representation of L • We need a goal: • We are going to have to fix some representation of L; – L is a target (unknown). The harder L • We denote by r ( L ) this ideal is the harder it is going to be to learn. representation of L ; ∫ r ( L ) ∫ • Learn what? • And is the size of this representation; – find a representation of L • Or at least some polynomial – find some reasonable approximation of L measure of the number of bits (what is a reasonable approximation?) needed to encode r ( L ). c c d d l l h h 5 6 Grammatical Inference 2005 5 Grammatical Inference 2005 6 1
How long can it take? What about efficiency? • Ideally: • We can try to bound with p ( ∫ r ( L ) ∫ ) examples we are sure – global time to learn/find... – update time • Interesting: p ( ∫ r ( L ) ∫ ) with examples drawn – errors before converging according to some distribution D , – queries we will be nearly sure of finding a grammar/ classifier that will – good examples needed be nearly always right… according to D ( PAC model: Valiant 84) c c d d l l h h 7 8 Grammatical Inference 2005 7 Grammatical Inference 2005 8 2 General Notations The examples h r ( x r ( L ) L ) 0 Σ * r ( L ) ≡ h 1 H r ( L C x ) r ( L ) ≈ h c c d d l l h h 9 10 Grammatical Inference 2005 Grammatical Inference 2005 9 10 The classes C and H How do we consider a finite set? • sets of examples Σ * • representations of these sets Σ n D Pr < ε • the computation of r ( L )(x) D ≤ n (and h ( x )) must take place in time polynomial in ⏐ x ⏐ c c d d l l h h 11 12 Grammatical Inference 2005 11 Grammatical Inference 2005 12 2
3 Some models/paradigms • Identification in the limit • and... – Identification in the limit with • PAC learnability probability 1 • PAC predictability – Identification PAC • Learning with a teacher – Simple PAC – PAC Simple – Different teaching models – … c c d d l l h h 13 14 Grammatical Inference 2005 13 Grammatical Inference 2005 14 C is identifiable in the limit iff 3.1 Identification in the limit (Gold 67,78) ∀ L ∈ C , ∀ presentation � Protocol We have a presentation of some language L : x 1 x 2 x n x i � ∀ x ∈ L , x appears in the presentation h i ≡ h n … ≡ h 1 h 2 h n ( learning from text : positive presentation) r ( L ) � ∀ x ∈Σ * L ( x )> < x , appears in the presentation ( learning from informant ) c c d d l l h h 15 16 Grammatical Inference 2005 Grammatical Inference 2005 15 16 Main results (Gold 67) Main results in GI • No super-finite class is • All well known classes of identifiable from text; languages can be identified from complete presentations • any recursively denumerable class is identifiable from an informant. • No usual class of languages can be identified from positive presentations c c d d l l h h 17 18 Grammatical Inference 2005 17 Grammatical Inference 2005 18 3
3.2 PAC learning (Valiant 84, Pitt 89) h is AC (approximately correct)* • C a set of languages • H a set of hypothesis iff ε >0 and δ >0 • • L ∈ C Pr D [ h ( x ) ≠ L ( x )]< ε • h ∈ H * For some specific ε c c d d l l h h 19 20 Grammatical Inference 2005 19 Grammatical Inference 2005 20 h is PAC * (probably approximately f h correct) iff Pr D [ h ( x ) ≠ L ( x )]< ε with probability at least 1- δ * For some specific ε and δ Errors: we want ( 1 ( L ) ⊕ 1 ( h ))< ε c c d d l l h h 21 22 Grammatical Inference 2005 Grammatical Inference 2005 21 22 The oracle EX • The examples may cost, but… The class C is PAC learnable by • ( X , D ) set of examples. iff there exists an algorithm H (maybe probabilistic) a that uses • Denote by n the size of an example. ∀ ε >0 δ >0 , EX to obtain and • EX ( L , D ) returns in time at most ∀ L ∈ C and for any distribution D O ( n ) a pair < x , L ( x )>. over Σ * , a PAC hypothesis h ∈ H . • simplifying… EX c c d d l l h h 23 24 Grammatical Inference 2005 23 Grammatical Inference 2005 24 4
3.3 PAC Prediction The class C is polynomially The class C is polynomially PAC -learnable by H if C is PAC -predictable if there is a PAC -learnable by H and if for class such that is H C any L ∈ C , a returns a PAC polynomially PAC -learnable by H . solution in time polynomial in 1/ ε , 1/ δ , ∫ r ( L ) ∫ , n. c c d d l l h h 25 26 Grammatical Inference 2005 25 Grammatical Inference 2005 26 Some observations PAC and GI • PAC learning DFA is still an open • Different variants problem but it is believed to – PAC -identifiable: ε =0 be impossible because – EX-pos, EX-neg – intractability of minimum consistency problem (Gold 78) • the case C is PAC -learnable – hardness of prediction due to (by C ): this is the usual case cryptographic limitations (Kearns & Valiant 89) for positive results, but is – hardness of learning with equivalence not that useful in the queries (Pitt 89, Angluin 87) negative case. c c d d l l h h 27 28 Grammatical Inference 2005 Grammatical Inference 2005 27 28 3.4 Active Learning Active learning and GI • Idea: the learner can see the 2 lectures on the interrogate a master (an subject oracle) the oracle must answer with poor queries, cannot correctly learn anything the oracle may choose the with strong queries, can worse of the correct answers learn DFA c c d d l l h h 29 30 Grammatical Inference 2005 29 Grammatical Inference 2005 30 5
3.5 Learning from a Teacher Intermediate Model Identification from a characteristic • Idea: sample – the teacher can choose some good • Algorithm must be polynomial and ... examples. … every concept admits a polynomial – All examples are given at the characteristic sample beginning. • Related to learning from a teacher: a set of models for the harder classes. – To avoid cheating (collusion) these Goldman, Mathias, ... examples will be mixed with others, less useful. c c d d l l h h 31 32 Grammatical Inference 2005 31 Grammatical Inference 2005 32 Identification in the limit from polynomial Identification in the limit from polynomial time and data. time and data a 1) Given a sample < X + , X - >, of size m , ϕ < X + , X - > h [in time returns h in H consistent with < X + , X - > p ( ║ X + ║ + ║ X - ║ )] in time in O ( p ( m )). 2) For any r ( L ) of size n , there exists a < CS + , CS - > ⊆ < X + , X - > L characteristic sample < CS + , CS - > of size at most q ( n ), with which, given < X + , X - >, a CS + ⊆ X + , CS - ⊆ X - , ϕ with returns h of size q ( ∫ r ( L ) ∫ ) equivalent to f . ≡ L h c c d d l l h h 33 34 Grammatical Inference 2005 Grammatical Inference 2005 33 34 A theorem by Gold (1978) By morphisms the result may extend to: • Even linear grammars (Takada 88 & • DFA are identifiable in the 94; Sempere & García 94, Mäkinen limit from polynomial time 96) and data • Total subsequential functions • alternative results and (Oncina, García & Vidal 93 ) algorithms: • Context-free grammars from – Trakhenbrot & Barzdin 73 skeletons (Sakakibara 90) – Oncina & García 92 • Tree automata (Knuutila 94) – Lang 92 c c d d l l h h 35 36 Grammatical Inference 2005 35 Grammatical Inference 2005 36 6
Recommend
More recommend