Polynomial learning from Laurent Miclet, Jose Oncina and Tim - PDF document

Acknowledgements Polynomial learning from • Laurent Miclet, Jose Oncina and Tim Oates for previous versions of these slides. • Rafael Carrasco, Paco Casacuberta, Rémi positive and negative Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck Thollard, Enrique Vidal, Frédéric Tantini,... examples • List is necessarily incomplete. Excuses to those that have been forgotten. http://eurise.univ-st-etienne.fr/~cdlh/slides c c d d l l h h 1 2 Grammatical Inference 2005 1 Grammatical Inference 2005 2 Outline 1 The problem: • In a general way to learn a language 1. The problem (belonging to some class L ) from 2. Notations examples and perhaps from: – counter-examples 3. Models – queries to an oracle – specific knowledge 4. Proof techniques 5. Conclusion • Once the program is written we would like: – to say it is correct – to prove that no correct program can be written c c d d l l h h 3 4 Grammatical Inference 2005 Grammatical Inference 2005 3 4 What does ‘correct’ mean? Representation of L • We need a goal: • We are going to have to fix some representation of L; – L is a target (unknown). The harder L • We denote by r ( L ) this ideal is the harder it is going to be to learn. representation of L ; ∫ r ( L ) ∫ • Learn what? • And is the size of this representation; – find a representation of L • Or at least some polynomial – find some reasonable approximation of L measure of the number of bits (what is a reasonable approximation?) needed to encode r ( L ). c c d d l l h h 5 6 Grammatical Inference 2005 5 Grammatical Inference 2005 6 1

How long can it take? What about efficiency? • Ideally: • We can try to bound with p ( ∫ r ( L ) ∫ ) examples we are sure – global time to learn/find... – update time • Interesting: p ( ∫ r ( L ) ∫ ) with examples drawn – errors before converging according to some distribution D , – queries we will be nearly sure of finding a grammar/ classifier that will – good examples needed be nearly always right… according to D ( PAC model: Valiant 84) c c d d l l h h 7 8 Grammatical Inference 2005 7 Grammatical Inference 2005 8 2 General Notations The examples h r ( x r ( L ) L ) 0 Σ * r ( L ) ≡ h 1 H r ( L C x ) r ( L ) ≈ h c c d d l l h h 9 10 Grammatical Inference 2005 Grammatical Inference 2005 9 10 The classes C and H How do we consider a finite set? • sets of examples Σ * • representations of these sets Σ n D Pr < ε • the computation of r ( L )(x) D ≤ n (and h ( x )) must take place in time polynomial in ⏐ x ⏐ c c d d l l h h 11 12 Grammatical Inference 2005 11 Grammatical Inference 2005 12 2

3 Some models/paradigms • Identification in the limit • and... – Identification in the limit with • PAC learnability probability 1 • PAC predictability – Identification PAC • Learning with a teacher – Simple PAC – PAC Simple – Different teaching models – … c c d d l l h h 13 14 Grammatical Inference 2005 13 Grammatical Inference 2005 14 C is identifiable in the limit iff 3.1 Identification in the limit (Gold 67,78) ∀ L ∈ C , ∀ presentation � Protocol We have a presentation of some language L : x 1 x 2 x n x i � ∀ x ∈ L , x appears in the presentation h i ≡ h n … ≡ h 1 h 2 h n ( learning from text : positive presentation) r ( L ) � ∀ x ∈Σ * L ( x )> < x , appears in the presentation ( learning from informant ) c c d d l l h h 15 16 Grammatical Inference 2005 Grammatical Inference 2005 15 16 Main results (Gold 67) Main results in GI • No super-finite class is • All well known classes of identifiable from text; languages can be identified from complete presentations • any recursively denumerable class is identifiable from an informant. • No usual class of languages can be identified from positive presentations c c d d l l h h 17 18 Grammatical Inference 2005 17 Grammatical Inference 2005 18 3

3.2 PAC learning (Valiant 84, Pitt 89) h is AC (approximately correct)* • C a set of languages • H a set of hypothesis iff ε >0 and δ >0 • • L ∈ C Pr D [ h ( x ) ≠ L ( x )]< ε • h ∈ H * For some specific ε c c d d l l h h 19 20 Grammatical Inference 2005 19 Grammatical Inference 2005 20 h is PAC * (probably approximately f h correct) iff Pr D [ h ( x ) ≠ L ( x )]< ε with probability at least 1- δ * For some specific ε and δ Errors: we want ( 1 ( L ) ⊕ 1 ( h ))< ε c c d d l l h h 21 22 Grammatical Inference 2005 Grammatical Inference 2005 21 22 The oracle EX • The examples may cost, but… The class C is PAC learnable by • ( X , D ) set of examples. iff there exists an algorithm H (maybe probabilistic) a that uses • Denote by n the size of an example. ∀ ε >0 δ >0 , EX to obtain and • EX ( L , D ) returns in time at most ∀ L ∈ C and for any distribution D O ( n ) a pair < x , L ( x )>. over Σ * , a PAC hypothesis h ∈ H . • simplifying… EX c c d d l l h h 23 24 Grammatical Inference 2005 23 Grammatical Inference 2005 24 4

3.3 PAC Prediction The class C is polynomially The class C is polynomially PAC -learnable by H if C is PAC -predictable if there is a PAC -learnable by H and if for class such that is H C any L ∈ C , a returns a PAC polynomially PAC -learnable by H . solution in time polynomial in 1/ ε , 1/ δ , ∫ r ( L ) ∫ , n. c c d d l l h h 25 26 Grammatical Inference 2005 25 Grammatical Inference 2005 26 Some observations PAC and GI • PAC learning DFA is still an open • Different variants problem but it is believed to – PAC -identifiable: ε =0 be impossible because – EX-pos, EX-neg – intractability of minimum consistency problem (Gold 78) • the case C is PAC -learnable – hardness of prediction due to (by C ): this is the usual case cryptographic limitations (Kearns & Valiant 89) for positive results, but is – hardness of learning with equivalence not that useful in the queries (Pitt 89, Angluin 87) negative case. c c d d l l h h 27 28 Grammatical Inference 2005 Grammatical Inference 2005 27 28 3.4 Active Learning Active learning and GI • Idea: the learner can see the 2 lectures on the interrogate a master (an subject oracle) the oracle must answer with poor queries, cannot correctly learn anything the oracle may choose the with strong queries, can worse of the correct answers learn DFA c c d d l l h h 29 30 Grammatical Inference 2005 29 Grammatical Inference 2005 30 5

3.5 Learning from a Teacher Intermediate Model Identification from a characteristic • Idea: sample – the teacher can choose some good • Algorithm must be polynomial and ... examples. … every concept admits a polynomial – All examples are given at the characteristic sample beginning. • Related to learning from a teacher: a set of models for the harder classes. – To avoid cheating (collusion) these Goldman, Mathias, ... examples will be mixed with others, less useful. c c d d l l h h 31 32 Grammatical Inference 2005 31 Grammatical Inference 2005 32 Identification in the limit from polynomial Identification in the limit from polynomial time and data. time and data a 1) Given a sample < X + , X - >, of size m , ϕ < X + , X - > h [in time returns h in H consistent with < X + , X - > p ( ║ X + ║ + ║ X - ║ )] in time in O ( p ( m )). 2) For any r ( L ) of size n , there exists a < CS + , CS - > ⊆ < X + , X - > L characteristic sample < CS + , CS - > of size at most q ( n ), with which, given < X + , X - >, a CS + ⊆ X + , CS - ⊆ X - , ϕ with returns h of size q ( ∫ r ( L ) ∫ ) equivalent to f . ≡ L h c c d d l l h h 33 34 Grammatical Inference 2005 Grammatical Inference 2005 33 34 A theorem by Gold (1978) By morphisms the result may extend to: • Even linear grammars (Takada 88 & • DFA are identifiable in the 94; Sempere & García 94, Mäkinen limit from polynomial time 96) and data • Total subsequential functions • alternative results and (Oncina, García & Vidal 93 ) algorithms: • Context-free grammars from – Trakhenbrot & Barzdin 73 skeletons (Sakakibara 90) – Oncina & García 92 • Tree automata (Knuutila 94) – Lang 92 c c d d l l h h 35 36 Grammatical Inference 2005 35 Grammatical Inference 2005 36 6

Polynomial learning from Laurent Miclet, Jose Oncina and Tim - PDF document

Acknowledgements Polynomial learning from Laurent Miclet, Jose Oncina and Tim Oates for previous versions of these slides. Rafael Carrasco, Paco Casacuberta, Rmi positive and negative Eyraud, Philippe Ezequel, Henning Fernau,

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

On Kauffman polynomial of alternating knot and HOMFLY polynomial of its Whitehead double

Property of the interior polynomial from the HOMFLY polynomial

AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Example Handwritten Digit Recognition Polynomial

Polynomial Resultants Henry Woody May 2, 2016 The Resultant Polynomial Resultants Henry Woody

Section4.1 Polynomial Functions and Models Introduction Definitions A polynomial function is a

Polynomial Equations and Inequal- ities We will consider polynomial equations first and assume

Polynomial-time reductions We have seen several reductions: Polynomial-time reductions Informal

raSAT: SMT for Polynomial Inequality To Van Khanh (UET/VNU-HN) Vu Xuan Tung, Mizuhito Ogawa

A CLASS OF POLYNOMIAL PLANAR VECTOR FIELDS WITH POLYNOMIAL FIRST INTEGRAL A. FERRAGUT, C. GALINDO

Polynomial Time corresponds to Solutions of Polynomial Ordinary Differential Equations of

Polynomial Hierarchy A polynomial-bounded version of Kleenes Arithmetic Hierarchy becomes

Finding Small Roots of Bivariate Integer Polynomial Equations Revisited Jean-S ebastien Coron

Stochastic Computing by Stochastic Computing by a New Polynomial a New Polynomial Dimensional

On the Satisfiability of Metric Temporal Logics over the Reals Marcello M. Bersani Matteo Rossi

Introduction to Computer Science Page 1 of 200 Sanjiva Prasad Go Back

CatLog: A Categorial Parser/Theorem-Prover 1 Glyn Morrill Departament de LSI Universitat Polit`

Computational Properties of Resolution and First-Order Logic Soundness and Completeness:

ANALYSIS of EUCLIDEAN ALGORITHMS An Arithmetical Instance of Dynamical Analysis Dynamical

Logic-based order-of-magnitude qualitative reasoning for closeness via proximity intervals

JUST THE MATHS SLIDES NUMBER 14.8 PARTIAL DIFFERENTIATION 8 (Dependent and independent

approach to sca7ering James P. Vary with collaborators: Weijie Du ( ), Peng Yin (