Grammatical inference: an introduction Colin de la Higuera - PowerPoint PPT Presentation

Grammatical inference: an introduction Colin de la Higuera University of Nantes

Nantes @wikipedia 2 Colin de la Higuera, Nantes 2013

Acknowledgements � Pieter Adriaans, Hasan Ibne Akram, Anne-Muriel Arigon, Leo Becerra-Bonache, Cristina Bibire, Alex Clark, Rafael Carrasco, Paco Casacuberta, Pierre Dupont, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Jeffrey Heinz, Jean-Christophe Janodet, Satoshi Kobayachi, Laurent Miclet, Thierry Murgue, Tim Oates, Jose Oncina, Frédéric Tantini, Franck Thollard, Sicco Verwer, Enrique Vidal, Menno van Zaanen,... http://pagesperso.lina.univ-nantes.fr/~cdlh/ http://videolectures.net/colin_de_la_higuera/ 3 Colin de la Higuera, Nantes 2013

Practical information � Grammatical Inference is module X9IT050 � 18 hours � http://pagesperso.lina.univ- nantes.fr/~cdlh/X9IT050.html � Exam: to be decided 4 Colin de la Higuera, Nantes 2013

Some useful links Grammatical Inference Software � The Repository https://logiciels.lina.univ- nantes.fr/redmine/projects/gisr/wiki � Talks on http://videolectures.net � A book � Articles � Start here: http://pagesperso.lina.univ- nantes.fr/~cdlh/X9IT050.html 5 Colin de la Higuera, Nantes 2013

What I plan to talk about 11/9/2013 An introduction to grammatical inference. About what learning a 1. language means, how we can measure success 18/9/2013 An introduction to grammatical inference. A motivating example 2. 25/9/2013 Learning: identifying or approximating? 3. 2/10/2013 Learning from text 4. 9/10/2013 Learning from text: the window languages 5. 16/10/2013 Learning from an informant: the RPNI algorithm and variants 6. 23/10/2013 Learning distributions: why? How should we measure success? 7. About distances between distributions 6/11/2013 Learning distributions: learning the weights given a structure. 8. EM, Gibbs sampling and the spectral methods 13/11/2013 Learning distributions: state merging techniques 9. 20/11/2013 Active learning 1 About active learning 10. 27/11/2013 Active learning 2 The MAT algorithm 11. 4/12/2013 Learning transducers 12. 11/12/2013 Learning probabilistic transducers 13. 18/12/2013 Exam 6 14. Colin de la Higuera, Nantes 2013

Outline (of this first talk) What is grammatical inference about? 1. Why is it a difficult task? 2. Why is it a useful task? 3. Validation issues 4. Some criteria 5. 7 Colin de la Higuera, Nantes 2013

1 Grammatical inference is about learning a grammar given information about a language � Information is strings, trees or graphs � Information can be (typically) � Text: only positive information � Informant: labelled data � Actively sought (query learning, teaching) Above lists are not limitative 8 Colin de la Higuera, Nantes 2013

The functions/goals � Languages and grammars from the Chomsky hierarchy � Probabilistic automata and context-free grammars � Hidden Markov Models � Patterns � Transducers 9 Colin de la Higuera, Nantes 2013

The Chomsky hierarchy Recursively enumerable languages Context sensitive languages Regular Context-free languages languages 10 Colin de la Higuera, Nantes 2013

The Chomsky hierarchy revisited � Regular languages � Recognized by DFA, NFA � Generated by regular grammars � Described by regular expressions � Context-free languages � Generated by CF grammars � Recognized by Stack automata � Context-sensitive languages � CS grammars (parsing is not in P) � Turing machines � Parsing is undecidable 11 Colin de la Higuera, Nantes 2013

Other formalisms � Topological formalisms � Semilinear languages � Hyperplanes � Balls of strings 12 Colin de la Higuera, Nantes 2013

Distributions of strings � A probabilistic automaton defines a distribution over the strings 13 Colin de la Higuera, Nantes 2013

Fuzzy automata � An automaton will say that string w belongs to the language with probability p � The difference with the probabilistic automata is that � The total sum of probabilities may be different than 1 (may even be infinite) � The fuzzy automaton cannot be used as a generator of strings 14 Colin de la Higuera, Nantes 2013

The data: examples of strings A string in Gaelic and its translation to English: � Tha thu cho duaichnidh ri èarr àirde de a ’ coisich deas damh � You are as ugly as the north end of a southward traveling ox 15 Colin de la Higuera, Nantes 2013

http://www.flickr.com/photos/popfossa/3992549630/ Time series pose the problem of the alphabet: • An infinite alphabet? • Discretizing? • An ordered alphabet 16 Colin de la Higuera, Nantes 2013

GIORGIO BERNARDI, REGINA GOURSOT, EDDA RAYKO, RENÉ GOURSOT, BAYA CHERIF-ZAHAR, AND ROBERTA MELIS http://www.scopenvironment.org/downloadpubs/scope44/ chapter05.html 17 Colin de la Higuera, Nantes 2013

>A BAC=41M14 LIBRARY=CITB_978_SKB AAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA GTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAAT GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAA TGCTATTGCCATCTTTATTTCA GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGA CACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT GGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCA GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAA CAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACA CACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA TGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAA TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATA AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCA TCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAA TGAACAGCCAAG TATTCACTATCAAATTTGAGGAA TAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC TAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGA GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGAT GGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAA AAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC TGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGAC TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCA TGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT 18 Colin de la Higuera, Nantes 2013

http://bandelestudio.com/tutoriel-mao-sur- la-creation-musicale/ 19 Colin de la Higuera, Nantes 2013

http://fr.wikipedia.org/wiki/Philippe_VI_de_France 20 Colin de la Higuera, Nantes 2013

21 Colin de la Higuera, Nantes 2013

<book> <part> <chapter> <sect1/> <sect1> <orderedlist numeration="arabic"> <listitem/> <f:fragbody/> </orderedlist> </sect1> </chapter> </part> </book> 22 Colin de la Higuera, Nantes 2013

<?xml version="1.0"?> <?xml-stylesheet href="carmen.xsl" type="text/xsl"?> <?cocoon-process type="xslt"?> <!DOCTYPE pagina [ <!ELEMENT pagina (titulus?, poema)> <!ELEMENT titulus (#PCDATA)> <!ELEMENT auctor (praenomen, cognomen, nomen)> <!ELEMENT praenomen (#PCDATA)> <!ELEMENT nomen (#PCDATA)> <!ELEMENT cognomen (#PCDATA)> <!ELEMENT poema (versus+)> <!ELEMENT versus (#PCDATA)> ]> <pagina> <titulus>Catullus II</titulus> <auctor> <praenomen>Gaius</praenomen> <nomen>Valerius</nomen> <cognomen>Catullus</cognomen> </auctor> 23 Colin de la Higuera, Nantes 2013

24 Colin de la Higuera, Nantes 2013

And also � Business processes � Bird songs � Images (contours and shapes) � Robot moves � Web services � Malware � … 25 Colin de la Higuera, Nantes 2013

2 What does learning mean? � Suppose we write a program that can learn grammars … are we done? � A first question is: « why bother? » � If my programme works, why do something more about it? � Why should we do something when other researchers in Machine Learning are not? 26 Colin de la Higuera, Nantes 2013

Motivating reflection #1 � Is 17 a random number? � Is 0110110110110101011000111101 a random sequence? (Is grammar G the correct grammar for a given sample S ?) 27 Colin de la Higuera, Nantes 2013

Motivating reflection #2 � In the case of languages, learning is an ongoing process � Is there a moment where we can say we have learnt a language? 28 Colin de la Higuera, Nantes 2013

Motivating reflection #3 � Statement “ I have learnt ” does not make sense � Statement “ I am learning ” makes sense � At least when learning over infinite spaces 29 Colin de la Higuera, Nantes 2013

What usually is called “ having learnt ” � That the grammar / automaton is the smallest, best (re a score) � Combinatorial characterisation � That some optimisation problem has been solved � That the “ learning ” algorithm has converged (EM) 30 Colin de la Higuera, Nantes 2013

What is not said � That having solved some complex combinatorial question we have an Occam, Compression, MDL, Kolmogorov complexity like argument which gives us some guarantee with respect to the future � Computational learning theory has got such results 31 Colin de la Higuera, Nantes 2013

Grammatical inference: an introduction Colin de la Higuera - PowerPoint PPT Presentation

Grammatical inference: an introduction Colin de la Higuera University of Nantes Nantes @wikipedia 2 Colin de la Higuera, Nantes 2013 Acknowledgements Pieter Adriaans, Hasan Ibne Akram, Anne-Muriel Arigon, Leo Becerra-Bonache,

Grammatical markers and grammatical relations in the simple clause in Old French Nicolas

Expressing I`rab: The Presentation of Arabic Grammatical Analysis Expressing I`rab: The

Syntax Valency Jirka Hana Jirka Hana Syntax Valency Grammatical Roles Adjunct versus

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Chapter 3: Syntactic Forms, Grammatical Functions, and Semantic Roles Syntactic Constructions in

Assessment of Chinese Grammatical Knowledge for D/hh children: Current findings and implications

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

MOLTO: Multilingual On-Line Translation Or: Using Grammatical Framework to Build

Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher

THE SCIENCE AND THE SCIENCE AND ART OF ART OF INT INTERPRE RPRETAT ATION ION GRAMMATICAL-

Composer classification using grammatical inference Jeroen Geertzen Menno van Zaanen ILK Dept.

A Grammatical Inference approach to Transmembrane domain prediction. Piedachu Peris, Dami an

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 11, 2019

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 9, 2019

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 11, 2019

Jie Fu (U. Pennsylvania) Jeffrey Heinz (Delaware) Adam Jardine (Delaware) Herbert G. Tanner

Compiler Construction Lecture 5: Introduction to Parsing 2020-01-21 Michael Engel Overview

Inverse limits of finite state automata Michal Ferov University of Technology, Sydney Trees,

Statistical natural language processing 24.05.19 Statistical Natural Language Processing 1 The

Finite State Automata Stephan Busemann Thanks to Anette Frank, on whose materials this lecture is

Statistical Parsing October 27, 2016 Dependency grammars Grammar formalisms Finale Plan of the

An Introduction to Minimalist Grammars: Formalism (July 20, 2009) Gregory Kobele Jens Michaelis

La jerarqua de Chomsky: Donde los rboles dejan ver el bosque Donde los rboles dejan ver el

Logical methods in NLP 2012 Preliminaries Michael Moortgat Abstract Natural languages exhibit