Towards Wide-Coverage Semantics for French Richard Moot LaBRI - PowerPoint PPT Presentation

Towards Wide-Coverage Semantics for French Richard Moot LaBRI (CNRS), SIGNES (INRIA) & U. Bordeaux CAuLD, 13 december 2010, Nancy Research partially funded by grants from the Conseil Regional d’Aquitaine: “Itipy” and “Grammaire du Français”

Bridge between statistical NLP and syntax/ Bridge between statistical NLP and syntax/ semantics the way I (and many people here) semantics the way I (and many people here) Introduction like it! like it! Don’t worry, this will not be a talk of the Don’t worry, this will not be a talk of the style I improved on task X from Y% to Y+. style I improved on task X from Y% to Y+. 2% 2% There will be some percentages, but just to There will be some percentages, but just to show we are up to the level of some of the show we are up to the level of some of the statistical NLP guys. statistical NLP guys. • Many wide-coverage parsers for French exist (witness the participation of the Easy and Passage campaigns) • My goal is not directly to compete with them, but to move towards a wide-coverage parser which produces structures which are more interesting (at least to me!) than shared forests

Introduction • I will talk about my current research on a wide-coverage categorial grammar for French. • As we all know, a categorial parse corresponds to a lambda-term in the simply typed lambda calculus.

Introduction • So sentences analysed with this grammar correspond to lambda terms. • Since the work of Montague, we know that the simply typed lambda calculus forms a solid base for the semantic analysis of fragments of natural language.

Introduction • However, we are by no means limited to Montague semantics: Muskens (1994) and de Groote (2006) show that the semantics of categorial grammars are compatible with modern theories of dynamic semantics (DRT in the case of Muskens, and a continuation-based approach in the case of de Groote)

Introduction • In this talk I will present the Grail parser and the development of a wide- coverage grammar of French as well as the development of two prototype semantic lexicons: • one producing DRSs • one producing de Groote-style continuation semantics

Introduction • Wide-coverage semantics in this sense is a relatively new field, which was pioneered for English by Bos e.a. (2004)

Overview • Grammar Extraction - converting a corpus into categorial grammar - how to use this grammar for parsing • Semantics

Grammar Extraction From the Paris VII corpus to a categorial lexicon, while developing several taggers

Grammar Extraction • Grammar extraction is the conversion of a linguistically annotated corpus (in our case, the Paris VII treebank) into a grammar into a grammar formalism the people doing the conversion really like (in our case, categorial grammar)

The Paris VII Corpus NP • To the right is a small DET NC Srel sentence fragment of de-obj ats la monnaie PP VN AP the Paris VII corpus, PROREL CLS-SUJ V ADJ which suffices to dont elle est responsable illustrate the extraction procedure

The extraction algorithm 1. Binarize the annotation NP NP DET NC DET NC Srel de-obj ats la NC Srel la monnaie PP VN AP de-obj ats monnaie PP VN AP PROREL CLS-SUJ V ADJ PROREL CLS-SUJ V ADJ dont elle est responsable dont elle est responsable

The extraction algorithm 1. Binarize the annotation NP NP DET NC DET NC la NC Srel la NC Srel de-obj de-obj ats monnaie PP Srel monnaie PP VN AP PROREL CLS-SUJ VN PROREL CLS-SUJ V ADJ ats dont elle V AP dont elle est responsable est ADJ responsable

The extraction algorithm 1. Binarize the annotation inserting traces for wh words NP NP NP DET NC DET NC DET NC la NC Srel la NC Srel la NC Srel de-obj de-obj ats monnaie PP Srel monnaie PROREL Srel monnaie PP VN AP PROREL CLS-SUJ VN dont CLS-SUJ VN PROREL CLS-SUJ V ADJ ats ats dont elle V AP elle V AP dont elle est responsable est ADJ est ADJ PP-DE responsable responsable �

The extraction algorithm 2. Assign formulas NP DET NC la NC Srel monnaie PROREL Srel dont CLS-SUJ VN ats elle V AP est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np DET NC la NC Srel monnaie PROREL Srel dont CLS-SUJ VN ats elle V AP est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np np/n n la NC Srel monnaie PROREL Srel dont CLS-SUJ VN ats elle V AP est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np np/n n la n n \ n monnaie PROREL Srel dont CLS-SUJ VN ats elle V AP est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np np/n n la n n \ n monnaie ( n \ n ) / ( s/ 32 pp de ) s/ 32 pp de dont CLS-SUJ VN ats elle V AP est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np np/n n la n n \ n monnaie ( n \ n ) / ( s/ 32 pp de ) s/ 32 pp de dont s CLS-SUJ VN ats elle V AP est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np np/n n la n n \ n monnaie ( n \ n ) / ( s/ 32 pp de ) s/ 32 pp de dont s np np \ s ats elle V AP est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np np/n n la n n \ n monnaie ( n \ n ) / ( s/ 32 pp de ) s/ 32 pp de dont s np np \ s ats elle ( np \ s ) / ( n \ n ) n \ n est ADJ PP-DE responsable �

The extraction algorithm 2. Assign formulas np np/n n la n n \ n monnaie ( n \ n ) / ( s/ 32 pp de ) s/ 32 pp de dont s np np \ s ats elle ( np \ s ) / ( n \ n ) n \ n est ( n \ n ) /pp de pp de responsable �

Grammar Extraction • A lot of useful information (such as the position of “traces” of extracted elements) is not annotated but very useful for the grammar and needs to be added by hand. • In addition, the extracted grammar has received a very significant amount of manual cleanup

The extracted grammar • On the basis of the 382.145 words and 12.822 sentence of the treebank, the extraction algorithm extracts 883 different formulas, of which 664 occur more than once. • Many frequent words are assigned many different formulas • This is a significant bottleneck for parsing

The extracted grammar Word POS # POS # et conj 71 adv 206 , ponct 62 conj 92 à prp 55 prp 149 plus adv 44 ponct 89 verb 175 ou conj 42 est verb 39 An illustration of some An illustration of some être inf 36 of the most ambiguous of the most ambiguous words and part-of-speech words and part-of-speech tags. tags. en prp 34 a verb 31

The extracted grammar (np\s)/np ((np\s)/pp_de)/np) (np\s)/(np/s_inf) • Formula assignments ((np\s)/pp_a)/np ((np\s)/np)/(np\s_inf) to the present tense other form “fait” 33 34 • 124 occurrences in the corpus, with 19 different formulas 6 assigned to it. 21 14 16

The extracted grammar • Formula assignments to the comma “,” • 21,398 occurrences, 5.2% 1.4% 1.7% 1.8% 62 different formulas. 2.8% 3.1% 8.6% no formula (np\np)/np (n\n)/n (np\np)/n (s\s)/s ((np\s)\(np\s))/(np\s)) ((n\n)\(n\n))\(n\n)) 75.3% other

The extracted grammar • The sum up, we have produced a categorial grammar for French, which is essentially a very big lexicon. • The size of this lexicon, coupled with high lexical ambiguity, makes direct exploitation for parsing difficult. • A fairly standard solution is to use a supertagger to estimate the most likely sequence of formulas for the given words.

Supertagging • Supertagging is essentially part-of- speech tagging but with richer structure hence “super” tags. • Like part-of-speech tagging, we use superficial contextual information and statistical estimation to decide the most likely tag.

Supertagging • So what is the context for a supertagger? Context for “de” • Typically, it consists np/n n ? of the current word, the surrounding DET NC P NPP NPP words, the current and surrounding POS la voiture de Prince Charles tags and the previous supertags.

Supertagging • The basic procedure for finding the Context for “de” sequence of formulas then becomes np/n n ? • Find the correct DET NC P NPP NPP POS tag sequence la voiture de Prince Charles • Find the correct supertag sequence

Supertagging • Estimation is done using maximum entropy models Context for “de” • Very standard and easy to modify (ie. we can np/n n ? add any information we think is useful and let DET NC P NPP NPP the estimation algorithm decide which la voiture de Prince Charles ones really are). Any information which we can Any information which we can easily obtain, of course. If we easily obtain, of course. If we • Good performance and think a word having an even think a word having an even number of letters is useful, we number of letters is useful, we can add it. can add it. efficient training (Clark & Curran 2004).

Towards Wide-Coverage Semantics for French Richard Moot LaBRI - PowerPoint PPT Presentation

Towards Wide-Coverage Semantics for French Richard Moot LaBRI (CNRS), SIGNES (INRIA) & U. Bordeaux CAuLD, 13 december 2010, Nancy Research partially funded by grants from the Conseil Regional dAquitaine: Itipy and Grammaire du

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Neatening sketched strokes using piecewise French Curves James McCrae, Karan Singh French Curves

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Introduction to French Business Culture 1 IHRM French Business Culture Agenda The

French People - who we are French People - Studio of design, situated in Shanghai former French

The French baccalaureate until 2020 1. What is the French Baccalaureate or the Bac? The French

Coverage-Oriented Verification Coverage-Oriented Verification of Banias of Banias Alon Gluska

Data Flow Coverage 1 Stuart Anderson Stuart Anderson Data Flow Coverage 1 2011 c 1 Why

Logic-based test coverage Basic approach Clauses and predicates Basic coverage criteria: CC, PC,

Towards Wide-Coverage Semantics Mark Steedman Osnabr uck Semantic Theory and Empirical

WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

GULF LIGHT JANUARY 25, 2012 JJ Jamison jj@juniper.net GLORIAD IS: A cooperative R&E

an example with CERN@school T. Whyntie, Queen Mary University of London ; Langton Star

Database Technology Roadmapping Hector Garcia-Molina CRA Conference at Snowbird 2002 1 Session

Hosted PostgreSQL: An Objective Look Christophe Pettus PostgreSQL Experts, Inc. FOSDEM PGDay

QGAR Environment General Presentation, Perspectives and Discussion . Philippe Dosch

Simple ML Tutorial Mike Williams MIT June 16, 2017 Machine Learning ROOT provides a C++

Disrupting the News Ethan Zuckerman (@ethanz) 10.1.2014 Analysis by Mert Yildiz for Econoscale

Effective Code Reviews: The edge between hard and soft skills Vincius Gubiani Ferreira

Sambuz

Useful Links

Newsletter

Mail Us

Towards Wide-Coverage Semantics for French Richard Moot LaBRI - PowerPoint PPT Presentation

Towards Wide-Coverage Semantics for French Richard Moot LaBRI (CNRS), SIGNES (INRIA) & U. Bordeaux CAuLD, 13 december 2010, Nancy Research partially funded by grants from the Conseil Regional dAquitaine: Itipy and Grammaire du

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Neatening sketched strokes using piecewise French Curves James McCrae, Karan Singh French Curves

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Introduction to French Business Culture 1 IHRM French Business Culture Agenda The

French People - who we are French People - Studio of design, situated in Shanghai former French

The French baccalaureate until 2020 1. What is the French Baccalaureate or the Bac? The French

Coverage-Oriented Verification Coverage-Oriented Verification of Banias of Banias Alon Gluska

Data Flow Coverage 1 Stuart Anderson Stuart Anderson Data Flow Coverage 1 2011 c 1 Why

Logic-based test coverage Basic approach Clauses and predicates Basic coverage criteria: CC, PC,

Towards Wide-Coverage Semantics Mark Steedman Osnabr uck Semantic Theory and Empirical

WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

GULF LIGHT JANUARY 25, 2012 JJ Jamison jj@juniper.net GLORIAD IS: A cooperative R&amp;E

an example with CERN@school T. Whyntie*, * Queen Mary University of London ; Langton Star

Database Technology Roadmapping Hector Garcia-Molina CRA Conference at Snowbird 2002 1 Session

Hosted PostgreSQL: An Objective Look Christophe Pettus PostgreSQL Experts, Inc. FOSDEM PGDay

QGAR Environment General Presentation, Perspectives and Discussion . Philippe Dosch

Simple ML Tutorial Mike Williams MIT June 16, 2017 Machine Learning ROOT provides a C++

Disrupting the News Ethan Zuckerman (@ethanz) 10.1.2014 Analysis by Mert Yildiz for Econoscale

Effective Code Reviews: The edge between hard and soft skills Vincius Gubiani Ferreira

Sambuz

Useful Links

Newsletter

Mail Us

GULF LIGHT JANUARY 25, 2012 JJ Jamison jj@juniper.net GLORIAD IS: A cooperative R&E

an example with CERN@school T. Whyntie, Queen Mary University of London ; Langton Star