Injecting Linguistics into NLP by Annotation Eduard Hovy - PowerPoint PPT Presentation

Injecting Linguistics into NLP by Annotation Eduard Hovy Information Sciences Institute University of Southern California

Lesson 1: Banko and Brill, HLT-01 • Confusion set disambiguation task: {you‘re | your}, {to | too | two}, {its | it‘s} • 5 Algorithms: ngram table, winnow, perceptron, transformation-based learning, decision trees • Training: 10 6  10 9 words • Lessons: – All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization

Lesson 1: Banko and Brill, HLT-01 • Confusion set disambiguation task: {you‘re | your}, {to | too | two}, {its | it‘s} • 5 Algorithms: ngram table, winnow, perceptron, transformation-based learning, decision trees • Training: 10 6  10 9 words You don‘t have to be smart, • Lessons: you just need enough training data – All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization

Lesson 2: Och, ACL-02 • Best MT system in world (Arabic  English, by BLEU and NIST, 2002 –2005): Och‘s work • Method: learn ngram correspondence patterns (alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score w 4 w 3 w 1 w 2 w 3 w 4 w 5  w 1 w 2 w 3 w 4 w 2 w 1 w 1 w 2 w 3 w 4 w 5 • Approximately: EBMT + Viterbi search • Lesson: the more you store, the better your MT

Lesson 2: Och, ACL-02 • Best MT system in world (Arabic  English, by BLEU and NIST, 2002 –2005): Och‘s work • Method: learn ngram correspondence patterns (alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score You don‘t have to be smart, w 4 you just need enough storage w 3 w 1 w 2 w 3 w 4 w 5  w 1 w 2 w 3 w 4 w 2 w 1 w 1 w 2 w 3 w 4 w 5 • Approximately: EBMT + Viterbi search • Lesson: the more you store, the better your MT

Lesson 3: Chiang et al., HLT-2009 • 11,001 New Features for Statistical MT. David Chiang, Kevin Knight, Wei Wang. 2009. Proc. NAACL HLT . Best paper award • Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) < – > x1 de x0 • Several hundred count features of various kinds: reward rules seen more often; punish rules that partly overlap; punish rules that insert is, the , etc. into English … • 10,000 word context features: for each triple ( f; e; f +1 ), feature that counts the number of times that f is aligned to e and f +1 occurs to the right of f ; and similarly for triples ( f; e; f -1 ) with f -1 occurring to the left of f . Restrict words to the 100 most frequent in training data

Lesson 3: Chiang et al., HLT-2009 • 11,001 New Features for Statistical MT. David Chiang, Kevin Knight, Wei Wang. 2009. Proc. NAACL HLT . Best paper award • Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) < – > x1 de x0 • Several hundred count features of various kinds: reward rules seen more often; punish rules that partly overlap; punish rules that insert is, the , etc. into English … You don‘t have to know anything, • 10,000 word context features: for each triple ( f; e; f +1 ), feature that counts the number of times that f is aligned to e and f +1 occurs to you just need enough features the right of f ; and similarly for triples ( f; e; f -1 ) with f -1 occurring to the left of f . Restrict words to the 100 most frequent in training data

Lesson 4: Fleischman and Hovy, ACL-03 • Text mining: classify locations and people from free text into fine-grain classes – Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush) Performance on a Question • Test: QA on ―who is X?‖: Answ ering Task 50 45 – 100 questions from AskJeeves 40 % Correct 35 30 – System 1: Table of instances 25 20 – System 2: ISI‘s TextMap QA system 15 10 – Table system scored 25% better Partial Correct Incorrect State of the Art System Extraction System – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours

Lesson 4: Fleischman and Hovy, ACL-03 • Text mining: classify locations and people from free text into fine-grain classes – Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush) You don‘t have to reason, Performance on a Question • Test: QA on ―who is X?‖: Answ ering Task 50 you just need to collect the 45 – 100 questions from AskJeeves 40 % Correct 35 knowledge beforehand 30 – System 1: Table of instances 25 20 – System 2: ISI‘s TextMap QA system 15 10 – Table system scored 25% better Partial Correct Incorrect State of the Art System Extraction System – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours

Four lessons • You don‘t have to be smart, you just need — the web has all you need enough training data • You don‘t have to be smart, you just need — memory gets cheaper enough memory • You don‘t have to be smart, you just need — computers get faster enough features • You don‘t have to be smart, you just need to collect the knowledge beforehand …we are moving to a new world: • Conclusion: NLP as table lookup

So you may be happy with this, but I am not … I want to understand what‘s going on in language and thought • We have no theory of language or even of language processing in NLP • Our general approach is: – Goal: Transform notation 1 into notation 2 (maybe adding tags…) – Learn how to do this automatically – Design an algorithm to beat the other guy • How can one inject understanding?

• Generally, to reduce the size of a transformation table / statistical model, you introduce a generalization step: – POS tags, syntactic trees, modality labels… • If you‘re smart, the theory behind the generalization actually ‗explains‘ or ‗captures‘ the phenomenon – Classes of the phenomenon + rules linking them • ‗Good‘ NLP can test the adequacy of a theory by determining the table reduction factor • How can you introduce the generalization info?

Annotation! 1. Preparation – Which corpus? – Choose the corpus – Interface design issues – Build the interfaces 2. Instantiating the theory – How remain true to – Create the annotation choices theory? – Test-run them for stability 3. Annotation – How many annotators? – Annotate – Which procedure? – Reconcile among annotators 4. Validation – Which measures? – Measure inter-annotator agreement – Possibly adjust theory instantiation 5. Delivery – Wrap the result ‗annotation science‘

The new NLP world • Fundamental methodological assumptions of NLP: – Old-style NLP: process is deterministic; manually written rules will exactly generate desired product – Statistical NLP: process is (somewhat) nondeterministic; probabilities predict likelihood of products – Underlying assumption: as long as annotator consistency can be achieved, there is systematicity, and systems will learn to find it • Theory creation (and testing!) through corpus annotation – But we (still) have to manually identify generalizations (= equivalence classes of individual instances of phenomena) to obtain expressive generality/power – This is the ‗theory‘ – (and we need to understand how to do annotation properly)

Who are the people with the ‗theory‘? Not us! • Our ‗theory‘ of sentiment • Our ‗theory‘ of entailment • Our ‗theory‘ of MT • Our ‗theory‘ of IR • Our ‗theory‘ of QA • …

A fruitful cycle Linguists, psycholinguists, cognitive linguists… Analysis, theorizing, annotation annotated problems: low corpus performance evaluation Storage in Machine large tables, learning of automated optimization transformations creation method NLP companies Current NLP researchers • Each one influences the others • Different people like different work

Toward a theory of NLP? • Basic tenets: 1. NLP is notation transformation 2. There exists a natural and optimal set of transformation steps , each involving a dedicated and distinct representation • Problem: syntax-semantics and semantics-pragmatics interfaces 3. Each rep. is based on a suitable (family of) theories in linguistics, philosophy, rhetorics, social interaction studies, etc. • Problem: which theory/ies? Why? 4. Except for a few circumscribed phenomena (morphology, number expressions, etc.), the phenomena being represented are too complex and interrelated for human-built rules to handle them well • Puzzle: but they can (usually) be annotated in corpora: why? 5. A set of machine learning algorithms and a set of features can be used to learn the transformations from suitably annotated corpora • Problem: which algorithms and features? Why? • Observation: We (almost) completely lack the theoretical framework to describe and measure the informational content and complexity of the representation levels we use — a challenge for the future

The face of NLP tomorrow Three (and a Half) Trends — The Near Future of NLP: 1. Machine learning transformations 2. Analysis and corpus construction 3. Table construction and use 4. Evaluation frameworks Who are you ???

Thank you!

Injecting Linguistics into NLP by Annotation Eduard Hovy - PowerPoint PPT Presentation

Injecting Linguistics into NLP by Annotation Eduard Hovy Information Sciences Institute University of Southern California Lesson 1: Banko and Brill, HLT-01 Confusion set disambiguation task: {youre | your}, {to | too | two}, {its |

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Why does NLP need linguistics? Julia Hockenmaier juliahmr@illinois.edu NLP and Linguistics:

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Resources for Computational Linguistics Annotation Tools: RSTTool &MMAX Presentation by

Sharing: Transmission of HIV during Injecting 6 Ways SW-IDUs put themselves at risk of HIV

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Computational linguistics and NLP: How far from generic linguistics? Andrey Kutuzov University

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

Tagging Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of

Monica Ber+ (University of Roma Tor Vergata) SAWS Workshop

Library Partnership Initiative NewsGuard uses journalism to fight false news, misinformation,

Graduate College Council Agenda -- 27 January 2020, 3pm; Perkins Ewing Room Prepared as draft

Is there a data barrier to entry? Hal Varian June 2015

r s trs t

AND ITS MEASUREMENT AND ITS MEASUREMENT INTRODUCTION INTRODUCTION Frame- -Dragging Dragging

Digital Methods in Language Documentation Andrea Berez-Kroeker, University of Hawaii at Mnoa

Injecting Linguistics into NLP by Annotation Eduard Hovy - PowerPoint PPT Presentation

Injecting Linguistics into NLP by Annotation Eduard Hovy Information Sciences Institute University of Southern California Lesson 1: Banko and Brill, HLT-01 Confusion set disambiguation task: {youre | your}, {to | too | two}, {its |

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Why does NLP need linguistics? Julia Hockenmaier juliahmr@illinois.edu NLP and Linguistics:

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Resources for Computational Linguistics Annotation Tools: RSTTool &amp;MMAX Presentation by

Sharing: Transmission of HIV during Injecting 6 Ways SW-IDUs put themselves at risk of HIV

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Computational linguistics and NLP: How far from generic linguistics? Andrey Kutuzov University

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Lecture 2 Annotation tools &amp; Segmentation Summary of Part 1 Annotation theory

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

Tagging Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA University of

Monica Ber+ (University of Roma Tor Vergata) SAWS Workshop

Library Partnership Initiative NewsGuard uses journalism to fight false news, misinformation,

Graduate College Council Agenda -- 27 January 2020, 3pm; Perkins Ewing Room Prepared as draft

Is there a data barrier to entry? Hal Varian June 2015

r s trs t

AND ITS MEASUREMENT AND ITS MEASUREMENT INTRODUCTION INTRODUCTION Frame- -Dragging Dragging

Digital Methods in Language Documentation Andrea Berez-Kroeker, University of Hawaii at Mnoa

Resources for Computational Linguistics Annotation Tools: RSTTool &MMAX Presentation by

Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory