A Maximum Entropy Model for Part-of-Speech Introduction Tagging - PowerPoint PPT Presentation

Mawulolo Ameko and Sonia Baee A Maximum Entropy Model for Part-of-Speech Introduction Tagging The probability model Adwait Ratnaparkhi, 1996 Features for POS tagging Testing the Model Mawulolo Ameko and Sonia Baee Error Analysis Comparison with previous work CS 6501-004 - Text Mining Paper Presentation Conclusion April 12th, 2018 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 1

Table of Contents Mawulolo 1 Introduction Ameko and Sonia Baee 2 The probability model Introduction The probability model 3 Features for POS tagging Features for POS tagging 4 Testing the Model Testing the Model 5 Error Analysis Error Analysis Comparison 6 Comparison with previous work with previous work Conclusion 7 Conclusion Question 8 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 2

Background Mawulolo Ameko and Sonia Baee Many Natural Language tasks require the accurate assignment of Part-Of-Speech Introduction (POS) to previously unseen texts. The probability model Previous use cases for Maximum Entropy (MaxEnt) models include: Features for POS tagging . Language modeling (Lau et al., 1993) Testing the Model . Machine translation (Berger et al., 1996) Error Analysis . Prepositional phrase attachment (Ratnaparkhi et al., 1995) Comparison with previous . Word morphology (Della Pietra et al., 1995) work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 4

Model Formulation Given a set of histories H and tag contexts T the probability model is defined Mawulolo over the Cartesian product space H × T as: Ameko and Sonia Baee Probability Model Introduction The probability model k α f j ( h , t ) � p ( h , t ) = πµ Features for j POS tagging j =1 Testing the Model Where π is a normalization constant, { µ, α 1 , · · · , α k } positive model parameters Error Analysis and { f 1 , · · · , f k } features; where f j ( h , t ) ∈ { 0 , 1 } Comparison with previous work Likelihood Function Conclusion Question n n k α f j ( h i , t i ) � � � L ( p ) = p ( h i , t i ) = πµ j i =1 i =1 j =1 CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 6

Equivalent Formulation Mawulolo Ameko and Maximum Entropy Formalism Sonia Baee Introduction � The probability H ( p ) = − p ( h , t ) logp ( h , t ) model h ∈H , t ∈T Features for POS tagging s . t . Testing the Model E f j = ˜ E f j , 1 ≤ j ≤ k Error Analysis Where E f j and ˜ Comparison E f j represent the model’s feature expectation and the observed with previous expectation from the training data, respectively. work Conclusion Generalized Iterative Scaling (Darroch and Ratcliff, 1972) used to determine Question the unique combination of parameters that maximizes the log-likelihood. CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 7

Basic Definition Mawulolo Ameko and Sonia Baee Feature Definition Introduction For a given word and tag context available in a history: The probability model h i = { w i , w i +1 , w i +2 , w i − 1 , w i − 2 , t i − 1 , t i − 2 } Features for POS tagging � 1 if suffix ( w i ) = ” ing ” & t i = VBG Testing the f j ( h i , t i ) = Model 0 otherwise Error Analysis Comparison with previous work The joint distribution of a history h and tag t is determined by the activated Conclusion parameters as enabled from the feature definition. Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 9

Specifically, Feature Definition Mawulolo Ameko and Condition Features Sonia Baee w i is not rare w i = X & t i = T Introduction X is prefix of w i , | X | ≤ 4 & t i = T The probability X is suffix of w i , | X | ≤ 4 & t i = T model w i is rare w i contains number & t i = T Features for POS tagging w i contains uppercase character & t i = T Testing the w i contains hyphen & t i = T Model t i − 1 = X & t i = T Error Analysis t i − 2 t i − 1 = X & t i = T Comparison w i − 1 = X & t i = T with previous ∀ w i work w i − 2 = X & t i = T Conclusion w i +1 = X & t i = T Question w i +2 = X & t i = T CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 10

Testing the Model Mawulolo Ameko and Sonia Baee Introduction The probability • Specifically, uses ”Beam Search” as search algorithm with a with beam size model N = 5 Features for POS tagging • Uses a Tag Dictionary for seen words within the training set Testing the Model • Assigns equal probability to all tags for Unseen words Error Analysis • Test corpus is tagged one sentence at a time Comparison with previous work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 12

Experiments Mawulolo Ameko and Sonia Baee In order to conduct tagging experiments, the Wall St. Journal data has been split Introduction into three contiguous sections. The probability model Features for POS tagging Table: WSJ Data Sizes Testing the Dataset Sentences Words Uknown words Model Training 40000 962687 - Error Analysis Development 8000 192826 6107 Comparison with previous Test 5485 133805 133805 work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 13

Experiments-Results Mawulolo Table: Baseline Performance on Development Set Ameko and Sonia Baee Total Word Accu- Unknown Word Ac- Sentence Accuracy Introduction racy curacy The probability Tag Dictionary 96.43% 86.32% 47.55% model No Tag Dictionary 96.31% 86.28% 47.38% Features for POS tagging Testing the Error analysis reveals some ”Difficult Words”. Model Error Analysis Table: Top Tagging Mistakes on Training Set for Baseline Model Comparison with previous work Word Correct Tag Model Tag Frequency Conclusion about RB IN 393 Question that DT IN 389 more RBR IN 389 up IN RB 187 CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 14

Specialized Features and Consistency Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 16

A Maximum Entropy Model for Part-of-Speech Introduction Tagging - PowerPoint PPT Presentation

Mawulolo Ameko and Sonia Baee A Maximum Entropy Model for Part-of-Speech Introduction Tagging The probability model Adwait Ratnaparkhi, 1996 Features for POS tagging Testing the Model Mawulolo Ameko and Sonia Baee Error Analysis

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Maximum Entropy Beyond Fact to Explain Selecting Probability Maximum Entropy . . . Explaining a

Topic III.2: Maximum Entropy Models Discrete Topics in Data Mining Universitt des Saarlandes,

Comparison Between Bayesian and Maximum Entropy Analysis of Flow Networks 1 Maximum Entropy

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Maximum Entropy Inverse Reinforcement Learning Nomenclature Basis Feature Expectation Matching

Maximum Entropy Tagging (for the Maximum Entropy method itself, refer to NPFL067 added slides

A gentle introduction to Maximum Entropy Models and their friends Mark Johnson Brown University

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Maximum Entropy Model (I) LING 572 Advanced Statistical Methods for NLP January 28, 2020 1

str t tr

Entanglement entropy: logarithmic terms Sergey Solodukhin Institut Denis Poisson (Tours) Talk at

Study that Evaluates Interface Design using Gaze Entropy Yejin Lee a , Hyun-Chul Lee b a SA

NEURAL CONVERSATIONAL MODELS WITH ENTROPY-BASED DATA FILTERING Richard Csaky 1 , Patrik Purgai 1

Asymptotic M5-brane entropy from S-duality Nahmgoong June Seoul National University KIAS,

Dimension of stationary measures with infinite entropy Adam Spiewak University of Warsaw

Overview of Key Changes and Additions in the Second Draft Risk and Exposure Assessment for the SO

Guidance on network numbers and presentation numbers The following guide is to be read in