a maximum entropy model for part of speech
play

A Maximum Entropy Model for Part-of-Speech Introduction Tagging - PowerPoint PPT Presentation

Mawulolo Ameko and Sonia Baee A Maximum Entropy Model for Part-of-Speech Introduction Tagging The probability model Adwait Ratnaparkhi, 1996 Features for POS tagging Testing the Model Mawulolo Ameko and Sonia Baee Error Analysis


  1. Mawulolo Ameko and Sonia Baee A Maximum Entropy Model for Part-of-Speech Introduction Tagging The probability model Adwait Ratnaparkhi, 1996 Features for POS tagging Testing the Model Mawulolo Ameko and Sonia Baee Error Analysis Comparison with previous work CS 6501-004 - Text Mining Paper Presentation Conclusion April 12th, 2018 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 1

  2. Table of Contents Mawulolo 1 Introduction Ameko and Sonia Baee 2 The probability model Introduction The probability model 3 Features for POS tagging Features for POS tagging 4 Testing the Model Testing the Model 5 Error Analysis Error Analysis Comparison 6 Comparison with previous work with previous work Conclusion 7 Conclusion Question 8 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 2

  3. Table of Contents Mawulolo 1 Introduction Ameko and Sonia Baee 2 The probability model Introduction The probability model 3 Features for POS tagging Features for POS tagging 4 Testing the Model Testing the Model 5 Error Analysis Error Analysis Comparison 6 Comparison with previous work with previous work Conclusion 7 Conclusion Question 8 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 3

  4. Background Mawulolo Ameko and Sonia Baee Many Natural Language tasks require the accurate assignment of Part-Of-Speech Introduction (POS) to previously unseen texts. The probability model Previous use cases for Maximum Entropy (MaxEnt) models include: Features for POS tagging . Language modeling (Lau et al., 1993) Testing the Model . Machine translation (Berger et al., 1996) Error Analysis . Prepositional phrase attachment (Ratnaparkhi et al., 1995) Comparison with previous . Word morphology (Della Pietra et al., 1995) work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 4

  5. Table of Contents Mawulolo 1 Introduction Ameko and Sonia Baee 2 The probability model Introduction The probability model 3 Features for POS tagging Features for POS tagging 4 Testing the Model Testing the Model 5 Error Analysis Error Analysis Comparison 6 Comparison with previous work with previous work Conclusion 7 Conclusion Question 8 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 5

  6. Model Formulation Given a set of histories H and tag contexts T the probability model is defined Mawulolo over the Cartesian product space H × T as: Ameko and Sonia Baee Probability Model Introduction The probability model k α f j ( h , t ) � p ( h , t ) = πµ Features for j POS tagging j =1 Testing the Model Where π is a normalization constant, { µ, α 1 , · · · , α k } positive model parameters Error Analysis and { f 1 , · · · , f k } features; where f j ( h , t ) ∈ { 0 , 1 } Comparison with previous work Likelihood Function Conclusion Question n n k α f j ( h i , t i ) � � � L ( p ) = p ( h i , t i ) = πµ j i =1 i =1 j =1 CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 6

  7. Equivalent Formulation Mawulolo Ameko and Maximum Entropy Formalism Sonia Baee Introduction � The probability H ( p ) = − p ( h , t ) logp ( h , t ) model h ∈H , t ∈T Features for POS tagging s . t . Testing the Model E f j = ˜ E f j , 1 ≤ j ≤ k Error Analysis Where E f j and ˜ Comparison E f j represent the model’s feature expectation and the observed with previous expectation from the training data, respectively. work Conclusion Generalized Iterative Scaling (Darroch and Ratcliff, 1972) used to determine Question the unique combination of parameters that maximizes the log-likelihood. CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 7

  8. Table of Contents Mawulolo 1 Introduction Ameko and Sonia Baee 2 The probability model Introduction The probability model 3 Features for POS tagging Features for POS tagging 4 Testing the Model Testing the Model 5 Error Analysis Error Analysis Comparison 6 Comparison with previous work with previous work Conclusion 7 Conclusion Question 8 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 8

  9. Basic Definition Mawulolo Ameko and Sonia Baee Feature Definition Introduction For a given word and tag context available in a history: The probability model h i = { w i , w i +1 , w i +2 , w i − 1 , w i − 2 , t i − 1 , t i − 2 } Features for POS tagging � 1 if suffix ( w i ) = ” ing ” & t i = VBG Testing the f j ( h i , t i ) = Model 0 otherwise Error Analysis Comparison with previous work The joint distribution of a history h and tag t is determined by the activated Conclusion parameters as enabled from the feature definition. Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 9

  10. Specifically, Feature Definition Mawulolo Ameko and Condition Features Sonia Baee w i is not rare w i = X & t i = T Introduction X is prefix of w i , | X | ≤ 4 & t i = T The probability X is suffix of w i , | X | ≤ 4 & t i = T model w i is rare w i contains number & t i = T Features for POS tagging w i contains uppercase character & t i = T Testing the w i contains hyphen & t i = T Model t i − 1 = X & t i = T Error Analysis t i − 2 t i − 1 = X & t i = T Comparison w i − 1 = X & t i = T with previous ∀ w i work w i − 2 = X & t i = T Conclusion w i +1 = X & t i = T Question w i +2 = X & t i = T CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 10

  11. Table of Contents Mawulolo 1 Introduction Ameko and Sonia Baee 2 The probability model Introduction The probability model 3 Features for POS tagging Features for POS tagging 4 Testing the Model Testing the Model 5 Error Analysis Error Analysis Comparison 6 Comparison with previous work with previous work Conclusion 7 Conclusion Question 8 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 11

  12. Testing the Model Mawulolo Ameko and Sonia Baee Introduction The probability • Specifically, uses ”Beam Search” as search algorithm with a with beam size model N = 5 Features for POS tagging • Uses a Tag Dictionary for seen words within the training set Testing the Model • Assigns equal probability to all tags for Unseen words Error Analysis • Test corpus is tagged one sentence at a time Comparison with previous work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 12

  13. Experiments Mawulolo Ameko and Sonia Baee In order to conduct tagging experiments, the Wall St. Journal data has been split Introduction into three contiguous sections. The probability model Features for POS tagging Table: WSJ Data Sizes Testing the Dataset Sentences Words Uknown words Model Training 40000 962687 - Error Analysis Development 8000 192826 6107 Comparison with previous Test 5485 133805 133805 work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 13

  14. Experiments-Results Mawulolo Table: Baseline Performance on Development Set Ameko and Sonia Baee Total Word Accu- Unknown Word Ac- Sentence Accuracy Introduction racy curacy The probability Tag Dictionary 96.43% 86.32% 47.55% model No Tag Dictionary 96.31% 86.28% 47.38% Features for POS tagging Testing the Error analysis reveals some ”Difficult Words”. Model Error Analysis Table: Top Tagging Mistakes on Training Set for Baseline Model Comparison with previous work Word Correct Tag Model Tag Frequency Conclusion about RB IN 393 Question that DT IN 389 more RBR IN 389 up IN RB 187 CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 14

  15. Table of Contents Mawulolo 1 Introduction Ameko and Sonia Baee 2 The probability model Introduction The probability model 3 Features for POS tagging Features for POS tagging 4 Testing the Model Testing the Model 5 Error Analysis Error Analysis Comparison 6 Comparison with previous work with previous work Conclusion 7 Conclusion Question 8 Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 15

  16. Specialized Features and Consistency Mawulolo Ameko and Sonia Baee Introduction The probability model Features for POS tagging Testing the Model Error Analysis Comparison with previous work Conclusion Question CS6501-004: A Maximum Entropy Model for Part-of-Speech Tagging Mawulolo Ameko and Sonia Baee 16

Recommend


More recommend