natural language processing
play

Natural Language Processing Parsing III Dan Klein UC Berkeley 1 - PowerPoint PPT Presentation

Natural Language Processing Parsing III Dan Klein UC Berkeley 1 Unsupervised Tagging 2 Unsupervised Tagging? AKA part of speech induction Task: Raw sentences in Tagged sentences out Obvious thing to do: Start


  1. Natural Language Processing Parsing III Dan Klein – UC Berkeley 1

  2. Unsupervised Tagging 2

  3. Unsupervised Tagging?  AKA part ‐ of ‐ speech induction  Task:  Raw sentences in  Tagged sentences out  Obvious thing to do:  Start with a (mostly) uniform HMM  Run EM  Inspect results 3

  4. EM for HMMs: Process  Alternate between recomputing distributions over hidden variables (the tags) and reestimating parameters  Crucial step: we want to tally up how many (fractional) counts of each kind of transition and emission we have under current params:  Same quantities we needed to train a CRF! 4

  5. Merialdo: Setup  Some (discouraging) experiments [Merialdo 94]  Setup:  You know the set of allowable tags for each word  Fix k training examples to their true labels  Learn P(w|t) on these examples  Learn P(t|t ‐ 1 ,t ‐ 2 ) on these examples  On n examples, re ‐ estimate with EM  Note: we know allowed tags but not frequencies 5

  6. Merialdo: Results 6

  7. Latent Variable PCFGs 7

  8. The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98] 8

  9. The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00] 9

  10. The Game of Designing a Grammar  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]  Automatic clustering? 10

  11. Latent Variable Grammars ... Parse Tree Parameters Derivations Sentence 11

  12. Learning Latent Annotations Forward EM algorithm:  Brackets are known  Base categories are known X 1  Only induce subcategories X 7 X 2 X 4 X 3 X 5 X 6 . He was right Just like Forward ‐ Backward for HMMs. Backward 12

  13. Refinement of the DT tag DT DT-2 DT-1 DT-3 DT-4 13

  14. Hierarchical refinement 14

  15. Hierarchical Estimation Results 90 88 Parsing accuracy (F1) 86 84 82 80 78 76 74 Model F1 100 300 500 700 900 1100 1300 1500 1700 Flat Training 87.3 Total Number of grammar symbols Hierarchical Training 88.4 15

  16. Refinement of the , tag  Splitting all categories equally is wasteful: 16

  17. Adaptive Splitting  Want to split complex categories more  Idea: split everything, roll back splits which were least useful 17

  18. Adaptive Splitting Results Model F1 Previous 88.4 With 50% Merging 89.5 18

  19. 10 15 20 25 30 35 40 0 5 NP VP PP Number of Phrasal Subcategories ADVP S ADJP SBAR QP WHNP PRN NX SINV PRT WHPP SQ CONJP FRAG NAC UCP WHADVP INTJ SBARQ RRC WHADJP X ROOT LST 19

  20. 10 20 30 40 50 60 70 0 NNP JJ NNS NN VBN RB Number of Lexical Subcategories VBG VB VBD CD IN VBZ VBP DT NNPS CC JJR JJS : PRP PRP$ MD RBR WP POS PDT WRB -LRB- . EX WP$ WDT -RRB- '' FW RBS TO $ UH , `` SYM RP LS # 20

  21. Learned Splits  Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters NNP-15 New San Wall NNP-3 York Francisco Street  Personal pronouns (PRP): PRP-0 It He I PRP-1 it he they PRP-2 it them him 21

  22. Learned Splits  Relative adverbs (RBR): RBR-0 further lower higher RBR-1 more less More RBR-2 earlier Earlier later  Cardinal Numbers (CD): CD-7 one two Three CD-4 1989 1990 1988 CD-11 million billion trillion CD-0 1 50 100 CD-3 1 30 31 CD-9 78 58 34 22

  23. Final Results (Accuracy) ≤ 40 words all F1 F1 Charniak&Johnson ‘05 (generative) 90.1 89.6 ENG Split / Merge 90.6 90.1 Dubey ‘05 76.3 - GER Split / Merge 80.8 80.1 Chiang et al. ‘02 80.0 76.6 CHN Split / Merge 86.3 83.4 Still higher numbers from reranking / self-training methods 23

  24. Efficient Parsing for Hierarchical Grammars 24

  25. Coarse ‐ to ‐ Fine Inference  Example: PP attachment ????????? 25

  26. Hierarchical Pruning coarse: … QP NP VP … split in two: … QP1 QP2 NP1 NP2 VP1 VP2 … split in four: … QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 … split in eight: … … … … … … … … … … … … … … … … … 26

  27. Bracket Posteriors 27

  28. 1621 min 111 min 35 min 15 min (no search error) 28

  29. Other Syntactic Models 29

  30. Parse Reranking  Assume the number of parses is very small We can represent each parse T as an arbitrary feature vector  (T)   Typically, all local rules are features  Also non ‐ local features, like how right ‐ branching the overall tree is  [Charniak and Johnson 05] gives a rich set of features 30

  31. [Huang and Chiang 05, K ‐ Best Parsing Pauls, Klein, Quirk 10] 31

  32. Dependency Parsing  Lexicalized parsers can be seen as producing dependency trees questioned lawyer witness the the  Each local binary tree corresponds to an attachment in the dependency graph 32

  33. Dependency Parsing  Pure dependency parsing is only cubic [Eisner 99] X[h] h Y[h] Z[h’] h h’ i h k h’ j h k h’  Some work on non ‐ projective dependencies  Common in, e.g. Czech parsing  Can do with MST algorithms [McDonald and Pereira 05] 33

  34. Shift ‐ Reduce Parsers  Another way to derive a tree:  Parsing  No useful dynamic programming search  Can still use beam search [Ratnaparkhi 97] 34

  35. Data ‐ oriented parsing:  Rewrite large (possibly lexicalized) subtrees in a single step  Formally, a tree ‐ insertion grammar  Derivational ambiguity whether subtrees were generated atomically or compositionally  Most probable parse is NP ‐ complete 35

  36. TIG: Insertion 36

  37. Tree ‐ adjoining grammars  Start with local trees  Can insert structure with adjunction operators  Mildly context ‐ sensitive  Models long ‐ distance dependencies naturally  … as well as other weird stuff that CFGs don’t capture well (e.g. cross ‐ serial dependencies) 37

  38. TAG: Long Distance 38

  39. CCG Parsing  Combinatory Categorial Grammar  Fully (mono ‐ ) lexicalized grammar  Categories encode argument sequences  Very closely related to the lambda calculus (more later)  Can have spurious ambiguities (why?) 39

Recommend


More recommend