probabilistic parsing issues improvement
play

Probabilistic Parsing: Issues & Improvement LING 571 Deep - PowerPoint PPT Presentation

Probabilistic Parsing: Issues & Improvement LING 571 Deep Processing Techniques for NLP October 14, 2019 Shane Steinert-Threlkeld 1 Announcements HW2 grades posted (mean 87) Reference code available in


  1. Probabilistic Parsing: 
 Issues & Improvement LING 571 — Deep Processing Techniques for NLP October 14, 2019 Shane Steinert-Threlkeld 1

  2. Announcements ● HW2 grades posted (mean 87) ● Reference code available in ● /dropbox/19-20/571/hw2/reference_code ● NB: not needed for HW3; you can assume that all grammars are already in CNF 2

  3. Homework Tips ● Use nltk.load for reading grammars; will save you and TA time and headaches! ● Run your code on patas to produce the output you submit in TAR file ● Some discrepancies found that seem due to different environment ● readme.{txt|pdf} : this should NOT be inside your TAR file, but a separate upload on Canvas 3

  4. Notes on HW #3 ● Python’s range has many use cases by manipulating start/end, and step ● range(n) is equivalent to range(0, n, 1) ● Reminder: the rhs= argument in NLTK’s grammar.productions() method only matches the first symbol, not an entire string ● You’ll want to implement an efficient look-up based on RHS ● HW3: compare your output to running HW1 parser on the same grammar/ sentences [order of output in ambiguous sentences could differ] 4

  5. Indigenous Peoples’ Day ● Seattle/Sealth ● For those of you taking 550: ● The Lushootseed spelling [IPA] of Chief Seattle/Sealth: ● si ʔ a ɫ [ ˈ si ʔ a ː ɬ ] ● Duwamish — Dx ʷ d ə w ʔ ab š [dx ʷ d ɐ w ʔ ab ʃ ] ● IPA resources: ● https://en.wikipedia.org/wiki/International_Phonetic_Alphabet ● http://web.mit.edu/6.mitx/www/24.900%20IPA/IPAapp.html 5

  6. Indigenous Peoples’ Day ● Studying non-English languages gives more holistic insight for NLP tasks ● Many interesting phenomena in non-Indo-European languages ● Lushootseed exhibits debatable distinction between verbs and nouns [link to Glottolog page for more references] ● ʔ ux ̌ʷ ti sbiaw 
 goes that-which is-a-coyote 
 “The/a coyote goes” via Beck, 2013 ● sbiaw ti ʔ ux ̌ʷ 
 is-a-coyote that-which goes 
 “The one who goes is a coyote” ● (Translation distinction provided for clarity — semantically equivalent) ● Lillooet Salish quantification has repercussions for e.g. English (Matthewson 2001) 6

  7. Indigenous Peoples’ Day ● UW American Indian Studies Courses ● (Sometimes including language courses, e.g. Southern Lushootseed) ● At the new Burke Museum on campus: ● https://www.burkemuseum.org/calendar/indigenous-peoples-day 7

  8. PCFG Induction 8

  9. Learning Probabilities ● Simplest way: ● Use treebank of parsed sentences ● To compute probability of a rule, count: ● Number of times a nonterminal is expanded: Σ 𝛿 Count ( 𝛽 → 𝛿 ) ● Number of times a nonterminal is expanded by a given rule: Count ( 𝛽 → 𝛾 ) ∑ γ Count ( α → γ ) = Count ( α → β ) Count ( α → β ) P ( α → β | α ) = Count ( α ) ● Alternative: Learn probabilities by re-estimating ● (Later) 9

  10. Inducing a PCFG S NP VP . NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 10

  11. Inducing a PCFG S S → * S → NPVP . 1 1 NP VP . NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 11

  12. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 1 1 NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 12

  13. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 1 1 VP → * VP → VBZ NP 1 1 NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 13

  14. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 2 1 VP → * VP → VBZ NP 1 1 NP → NP PP 1 NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 14

  15. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 2 1 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 15

  16. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 3 1 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 16

  17. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 4 2 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 17

  18. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 2 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NP → DT NNP VBG 1 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 18

  19. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 2 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NP → DT NNP VBG 1 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 19

  20. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 2/5 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1/5 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1/5 Mr. Vinken is NP PP NP → DT NNP VBG 1/5 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 20

  21. Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 0.4 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 0.2 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 0.2 Mr. Vinken is NP PP NP → DT NNP VBG 0.2 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 21

  22. Problems with PCFGs 22

  23. Problems with PCFGs ● Independence Assumption ● Assume that rule probabilities are independent ● Lack of Lexical Conditioning ● Lexical items should influence the choice of analysis 23

  24. Issues with PCFGs: Independence Assumption ● Context Free ⇒ Independence Assumption ● Rule expansion is context-independent ● Allows us to multiply probabilities Semantic Role of NPs in Switchboard Corpus ● If we have two rules: Pronomial Non-Pronomial ● NP → DT NN [0.28] Subject 91% 9% ● NP → PRP [0.25] Object 34% 66% ● What does this new data tell us? ● NP → DT NN [0.09 if NP Θ = subject else 0.66] …Can try parent annotation ● NP → PRP [0.91 if NP Θ = subject else 0.34] 24

  25. Issues with PCFGs: Lexical Conditioning S S NP VP NP VP NNS VBD NP NNS VBD NP PP NNS P NP workers dumped NNS PP workers dumped * P NP sacks into DT NN sacks a bin into DT NN a bin (“into a bin” = location of sacks after dumping) (“into a bin” = * the sacks which were located in PP ) OK! not OK 25

  26. Issues with PCFGs: Lexical Conditioning S S NP VP NP VP NNS VBD NP NNS VBD NP workers dumped NNS PP workers dumped NNS PP * P NP P NP sacks sacks in DT NN into DT NN a bin a bin (“ into a bin” = * the sacks which were located in PP ) (“ in a bin” = location of sacks before dumping) not OK OK! 26

  27. Issues with PCFGs: Lexical Conditioning ● workers dumped sacks into a bin ● into should prefer modifying dumped ● into should disprefer modifying sacks ● fishermen caught tons of herring ● of should prefer modifying tons ● of should disprefer modifying caught 27

  28. Issues with PCFGs: Coordination Ambiguity NP NP NP Conj NP NP PP NP PP Noun and Noun Prep NP Noun Prep NP cats dogs in NP Conj NP Noun Noun Noun and dogs in houses cats houses 28

  29. Issues with PCFGs: Coordination Ambiguity NP NP NP Conj NP NP PP NP PP Noun and Noun Prep NP Noun Prep NP cats dogs in NP Conj NP Noun Noun Noun and dogs in houses cats houses NP → NP Conj NP NP → NP PP NP → NP PP Noun → “dogs” Noun → “dogs” PP → Prep NP Same Rules! PP → Prep NP Prep → “in” Prep → “in” NP → NP Conj NP NP → Noun NP → Noun Noun → “houses” Noun → “houses” Conj → “and” Conj → “and” NP → Noun NP → Noun 29 Noun → “cats” Noun → “cats”

Recommend


More recommend