Probabilistic Parsing: Issues & Improvement LING 571 — Deep Processing Techniques for NLP October 14, 2019 Shane Steinert-Threlkeld 1
Announcements ● HW2 grades posted (mean 87) ● Reference code available in ● /dropbox/19-20/571/hw2/reference_code ● NB: not needed for HW3; you can assume that all grammars are already in CNF 2
Homework Tips ● Use nltk.load for reading grammars; will save you and TA time and headaches! ● Run your code on patas to produce the output you submit in TAR file ● Some discrepancies found that seem due to different environment ● readme.{txt|pdf} : this should NOT be inside your TAR file, but a separate upload on Canvas 3
Notes on HW #3 ● Python’s range has many use cases by manipulating start/end, and step ● range(n) is equivalent to range(0, n, 1) ● Reminder: the rhs= argument in NLTK’s grammar.productions() method only matches the first symbol, not an entire string ● You’ll want to implement an efficient look-up based on RHS ● HW3: compare your output to running HW1 parser on the same grammar/ sentences [order of output in ambiguous sentences could differ] 4
Indigenous Peoples’ Day ● Seattle/Sealth ● For those of you taking 550: ● The Lushootseed spelling [IPA] of Chief Seattle/Sealth: ● si ʔ a ɫ [ ˈ si ʔ a ː ɬ ] ● Duwamish — Dx ʷ d ə w ʔ ab š [dx ʷ d ɐ w ʔ ab ʃ ] ● IPA resources: ● https://en.wikipedia.org/wiki/International_Phonetic_Alphabet ● http://web.mit.edu/6.mitx/www/24.900%20IPA/IPAapp.html 5
Indigenous Peoples’ Day ● Studying non-English languages gives more holistic insight for NLP tasks ● Many interesting phenomena in non-Indo-European languages ● Lushootseed exhibits debatable distinction between verbs and nouns [link to Glottolog page for more references] ● ʔ ux ̌ʷ ti sbiaw goes that-which is-a-coyote “The/a coyote goes” via Beck, 2013 ● sbiaw ti ʔ ux ̌ʷ is-a-coyote that-which goes “The one who goes is a coyote” ● (Translation distinction provided for clarity — semantically equivalent) ● Lillooet Salish quantification has repercussions for e.g. English (Matthewson 2001) 6
Indigenous Peoples’ Day ● UW American Indian Studies Courses ● (Sometimes including language courses, e.g. Southern Lushootseed) ● At the new Burke Museum on campus: ● https://www.burkemuseum.org/calendar/indigenous-peoples-day 7
PCFG Induction 8
Learning Probabilities ● Simplest way: ● Use treebank of parsed sentences ● To compute probability of a rule, count: ● Number of times a nonterminal is expanded: Σ 𝛿 Count ( 𝛽 → 𝛿 ) ● Number of times a nonterminal is expanded by a given rule: Count ( 𝛽 → 𝛾 ) ∑ γ Count ( α → γ ) = Count ( α → β ) Count ( α → β ) P ( α → β | α ) = Count ( α ) ● Alternative: Learn probabilities by re-estimating ● (Later) 9
Inducing a PCFG S NP VP . NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 10
Inducing a PCFG S S → * S → NPVP . 1 1 NP VP . NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 11
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 1 1 NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 12
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 1 1 VP → * VP → VBZ NP 1 1 NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 13
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 2 1 VP → * VP → VBZ NP 1 1 NP → NP PP 1 NNP NNP VBZ NP . Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 14
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 2 1 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 15
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 3 1 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 16
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 4 2 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 17
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 2 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NP → DT NNP VBG 1 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 18
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 2 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1 Mr. Vinken is NP PP NP → DT NNP VBG 1 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 19
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 2/5 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 1/5 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 1/5 Mr. Vinken is NP PP NP → DT NNP VBG 1/5 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 20
Inducing a PCFG S S → * S → NP VP . 1 1 NP VP . NP → * NP → NNP NNP 5 0.4 VP → * VP → VBZ NP 1 1 PP → * NP → NP PP 1 0.2 NNP NNP VBZ NP . PP → IN NP 1 NP → NP , NP 0.2 Mr. Vinken is NP PP NP → DT NNP VBG 0.2 NN NN IN NP chairman of NP , NP NNP NNP , DT NNP VBG NN Elsevier N.V. the Dutch publishing group 21
Problems with PCFGs 22
Problems with PCFGs ● Independence Assumption ● Assume that rule probabilities are independent ● Lack of Lexical Conditioning ● Lexical items should influence the choice of analysis 23
Issues with PCFGs: Independence Assumption ● Context Free ⇒ Independence Assumption ● Rule expansion is context-independent ● Allows us to multiply probabilities Semantic Role of NPs in Switchboard Corpus ● If we have two rules: Pronomial Non-Pronomial ● NP → DT NN [0.28] Subject 91% 9% ● NP → PRP [0.25] Object 34% 66% ● What does this new data tell us? ● NP → DT NN [0.09 if NP Θ = subject else 0.66] …Can try parent annotation ● NP → PRP [0.91 if NP Θ = subject else 0.34] 24
Issues with PCFGs: Lexical Conditioning S S NP VP NP VP NNS VBD NP NNS VBD NP PP NNS P NP workers dumped NNS PP workers dumped * P NP sacks into DT NN sacks a bin into DT NN a bin (“into a bin” = location of sacks after dumping) (“into a bin” = * the sacks which were located in PP ) OK! not OK 25
Issues with PCFGs: Lexical Conditioning S S NP VP NP VP NNS VBD NP NNS VBD NP workers dumped NNS PP workers dumped NNS PP * P NP P NP sacks sacks in DT NN into DT NN a bin a bin (“ into a bin” = * the sacks which were located in PP ) (“ in a bin” = location of sacks before dumping) not OK OK! 26
Issues with PCFGs: Lexical Conditioning ● workers dumped sacks into a bin ● into should prefer modifying dumped ● into should disprefer modifying sacks ● fishermen caught tons of herring ● of should prefer modifying tons ● of should disprefer modifying caught 27
Issues with PCFGs: Coordination Ambiguity NP NP NP Conj NP NP PP NP PP Noun and Noun Prep NP Noun Prep NP cats dogs in NP Conj NP Noun Noun Noun and dogs in houses cats houses 28
Issues with PCFGs: Coordination Ambiguity NP NP NP Conj NP NP PP NP PP Noun and Noun Prep NP Noun Prep NP cats dogs in NP Conj NP Noun Noun Noun and dogs in houses cats houses NP → NP Conj NP NP → NP PP NP → NP PP Noun → “dogs” Noun → “dogs” PP → Prep NP Same Rules! PP → Prep NP Prep → “in” Prep → “in” NP → NP Conj NP NP → Noun NP → Noun Noun → “houses” Noun → “houses” Conj → “and” Conj → “and” NP → Noun NP → Noun 29 Noun → “cats” Noun → “cats”
Recommend
More recommend