last class anlp lecture 15
play

Last class ANLP Lecture 15 Probabilistic context-free grammars - PDF document

Last class ANLP Lecture 15 Probabilistic context-free grammars Dependency Syntax and Parsing Probabilistic CYK Shay Cohen (based on slides by Sharon Goldwater Best-first parsing and Nathan Schneider) Problems with PCFGs (model


  1. Last class ANLP Lecture 15 ◮ Probabilistic context-free grammars Dependency Syntax and Parsing ◮ Probabilistic CYK Shay Cohen (based on slides by Sharon Goldwater ◮ Best-first parsing and Nathan Schneider) ◮ Problems with PCFGs (model makes too strong independence assumptions) 18 October, 2019 A warm-up question Evaluating parse accuracy We described the generative story for PCFGs - pick a rule at Compare gold standard tree (left) to parser output (right): random and terminate when choosing a terminal symbol. Does S S this process have to terminate? NP VP NP VP Pro Pro Vt NP Vp NP VP he he saw PosPro N saw Pro Vi her duck her duck ◮ Output constituent is counted correct if there is a gold constituent that spans the same sentence positions. ◮ Harsher measure: also require the constituent labels to match. ◮ Pre-terminals don’t count as constituents.

  2. Evaluating parse accuracy Parsing: where are we now? Compare gold standard tree (left) to parser output (right): S S ◮ We discussed the basics of probabilistic parsing and you should now have a good idea of the issues involved. NP VP NP VP ◮ State-of-the-art parsers address these issues in other ways. For Pro Pro comparison, parsing F-scores on WSJ corpus are: Vt NP Vp NP VP ◮ vanilla PCFG: < 80% 1 he he ◮ lexicalizing + cat-splitting: 89.5% (Charniak, 2000) saw PosPro N saw Pro Vi ◮ Best current parsers get about 94% her duck her duck ◮ We’ll say a little bit about recent methods later, but most ◮ Precision : (# correct constituents)/(# in parser output) = details in sem 2. 3/5 ◮ Recall : (# correct constituents)/(# in gold standard) = 3/4 ◮ F-score : balances precision/recall: 2pr/(p+r) 1 Charniak (1996) reports 81% but using gold POS tags as input. Parsing: where are we now? Lexicalization, again We saw that adding lexical head of the phrase can help choose the right parse: S-saw Parsing is not just WSJ. Lots of situations are much harder! ◮ Other languages, esp with free word order (up next) or little annotated data. NP-kids VP-saw ◮ Other domains, esp with jargon (e.g., biomedical) or kids non-standard language (e.g., social media text). VP-saw PP-fish In fact, due to increasing focus on multilingual NLP, constituency syntax/parsing (English-centric) is losing ground to dependency P-with NP-fish V-saw NP-birds parsing ... saw birds with fish Dependency syntax focuses on the head-dependent relationships.

  3. Dependency syntax Dependency trees A valid dependency tree for a sentence requires: ◮ A single distinguished root word. An alternative approach to sentence structure. ◮ All other words have exactly one incoming edge. ◮ A fully lexicalized formalism: no phrasal categories. ◮ A unique path from the root to each other word. ◮ Assumes binary, asymmetric grammatical relations between words: head-dependent relations, shown as directed edges: kids saw birds with fish kids saw birds with fish ◮ Here, edges point from heads to their dependents. kids saw birds with binoculars It really is a tree! Labelled dependencies It is often useful to distinguish different kinds of head → modifier relations, by labelling edges: ◮ The usual way to show dependency trees is with edges over ordered sentences. ROOT ◮ But the edge structure (without word order) can also be NMOD shown as a more obvious tree: NSUBJ DOBJ CASE kids saw birds with fish saw kids birds ◮ Historically, different treebanks/languages used different labels. fish ◮ Now, the Universal Dependencies project aims to kids saw birds with fish standardize labels and annotation conventions, bringing with together annotated corpora from over 50 languages. ◮ Labels in this example (and in textbook) are from UD.

  4. Why dependencies?? Why dependencies?? Consider these sentences. Two ways to say the same thing: S S Consider these sentences. Two ways to say the same thing: S S NP VP NP VP Sasha Sasha NP VP NP VP V NP NP V NP PP Sasha Sasha gave the girl a book a book to the girl gave V NP NP V NP PP gave the girl a book ◮ We only need a few phrase structure rules: a book to the girl gave S → NP VP VP → V NP NP VP → V NP PP plus rules for NP and PP. Equivalent sentences in Russian Equivalent sentences in Russian ◮ Russian uses morphology to mark relations between words: ◮ knigu means book (kniga) as a direct object. ◮ Russian uses morphology to mark relations between words: ◮ devochke means girl (devochka) as indirect object (to the girl). ◮ knigu means book (kniga) as a direct object. ◮ So we can have the same word orders as English: ◮ devochke means girl (devochka) as indirect object (to the girl). ◮ Sasha dal devochke knigu ◮ So we can have the same word orders as English: ◮ Sasha dal knigu devochke ◮ Sasha dal devochke knigu ◮ But also many others! ◮ Sasha dal knigu devochke ◮ Sasha devochke dal knigu ◮ Devochke dal Sasha knigu ◮ Knigu dal Sasha devochke

  5. Phrase structure vs dependencies Phrase structure vs dependencies ◮ In languages with free word order , phrase structure (constituency) grammars don’t make as much sense. ◮ E.g., we would need both S → NP VP and S → VP NP , etc. ◮ In languages with free word order , phrase structure Not very informative about what’s really going on. (constituency) grammars don’t make as much sense. ◮ In contrast, the dependency relations stay constant: ◮ E.g., we would need both S → NP VP and S → VP NP , etc. ROOT ROOT Not very informative about what’s really going on. DOBJ IOBJ NSUBJ IOBJ NSUBJ DOBJ Sasha dal devochke knigu Sasha dal knigu devochke Phrase structure vs dependencies Pros and cons ◮ Even more obvious if we just look at the trees without word ◮ Sensible framework for free word order languages. order: ◮ Identifies syntactic relations directly. (using CFG, how would ROOT ROOT you identify the subject of a sentence?) DOBJ IOBJ ◮ Dependency pairs/chains can make good features in NSUBJ IOBJ NSUBJ DOBJ classifiers, for information extraction, etc. ◮ Parsers can be very fast (coming up...) Sasha dal devochke knigu Sasha dal knigu devochke But dal dal ◮ The assumption of asymmetric binary relations isn’t always right... e.g., how to parse dogs and cats? Sasha devochke knigu Sasha devochke knigu

  6. How do we annotate dependencies? Lexicalized Constituency Parse S-saw Two options: NP-kids VP-saw 1. Annotate dependencies directly. kids 2. Convert phrase structure annotations to dependencies. V-saw NP-birds (Convenient if we already have a phrase structure treebank.) Next slides show how to convert, assuming we have head-finding saw NP-birds PP-fish rules for our phrase structure trees. birds P-with NP-fish with fish . . . remove the phrasal categories. . . . . . remove the (duplicated) terminals. . . saw saw kids saw kids saw kids saw birds saw birds saw birds fish birds fish with fish birds with fish with fish

  7. . . . and collapse chains of duplicates. . . . . . and collapse chains of duplicates. . . saw saw kids saw kids saw saw birds saw birds birds fish birds fish with with fish . . . and collapse chains of duplicates. . . . . . and collapse chains of duplicates. . . saw saw kids saw kids saw saw birds saw birds fish birds fish with with

  8. . . . and collapse chains of duplicates. . . . . . done! saw saw kids saw kids birds saw birds fish fish with with Constituency Tree → Dependency Tree Head Rules The standard solution is to use head rules : for every non-unary We saw how the lexical head of the phrase can be used to (P)CFG production, designate one RHS nonterminal as containing collapse down to a dependency tree: the head. S → NP VP , VP → VP PP , PP → P NP (content head), S-saw etc. S NP-kids VP-saw NP VP kids VP-saw PP-binoculars kids VP PP V-saw NP-birds P-with NP-binoculars V NP P NP saw birds with binoculars saw birds with binoculars ◮ Heuristics to scale this to large grammars: e.g., within an NP , ◮ But how can we find each phrase’s head in the first place? last immediate N child is the head.

  9. Head Rules Head Rules Then, propagate heads up the tree: Then, propagate heads up the tree: S S NP-kids VP NP-kids VP kids kids VP PP VP-saw PP V-saw NP-birds V-saw NP-birds P-with NP-binoculars P-with NP-binoculars saw birds saw birds with binoculars with binoculars Head Rules Head Rules Then, propagate heads up the tree: Then, propagate heads up the tree: S S NP-kids VP NP-kids VP-saw kids kids VP-saw PP-binoculars VP-saw PP-binoculars V-saw NP-birds V-saw NP-birds P-with NP-binoculars P-with NP-binoculars saw birds saw birds with binoculars with binoculars

Recommend


More recommend