algorithms for nlp
play

Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan - PowerPoint PPT Presentation

Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan Titov University of Edinburgh, Taylor Berg-Kirkpatrick CMU/UCSD, Dan Klein UC Berkeley Ambiguity I saw a girl with a telescope Parsing INPUT: The move


  1. Probabilistic Context-Free Grammars ▪ A context-free grammar is a tuple < N, T, S, R > ▪ N : the set of non-terminals ▪ Phrasal categories: S, NP, VP, ADJP, etc. ▪ Parts-of-speech (pre-terminals): NN, JJ, DT, VB ▪ T : the set of terminals (the words) ▪ S : the start symbol ▪ Often written as ROOT or TOP ▪ Not usually the sentence non-terminal S ▪ R : the set of rules ▪ Of the form X → Y 1 Y 2 … Y k , with X, Y i ∈ N ▪ Examples: S → NP VP, VP → VP CC VP ▪ Also called rewrites, productions, or local trees ▪ A PCFG adds: ▪ A top-down production probability per rule P(Y 1 Y 2 … Y k | X)

  2. PCFGs Associate probabilities with the rules : Now we can score a tree as a product of probabilities corresponding to the used rules 0.2 1.0 (NP A girl) (VP ate a sandwich) 0.7 0.1 0.2 0.4 (VP ate) (NP a sandwich) 1.0 (VP saw a girl) (PP with …) 0.4 0.5 0.5 (NP a girl) (PP with ….) 0.3 0.6 (D a) (N sandwich) 0.5 0.4 0.2 0.3 (P with) (NP with a sandwich) 1.0 0.7

  3. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  4. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  5. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  6. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  7. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  8. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  9. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  10. PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

  11. PCFG Estimation

  12. ML estimation ▪ A treebank: a collection sentences annotated with constituent trees ▪ An estimated probability of a rule (maximum likelihood estimates) The number of times the rule used in the corpus The number of times the nonterminal X appears in the treebank ▪ Smoothing is helpful ▪ Especially important for preterminal rules

  13. Distribution over trees ▪ We defined a distribution over production rules for each nonterminal ▪ Our goal was to define a distribution over parse trees Unfortunately, not all PCFGs give rise to a proper distribution over trees, i.e. the sum over probabilities of all trees the grammar can generate may be less than 1: ▪ Good news: any PCFG estimated with the maximum likelihood procedure are always proper (Chi and Geman, 98)

  14. Penn Treebank: peculiarities ▪ Wall street journal: around 40, 000 annotated sentences, 1,000,000 words ▪ Fine-grained part of speech tags (45), e.g., for verbs VBD Verb, past tense VBG Verb, gerund or present participle Verb, present (non-3 rd person singular) VBP Verb, present (3 rd person singular) VBZ MD Modal ▪ Flat NPs (no attempt to disambiguate NP attachment)

  15. CKY Parsing

  16. Parsing ▪ Parsing is search through the space of all possible parses ▪ e.g., we may want either any parse, all parses or the highest scoring parse (if PCFG): arg max P ( T ) T ∈ G ( x ) ▪ Bottom-up: ▪ One starts from words and attempt to construct the full tree ▪ Top-down ▪ Start from the start symbol and attempt to expand to get the sentence

  17. CKY algorithm (aka CYK) ▪ Cocke-Kasami-Younger algorithm ▪ Independently discovered in late 60s / early 70s ▪ An efficient bottom up parsing algorithm for (P)CFGs ▪ can be used both for the recognition and parsing problems ▪ Very important in NLP (and beyond) ▪ We will start with the non-probabilistic version

  18. Constraints on the grammar ▪ The basic CKY algorithm supports only rules in the Chomsky Normal Form (CNF): Unary preterminal rules (generation of words given PoS tags) Binary inner rules

  19. Constraints on the grammar ▪ The basic CKY algorithm supports only rules in the Chomsky Normal Form (CNF): ▪ Any CFG can be converted to an equivalent CNF ▪ Equivalent means that they define the same language ▪ However (syntactic) trees will look differently ▪ It is possible to address it by defining such transformations that allows for easy reverse transformation

  20. Transformation to CNF form ▪ What one need to do to convert to CNF form Not a problem, as our ▪ Get rid of unary rules: CKY algorithm will support unary rules ▪ Get rid of N-ary rules: Crucial to process them, as required for efficient parsing

  21. Transformation to CNF form: binarization ▪ Consider ▪ How do we get a set of binary rules which are equivalent?

  22. Transformation to CNF form: binarization ▪ Consider ▪ How do we get a set of binary rules which are equivalent?

  23. Transformation to CNF form: binarization ▪ Consider ▪ How do we get a set of binary rules which are equivalent? ▪ A more systematic way to refer to new non-terminals

  24. Transformation to CNF form: binarization ▪ Instead of binarizing tuples we can binarize trees on preprocessing: Also known as lossless Markovization in the context of PCFGs Can be easily reversed on postprocessing

  25. CKY: Parsing task ▪ We a given ▪ a grammar <N, T, S, R> ▪ a sequence of words ▪ Our goal is to produce a parse tree for w

  26. CKY: Parsing task ▪ We a given ▪ a grammar <N, T, S, R> ▪ a sequence of words ▪ Our goal is to produce a parse tree for w ▪ We need an easy way to refer to substrings of w indices refer to fenceposts span (i, j) refers to words between fenceposts i and j 71

  27. Parsing one word

  28. Parsing one word

  29. Parsing one word

  30. Parsing longer spans Check through all C1, C2, mid

  31. Parsing longer spans Check through all C1, C2, mid

  32. Parsing longer spans

  33. CKY in action Preterminal rules Inner rules

  34. Inner rules Preterminal rules Chart (aka parsing triangle)

  35. Preterminal rules Inner rules

  36. Preterminal rules Inner rules

  37. Preterminal rules Inner rules

  38. Preterminal rules Inner rules

  39. Preterminal rules Inner rules

  40. Preterminal rules Inner rules

  41. Preterminal rules Inner rules

  42. Preterminal rules Inner rules

  43. Preterminal rules Inner rules

  44. unary rules Check about Preterminal rules Inner rules

  45. Preterminal rules Inner rules

  46. Preterminal rules Inner rules

  47. Preterminal rules Inner rules

  48. Inner rules Check about unary rules: no unary rules here Preterminal rules

  49. Preterminal rules Inner rules

  50. Preterminal rules Inner rules

  51. CKY in action Inner rules Check about unary rules: no unary rules here Preterminal rules

  52. Preterminal rules Inner rules

  53. Preterminal rules Inner rules

  54. mid=1 Preterminal rules Inner rules

  55. mid=2 Preterminal rules Inner rules

Recommend


More recommend