Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan - PowerPoint PPT Presentation

Probabilistic Context-Free Grammars ▪ A context-free grammar is a tuple < N, T, S, R > ▪ N : the set of non-terminals ▪ Phrasal categories: S, NP, VP, ADJP, etc. ▪ Parts-of-speech (pre-terminals): NN, JJ, DT, VB ▪ T : the set of terminals (the words) ▪ S : the start symbol ▪ Often written as ROOT or TOP ▪ Not usually the sentence non-terminal S ▪ R : the set of rules ▪ Of the form X → Y 1 Y 2 … Y k , with X, Y i ∈ N ▪ Examples: S → NP VP, VP → VP CC VP ▪ Also called rewrites, productions, or local trees ▪ A PCFG adds: ▪ A top-down production probability per rule P(Y 1 Y 2 … Y k | X)

PCFGs Associate probabilities with the rules : Now we can score a tree as a product of probabilities corresponding to the used rules 0.2 1.0 (NP A girl) (VP ate a sandwich) 0.7 0.1 0.2 0.4 (VP ate) (NP a sandwich) 1.0 (VP saw a girl) (PP with …) 0.4 0.5 0.5 (NP a girl) (PP with ….) 0.3 0.6 (D a) (N sandwich) 0.5 0.4 0.2 0.3 (P with) (NP with a sandwich) 1.0 0.7

PCFGs 1.0 0.2 0.7 0.2 0.1 0.4 1.0 0.4 0.5 1.0 0.5 0.3 0.2 0.4 0.5 0.6 0.2 1.0 0.4 0.5 0.3 0.3 1.0 1.0 0.5 0.7 0.3 0.2 0.6 0.5 0.7 0.3

PCFG Estimation

ML estimation ▪ A treebank: a collection sentences annotated with constituent trees ▪ An estimated probability of a rule (maximum likelihood estimates) The number of times the rule used in the corpus The number of times the nonterminal X appears in the treebank ▪ Smoothing is helpful ▪ Especially important for preterminal rules

Distribution over trees ▪ We defined a distribution over production rules for each nonterminal ▪ Our goal was to define a distribution over parse trees Unfortunately, not all PCFGs give rise to a proper distribution over trees, i.e. the sum over probabilities of all trees the grammar can generate may be less than 1: ▪ Good news: any PCFG estimated with the maximum likelihood procedure are always proper (Chi and Geman, 98)

Penn Treebank: peculiarities ▪ Wall street journal: around 40, 000 annotated sentences, 1,000,000 words ▪ Fine-grained part of speech tags (45), e.g., for verbs VBD Verb, past tense VBG Verb, gerund or present participle Verb, present (non-3 rd person singular) VBP Verb, present (3 rd person singular) VBZ MD Modal ▪ Flat NPs (no attempt to disambiguate NP attachment)

CKY Parsing

Parsing ▪ Parsing is search through the space of all possible parses ▪ e.g., we may want either any parse, all parses or the highest scoring parse (if PCFG): arg max P ( T ) T ∈ G ( x ) ▪ Bottom-up: ▪ One starts from words and attempt to construct the full tree ▪ Top-down ▪ Start from the start symbol and attempt to expand to get the sentence

CKY algorithm (aka CYK) ▪ Cocke-Kasami-Younger algorithm ▪ Independently discovered in late 60s / early 70s ▪ An efficient bottom up parsing algorithm for (P)CFGs ▪ can be used both for the recognition and parsing problems ▪ Very important in NLP (and beyond) ▪ We will start with the non-probabilistic version

Constraints on the grammar ▪ The basic CKY algorithm supports only rules in the Chomsky Normal Form (CNF): Unary preterminal rules (generation of words given PoS tags) Binary inner rules

Constraints on the grammar ▪ The basic CKY algorithm supports only rules in the Chomsky Normal Form (CNF): ▪ Any CFG can be converted to an equivalent CNF ▪ Equivalent means that they define the same language ▪ However (syntactic) trees will look differently ▪ It is possible to address it by defining such transformations that allows for easy reverse transformation

Transformation to CNF form ▪ What one need to do to convert to CNF form Not a problem, as our ▪ Get rid of unary rules: CKY algorithm will support unary rules ▪ Get rid of N-ary rules: Crucial to process them, as required for efficient parsing

Transformation to CNF form: binarization ▪ Consider ▪ How do we get a set of binary rules which are equivalent?

Transformation to CNF form: binarization ▪ Consider ▪ How do we get a set of binary rules which are equivalent? ▪ A more systematic way to refer to new non-terminals

Transformation to CNF form: binarization ▪ Instead of binarizing tuples we can binarize trees on preprocessing: Also known as lossless Markovization in the context of PCFGs Can be easily reversed on postprocessing

CKY: Parsing task ▪ We a given ▪ a grammar <N, T, S, R> ▪ a sequence of words ▪ Our goal is to produce a parse tree for w

CKY: Parsing task ▪ We a given ▪ a grammar <N, T, S, R> ▪ a sequence of words ▪ Our goal is to produce a parse tree for w ▪ We need an easy way to refer to substrings of w indices refer to fenceposts span (i, j) refers to words between fenceposts i and j 71

Parsing one word

Parsing longer spans Check through all C1, C2, mid

Parsing longer spans

CKY in action Preterminal rules Inner rules

Inner rules Preterminal rules Chart (aka parsing triangle)

Preterminal rules Inner rules

unary rules Check about Preterminal rules Inner rules

Inner rules Check about unary rules: no unary rules here Preterminal rules

CKY in action Inner rules Check about unary rules: no unary rules here Preterminal rules

mid=1 Preterminal rules Inner rules

mid=2 Preterminal rules Inner rules

Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan - PowerPoint PPT Presentation

Algorithms for NLP Parsing I Yulia Tsvetkov CMU Slides: Ivan Titov University of Edinburgh, Taylor Berg-Kirkpatrick CMU/UCSD, Dan Klein UC Berkeley Ambiguity I saw a girl with a telescope Parsing INPUT: The move

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Algorithms for NLP 11-711, Fall 2019 Lecture 26: Computational Ethics Yulia Tsvetkov 1

Algorithms for NLP IITP, Fall 2019 Lecture 25: Computational Ethics Yulia Tsvetkov 1 Tsvetkov

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Liege University: Francqui Chair 2011-2012 Lecture 1: Intrinsic complexity of Black-Box

BINARY BLACK HOLES IN CIRCULAR ORBITS: AN HELICAL KILLING VECTOR APPROACH Eric Gourgoulhon

i A is a finite quiver Q that embeds in dimemodelw.boundanI a component of S Q surface S is

On Using a Black-Box Floating-Point Simplex for Generating Proof Witnesses Fr ed eric

Effective-one-body modeling of binary black holes in the era of gravitational-wave

Supertranslations and superrotations Geoffrey Compre Universit Libre de Bruxelles (ULB)

Primordial black k holes s in light of LIGO/Virgo obse serva vations Ville Vaskonen

GRAVITATIONAL RECOIL OF BINARY BLACK HOLES Luc Blanchet Gravitation et Cosmologie ( G R C O )