Natural Language Processing Syntax Parsing I Dan Klein UC Berkeley - PDF document

Natural Language Processing Syntax Parsing I Dan Klein – UC Berkeley Parse Trees Phrase Structure Parsing  Phrase structure parsing organizes syntax into constituents or brackets  In general, this involves nested trees  Linguists can, and do, S argue about details VP  NP PP Lots of ambiguity NP The move followed a round of similar increases by other lenders, N’ NP reflecting a continuing decline in that market  Not the only kind of new art critics write reviews with computers syntax… Constituency Tests Conflicting Tests  Constituency isn’t always clear  How do we know what nodes go in the tree?  Units of transfer:  Classic constituency tests:  think about ~ penser à  talk about ~ hablar de  Substitution by proform  Question answers  Phonological reduction:  Semantic gounds  Coherence  I will go  I’ll go  Reference  I want to go  I wanna go  Idioms  a le centre  au centre  Dislocation La vélocité des ondes sismiques  Conjunction  Coordination  He went to and came from the store.  Cross ‐ linguistic arguments, too 1

Classical NLP: Parsing  Write symbolic or logical rules: Grammar (CFG) Lexicon ROOT  S NP  NP PP NN  interest Ambiguities S  NP VP VP  VBP NP NNS  raises NP  DT NN VP  VBP NP PP VBP  interest NP  NN NNS PP  IN NP VBZ  raises …  Use deduction systems to prove parses from words  Minimal grammar on “Fed raises” sentence: 36 parses  Simple 10 ‐ rule grammar: 592 parses  Real ‐ size grammar: many millions of parses  This scaled very badly, didn’t yield broad ‐ coverage tools Ambiguities: PP Attachment Attachments  I cleaned the dishes from dinner  I cleaned the dishes with detergent  I cleaned the dishes in my pajamas  I cleaned the dishes in the sink Syntactic Ambiguities I Syntactic Ambiguities II  Modifier scope within NPs  Prepositional phrases: impractical design requirements They cooked the beans in the pot on the stove with handles. plastic cup holder  Particle vs. preposition:  Multiple gap constructions The puppy tore up the staircase. The chicken is ready to eat. The contractors are rich enough to sue.  Complement structures The tourists objected to the guide that they couldn’t hear. She knows you like the back of her hand.  Coordination scope: Small rats and mice can squeeze into holes or cracks in the  Gerund vs. participial adjective wall. Visiting relatives can be boring. Changing schedules frequently confused passengers. 2

Dark Ambiguities  Dark ambiguities : most analyses are shockingly bad (meaning, they don’t have an interpretation you can get your mind around) PCFGs This analysis corresponds to the correct parse of “This will panic buyers ! ”  Unknown words and new usages  Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this Treebank Sentences Probabilistic Context ‐ Free Grammars  A context ‐ free grammar is a tuple < N, T, S, R >  N : the set of non ‐ terminals  Phrasal categories: S, NP, VP, ADJP, etc.  Parts ‐ of ‐ speech (pre ‐ terminals): NN, JJ, DT, VB  T : the set of terminals (the words)  S : the start symbol  Often written as ROOT or TOP  Not usually the sentence non ‐ terminal S  R : the set of rules  Of the form X  Y 1 Y 2 … Y k , with X, Y i  N  Examples: S  NP VP, VP  VP CC VP  Also called rewrites, productions, or local trees  A PCFG adds:  A top ‐ down production probability per rule P(Y 1 Y 2 … Y k | X) Treebank Grammars Treebank Grammar Scale  Need a PCFG for broad coverage parsing.  Treebank grammars can be enormous  Can take a grammar right off the trees (doesn’t work well):  As FSAs, the raw grammar has ~10K states, excluding the lexicon  Better parsers usually make the grammars larger, not smaller ROOT  S 1 NP S  NP VP . 1 ADJ NP  PRP NOUN 1 DET DET NOUN VP  VBD ADJP 1 PLURAL NOUN ….. NP PP  Better results by enriching the grammar (e.g., lexicalization). NP NP  Can also get reasonable parsers without lexicalization. CONJ 3

Chomsky Normal Form  Chomsky normal form:  All rules of the form X  Y Z or X  w  In principle, this is no limitation on the space of (P)CFGs  N ‐ ary rules introduce new non ‐ terminals CKY Parsing VP VP [VP  VBD NP PP  ] [VP  VBD NP  ] VBD NP PP PP PP VBD NP PP  Unaries / empties are “promoted”  In practice it’s kind of a pain:  Reconstructing n ‐ aries is easy  Reconstructing unaries is trickier  The straightforward transformations don’t preserve tree scores  Makes parsing algorithms simpler! A Recursive Parser A Memoized Parser  One small change: bestScore(X,i,j,s) if (j = i+1) bestScore(X,i,j,s) return tagScore(X,s[i]) if (scores[X][i][j] == null) else if (j = i+1) return max score(X->YZ) * score = tagScore(X,s[i]) bestScore(Y,i,k) * else bestScore(Z,k,j) score = max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j)  Will this parser work? scores[X][i][j] = score return scores[X][i][j]  Why or why not?  Memory requirements? A Bottom ‐ Up Parser (CKY) Unary Rules  Unary rules?  Can also organize things bottom ‐ up bestScore(s) X for (i : [0,n-1]) bestScore(X,i,j,s) for (X : tags[s[i]]) Y Z if (j = i+1) score[X][i][i+1] = return tagScore(X,s[i]) tagScore(X,s[i]) else for (diff : [2,n]) i k j return max max score(X->YZ) * for (i : [0,n-diff]) bestScore(Y,i,k) * j = i + diff bestScore(Z,k,j) for (X->YZ : rule) max score(X->Y) * for (k : [i+1, j-1]) bestScore(Y,i,j) score[X][i][j] = max score[X][i][j], score(X->YZ) * score[Y][i][k] * score[Z][k][j] 4

CNF + Unary Closure Alternating Layers  We need unaries to be non ‐ cyclic bestScoreB(X,i,j,s)  Can address by pre ‐ calculating the unary closure return max max score(X->YZ) *  Rather than having zero or more unaries, always have bestScoreU(Y,i,k) * bestScoreU(Z,k,j) exactly one VP SBAR VP SBAR VBD NP bestScoreU(X,i,j,s) VBD NP S VP NP if (j = i+1) DT NN VP return tagScore(X,s[i]) DT NN else  Alternate unary and binary layers return max max score(X->Y) * bestScoreB(Y,i,j)  Reconstruct unary chains afterwards Memory  How much memory does this require?  Have to store the score cache Cache size: |symbols|*n 2 doubles   For the plain treebank grammar: Analysis  X ~ 20K, n = 40, double ~ 8 bytes = ~ 256MB  Big, but workable.  Pruning: Beams  score[X][i][j] can get too large (when?)  Can keep beams (truncated maps score[i][j]) which only store the best few scores for the span [i,j]  Pruning: Coarse ‐ to ‐ Fine  Use a smaller grammar to rule out most X[i,j]  Much more on this later… Time: Theory Time: Practice  How much time will it take to parse?  Parsing with the vanilla treebank grammar:  For each diff (<= n) ~ 20K Rules  For each i (<= n) X (not an  For each rule X  Y Z optimized Y Z parser!)  For each split point k Observed Do constant work exponent: 3.6 i k j  Total time: |rules|*n 3  Something like 5 sec for an unoptimized parse of a  Why’s it worse in practice? 20 ‐ word sentences  Longer sentences “unlock” more of the grammar  All kinds of systems issues don’t scale 5

Same ‐ Span Reachability Rule State Reachability TOP Example: NP CC  SQ X RRC NX NP CC 1 Alignment LST ADJP ADVP 0 n-1 n FRAG INTJ NP CONJP PP PRN QP S Example: NP CC NP  NAC SBAR UCP VP WHNP NP CC NP n Alignments SINV PRT 0 n-k-1 n-k n SBARQ WHADJP WHPP  Many states are more likely to match larger spans! WHADVP Efficient CKY  Lots of tricks to make CKY efficient  Some of them are little engineering details:  E.g., first choose k, then enumerate through the Y:[i,k] which are Agenda ‐ Based Parsing non ‐ zero, then loop through rules by left child.  Optimal layout of the dynamic program depends on grammar, input, even system details.  Another kind is more important (and interesting):  Many X:[i,j] can be suppressed on the basis of the input string  We’ll see this next class as figures ‐ of ‐ merit, A* heuristics, coarse ‐ to ‐ fine, etc Agenda ‐ Based Parsing Word Items  Agenda ‐ based parsing is like graph search (but over a  Building an item for the first time is called discovery. Items go hypergraph) into the agenda on discovery.  Concepts:  To initialize, we discover all word items (with score 1.0).  Numbering: we number fenceposts between words  “Edges” or items: spans with labels, e.g. PP[3,5], represent the sets of trees over those words rooted at that label (cf. search states) AGENDA  A chart: records edges we’ve expanded (cf. closed set) critics[0,1], write[1,2], reviews[2,3], with[3,4], computers[4,5]  An agenda: a queue which holds edges (cf. a fringe or open set) CHART [EMPTY] PP 0 1 2 3 4 5 critics write reviews with computers critics write reviews with computers 0 1 2 3 4 5 6

Natural Language Processing Syntax Parsing I Dan Klein UC Berkeley - PDF document

Natural Language Processing Syntax Parsing I Dan Klein UC Berkeley Parse Trees Phrase Structure Parsing Phrase structure parsing organizes syntax into constituents or brackets In general, this involves nested trees Linguists can,

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Algorithms for Natural Language Processing Lecture 11: Formal Grammars WHAT IS SYNTAX? Syntax

Syntax Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Abstract Syntax Parsing Bindings

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Fundamantals Syntax of Programming Languages cs3723 1 Syntax and Semantics Syntax The

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Layer construction of f three- dim imensional topological states and Stri ring-String braiding

Classification of Defects 0 Dimensional Defects 2 Dimensional Defects Point Defects

A highly scalable Met Office NERC Cloud model EASC 2015 Nick Brown (EPCC), Michele Weiland

THE EFFECT of COOLING WATER on MAGNET VIBRATIONS R. Amann, W. Coosemans, S. Redaelli, W.

Random planar maps & growth-fragmentations Igor Kortchemski (joint work with J. Bertoin and

Supply enough clean drinking water. Henri P . Gavin Tess Kretschmann (Duke CEE 2005) CEE 201L.

Effective Stress Chapter 8 Effective Stress 1 3/23/2015 Effective Stress

Vicarious Trauma Secondary Traumatic Stress in Behavioral Health Providers: How to Identify It

Natural Language Processing Syntax Parsing I Dan Klein UC Berkeley - PDF document

Natural Language Processing Syntax Parsing I Dan Klein UC Berkeley Parse Trees Phrase Structure Parsing Phrase structure parsing organizes syntax into constituents or brackets In general, this involves nested trees Linguists can,

Chapter 6: Syntax Syntax Syntax is the structure of a language. Earlier, both syntax and

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Algorithms for Natural Language Processing Lecture 11: Formal Grammars WHAT IS SYNTAX? Syntax

Syntax Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Abstract Syntax Parsing Bindings

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Fundamantals Syntax of Programming Languages cs3723 1 Syntax and Semantics Syntax The

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Layer construction of f three- dim imensional topological states and Stri ring-String braiding

Classification of Defects 0 Dimensional Defects 2 Dimensional Defects Point Defects

A highly scalable Met Office NERC Cloud model EASC 2015 Nick Brown (EPCC), Michele Weiland

THE EFFECT of COOLING WATER on MAGNET VIBRATIONS R. Amann, W. Coosemans, S. Redaelli, W.

Random planar maps &amp; growth-fragmentations Igor Kortchemski (joint work with J. Bertoin and

Supply enough clean drinking water. Henri P . Gavin Tess Kretschmann (Duke CEE 2005) CEE 201L.

Effective Stress Chapter 8 Effective Stress 1 3/23/2015 Effective Stress

Vicarious Trauma Secondary Traumatic Stress in Behavioral Health Providers: How to Identify It

Random planar maps & growth-fragmentations Igor Kortchemski (joint work with J. Bertoin and