parsing syntactic structure
play

Parsing (Syntactic Structure) INPUT: Boeing is located in Seattle. - PowerPoint PPT Presentation

Parsing (Syntactic Structure) INPUT: Boeing is located in Seattle. OUTPUT: S 6.864: Lecture 2, Fall 2007 Parsing and Syntax I NP VP N V VP Boeing is V PP P NP located in N Seattle 1 3 Overview Syntactic Formalisms Work


  1. Parsing (Syntactic Structure) INPUT: Boeing is located in Seattle. OUTPUT: S 6.864: Lecture 2, Fall 2007 Parsing and Syntax I NP VP N V VP Boeing is V PP P NP located in N Seattle 1 3 Overview Syntactic Formalisms • Work in formal syntax goes back to Chomsky’s PhD thesis in • An introduction to the parsing problem the 1950s • Context free grammars • Examples of current formalisms: minimalism, lexical functional grammar (LFG), head-driven phrase-structure • A brief(!) sketch of the syntax of English grammar (HPSG), tree adjoining grammars (TAG), categorial grammars • Examples of ambiguous structures • PCFGs, their formal properties, and useful algorithms • Weaknesses of PCFGs 2 4

  2. Data for Parsing Experiments 2) Phrases S • Penn WSJ Treebank = 50,000 sentences with associated trees • Usual set-up: 40,000 training sentences, 2400 test sentences NP VP An example tree: DT N V NP TOP S NP VP the burglar NNP NNPS VBD NP PP robbed DT N NP PP ADVP IN NP CD NN IN NP RB NP PP QP PRP$ JJ NN CC JJ NN NNS IN NP the apartment $ CD CD PUNC, NP SBAR NNP PUNC, WHADVP S WRB NP VP DT NN VBZ NP QP NNS PUNC. Noun Phrases (NP): “the burglar”, “the apartment” RB CD Canadian Utilities had 1988 revenue of C$ 1.16 billion , mainly from its natural gas and electric utility businesses in Alberta , where the company serves about 800,000 customers . Verb Phrases (VP): “robbed the apartment” Canadian Utilities had 1988 revenue of C$ 1.16 billion , mainly from its natural gas and electric utility businesses in Alberta , where the company Sentences (S): “the burglar robbed the apartment” serves about 800,000 customers . 5 7 The Information Conveyed by Parse Trees 3) Useful Relationships 1) Part of speech for each word S S (N = noun, V = verb, D = determiner) NP VP V subject NP VP S verb DT N V NP NP VP the burglar robbed DT N D N V NP the apartment the burglar robbed D N ⇒ “the burglar” is the subject of “robbed” the apartment 6 8

  3. An Example Application: Machine Translation Overview • An introduction to the parsing problem • English word order is subject – verb – object • Context free grammars • Japanese word order is subject – object – verb • A brief(!) sketch of the syntax of English • Examples of ambiguous structures English: IBM bought Lotus Japanese: IBM Lotus bought • PCFGs, their formal properties, and useful algorithms English: Sources said that IBM bought Lotus yesterday • Weaknesses of PCFGs Japanese: Sources yesterday IBM Lotus bought that said 9 11 Syntax and Compositional Semantics Context-Free Grammars S: bought ( IBM, Lotus ) [Hopcroft and Ullman 1979] A context free grammar G = ( N, Σ , R, S ) where: • N is a set of non-terminal symbols VP: λy bought ( y, Lotus ) NP: IBM • Σ is a set of terminal symbols • R is a set of rules of the form X → Y 1 Y 2 . . . Y n IBM V: λx, y bought ( y, x ) NP: Lotus for n ≥ 0 , X ∈ N , Y i ∈ ( N ∪ Σ) • S ∈ N is a distinguished start symbol bought Lotus • Each syntactic non-terminal now has an associated semantic expression 10 12

  4. A Context-Free Grammar for English DERIVATION RULES USED N = { S, NP, VP, PP, DT, Vi, Vt, NN, IN } S S → NP VP S = S NP VP NP → DT N Σ = { sleeps, saw, man, woman, telescope, the, with, in } DT N VP DT → the Vi ⇒ sleeps R = S ⇒ NP VP Vt ⇒ saw the N VP N → dog VP ⇒ Vi ⇒ NN man the dog VP VP → VB ⇒ VP Vt NP ⇒ NN woman ⇒ VP VP PP the dog VB VB → laughs ⇒ NN telescope NP ⇒ DT NN the dog laughs DT ⇒ the NP ⇒ NP PP IN ⇒ with S PP ⇒ IN NP IN ⇒ in NP VP DT N VB Note: S=sentence, VP=verb phrase, NP=noun phrase, PP=prepositional the dog laughs phrase, DT=determiner, Vi=intransitive verb, Vt=transitive verb, NN=noun, IN=preposition 13 15 Left-Most Derivations Properties of CFGs A left-most derivation is a sequence of strings s 1 . . . s n , where • s 1 = S , the start symbol • A CFG defines a set of possible derivations • s n ∈ Σ ∗ , i.e. s n is made up of terminal symbols only • A string s ∈ Σ ∗ is in the language defined by the CFG if there • Each s i for i = 2 . . . n is derived from s i − 1 by picking the left- is at least one derivation which yields s most non-terminal X in s i − 1 and replacing it by some β where X → β is a rule in R • Each string in the language generated by the CFG may have For example: [S], [NP VP], [D N VP], [the N VP], [the man VP], more than one derivation (“ambiguity”) [the man Vi], [the man sleeps] Representation of a derivation as a tree: S NP VP D N Vi the man sleeps 14 16

  5. The Problem with Parsing: Ambiguity DERIVATION RULES USED S S → NP VP INPUT: NP VP NP → he he VP VP → VP PP She announced a program to promote safety in trucks and vans he VP PP VP → VB PP he VB PP PP VB → drove ⇓ he drove PP PP PP → down the street POSSIBLE OUTPUTS: he drove down the street PP PP → in the car he drove down the street in the car S S S S S S S NP VP NP VP NP VP She NP VP NP VP She She NP VP She announced NP She She announced NP announced NP announced NP NP VP NP VP a program announced NP NP VP a program announced NP NP PP to promote NP a program to promote NP PP NP VP in NP safety PP safety in NP a program trucks and vans in NP to promote NP to promote NP trucks and vans safety and trucks and vans and NP NP NP NP vans vans NP and NP vans NP VP NP VP safety PP a program in NP a program NP VP to promote NP PP trucks safety in to promote NP NP safety PP trucks in NP trucks he And there are more... VP PP in the car VB PP drove down the street 17 19 Overview DERIVATION RULES USED S S → NP VP NP VP NP → he • An introduction to the parsing problem he VP VP → VB PP he VB PP VB → drove • Context free grammars he drove PP PP → down NP he drove down NP NP → NP PP he drove down NP PP NP → the street • A brief(!) sketch of the syntax of English PP → in the car he drove down the street PP he drove down the street in the car • Examples of ambiguous structures S • PCFGs, their formal properties, and useful algorithms NP VP he • Weaknesses of PCFGs VB PP drove down NP NP PP the street in the car 18 20

Recommend


More recommend