Statistical Parsing Recap (dashed ellipse) are adequate for representing natural languages cross-cut this hierarchy (shaded region) Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 5 / 27 Parsing basics Context Sensitive CKY Earley Closing Context free grammars phenomena in natural language syntax parsing CF languages (possibly empty) sequence of terminal or non-terminal Recursively Enumerable Context Free grammars for the rest of this lecture SfS / University of Tübingen Phrase structure grammars A phrase structure grammar is specifjed by, N is a set of non-terminal symbols R is a set of rules of the form with the rewrite rules R Ç. Çöltekin, November 8, 2016 Regular 4 / 27 Recap Parsing basics Parsing context-free languages Earley Closing Chomsky hierarchy and natural languages symbols Ç. Çöltekin, Earley saw S NP Prn she VP V NP V Det a N duck Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 N S SfS / University of Tübingen An example grammar November 8, 2016 6 / 27 Recap Parsing basics CKY Earley Closing S Derivation of sentence ‘she saw a duck’ S N N N V V V Closing CKY CKY we do not study or focus on any specifjc theory infjnite) language grammars Constitunecy (or phrase structure) grammars, Dependency grammars often use ideas/notions from both constituencies and Parsing basics Ç. Çöltekin, Closing SfS / University of Tübingen November 8, 2016 2 / 27 Recap Parsing basics CKY Earley Grammars Earley Dependency vs. constituency Earley Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft November 8, 2016 Recap Parsing basics CKY Closing CKY Ingredients of a (natural language) parser Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 1 / 27 Recap Parsing basics Closing dependencies on units formed by a group of lexical saw John VP V saw NP Marry John Marry S subject object root Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 3 / 27 Recap NP 7 / 27 become popular in CL binary head–dependent relations items (constituents or phrases) grammars between words developed with constituency • A grammar • An algorithm for parsing • A method for ambiguity resolution • A formal grammar is a fjnite specifjcation of a (possibly • Constituency grammars are based • In this course, we are interested in two broad classes of • Dependency grammars model • Various theories of ‘grammar’ (e.g., HPSG, LFG, CCG) • Most of the theory of parsing is • We will study these grammars in their relation to parsing, • Dependency grammars has recently Σ is a set of terminal symbols S ∈ N is a distinguished start symbol αAβ → γ for A ∈ N α, β, γ ∈ Σ ∪ N • The grammar accepts a sentence if it can be derived from S • Chomsky hierarchy of languages form a set-inclusion hierarchy • It is often claimed that mildly context sensitive grammars • Note, however, that the possible natural languages probably → NP VP → Aux NP VP • Context free grammars are suffjcient for expressing most ⇒ NP VP NP → Det N NP ⇒ Prn NP → Prn Prn ⇒ she • Most of the parsing theory (and practice) is build on NP → NP PP VP ⇒ V NP VP → V NP ⇒ saw VP → V • The context-free rules have the form NP ⇒ Det N VP → VP PP A → α PP → Prp NP Det ⇒ a → duck ⇒ duck where A is a single non-terminal symbol and α is a → park → parks → duck → ducks • We will mainly focus with parsing with context-free → saw Prn → she | her Prp → in | with Det → a | the
Recap V VP S S S N N N V V Ç. Çöltekin, NP SfS / University of Tübingen November 8, 2016 10 / 27 Recap Parsing basics CKY Earley Closing Problems with search procedures the input, and cannot handle left recursion NP N Some of these problems can be solved using dynamic Recap N N V V V Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 9 / 27 Parsing basics Det CKY Earley Closing Parsing as search: bottom up she saw a duck Parsing basics V the sentence programming techniques. S Converting to CNF: example Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 13 / 27 Recap Parsing basics CKY Earley Closing S grammar: it generates/accepts the same language, but the S N N N V V V Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 derivations are difgerent following forms Ç. Çöltekin, algorithm is a dynamic programming algorithm (Kasami SfS / University of Tübingen November 8, 2016 11 / 27 Recap Parsing basics CKY Earley Closing CKY algorithm 1965; Younger 1967; Cocke and Schwartz 1970) Chomsky normal form (CNF) results on a chart Ç. Çöltekin, SfS / University of Tübingen November 8, 2016 12 / 27 Recap Parsing basics CKY Earley Closing N Prn 14 / 27 S VP NP she Prn Det NP Parsing as search: top down saw Closing Ç. Çöltekin, Earley SfS / University of Tübingen November 8, 2016 9 / 27 Recap V NP Earley Backtrack! N N N S V V S duck Det a saw she V duck N a Parsing basics CKY Closing directions Parsing as search: top down saw SfS / University of Tübingen a Ç. Çöltekin, duck Backtrack! 8 / 27 the sentence S and the input Parsing as search Closing Earley CKY she November 8, 2016 Recap VP S NP Det Prn Parsing basics NP she V saw NP Det a N duck CKY → NP VP → Aux NP VP NP → Det N NP → Prn • Parsing can be seen as search constrained by the grammar NP → NP PP VP → V NP • Top down: start from S , fjnd the derivations that lead to VP → V VP → VP PP PP → Prp NP • Bottom up: start from the sentence, fjnd series of → duck derivations (in reverse) that leads to S → park • One can search depth fjrst or breadth fjrst in both → parks → duck → ducks → saw Prn → she | her Prp → in | with Det → a | the → NP VP → NP VP → Aux NP VP → Aux NP VP NP → Det N NP → Det N NP → Prn NP → Prn NP → NP PP NP → NP PP VP → V NP VP → V NP VP → V VP → V VP → VP PP VP → VP PP PP → Prp NP PP → Prp NP → duck → duck → park → park → parks → parks → duck → duck → ducks → ducks → saw → saw Prn → she | her Prn → she | her Prp → in | with Prp → in | with Det → a | the Det → a | the • Top-down search considers productions incompatible with • The CKY (Cocke–Younger–Kasami), or CYK, parsing • Bottom-up search considers non-terminals that would never lead to S • It processes the input bottom up , and saves the intermediate • Repeated work because of backtracking → The result is exponential time complexity in the length of • Time complexity for recognition is O ( n 3 ) (with a space complexity of O ( n 2 ) ) • It requires the CFG to be in Chomsky normal form (CNF) → NP VP → Aux NP VP NP → Det N NP → Prn • A CFG is in CNF, if the rewrite rules are in one of the NP → NP PP VP → V NP – A → B C VP → V • For rules with > 2 RHS symbols – A → a VP → VP PP S → Aux NP VP ⇒ S → Aux X where A , B , C are non-terminals and a is a terminal PP → Prp NP X → NP VP → duck • Any CFG can be converted to CNF • For rules with < 2 RHS symbols → park • Resulting grammar is weakly equivalent to the original NP → Det ⇒ NP → a | the → parks → duck → ducks → saw Prn → she | her Prp → in | with Det → a | the
Recommend
More recommend