Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernández Institute for Logic, Language, and Computation Winter 2012, lecture 3a Raquel Fernández TtTv 2012 - lecture 3a 1 / 19
Plan for Today Theoretical session: • Parsing (part of ch. 13 of J&M) • Guest lecture on Machine Translation Practical session: • Questions/problems regarding HW#2 (I’ll be at there 17-18h). • Time to finish HW#3. • Review project groups. Raquel Fernández TtTv 2012 - lecture 3a 2 / 19
Parsing Syntactic parsing is the task of computing a parse tree for a sentence given a grammar. • When we use grammars as recognizers, the recognizer also parses the sentence (goes through a derivation), but we are not interested in the resulting structure. • When we use grammars as parsers, we are interested in the tree structure assigned to a particular sentence. Parsing can be viewed as a search problem: • the parser searches through the space of possible parse trees allowed by a grammar to find the right tree for a given sentence. ◦ note : recognition/parsing of regular languages can also be viewed as a search problem, but since any non-deterministic FSA is equivalent to a deterministic FSA the search ‘problem’ is not a problem in theory. Raquel Fernández TtTv 2012 - lecture 3a 3 / 19
Parsing as a Search Problem A grammar defines a search space of possible trees – each state in this space corresponds to a tree. The space includes: • all the complete trees a grammar can generate (trees whose leaves correspond to words and cannot be further expanded), and • all the partial trees (where some node can still be expanded by a rule), which can be seen as intermediate steps towards the generation of complete trees. ⇒ the search space of natural language grammars can be huge! Raquel Fernández TtTv 2012 - lecture 3a 4 / 19
Parsing as a Search Problem How does a parser assign a parse tree to a sentence? Given a sentence and a grammar, the parser navigates the search space following two constraints: • the complete parse tree of a given sentence must have leaves that correspond to the words in it. • the root of the complete parse tree must be the start symbol S of the grammar. These two constraints give rise to the two search strategies underlying most parsers: bottom-up and top-down. Raquel Fernández TtTv 2012 - lecture 3a 5 / 19
Bottom-Up Parsing • The starting point of a bottom-up parser are the words of the input sentence. • The parser proceeds by building up structure from the bottom to the top of the tree. • It does so by looking at the grammar rules right-to-left . • At each stage, it considers as many (partial) trees as can be built by matching the right-hand side of a rule with the current input. • The parser succeeds if it is able to build a tree that covers all teh input and whose root is the start symbol of the grammar. Raquel Fernández TtTv 2012 - lecture 3a 6 / 19
Bottom-Up Parsing Raquel Fernández TtTv 2012 - lecture 3a 7 / 19
Top-Down Parsing • The starting point of a top-down parser is the start symbol of the grammar. • The parser starts by assuming that the input is indeed a well-formed sentence and it tries to prove this by building up structure from the top of the tree down to the leaves. • It does so by looking at the grammar rules left-to-right . • At each stage, it considers as many (partial) trees as can be built by matching the left-hand side of a rule with the currently available non-terminal nodes. • The parser succeeds if the leaves of at least one of the trees it has constructed matches the words of the input sentence Raquel Fernández TtTv 2012 - lecture 3a 8 / 19
Top-Down Parsing Raquel Fernández TtTv 2012 - lecture 3a 9 / 19
Bottom-Up vs. Top-Down These two basic strategies have advantages and disadvantages: • Top-Down parsers never explore illegal parse trees that cannot form an S – but waste time on trees that can never match the input words. • Bottom-Up parsers never explore trees that are inconsistent with input sentence – but waste time exploring illegal parse trees that will never lead to an S root. Actual parsing algorithms may combine these two strategies. Raquel Fernández TtTv 2012 - lecture 3a 10 / 19
A Grammar’s Search Space How can we define the search space of a given grammar? For simplicity, let us focus on the top-down approach (the same considerations apply to the bottom-up approach). Let’s assume that the states in the search space are created by: • applying the grammar rules in the order in which they appear in the grammar, and • expanding the nodes at a given level in a tree from left to right. We can define the search space of a given grammar following one of two strategies: depth-first or breadth-first. • depth-first: we work vertically – priority is given to nodes that are lower or deeper in the tree • breadth-first: we work horizontally – priority is given to nodes that are higher up in the tree Raquel Fernández TtTv 2012 - lecture 3a 11 / 19
Search Space: Depth-first 1. S → NP VP 5. D → the 8. V → runs 2. NP → Det N 6. N → dog 9. A → fast 3. VP → V 7. N → cat 4. VP → V A For simplicity, sequences of states where there is no branching are collapsed into one single state. S NP VP D N the S S NP VP NP VP D N D N the dog the cat S S S S NP VP NP VP NP VP NP VP D N D N V V D N D N V A V A the cat the dog runs runs the cat the dog runs fast runs fast Raquel Fernández TtTv 2012 - lecture 3a 12 / 19
Search Space: Breadth-first 1. S → NP VP 5. D → the 8. V → runs 2. NP → Det N 6. N → dog 9. A → fast 3. VP → V 7. N → cat 4. VP → V A For simplicity, sequences of states where there is no branching are collapsed into one single state. S NP VP D N S S NP VP NP VP D N V D N V A S S S S NP VP NP VP NP VP NP VP D N D N V V D N D N V A V A the cat the dog runs runs the cat the dog runs fast runs fast Raquel Fernández TtTv 2012 - lecture 3a 13 / 19
Realistic Search • Since the search space of a realistic grammar can be huge, parsing algorithms do not actually build the full space of parse trees that a grammar allows and then search for the tree that corresponds to a given sentence. • Instead, they expand the search space incrementally by systematically exploring one state at a time. • When parsing a given a sentence, parsers explore paths in a theoretical search space. Raquel Fernández TtTv 2012 - lecture 3a 14 / 19
Exploring Paths breadth-first 1 2 3 4 5 6 7 depth-first 1 2 5 3 4 6 7 Raquel Fernández TtTv 2012 - lecture 3a 15 / 19
Top-Down depth-first with bottom-up filtering We can combine top-down and bottom-up parsing by adding the following constraint: the parser should not consider any grammar rule that leads to words which are not part of the input sentence. S NP VP D N the S S NP VP NP VP D N D N the cat the dog The cat runs fast Raquel Fernández TtTv 2012 - lecture 3a 16 / 19
Structural Ambiguity There are several types of structural or syntactic ambiguity: • attachment ambiguity: one constituent can appear in more than one location in the parse tree (we have already seen this kind of ambiguity). The tourist saw the astronomer with the telescope I shot an elephant in my pajamas We saw the Eiffel Tower flying to Paris • coordination ambiguity: different sets of phrases can be conjoined together (a variant of attachment ambiguity) old men and women → old [men & women] / [ old men ] & women → nationwide [t & r] / [nationwide t] & r nationwide television and radio the light red chair → the [light [blue chair ] ] / the [ [light blue] chair ] • local ambiguity: a part of a sentence is ambiguous (has more than one parse tree) even thought the whole sentence may not be so. book that flight → POS ambiguity of ‘book’ (V or N) the robber knew Vincent shot Marsellus → the grammar may be able to assign a sentential structure to the sub-string ‘the robber knew Vincent’. Raquel Fernández TtTv 2012 - lecture 3a 17 / 19
Structural Ambiguity • We have been referring to ambiguous sentences. • We say that a grammar is ambiguous if it can generate more than one parse tree for a given sentence. ∗ note that local ambiguity is possible with grammars that are not ambiguous. For instance, this grammar is not ambiguous even though it gives rise to local ambiguity: S → NP VP S → VP NP → Det N VP → V NP Det → the | that N → book | flight V → book Raquel Fernández TtTv 2012 - lecture 3a 18 / 19
Syntactic Disambiguation Ambiguity is perhaps the worst enemy of parsers. • Syntactic disambiguation is the task of choosing one parse tree among the possible parses of an ambiguous sentence. • This task is critical because structure guides how we assign meaning to a given sentence. • Parsing by itself does not offer tools for syntactic disambiguation – a parser can at most return all possible parse trees. • On Friday we’ll look into basic probabilistic techniques for syntactic disambiguation (PCFGs). Raquel Fernández TtTv 2012 - lecture 3a 19 / 19
Recommend
More recommend