Grammars and Parsing • Grammars and Sentence Structure • What makes a good grammar • A Top-Down Parser • A Bottom-Up Parser • Transition Network Grammars Ch.3 Grammars and Parsing 1
Grammars and Sentence Structure Ex. John ate the cat Tree Representation (S (NP (NAME John)) VP (V ate) ( NP ( ART the) (N cat)))) Ch.3 Grammars and Parsing 2
Tree Terminology • Trees are special form of a graph which are consisting of nodes connected by links • The node at the top is called the root • The nodes at the bottom are called leaves • An ancestor of node N is defined as N’s parent Ch.3 Grammars and Parsing 3
A Simple Grammar 1. S -> NP VP 2. VP -> V NP 3. NP -> NAME 4. NP -> ART N 5. NAME -> John 6. V -> ate 7. ART -> the 8. N -> cat Ch.3 Grammars and Parsing 4
Context Free Grammar(CFG) • It consists of: – Terminal symbols – Non-terminal symbols – Production Rules – Starting Symbol Ch.3 Grammars and Parsing 5
Derivation • A grammar is said to derive a sentence if there is a sequence of rules that allow you to rewrite the start symbol into the sentence. • Two important processes are based on derivations: sentence generation and parsing • There are two basic methods of parsing Top down and Bottom up Ch.3 Grammars and Parsing 6
What makes a Good Grammar • Generality, the range of sentences the grammar analyzes correctly • Selectivity, the range of non-sentences it identifies as problematic • Understandability, the simplicity of the grammar itself Ch.3 Grammars and Parsing 7
Writing a Grammar • Try to group words to form a constituent • Try to construct a new sentence that involves that group of words in a conjunction with another group of words classified as the same type of constituent e.g NP-NP I ate a hamburger and a hot dog but if you define NP -> on NP the sentence I ate a hamburger and on the stove does not work and so the definition is wrong Ch.3 Grammars and Parsing 8
Writing a Grammar (cont.) • Another test involves inserting the proposed constituent into other sentences that take the same category of constituent e.g John’s hitting of Mary is a NP. It can be inserted in the following two sentences: John’s hitting of Mary alarmed Sue (S->NP VP) I cannot explain John’s hitting of Mary (VP->VNP) Ch.3 Grammars and Parsing 9
Grammar Generative Capacity •Regular grammar, S->aS1, S1->bS2, S2->d This grammar cannot generate ab, aabb, …. •Context Free Grammar, S->ab, S->aSb This grammar cannot generate abc, aabbcc, …… •Context Sensitive Grammar xAy ->x z y where A is a symbol x, y are ( possibly empty) sequence of symbols, and z is a nonempty sequence of symbols •Type 0 Grammar are more general and allow arbitrary rewrite rules Ch.3 Grammars and Parsing 10
Top Down Parsers • Start with the starting symbol and attempts to rewrite it into a sequence of terminals • The state of the parse at any given time can be represented as a list of symbols e.g Starting in the state (s) and applying the rule S-> NP VP, the symbol list will be (NP VP) • The parser could continue until the state consisted entirely of terminal symbols and then check the input sentence. A better idea is to check the input as soon as it can • Rather than having a separate rule for each word, a structure called the lexicon is used to store the possible categories for each word e.g. cried:V, dogs:N, the:ART Ch.3 Grammars and Parsing 11
Top Down Parsers (cont.) • With the lexicon specified, a grammar need not contain any lexical rules • the current position in the sentence is to be included in the representation of the state of the parse to memorize the number of terminals that have been matched . E.g 1 The 2 dogs 3 cried 4 A typical parse state would be ((N VP)2) • A parsing algorithm that is guaranteed to find a parse if there is one must systematically explore every possible state ( backtracking) Ch.3 Grammars and Parsing 12
A Simple Top Down Parsing Algorithm (Example) Grammar 1. S -> NP VP 2. NP -> ART N 3. NP -> ART ADJ N 4. VP -> V 5. VP -> V NP Lexicon cried:V, dogs:N,V, the:ART, old:ADJ,N, man:N,V Sentences 1. The dogs cried 2. The old man cried Ch.3 Grammars and Parsing 13
Parsing as a Search procedure • You can think of Parsing as a search problem in AI • Two strategies for searching Depth First Search and Breadth First Search • DFS uses stack for the possibilities list • BFS uses queue for the possibilities list • Left recursion causes DFS to enter into an infinite loop Ch.3 Grammars and Parsing 14
A Bottom-Up Chart Parser • Bottom up parser could be built simply by matching a sequence of symbol against the LHS of a Production Rule. • This matching process can be formulated as a search process • The state would simply consist of a symbol list, starting with the words in the sentence. • Successor states could be generated by exploring all possible ways to – rewrite a word by its possible lexical categories – replace a sequence of symbols that matches the RHS of a grammar rule by its LHS symbol Ch.3 Grammars and Parsing 15
A Bottom-Up Chart Parser (cont.) • Such simple implementation is expensive • To avoid this problem, a data structure called chart is introduced that allows the parser to store partial results • Matches are considered from the point of view of one constituent, called the key Ch.3 Grammars and Parsing 16
Example 1. S -> NP VP 2. NP -> ART ADJ N 3. NP -> ART N 4. NP -> ADJ N 5. VP -> V 6. VP -> V NP •Assume that you are parsing a sentence that start with an ART. With this ART as the key rule 2 and 3 are matched because they start with ART •To record this for analyzing the next key, you need to record that rules 2 and 3 could be continued at the point after the ART. Thus you record 2’. NP-> ART@ADJ N 3’. NP -> ART @ N Ch.3 Grammars and Parsing 17
Example (cont.) • If the next input key is an ADJ, then rule 4 may be started, and the modified rule 2’ may be extended to give – 2’’ NP -> ART ADJ @ N • The chart maintains the constituents derived from the parse in addition to the rules that have been matched partially. These rules are called active arcs • The basic operation of a chart-based parser involves combining an active arc with a completed constituent. The result is either a new completed constituent or a new active arc that is an extension of the original active arc. New completed constituents are maintained on a list called the agenda Ch.3 Grammars and Parsing 18
A bottom-up charting algorithm Do until there is no input left: 1. If the agenda is empty, look up the interpretations for the next word in the input and add them to the agenda 2. Select a constituent from the agenda( let’s call it constituent C from position p 1 to p 2 ) 3. For each rule in the grammar of form X ->C X 1 ….X n add an active arc of form X ->@CX 1 …X n from position p 1 to p 2 4. Add C to the chart using the arc extension algorithm. Ch.3 Grammars and Parsing 19
Arc Extension Algorithm To add a constituent C from position p 1 to p 2 1. Insert C into the chart from position p1 to p2 2. For any active arc of the form X ->X 1 …@C…X n from position p 0 to p 1 , add a new active arc X ->X 1 ...C @…X n from position p 0 to p 2 . 3. For any active arc of the form X -> X 1 … X n @C from position p 0 to p 1 , then add a new constituent of type X from p 0 to p 2 to the agenda. Ch.3 Grammars and Parsing 20
Recommend
More recommend