Natural Language Parsing Techonlogy Foundations of Language Science and Technology (WS 2014/2015) Bernd Kiefer Language Technology Lab, DFKI GmbH Department of Computational Linguistics Saarland University November 2014 1 Natural Language Parsing Technology
Outline Overview Basic Parsing Algorithms Parsing Strategies CYK Algorithm Earley’s Algorithm Parsing with Probabilistic Context-Free Grammar PCFG Inside-Outside Algorithm Recent Advances in Parsing Technology 2 Natural Language Parsing Technology
Outline Overview Basic Parsing Algorithms Parsing Strategies CYK Algorithm Earley’s Algorithm Parsing with Probabilistic Context-Free Grammar PCFG Inside-Outside Algorithm Recent Advances in Parsing Technology 3 Natural Language Parsing Technology
Language & Grammar q Language q Structural q Productive q Ambiguous, yet efficient in human-human communication q Grammar q Generalization of regularities in language structures q Morphology & syntax, often complemented by phonetics, phonology, semantics, and pragmatics 4 Natural Language Parsing Technology
Ambiguity q Human languages are ambiguous on almost every layer q Grammar frameworks are designed to represent necessary ambiguities, and eliminate unnecessary ones q Parsing models are responsible for retrieving valid analyses according to the grammar 5 Natural Language Parsing Technology
Syntactic Parser as NLP Component PoS Tagging Chunking Morph. Analysis NER Syntactic Parsing Semantic Analysis . . . 6 Natural Language Parsing Technology
Trees (or not) S D E PHON | ORTH " GAVE " NP VP 2 3 Sue V NP NP 2 3 HEAD VERB 6 2 3 7 D E 6 7 gave Paul Det N 6 NP 1 7 SUBJ 6 7 CAT 6 7 6 7 VAL 6 7 6 7 6 7 an A N D E 6 7 4 5 4 NP 2 , NP 3 5 COMPS 6 7 6 7 old penny SYNSEM | LOC 6 7 6 7 8 9 6 2 3 7 ARG 1 1 6 7 > > > > 6 7 > > < 6 7 = 6 7 ARG 2 2 CONT | RELS 6 6 7 7 DOBJ 6 4 5 7 > ARG 3 3 > 4 5 > > > > give_rel : ; DET SBJ IOBJ ADJ gave penny Sue Paul an old 7 Natural Language Parsing Technology
Chomsky Hierarchy q Type 0 (unrestricted rewriting system) ↵ ! � ↵ , � 2 ( V N [ V T ) ∗ q Type 1 (context sensitive grammars) � A ! ! ��! A 2 V N , � , � , ! 2 ( V N [ V T ) ∗ q Type 2 (context free grammars) A ! � A 2 V N , � 2 ( V N [ V T ) ∗ q Type 3 (regular grammars) A ! xB _ A ! x A , B 2 V N , x 2 V T 8 Natural Language Parsing Technology
Context-Free Grammar A CFG is a quadruple: h V T , V N , P , S i q V T : terminal symbols q V N : non-terminal symbols q P : context-free productions A 2 V N , � 2 ( V N [ V T ) ∗ A ! � q S : start symbol 9 Natural Language Parsing Technology
Context-Free Phrase Structure Grammar q S ! NP VP q N ! dog | cat q NP ! Det N q Det ! the | a q N ! Adj N q V ! chases | sleeps q VP ! V q Adj ! gray | lazy q VP ! V NP q Adv ! fiercely q VP ! Adv VP 10 Natural Language Parsing Technology
CFG Derivation q If � = � A � , ! = �↵� and A ! ↵ 2 P then ! follows � , � ) ! q If a sequence of strings � 1 , � 2 , . . . , � m where for all i (1 i m � 1), � i ) � i + 1 then � 1 , � 2 , . . . , � m is a derivation from � 1 to � m q “Derivable” relation: transitive, reflexive ∗ ) � m � 1 11 Natural Language Parsing Technology
Outline Overview Basic Parsing Algorithms Parsing Strategies CYK Algorithm Earley’s Algorithm Parsing with Probabilistic Context-Free Grammar PCFG Inside-Outside Algorithm Recent Advances in Parsing Technology 12 Natural Language Parsing Technology
Parsing Strategies q Top-down: start from the start symbol, and expand the tree with grammar rules (e.g. replace LHS symbol with RHS sequences of CFG productions) q Bottom-up: start from the input sequence, and apply grammar rules to build trees upwards (e.g. reducing RHS sequence into LHS symbols) 13 Natural Language Parsing Technology
Top-Down Parsing q Goal-directed search 1. S ! NP VP q Waste time on trees that do 2. NP ! NP PP not match input sentence 3. . . . q Pure top-down (left-first) S approach cannot parse NP VP (left-)recursion grammars NP PP NP PP NP PP . . . 14 Natural Language Parsing Technology
Bottom-Up Parsing q Use the input to guide the 1. A ! B | a search (data-driven) 2. B ! A q Waste time on trees that don’t 3. . . . result in S . . . q Recursive unary rules still B create an infinite parse forest A for a finite length sentence B A a 15 Natural Language Parsing Technology
Problems q Left-recursion NP ! NP PP q Ambiguity q Repeated parsing of subtrees 16 Natural Language Parsing Technology
Dynamic Programming (DP) q Divisibility: the optimal solution of a sub problem is part of the optimal solution of the whole problem q Memoization: solve small problems only once and remember the answers Example Calculating Fibonacci numbers: F n = F n − 1 + F n − 2 ( F 0 = 0 , F 1 = 1 ) Pascal Triangle (Binomial Coefficients): ✓ n + 1 ◆ ✓ n ◆ ✓ n ◆ = + k + 1 k k + 1 17 Natural Language Parsing Technology
CYK Algorithm q Cocke-Younger-Kasami, also known as CKY algorithm q Essentially a bottom-up chart parsing algorithm using dynamic programming q CFG is in Chomsky Normal Form (CNF) q A ! BC q A ! a q S ! ✏ q A , B , C 2 V N , a 2 V T , B , C 6 = S q Fill in a two-dimension array: C [ i ][ j ] contains all the possible syntactic interpretations of the substring w i + 1 . . . w j q Complexity O ( n 3 ) 18 Natural Language Parsing Technology
CYK Algorithm 0 i < j n do 1: for all i , j C [ i ][ j ] ( ; 2: 3: end for 4: for all A ! w i 2 P do C [ i � 1 ][ i ] ( { A } [ C [ i � 1 ][ i ] 5: 6: end for 7: for s = h 2 . . . n i do 8: for all A ! B C 2 P , i , k : 0 i < k < i + s do 9: if B 2 C [ i ][ k ] ^ C 2 C [ k ][ i + s ] then 10: C [ i ][ i + s ] ( { A } [ C [ i ][ i + s ] 11: end if 12: end for 13: end for 19 Natural Language Parsing Technology
CYK Chart Example S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP NP → D N | NP PP | N PP PP → P NP | P N N → john, girl, car V → saw, walks P → in D → the, a john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
CYK Chart Example N V S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP D NP → D N | NP PP | N PP N PP → P NP | P N N → john, girl, car P V → saw, walks D P → in D → the, a N N V D N P D N john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
CYK Chart Example N S V S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP D NP NP → D N | NP PP | N PP N PP → P NP | P N N → john, girl, car P V → saw, walks D NP P → in D → the, a N S NP NP N V D N P D N john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
CYK Chart Example N S V VP S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP D NP NP → D N | NP PP | N PP N PP → P NP | P N N → john, girl, car P PP V → saw, walks D NP P → in D → the, a N VP PP S NP NP N V D N P D N john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
CYK Chart Example N S S V VP S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP D NP NP → D N | NP PP | N PP N NP PP → P NP | P N N → john, girl, car P PP V → saw, walks D NP P → in D → the, a N S NP VP PP S NP NP N V D N P D N john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
CYK Chart Example N S S V VP S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP D NP NP NP → D N | NP PP | N PP N NP PP → P NP | P N N → john, girl, car P PP V → saw, walks D NP P → in D → the, a N NP S NP VP PP S NP NP N V D N P D N john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
CYK Chart Example N S S V VP VP S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP D NP NP NP → D N | NP PP | N PP N NP PP → P NP | P N N → john, girl, car P PP V → saw, walks D NP P → in D → the, a N VP NP S NP VP PP S NP NP N V D N P D N john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
CYK Chart Example N S S S V VP VP S → NP VP | N VP | N V | NP V VP → V NP | V N | VP PP D NP NP NP → D N | NP PP | N PP N NP PP → P NP | P N N → john, girl, car P PP V → saw, walks D NP P → in D → the, a N S VP NP S NP VP PP S NP NP N V D N P D N john saw the girl in a car 0 1 2 3 4 5 6 7 20 Natural Language Parsing Technology
Recommend
More recommend