cse 447 547 natural language processing winter 2018
play

CSE 447/547 Natural Language Processing Winter 2018 Parsing - PowerPoint PPT Presentation

CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney] Ambiguities I shot [an elephant] [in my pajamas] Examples


  1. CSE 447/547 Natural Language Processing Winter 2018 Parsing (Trees) Yejin Choi - University of Washington [Slides from Dan Klein, Michael Collins, Luke Zettlemoyer and Ray Mooney]

  2. Ambiguities

  3. I shot [an elephant] [in my pajamas] Examples from J&M

  4. Syntactic Ambiguities I § Prepositional phrases: They cooked the beans in the pot on the stove with handles. § Particle vs. preposition: The puppy tore up the staircase. § Complement structures The tourists objected to the guide that they couldn ’ t hear. She knows you like the back of her hand. § Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers.

  5. Syntactic Ambiguities II § Modifier scope within NPs impractical design requirements plastic cup holder § Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue. § Coordination scope: Small rats and mice can squeeze into holes or cracks in the wall.

  6. Dark Ambiguities § Dark ambiguities : most analyses are shockingly bad (meaning, they don ’ t have an interpretation you can get your mind around) This analysis corresponds to the correct parse of “ This will panic buyers ! ” § Unknown words and new usages § Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this

  7. Probabilistic Context Free Grammars

  8. Probabilistic Context-Free Grammars § A context-free grammar is a tuple < N, Σ ,S, R > § N : the set of non-terminals § Phrasal categories: S, NP, VP, ADJP, etc. § Parts-of-speech (pre-terminals): NN, JJ, DT, VB, etc. § Σ : the set of terminals (the words) § S : the start symbol § Often written as ROOT or TOP § Not usually the sentence non-terminal S § R : the set of rules § Of the form X → Y 1 Y 2 … Y n , with X ∈ N, n ≥ 0, Y i ∈ (N ∪ Σ ) § Examples: S → NP VP, VP → VP CC VP § A PCFG adds a distribution q: § Probability q(r) for each r ∈ R, such that for all X ∈ N: � q ( α → β ) = 1 α → β ∈ R : α = X for any .

  9. PCFG Example Vi sleeps 1.0 ⇒ S NP VP 1.0 ⇒ Vt saw 1.0 ⇒ VP Vi 0.4 ⇒ NN man 0.7 ⇒ VP Vt NP 0.4 ⇒ NN woman 0.2 ⇒ VP VP PP 0.2 ⇒ NN telescope 0.1 ⇒ NP DT NN 0.3 ⇒ DT the 1.0 ⇒ NP NP PP 0.7 ⇒ IN with 0.5 ⇒ PP P NP 1.0 ⇒ IN in 0.5 ⇒ • Probability of a tree t with rules α 1 → β 1 , α 2 → β 2 , . . . , α n → β n is n � p ( t ) = q ( α i → β i ) i =1 where q ( α → β ) is the probability for rule α → β . 44

  10. PCFG Example S NP VP 1.0 S ⇒ 1.0 VP Vi 0.4 ⇒ NP VP t 1 = 0.3 0.4 VP Vt NP 0.4 ⇒ DT NN Vi VP VP PP 0.2 ⇒ 1.0 0.7 1.0 The man sleeps NP DT NN 0.3 ⇒ p(t 1 )=1.0*0.3*1.0*0.7*0.4*1.0 NP NP PP 0.7 ⇒ PP P NP 1.0 S ⇒ 1.0 Vi sleeps 1.0 VP Probability of a tree with ru ⇒ 0.2 Vt saw 1.0 t 2 = ⇒ VP PP NN man 0.7 ⇒ 0.4 0.4 NN woman 0.2 NP Vt NP IN NP ⇒ 0.3 0.3 0.3 0.5 1.0 NN telescope 0.1 ⇒ DT NN DT NN DT NN DT the 1.0 ⇒ 1.0 0.2 1.0 0.7 1.0 0.1 The man saw the woman with the telescope IN with 0.5 ⇒ p(t s )=1.8*0.3*1.0*0.7*0.2*0.4*1.0*0.3*1.0*0.2*0.4*0.5*0.3*1.0*0.1 IN in 0.5 ⇒ rules

  11. PCFGs: Learning and Inference § Model The probability of a tree t with n rules α i à β i , i = 1..n § n Y p ( t ) = q ( α i → β i ) i =1 § Learning Read the rules off of labeled sentences, use ML estimates for § probabilities q ML ( α → β ) = Count( α → β ) Count( α ) and use all of our standard smoothing tricks! § § Inference For input sentence s, define T(s) to be the set of trees whole yield is s § (whole leaves, read left to right, match the words in s) t ∗ ( s ) = arg max t ∈ T ( s ) p ( t )

  12. Dynamic Programming § We will store: score of the max parse of x i to x j with root non-terminal X π ( i, j, X ) = § So we can compute the most likely parse: = max t ∈ T G ( s ) p ( t ) π (1 , n, S ) = for all , § Via the recursion: is the s π ( i, j, X ) = max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) } , § With base case: The next section of this note gives justification for this recursive definition. � q ( X → x i ) if X → x i ∈ R π ( i, i, X ) = 0 otherwise natural definition: the only way that we can have a tree ro

  13. The CKY Algorithm § Input: a sentence s = x 1 .. x n and a PCFG = <N, Σ ,S, R, q> § Initialization: For i = 1 … n and all X in N , � q ( X → x i ) if X → x i ∈ R π ( i, i, X ) = 0 otherwise § For l = 1 … (n-1) [iterate all phrase lengths] natural definition: the only way that we can have a tree ro § For i = 1 … (n-l) and j = i+l [iterate all phrases of length l] § For all X in N [iterate all non-terminals] for all , π ( i, j, X ) = max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) } The next section of this note gives justification for this recursive definition. § also, store back pointers bp ( i, j, X ) = arg max ( q ( X → Y Z ) × π ( i, s, Y ) × π ( s + 1 , j, Z )) X → Y Z ∈ R, s ∈ { i... ( j − 1) }

  14. Probabilistic CKY Parser 0.8 S → NP VP 0.1 Book the flight through Houston S → X1 VP 1.0 X1 → Aux NP S :.01, S:.05*.5*.054 S: .03*.0135*.032 S → book | include | prefer Verb:.5 =.00135 =.00001296 0.01 0.004 0.006 S:. 05*.5* Nominal:.03 0.05 S → Verb NP None VP:.5*.5*.054 .000864 None 0.03 S → VP PP =.0000216 =.0135 NP → I | he | she | me NP:.6*.6* 0.1 0.02 0.02 0.06 .0024 NP:.6*.6*.15 NP → Houston | NWA None =.000864 =.054 0.16 .04 Det:.6 Det → the | a | an 0.6 0.1 0.05 Nominal: 0.6 NP → Det Nominal .5*.15*.032 None Nominal → book | flight | meal | money Nominal:.15 =.0024 0.03 0.15 0.06 0.06 0.2 Nominal → Nominal Nominal 0.5 Nominal → Nominal PP PP:1.0*.2*.16 Prep:.2 =.032 Verb → book | include | prefer 0.5 0.04 0.06 0.5 VP → Verb NP 0.3 VP → VP PP Prep → through | to | from NP:.16 0.2 0.3 0.3 1.0 PP → Prep NP

  15. Probabilistic CKY Parser Book the flight through Houston Parse Tree S :.01, S:.05*.5*.054 Verb:.5 =.00135 S:.0000216 #1 Nominal:.03 None VP:.5*.5*.054 None =.0135 Pick most NP:.6*.6* probable .0024 NP:.6*.6*.15 None parse, i.e. take =.000864 =.054 Det:.6 max to Nominal: combine .5*.15*.032 Nominal:.15 None =.0024 probabilities of multiple PP:1.0*.2*.16 derivations Prep:.2 =.032 of each constituent in each cell. NP:.16

  16. Probabilistic CKY Parser Parse Book the flight through Houston Tree S :.01, S:.05*.5*.054 #2 S: 00001296 Verb:.5 =.00135 S:.0000216 Nominal:.03 None VP:.5*.5*.054 None =.0135 Pick most NP:.6*.6* probable .0024 NP:.6*.6*.15 None parse, i.e. take =.000864 =.054 Det:.6 max to Nominal: combine .5*.15*.032 Nominal:.15 None =.0024 probabilities of multiple PP:1.0*.2*.16 derivations Prep:.2 =.032 of each constituent in each cell. NP:.16

  17. Memory § How much memory does this require? § Have to store the score cache § Cache size: |symbols|*n 2 § Pruning: Beam Search § score[X][i][j] can get too large (when?) § Can keep beams (truncated maps score[i][j]) which only store the best K scores for the span [i,j] § Pruning: Coarse-to-Fine § Use a smaller grammar to rule out most X[i,j] § Much more on this later…

  18. Time: Theory § How much time will it take to parse? § For each diff (:= j – i) (<= n) X § For each i (<= n) § For each rule X → Y Z Z Y § For each split point k Do constant work i k j § Total time: |rules|*n 3 § Something like 5 sec for an unoptimized parse of a 20-word sentences

  19. Time: Practice § Parsing with the vanilla treebank grammar: ~ 20K Rules (not an optimized parser!) Observed exponent: 3.6 § Why ’ s it worse in practice? § Longer sentences “ unlock ” more of the grammar § All kinds of systems issues don ’ t scale

  20. Other Dynamic Programs Can also compute other quantities: § Best Inside: score of the max parse X of w i to w j with root non-terminal X 1 n i j § Best Outside: score of the max parse of w 0 to w n with a gap from w i to w j rooted with non-terminal X X § see notes for derivation, it is a bit more complicated 1 n i j § Sum Inside/Outside: Do sums instead of maxes

  21. Why Chomsky Normal Form? Book the flight through Houston S :.01, S:.05*.5*.054 S: .03*.0135*.032 =.00001296 Verb:.5 =.00135 S:. 05*.5* Nominal:.03 None VP:.5*.5*.054 .000864 None =.0000216 =.0135 NP:.6*.6* .0024 NP:.6*.6*.15 Inference: None =.000864 =.054 Det:.6 § Can we keep N-ary (N > 2) rules and Nominal: .5*.15*.032 still do dynamic programming? None Nominal:.15 =.0024 § Can we keep unary rules and still do PP:1.0*.2*.16 Prep:.2 dynamic programming? =.032 Learning: NP:.16 § Can we reconstruct the original trees?

  22. Treebanks

Recommend


More recommend