natural language processing
play

Natural Language Processing Lecture 132/26/2015 Martha Palmer - PowerPoint PPT Presentation

Natural Language Processing Lecture 132/26/2015 Martha Palmer Today Start on Parsing Top-down vs. Bottom-up Speech and Language Processing - Jurafsky and Martin 2/26/15 2 Summary Context-free grammars can be used to model


  1. Natural Language Processing Lecture 13—2/26/2015 Martha Palmer

  2. Today � Start on Parsing � Top-down vs. Bottom-up Speech and Language Processing - Jurafsky and Martin 2/26/15 2

  3. Summary � Context-free grammars can be used to model various facts about the syntax of a language. � When paired with parsers, such grammars consititute a critical component in many applications. � Constituency is a key phenomena easily captured with CFG rules. � But agreement and subcategorization do pose significant problems � Treebanks pair sentences in a corpus with their corresponding trees. Speech and Language Processing - Jurafsky and Martin 2/26/15 3

  4. Parsing � Parsing with CFGs refers to the task of assigning proper trees to input strings � Proper here means a tree that covers all and only the elements of the input and has an S at the top � It doesn ’ t actually mean that the system can select the correct tree from among all the possible trees Speech and Language Processing - Jurafsky and Martin 2/26/15 4

  5. Automatic Syntactic Parse

  6. For Now � Assume… � You have all the words already in some buffer � The input is not POS tagged prior to parsing � We won ’ t worry about morphological analysis � All the words are known � These are all problematic in various ways, and would have to be addressed in real applications. Speech and Language Processing - Jurafsky and Martin 2/26/15 6

  7. Top-Down Search � Since we ’ re trying to find trees rooted with an S (Sentences), why not start with the rules that give us an S . � Then we can work our way down from there to the words. Speech and Language Processing - Jurafsky and Martin 2/26/15 7

  8. Top Down Space Speech and Language Processing - Jurafsky and Martin 2/26/15 8

  9. Bottom-Up Parsing � Of course, we also want trees that cover the input words. So we might also start with trees that link up with the words in the right way. � Then work your way up from there to larger and larger trees. Speech and Language Processing - Jurafsky and Martin 2/26/15 9

  10. Bottom-Up Search Speech and Language Processing - Jurafsky and Martin 2/26/15 10

  11. Bottom-Up Search Speech and Language Processing - Jurafsky and Martin 2/26/15 11

  12. Bottom-Up Search Speech and Language Processing - Jurafsky and Martin 2/26/15 12

  13. Bottom-Up Search Speech and Language Processing - Jurafsky and Martin 2/26/15 13

  14. Bottom-Up Search Speech and Language Processing - Jurafsky and Martin 2/26/15 14

  15. Control � Of course, in both cases we left out how to keep track of the search space and how to make choices � Which node to try to expand next � Which grammar rule to use to expand a node � One approach is called backtracking. � Make a choice, if it works out then fine � If not then back up and make a different choice � Same as with ND-Recognize Speech and Language Processing - Jurafsky and Martin 2/26/15 15

  16. Problems � Even with the best filtering, backtracking methods are doomed because of two inter-related problems � Ambiguity and search control (choice) � Shared subproblems Speech and Language Processing - Jurafsky and Martin 2/26/15 16

  17. Ambiguity Speech and Language Processing - Jurafsky and Martin 2/26/15 17

  18. Structural Ambiguities � Its very important to separate PP ’ s that are part of the verb subcategorization frame from PP ’ s that modify the entire event. � The man saw the woman on the hill with the telescope. Woman has telescope � The man saw the woman on the hill with the telescope. Man has telescope 18

  19. Shared Sub-Problems � No matter what kind of search (top-down or bottom-up or mixed) that we choose... � We can’t afford to redo work we’ve already done. � Without some help naïve backtracking will lead to such duplicated work. Speech and Language Processing - Jurafsky and Martin 2/26/15 19

  20. Sample L1 Grammar Speech and Language Processing - Jurafsky and Martin 2/26/15 20

  21. State space representations: Recursive transition nets NP VP S S1 S2 S3 pp det noun NP adj S4 S5 S6 noun pronoun NP � s :- np,vp. � np:- pronoun; noun; det,adj, noun; np,pp. CSE391 – NLP 21 2005

  22. State space representations: Recursive transition nets, cont. VP PP VP NP S13 S14 V S9 NP PP VP aux S7 S8 S12 NP S11 NP V S10 V � VP:- VP, PP. � VP:- V; V,NP; V,NP,NP; V,NP,PP. CSE391 – NLP 22 2005

  23. S1 Parses The cat sat on the mat S S1: S → NP, VP NP VP 23

  24. S1 Parses S2 The cat sat on the mat S1: S → NP, VP S S2: NP → Det, N NP VP Det N the cat 24

  25. S1 Parses S2 The cat sat on the mat S3 S1: S → NP, VP S S2: NP → Det, N S3: VP → V NP VP Det N V the cat sat 25

  26. S1 Parses S2 The cat sat on the mat S3 S1: S → NP, VP S4 S S2: NP → Det, N S3: VP → V NP S4: VP → VP, PP VP Det PP N V the cat sat NP Prep N on Det mat the 26

  27. Multiple parses for a single sentence Time flies like an arrow. S NP VP N time V PP flies Prep NP like Det N arrow an NLP 27

  28. Multiple Parses for a single sentence Time flies like an arrow. S NP VP N time V NP N like flies N Det arrow an NLP 28

  29. Lexicon noun(cat). noun(flies). noun(mat). noun(time). det(the). noun(arrow). det(a). det(an). verb(sat). verb(flies). prep(on). verb(time). prep(like). CSE391 – NLP 29 2005

  30. Lexicon with Roots noun(flies,fly). noun(cat,cat). noun(time,time). noun(mat,mat). noun(arrow,arrow). det(the,the) det(an,an). det(a,a). verb(flies,fly). verb(sat,sit). verb(time,time). prep(on,on). prep(like,like). CSE391 – NLP 30 2005

  31. Parses The old can can hold the water. S NP VP det NP aux the V can adj N hold can old det N the water CSE391 – NLP 31 2005

  32. Structural ambiguities � That factory can can tuna. � That factory cans cans of tuna and salmon. CSE391 – NLP 32 2005

  33. Lexicon The old can can hold the water. Noun(can,can) Verb(hold,hold) Noun(cans,can) Verb(holds,hold) Noun(water,water) Verb(can, can) Noun(hold,hold) Aux(can,can) Noun(holds,hold) Adj(old,old) Det(the,the) Noun(old, old) CSE391 – NLP 33 2005

  34. Simple Context Free Grammar in BNF S → NP VP NP → Pronoun | Noun | Det Adj Noun |NP PP PP → Prep NP V → Verb | Aux Verb VP → V | V NP | V NP NP | V NP PP | VP PP NLP 34

  35. Top-down parse in progress [The, old, can, can, hold, the, water] S → NP VP NP → NP? NP → Pronoun? Pronoun? fail NP → Noun? Noun? fail NP → Det Adj Noun? Det? the ADJ? old Noun? Can Succeed. Succeed. VP? CSE391 – NLP 35 2005

  36. Top-down parse in progress [can, hold, the, water] VP → VP? V → Verb? Verb? fail V → Aux Verb? Aux? can Verb? hold succeed succeed fail [the, water] CSE391 – NLP 36 2005

  37. Top-down parse in progress [can, hold, the, water] VP → V NP? V → Verb? Verb? fail V → Aux Verb? Aux? can Verb? h old NP → Pronoun? Pronoun? fail NP → Noun? Noun? fail NP → Det Adj Noun? Det? the Noun? w ater SUCCEED SUCCEED CSE391 – NLP 37 2005

  38. Top-down approach � Start with goal of sentence S → NP VP S → Wh-word Aux NP VP � Will try to find an NP 4 different ways before trying a parse where the verb comes first. � What would be better? CSE391 – NLP 38 2005

  39. Bottom-up approach � Start with words in sentence. � What structures do they correspond to? � Once a structure is built, kept on a CHART. CSE391 – NLP 39 2005

  40. Bottom-up parse in progress det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun . 40

  41. Bottom-up parse in progress S VP NP NP V det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun . VP NP NP NP V S VP

  42. Bottom-up parse in progress – What is wrong w/ bottom parse? det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun/verb . 42 NLP

  43. Bottom-up parse, corrected The old can can hold the water. det noun verb noun noun det noun/verb . NP NP V NP VP S 43 NLP

  44. Headlines � Police Begin Campaign To Run Down Jaywalkers � Iraqi Head Seeks Arms � Teacher Strikes Idle Kids � Miners Refuse To Work After Death � Juvenile Court To Try Shooting Defendant 44 NLP

  45. Headlines � Drunk Gets Nine Months in Violin Case � Enraged Cow Injures Farmer with Ax � Hospitals are Sued by 7 Foot Doctors � Milk Drinkers Turn to Powder � Lung Cancer in Women Mushrooms 45 NLP

  46. Top-down vs. Bottom-up � Helps with POS ambiguities – only � Has to consider every consider relevant POS POS � Rebuilds the same � Builds each structure structure repeatedly once � Spends a lot of time � Spends a lot of time on on impossible parses useless structures ( trees that make no sense ( trees that are not globally ) consistent with any of the words) What would be better? 46 NLP

Recommend


More recommend