parse trees statistical nlp
play

Parse Trees Statistical NLP Spring 2011 Lecture 15: Parsing I The - PDF document

Parse Trees Statistical NLP Spring 2011 Lecture 15: Parsing I The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market Dan Klein UC Berkeley Phrase Structure Parsing Constituency


  1. Parse Trees Statistical NLP Spring 2011 Lecture 15: Parsing I The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market Dan Klein – UC Berkeley Phrase Structure Parsing Constituency Tests � Phrase structure parsing � How do we know what nodes go in the tree? organizes syntax into constituents or brackets � Classic constituency tests: � � Substitution by proform In general, this involves nested trees � Question answers � Semantic gounds � Linguists can, and do, � Coherence S argue about details � Reference VP � Idioms � Dislocation � NP PP Lots of ambiguity NP � Conjunction N’ NP � Not the only kind of new art critics write reviews with computers syntax… � Cross-linguistic arguments, too Conflicting Tests Classical NLP: Parsing � Constituency isn’t always clear � Write symbolic or logical rules: � Units of transfer: Grammar (CFG) Lexicon � think about ~ penser à ROOT → S NP → NP PP NN → interest � talk about ~ hablar de S → NP VP VP → VBP NP NNS → raises NP → DT NN VP → VBP NP PP VBP → interest � Phonological reduction: NP → NN NNS PP → IN NP VBZ → raises � I will go → I’ll go … � I want to go → I wanna go � Use deduction systems to prove parses from words � a le centre → au centre La vélocité des ondes sismiques � Minimal grammar on “Fed raises” sentence: 36 parses � Simple 10-rule grammar: 592 parses � Coordination � Real-size grammar: many millions of parses � He went to and came from the store. � This scaled very badly, didn’t yield broad-coverage tools 1

  2. Ambiguities: PP Attachment Attachments � I cleaned the dishes from dinner � I cleaned the dishes with detergent � I cleaned the dishes in my pajamas � I cleaned the dishes in the sink Syntactic Ambiguities I Syntactic Ambiguities II � Modifier scope within NPs � Prepositional phrases: impractical design requirements They cooked the beans in the pot on the stove with plastic cup holder handles. � Multiple gap constructions � Particle vs. preposition: The puppy tore up the staircase. The chicken is ready to eat. The contractors are rich enough to sue. � Complement structures The tourists objected to the guide that they couldn’t hear. � Coordination scope: She knows you like the back of her hand. Small rats and mice can squeeze into holes or cracks in the wall. � Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers. Ambiguities as Trees Probabilistic Context-Free Grammars � A context-free grammar is a tuple < N, T, S, R > � N : the set of non-terminals � Phrasal categories: S, NP, VP, ADJP, etc. � Parts-of-speech (pre-terminals): NN, JJ, DT, VB � T : the set of terminals (the words) � S : the start symbol � Often written as ROOT or TOP � Not usually the sentence non-terminal S � R : the set of rules � Of the form X → Y 1 Y 2 … Y k , with X, Y i ∈ N � Examples: S → NP VP, VP → VP CC VP � Also called rewrites, productions, or local trees � A PCFG adds: � A top-down production probability per rule P(Y 1 Y 2 … Y k | X) 2

  3. Treebank Sentences Treebank Grammars � Need a PCFG for broad coverage parsing. � Can take a grammar right off the trees (doesn’t work well): ROOT → S 1 S → NP VP . 1 NP → PRP 1 VP → VBD ADJP 1 ….. � Better results by enriching the grammar (e.g., lexicalization). � Can also get reasonable parsers without lexicalization. Treebank Grammar Scale Chomsky Normal Form � Treebank grammars can be enormous � Chomsky normal form: � As FSAs, the raw grammar has ~10K states, excluding the lexicon � All rules of the form X → Y Z or X → w � In principle, this is no limitation on the space of (P)CFGs � Better parsers usually make the grammars larger, not smaller � N-ary rules introduce new non-terminals NP VP VP ADJ [VP → VBD NP PP • ] [VP → VBD NP • ] NOUN DET VBD NP PP PP DET NOUN VBD NP PP PP � Unaries / empties are “promoted” PLURAL NOUN � In practice it’s kind of a pain: � Reconstructing n-aries is easy NP PP � Reconstructing unaries is trickier NP NP � The straightforward transformations don’t preserve tree scores � Makes parsing algorithms simpler! CONJ A Recursive Parser A Memoized Parser � One small change: bestScore(X,i,j,s) if (j = i+1) bestScore(X,i,j,s) return tagScore(X,s[i]) if (scores[X][i][j] == null) else if (j = i+1) return max score(X->YZ) * score = tagScore(X,s[i]) bestScore(Y,i,k) * else bestScore(Z,k,j) score = max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j) scores[X][i][j] = score � Will this parser work? return scores[X][i][j] � Why or why not? � Memory requirements? 3

  4. A Bottom-Up Parser (CKY) Unary Rules � Unary rules? � Can also organize things bottom-up bestScore(s) X for (i : [0,n-1]) for (X : tags[s[i]]) bestScore(X,i,j,s) Y Z score[X][i][i+1] = if (j = i+1) tagScore(X,s[i]) return tagScore(X,s[i]) for (diff : [2,n]) else i k j for (i : [0,n-diff]) return max max score(X->YZ) * j = i + diff bestScore(Y,i,k) * for (X->YZ : rule) bestScore(Z,k,j) for (k : [i+1, j-1]) max score(X->Y) * score[X][i][j] = max score[X][i][j], bestScore(Y,i,j) score(X->YZ) * score[Y][i][k] * score[Z][k][j] CNF + Unary Closure Alternating Layers � We need unaries to be non-cyclic bestScoreB(X,i,j,s) � Can address by pre-calculating the unary closure return max max score(X->YZ) * bestScoreU(Y,i,k) * � Rather than having zero or more unaries, always bestScoreU(Z,k,j) have exactly one VP SBAR VP SBAR VBD NP bestScoreU(X,i,j,s) VBD NP S VP if (j = i+1) NP DT NN VP return tagScore(X,s[i]) DT NN else return max max score(X->Y) * � Alternate unary and binary layers bestScoreB(Y,i,j) � Reconstruct unary chains afterwards Memory Time: Theory � How much time will it take to parse? � How much memory does this require? � Have to store the score cache � Cache size: |symbols|*n 2 doubles � For each diff (<= n) � For the plain treebank grammar: � X ~ 20K, n = 40, double ~ 8 bytes = ~ 256MB � For each i (<= n) X � Big, but workable. � For each rule X → Y Z Y Z � For each split point k � Pruning: Beams Do constant work � score[X][i][j] can get too large (when?) � Can keep beams (truncated maps score[i][j]) which only store the best i k j � Total time: |rules|*n 3 few scores for the span [i,j] � Something like 5 sec for an unoptimized � Pruning: Coarse-to-Fine parse of a 20-word sentences � Use a smaller grammar to rule out most X[i,j] � Much more on this later… 4

  5. Time: Practice Efficient CKY � Lots of tricks to make CKY efficient � Parsing with the vanilla treebank grammar: � Most of them are little engineering details: � E.g., first choose k, then enumerate through the Y:[i,k] which ~ 20K Rules are non-zero, then loop through rules by left child. (not an � Optimal layout of the dynamic program depends on optimized parser!) grammar, input, even system details. � Another kind is more critical: Observed exponent: � Many X:[i,j] can be suppressed on the basis of the input 3.6 string � We’ll see this next class as figures-of-merit or A* heuristics � Why’s it worse in practice? � Longer sentences “unlock” more of the grammar � All kinds of systems issues don’t scale Same-Span Reachability Rule State Reachability ��� �������� ������ • �� � ��� �� �� �� "���#$ �� % ��� ��������� � !" ������������ ����� ����������� �������� ��������� • ��� ����������� ���� �� �� �� ���#$ �� %' ���� ��� � !&!" !& ����� ������ ���� � Many states are more likely to match larger spans! ������ Unaries in Grammars ��� ��� ��� ��� ��� �!��� � � � ��!���� �� �� �� �� !����! �� !����! �� �� �� ε ε �%( � �%( � �%( � �%( � �%( � �#$+ �(, �����)�� �(�)� '*()� �(���%#�' �(� �)#�' 5

Recommend


More recommend