StatisticalNLP Spring2010 Lecture14:PCFGs DanKlein UCBerkeley - PDF document

Statistical�NLP Spring�2010 Lecture�14:�PCFGs Dan�Klein�– UC�Berkeley Treebank�PCFGs [Charniak�96] � Use�PCFGs�for�broad�coverage�parsing � Can�take�a�grammar�right�off�the�trees�(doesn’t�work�well): �� → � � �� → �� → �� → �� Baseline 72.0 1

Conditional�Independence? � Not�every�NP�expansion�can�fill�every�NP�slot � A�grammar�with�symbols�like�“NP”�won’t�be�context9free � Statistically,�conditional�independence�too�strong Non9Independence � Independence�assumptions�are�often�too�strong. �� Example:�the�expansion�of�an�NP�is�highly�dependent� on�the�parent�of�the�NP�(i.e.,�subjects�vs.�objects). � Also:�the�subject�and�object�expansions�are�correlated! 2

Grammar�Refinement � Example:�PP�attachment Grammar�Refinement � Structure�Annotation�[Johnson�’98,�Klein&Manning ’03] � Lexicalization�[Collins�’99,�Charniak ’00] � Latent�Variables�[Matsuzaki et�al.�05,�Petrov et�al.�’06] 3

The�Game�of�Designing�a�Grammar � Annotation�refines�base�treebank�symbols�to� improve�statistical�fit�of�the�grammar � Structural�annotation Typical�Experimental�Setup � Corpus:�Penn�Treebank,�WSJ Training: sections 02921 Development: section 22�(here,�first�20�files) Test: section 23 � Accuracy�– F1:�harmonic�mean�of�per9node�labeled� precision�and�recall. � Here:�also�size�– number�of�symbols�in�grammar. � Passive�/�complete�symbols:�NP,�NP^S � Active�/�incomplete�symbols:�NP� → NP�CC� • 4

Vertical�Markovization �� Vertical�Markov� order:�rewrites� depend�on�past� � ancestor�nodes. (cf.�parent� annotation) 79% 25000 78% 20000 77% �� 15000 76% 75% 10000 74% 5000 73% 72% 0 1 2v 2 3v 3 1 2v 2 3v 3 Vertical�Markov�Order Vertical�Markov�Order Horizontal�Markovization �� ∞ ∞ ∞ ∞ �� 12000 74% 9000 73% �� 72% 6000 71% 3000 70% 0 0 1 2v 2 inf 0 1 2v 2 inf Horizontal�Markov�Order Horizontal�Markov�Order 5

Unary�Splits � Problem:�unary� rewrites�used�to� transmute� categories�so�a� high9probability� rule�can�be� used. � �� Annotation F1 Size �� Base 77.8 7.5K UNARY 78.3 8.0K Tag�Splits � Problem:�Treebank� tags�are�too�coarse. � Example:�Sentential,� PP,�and�other� prepositions�are�all� marked�IN. Annotation F1 Size � Partial�Solution: Previous 78.3 8.0K � Subdivide�the�IN�tag. SPLIT9IN 80.3 8.1K 6

Other�Tag�Splits F1 Size � UNARY9DT:�mark�demonstratives�as�DT^U 80.4 8.1K (“the�X”�vs.�“those”) � UNARY9RB:�mark�phrasal�adverbs�as�RB^U 80.5 8.1K (“quickly”�vs.�“very”) � TAG9PA:�mark�tags�with�non9canonical� 81.2 8.5K parents�(“not”�is�an�RB^VP) � SPLIT9AUX:�mark�auxiliary�verbs�with�–AUX� 81.6 9.0K [cf.�Charniak�97] � SPLIT9CC:�separate�“but”�and�“&”�from�other� 81.7 9.1K conjunctions 81.8 9.3K � SPLIT9%:�“%”�gets�its�own�tag. A�Fully�Annotated�(Unlex)�Tree 7

Some�Test�Set�Results Parser LP LR �� CB 0�CB Magerman�95 84.9 84.6 �� 1.26 56.6 Collins�96 86.3 85.8 �� 1.14 59.9 Unlexicalized 86.9 85.7 �� 1.10 60.3 Charniak�97 87.4 87.5 �� 1.00 62.1 Collins�99 88.7 88.6 �� 0.90 67.1 � Beats�“first�generation”�lexicalized�parsers. � Lots�of�room�to�improve�– more�complex�models�next. The�Game�of�Designing�a�Grammar � Annotation�refines�base�treebank�symbols�to� improve�statistical�fit�of�the�grammar � Structural�annotation�[Johnson�’98,�Klein�and� Manning�03] � Head�lexicalization [Collins�’99,�Charniak�’00] 8

Problems�with�PCFGs � What’s�different�between�basic�PCFG�scores�here? � What�(lexical)�correlations�need�to�be�scored? Lexicalized�Trees � Add�“headwords”�to� each�phrasal�node � Syntactic�vs.�semantic� heads � Headship�not�in�(most)� treebanks � Usually� �� ,� e.g.: � NP: � Take�leftmost�NP � Take�rightmost�N* � Take�rightmost�JJ � Take�right�child � VP: � Take�leftmost�VB* � Take�leftmost�VP � Take�left�child 9

Lexicalized�PCFGs? � Problem:�we�now�have�to�estimate�probabilities�like � Never�going�to�get�these�atomically�off�of�a�treebank � Solution:�break�up�derivation�into�smaller�steps Lexical�Derivation�Steps � A�derivation�of�a�local�tree�[Collins�99] Choose�a�head�tag�and�word Choose�a�complement�bag Generate�children�(incl.�adjuncts) Recursively�derive�children 10

Lexicalized�CKY �%&��%'(+++*&� • • ��)� • • X[h] �%&��%'(� • • • ��)� • *&�� Y[h] Z[h’] �� i��h��k��h’��j �� !��"��# �� $��# ��!�$��"� �� "��!��# �� $��"��# ��!�$�� Pruning�with�Beams � The�Collins�parser�prunes�with� per9cell�beams�[Collins�99] � Essentially,�run�the�O(n 5 ) CKY � Remember�only�a�few�hypotheses�for� X[h] each�span�<i,j>. � If�we�keep�K�hypotheses�at�each� span,�then�we�do�at�most�O(nK 2 )� Y[h] Z[h’] work�per�span�(why?) � Keeps�things�more�or�less�cubic i��h��k��h’��j � Also:�certain�spans�are�forbidden� entirely�on�the�basis�of� punctuation�(crucial�for�speed) 11

Pruning�with�a�PCFG � The�Charniak�parser�prunes�using�a�two9pass� approach�[Charniak�97+] � First,�parse�with�the�base�grammar � For�each�X:[i,j]�calculate�P(X|i,j,s) � This�isn’t�trivial,�and�there�are�clever�speed�ups � Second,�do�the�full�O(n 5 ) CKY � Skip�any�X�:[i,j]�which�had�low�(say,�<�0.0001)�posterior � Avoids�almost�all�work�in�the�second�phase! � Charniak�et�al�06:�can�use�more�passes � Petrov�et�al�07:�can�use�many�more�passes Pruning�with�A* � You�can�also�speed�up� the�search�without� sacrificing�optimality � For�agenda9based� parsers: � � Can�select�which�items�to� process�first � � � � � Can�do�with�any�“figure�of� merit”�[Charniak�98] � If�your�figure9of9merit�is�a� valid�A*�heuristic,�no�loss� of�optimiality�[Klein�and� Manning�03] 12

Projection9Based�A* π π �� A*�Speedup 60 50 Combined�Phase Time�(sec) 40 Dependency�Phase 30 PCFG�Phase 20 10 0 0 5 10 15 20 25 30 35 40 Length � Total�time�dominated�by�calculation�of�A*�tables�in�each� projection…�O(n 3 ) 13

StatisticalNLP Spring2010 Lecture14:PCFGs DanKlein UCBerkeley - PDF document

StatisticalNLP Spring2010 Lecture14:PCFGs DanKlein UCBerkeley TreebankPCFGs [Charniak96] UsePCFGsforbroadcoverageparsing

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Maximum Entropy Model (I) LING 572 Advanced Statistical Methods for NLP January 28, 2020 1

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Harmonic Means of Wishart Matrices Hi! Im Asad Lodhia, Im a postdoc at the University of

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

Genome 562 January 2015 Week 1 Genome 562 p.1/6 Early workers in theoretical population

Electron Lifetime Measurement Matt Thiesse 11 January 2017 35-ton Sim/Reco/Ana Meeting 1

Evaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure

Quadrature Domains in Complex Variables Alan Legg Department of Mathematical Sciences, IPFW

EE E6882 SVIA: Homework 1 Due on October 1, 2007 Shih-Fu Chang, Lexing Xie Monday 4:10-6:30

Sambuz

Useful Links

Newsletter

Mail Us