Neural CRF Parsing Greg Durre2 and Dan Klein UC Berkeley - PowerPoint PPT Presentation

Basic ¡CRF ¡Model NP NP w > f = Y P ( T | x ) ∝ exp (score( r )) score NP PP NP PP 2 5 8 2 5 8 r ∈ T NP FirstWord ¡= ¡a ∧ NP PP NP NP PP He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9 [Hall, ¡Durre5, ¡Klein ¡(2014)]

Basic ¡CRF ¡Model NP NP w > f = Y P ( T | x ) ∝ exp (score( r )) score NP PP NP PP 2 5 8 2 5 8 r ∈ T NP FirstWord ¡= ¡a ∧ NP PP NP NP PrevWord ¡= ¡gave ∧ NP PP NP PP He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9 [Hall, ¡Durre5, ¡Klein ¡(2014)]

Basic ¡CRF ¡Model NP NP = w > f score NP PP NP PP 2 5 8 2 5 8 NP ∧ FirstWord ¡= ¡a NP PP NP ∧ PrevWord ¡= ¡gave NP PP

Basic ¡CRF ¡Model NP NP = w > f score NP PP NP PP 2 5 8 2 5 8 Surface ¡feature NP ∧ FirstWord ¡= ¡a NP PP NP ∧ PrevWord ¡= ¡gave NP PP

Basic ¡CRF ¡Model NP NP = w > f score NP PP NP PP 2 5 8 2 5 8 Surface ¡feature Label ¡feature NP ∧ FirstWord ¡= ¡a NP PP NP ∧ PrevWord ¡= ¡gave NP PP

Basic ¡CRF ¡Model NP NP = w > f score NP PP NP PP 2 5 8 2 5 8 Surface ¡feature Label ¡feature NP ∧ FirstWord ¡= ¡a NP PP NP ∧ PrevWord ¡= ¡gave NP PP s First ¡= ¡a, Prev ¡= ¡gave,…

Basic ¡CRF ¡Model NP NP = w > f score NP PP NP PP 2 5 8 2 5 8 ` NP Surface ¡feature Label ¡feature NP PP NP ∧ FirstWord ¡= ¡a NP PP NP … NP ∧ PrevWord ¡= ¡gave NP PP s First ¡= ¡a, Prev ¡= ¡gave, …

Basic ¡CRF ¡Model NP NP = w > f score NP PP NP PP 2 5 8 2 5 8 ` NP Surface ¡feature Label ¡feature NP PP F i,j = s i ` j NP ∧ FirstWord ¡= ¡a NP PP NP … NP ∧ PrevWord ¡= ¡gave NP PP s First ¡= ¡a, Prev ¡= ¡gave, …

Basic ¡CRF ¡Model NP NP = w > f = W � score NP PP NP PP 2 5 8 2 5 8 ` NP Surface ¡feature Label ¡feature NP PP F i,j = s i ` j NP ∧ FirstWord ¡= ¡a NP PP NP … NP ∧ PrevWord ¡= ¡gave NP PP s First ¡= ¡a, Prev ¡= ¡gave, …

Basic ¡CRF ¡Model NP NP X = w > f = W � score NP PP s NP PP X X 2 5 8 2 5 8 2 5 8 ` NP Surface ¡feature Label ¡feature NP PP F i,j = s i ` j NP ∧ FirstWord ¡= ¡a NP PP NP … NP ∧ PrevWord ¡= ¡gave NP PP s First ¡= ¡a, Prev ¡= ¡gave, …

Basic ¡CRF ¡Model NP NP X NP = w > f = W � ` > score NP PP s NP PP X X NP PP 2 5 8 2 5 8 2 5 8 ` NP Surface ¡feature Label ¡feature NP PP F i,j = s i ` j NP ∧ FirstWord ¡= ¡a NP PP NP … NP ∧ PrevWord ¡= ¡gave NP PP s First ¡= ¡a, Prev ¡= ¡gave, …

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 v He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 s v He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 s one-‑layer ¡NN v He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 s one-‑layer ¡NN 100-‑dim ¡vectors ¡ v (Bansal ¡et ¡al., ¡2014) He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 200-‑dim ¡vector s one-‑layer ¡NN 100-‑dim ¡vectors ¡ v (Bansal ¡et ¡al., ¡2014) He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 s one-‑layer ¡NN v He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 Neural s one-‑layer ¡NN v He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 Neural Sparse s s one-‑layer ¡NN v He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Neural ¡CRF ¡Model NP X NP = W � s ` > score NP PP X X NP PP 2 5 8 2 5 8 Neural+Sparse Neural Sparse s s s one-‑layer ¡NN v He ¡ ¡gave ¡ ¡a ¡ ¡long ¡ ¡speech ¡ ¡on ¡ ¡foreign ¡ ¡policy ¡ ¡. 0 1 2 3 4 5 6 7 8 9

Inference

Inference Just ¡CKY!

Inference Just ¡CKY! … ¡with ¡coarse ¡pruning ¡and ¡caching ¡of ¡neural ¡net ¡opera^ons (Goodman, ¡1997) (Chen ¡and ¡Manning, ¡2014)

Inference Just ¡CKY! … ¡with ¡coarse ¡pruning ¡and ¡caching ¡of ¡neural ¡net ¡opera^ons (Goodman, ¡1997) (Chen ¡and ¡Manning, ¡2014) Roughly ¡2x ¡slower ¡than ¡with ¡sparse ¡features ¡alone

Learning

Learning Just ¡Maximum ¡Likelihood!

Learning Just ¡Maximum ¡Likelihood! … ¡with ¡backpropaga^on ¡through ¡each ¡local ¡neural ¡network

Learning Just ¡Maximum ¡Likelihood! … ¡with ¡backpropaga^on ¡through ¡each ¡local ¡neural ¡network Op^miza^on: ¡Adadelta ¡(Zeiler, ¡2012) ¡worked ¡slightly ¡be5er ¡than ¡ Adagrad ¡(Duchi ¡et ¡al., ¡2011)

Results

Results: ¡English ¡Treebank ¡(Dev) 92 Dev ¡set ¡F 1 ¡all ¡lengths 91 90 90.1 89 Sparse 88 87

Results: ¡English ¡Treebank ¡(Dev) 92 Dev ¡set ¡F 1 ¡all ¡lengths 91 90.4 90 90.1 89 Neural Sparse 88 87

Results: ¡English ¡Treebank ¡(Dev) 92 Dev ¡set ¡F 1 ¡all ¡lengths 91.3 91 90.4 90 90.1 Sparse+ ¡ 89 Neural Neural Sparse 88 87

Results: ¡English ¡Treebank ¡(Dev) 92 Dev ¡set ¡F 1 ¡all ¡lengths 91.3 91 90.4 90 90.2 90.1 Sparse+ ¡ 89 Neural Sparse+ ¡ Neural Sparse Brown 88 87

Word ¡Vectors 92 Dev ¡set ¡F 1 ¡all ¡lengths 91 90.4 90 89 Bansal ¡et ¡al. 88 87

Word ¡Vectors 92 Dev ¡set ¡F 1 ¡all ¡lengths 91 90.4 90 89.6 89 Bansal ¡et ¡al. Collobert ¡and ¡ Weston 88 87

Word ¡Vectors 92 Dependency ¡ context Dev ¡set ¡F 1 ¡all ¡lengths 91 11-‑word ¡ surface ¡context 90.4 90 89.6 89 Bansal ¡et ¡al. Collobert ¡and ¡ Weston 88 87

Word ¡Vectors 92 Dependency ¡ context Dev ¡set ¡F 1 ¡all ¡lengths 91 11-‑word ¡ surface ¡context 90.4 90 89.6 89 Bansal ¡et ¡al. Collobert ¡and ¡ Weston 88 87 ‣ Syntac^c ¡vectors ¡are ¡best ¡for ¡parsing ¡(Bansal ¡et ¡al., ¡2014; ¡Levy ¡and ¡Goldberg, ¡2014)

Word ¡Vectors 92 Dev ¡set ¡F 1 ¡all ¡lengths 91 90.4 90 89.6 89 Bansal ¡et ¡al. Collobert ¡and ¡ Weston 88 87 ‣ Syntac^c ¡vectors ¡are ¡best ¡for ¡parsing ¡(Bansal ¡et ¡al., ¡2014; ¡Levy ¡and ¡Goldberg, ¡2014)

Word ¡Vectors 92 Dev ¡set ¡F 1 ¡all ¡lengths 91 90.4 90 89.6 89 89.0 Bansal ¡et ¡al. Collobert ¡and ¡ word2vec ¡ Weston 88 on ¡PTB 87 ‣ Syntac^c ¡vectors ¡are ¡best ¡for ¡parsing ¡(Bansal ¡et ¡al., ¡2014; ¡Levy ¡and ¡Goldberg, ¡2014)

Word ¡Vectors 92 30M ¡tokens Dev ¡set ¡F 1 ¡all ¡lengths 91 90.4 90 1M ¡tokens 89.6 89 89.0 Bansal ¡et ¡al. Collobert ¡and ¡ word2vec ¡ Weston 88 on ¡PTB 87 ‣ Syntac^c ¡vectors ¡are ¡best ¡for ¡parsing ¡(Bansal ¡et ¡al., ¡2014; ¡Levy ¡and ¡Goldberg, ¡2014)

Word ¡Vectors 92 Dev ¡set ¡F 1 ¡all ¡lengths 91.3 91 90.9 90.4 90 89.6 Sparse+ 89 89.0 Bansal ¡et ¡al. Sparse+ Collobert ¡and ¡ word2vec ¡ Weston 88 on ¡PTB 87 ‣ Syntac^c ¡vectors ¡are ¡best ¡for ¡parsing ¡(Bansal ¡et ¡al., ¡2014; ¡Levy ¡and ¡Goldberg, ¡2014)

Word ¡Vectors 92 Dev ¡set ¡F 1 ¡all ¡lengths 91.3 91 90.9 90.4 90 89.6 Sparse+ 89 89.0 Bansal ¡et ¡al. Sparse+ Collobert ¡and ¡ word2vec ¡ Weston 88 on ¡PTB 87 ‣ Syntac^c ¡vectors ¡are ¡best ¡for ¡parsing ¡(Bansal ¡et ¡al., ¡2014; ¡Levy ¡and ¡Goldberg, ¡2014) ‣ Don’t ¡need ¡huge ¡unlabeled ¡corpora ¡for ¡these ¡methods ¡to ¡be ¡effec^ve

Results: ¡English ¡Treebank ¡(Test) 92 Test ¡set ¡F 1 ¡all ¡lengths 91 91.1 90 Neural+ ¡ Sparse 89 88 87

Results: ¡English ¡Treebank ¡(Test) 92 Test ¡set ¡F 1 ¡all ¡lengths 91 91.1 90 Neural+ ¡ Sparse 89.2 89 Sparse 88 87

Results: ¡English ¡Treebank ¡(Test) 92 Test ¡set ¡F 1 ¡all ¡lengths 91 91.1 90 90.1 Neural+ ¡ Sparse 89.2 89 Berkeley Sparse Petrov+ ¡06 88 87

Results: ¡English ¡Treebank ¡(Test) 92 Test ¡set ¡F 1 ¡all ¡lengths 91 91.1 91.1 90 90.1 Neural+ ¡ CCK Sparse 89.2 89 Carreras+ ¡08 Berkeley Sparse Petrov+ ¡06 88 87

Results: ¡English ¡Treebank ¡(Test) 92 Test ¡set ¡F 1 ¡all ¡lengths 91.3 91 91.1 91.1 90 90.1 Neural+ ¡ CCK ZPar Sparse 89.2 89 Carreras+ ¡08 Zhu+ ¡13 Berkeley Sparse Petrov+ ¡06 88 87

Results: ¡English ¡Treebank ¡(Test) 92 (reranking ¡ Test ¡set ¡F 1 ¡all ¡lengths ensemble) 91.3 91 91.1 91.1 90.4 90 90.1 Neural+ ¡ CCK ZPar Sparse 89.2 89 CVG Carreras+ ¡08 Zhu+ ¡13 Berkeley Socher+ ¡13 Sparse Petrov+ ¡06 88 87

Related ¡Work

Related ¡Work ‣ Transi^on-‑based ¡neural ¡parsers: ¡Henderson ¡(2003), ¡Chen ¡and ¡ Manning ¡(2014)

Related ¡Work ‣ Transi^on-‑based ¡neural ¡parsers: ¡Henderson ¡(2003), ¡Chen ¡and ¡ Manning ¡(2014) ‣ Local ¡decisions ¡only: ¡Belinkov ¡et ¡al. ¡(2014)

Neural CRF Parsing Greg Durre2 and Dan Klein UC Berkeley - PowerPoint PPT Presentation

Neural CRF Parsing Greg Durre2 and Dan Klein UC Berkeley Parsing with CKY Parsing with CKY He gave a long speech on foreign

CT CTC-CRF CRF CRF-based sin ingle-stage acoustic ic modeli ling wit ith CT CTC topology

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Neural CRF Parsing AUTHORS: GREG DURRETT AND DAN KLEIN PRESENTER: YUNDI FEI 1 Overview Based

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Teri Hamelin, CRF Business Director CRF@uml.edu 978.934.6421 Learning with Purpose Learning with

CARES CRF Funding Spending Plan Marvin Odum, City of Houston COVID-19 Response and Recovery

Clinical Research Facility (CRF) TRANSLATIONAL RESEARCH INSTITUTE History of the CRF Took

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Outgrowing Content Types: Building Custom Entities BADCamp

2/10/2016 Assessment of Patients with Chronic Pain and Co-Occurring Substance Use Jon

Circadian Rhythms and Bipolar Disorder Colleen A. McClung, Ph.D. Professor Department of

Generative Adversarial Networks Benjamin Striner 1 1 Carnegie Mellon University April 8, 2019

Su Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

A NEW ALGORITHM FOR THE VARIANTS OF ACD PROBLEM Jung Hee Cheon, Wonhee Cho, Minki Hhan, Minsik

Wireless Networks L ecture 18: Wireless LANs 802.11* Peter Steenkiste CS and ECE, Carnegie

Drupal 7 Entity API Matthew Radcliffe mradcliffe@kosada.com Wednesday, December 7, 2011 Some