Marrying Dynamic Programming with Recurrent Neural Networks I eat - PowerPoint PPT Presentation

Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark

Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University James Cross Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark

Structured Prediction is Hard! 2

Not Easy for Humans Either... (structural ambiguity :-P) 3

Not Even Easy for Nature! • prion: “misfolded protein” • structural ambiguity for the same amino-acid sequence • similar to different interpretations under different contexts • causes mad-cow diseases etc. 4

Case Study: Parsing and Folding • both problems have exponentially large search space • both can be modeled by grammars (context-free & above) • question 1: how to search for the highest-scoring structure? • question 2: how to make gold structure score the highest? I eat sushi with tuna from Japan 5

Solutions to Search and Learning • question 1: how to search for the highest-scoring structure? • answer: dynamic programming to factor search space • question 2: how to make gold structure score the highest? • answer: neural nets to automate feature engineering • But do DP and neural nets like each other?? I eat sushi with tuna from Japan 6

In this talk... • Background • Dynamic Programming for Incremental Parsing • Features: from sparse to neural to recurrent neural nets • Bidirectional RNNs: minimal features; no tree structures! • dependency parsing (Kiperwaser+Goldberg, 2016, Cross+Huang, 2016a) • span-based constituency parsing (Cross+Huang, 2016b) • Marrying DP & RNNs (mostly not my work!) • transition-based dependency parsing (Shi et al, EMNLP 2017) • minimal span-based constituency parsing (Stern et al, ACL 2017) 7

Spectrum: Neural Incremental Parsing edge-factored constituency (McDonald+ 05a) dependency DP incremental parsing bottom-up (Huang+Sagae 10, Kuhlmann+ 11) Feedforward NNs (Chen + Manning 14) Stack LSTM biRNN graph-based biRNN dependency dependency (Dyer+ 15) (Kiperwaser+Goldberg 16; ( Kiperwaser+Goldberg 16; Cross+Huang 16a) Wang+Chang 16) RNNG biRNN span-based (Dyer+ 16) constituency minimal span-based (Cross+Huang 16b) constituency (Stern+ ACL 17) minimal dependency (Shi+ EMNLP 17) all tree info minimal or no tree info (summarize output y ) (summarize input x ) enables fast DP fastest DP: O ( n 3 ) DP impossible enables slow DP 8

Incremental Parsing with Dynamic Programming (Huang & Sagae, ACL 2010 * ; Kuhlmann et al., ACL 2011; Mi & Huang, ACL 2015) * best paper nominee

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - eat sushi with ... I 1 shift Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - eat sushi with ... I 1 shift sushi with tuna ... I eat 2 shift Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - eat sushi with ... I 1 shift sushi with tuna ... I eat 2 shift eat 3 l-reduce sushi with tuna ... I Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - eat sushi with ... I 1 shift sushi with tuna ... I eat 2 shift eat 3 l-reduce sushi with tuna ... I 4 shift eat sushi with tuna from ... I Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - eat sushi with ... I 1 shift sushi with tuna ... I eat 2 shift eat 3 l-reduce sushi with tuna ... I 4 shift eat sushi with tuna from ... I 5a r-reduce eat with tuna from ... sushi I Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - eat sushi with ... I 1 shift sushi with tuna ... I eat 2 shift eat 3 l-reduce sushi with tuna ... I 4 shift eat sushi with tuna from ... I 5a r-reduce eat with tuna from ... sushi I 5b shift tuna from Japan ... eat sushi with I Liang Huang (Oregon State) 10

Incremental Parsing (Shift-Reduce) I eat sushi with tuna from Japan in a restaurant action stack queue I eat sushi ... 0 - eat sushi with ... I 1 shift sushi with tuna ... I eat 2 shift eat 3 l-reduce sushi with tuna ... I 4 shift eat sushi with tuna from ... shift-reduce I 5a r-reduce conflict eat with tuna from ... sushi I 5b shift tuna from Japan ... eat sushi with I Liang Huang (Oregon State) 10

Greedy Search • each state => three new states (shift, l-reduce, r-reduce) • greedy search: always pick the best next state • “best” is defined by a score learned from data sh l-re r-re Liang Huang (Oregon State) 11

Greedy Search • each state => three new states (shift, l-reduce, r-reduce) • greedy search: always pick the best next state • “best” is defined by a score learned from data Liang Huang (Oregon State) 12

Beam Search • each state => three new states (shift, l-reduce, r-reduce) • beam search: always keep top- b states • still just a tiny fraction of the whole search space Liang Huang (Oregon State) 13

Beam Search • each state => three new states (shift, l-reduce, r-reduce) • beam search: always keep top- b states • still just a tiny fraction of the whole search space psycholinguistic evidence: parallelism (Fodor et al, 1974; Gibson, 1991) Liang Huang (Oregon State) 13

Dynamic Programming • each state => three new states (shift, l-reduce, r-reduce) • key idea of DP: share common subproblems • merge equivalent states => polynomial space Liang Huang (Oregon State) 14 (Huang and Sagae, 2010)

Dynamic Programming • each state => three new states (shift, l-reduce, r-reduce) • key idea of DP: share common subproblems • merge equivalent states => polynomial space each DP state corresponds to exponentially many non-DP states graph-structured stack (Tomita, 1986) Liang Huang (Oregon State) 16 (Huang and Sagae, 2010)

Dynamic Programming • each state => three new states (shift, l-reduce, r-reduce) • key idea of DP: share common subproblems • merge equivalent states => polynomial space each DP state corresponds to 10 10 DP: exponential exponentially many non-DP states 10 8 10 6 10 4 10 2 non-DP beam search 10 0 0 10 20 30 40 50 60 70 sentence length Liang Huang (Oregon State) 17 (Huang and Sagae, 2010)

Marrying Dynamic Programming with Recurrent Neural Networks I eat - PowerPoint PPT Presentation

Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying Dynamic Programming with Recurrent

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Marrying for Money: Evidence from the First Wave of Married Womens Property Laws in the U.S.

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Recurrent Neural Network Agenda Recurrent Neural Network

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

IN5550 Neural Methods in Natural Language Processing Applications of Recurrent Neural Networks

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

WebEx Quick Reference 1 Welcome to todays call! Please chat questions to All

CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun

Observations on CBOs scoring of health proposals David M. Cutler Harvard University

Google doc row 18 OPHTHALMOLOGY S ACCADIC PERFORMANCE CAN BE AFFECTED BY : 1. executive function

Kickstart your Application! Webinar No. 3: Community Need and Community Engagement June 18, 2020

Data Mining and Matrices 12 Probabilistic Matrix Factorization Rainer Gemulla, Pauli

Lecture 21/Chapter 18 Certainty effect These phenomena Pseudocertainty effect When Intuition

PROJECT ADVISORY COMMITTEE (PAC) Thursday, April 4, 2019 9:00 am - 12:00 pm Hilton Garden Inn

Marrying Dynamic Programming with Recurrent Neural Networks I eat - PowerPoint PPT Presentation

Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying Dynamic Programming with Recurrent

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Marrying for Money: Evidence from the First Wave of Married Womens Property Laws in the U.S.

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Recurrent Neural Network Agenda Recurrent Neural Network

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

IN5550 Neural Methods in Natural Language Processing Applications of Recurrent Neural Networks

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

WebEx Quick Reference 1 Welcome to todays call! Please chat questions to All

CS6220: DATA MINING TECHNIQUES Chapter 8&amp;9: Classification: Part 3 Instructor: Yizhou Sun

Observations on CBOs scoring of health proposals David M. Cutler Harvard University

Google doc row 18 OPHTHALMOLOGY S ACCADIC PERFORMANCE CAN BE AFFECTED BY : 1. executive function

Kickstart your Application! Webinar No. 3: Community Need and Community Engagement June 18, 2020

Data Mining and Matrices 12 Probabilistic Matrix Factorization Rainer Gemulla, Pauli

Lecture 21/Chapter 18 Certainty effect These phenomena Pseudocertainty effect When Intuition

PROJECT ADVISORY COMMITTEE (PAC) Thursday, April 4, 2019 9:00 am - 12:00 pm Hilton Garden Inn

CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun