Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang - PowerPoint PPT Presentation

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh 12th June, 2016 Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18

Sequential Language Models n � P ( S = w 1 , w 2 , . . . , w n ) = P ( w i | w 1: i − 1 ) (1) i =1 State of the Art based on Long Short Term Memory Network Language Model (Hochreiter and Schmidhuber, 1997; Sundermeyer et al., 2012) Billion word benchmark results reported in Jozefowicz et al., (2016) Models PPL KN5 67.6 LSTM 30.6 LSTM+CNN INPUTS 30.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 2 / 18

Will tree structures help LMs? Zhang et al., 2016 Tree LSTM 12th June, 2016 3 / 18

Will tree structures help LMs? Probably yes LMs based on Constituency Parsing (Chelba and Jelinek, 2000; Roark, 2001; Charniak, 2001) LMs based on Dependency Parsing (Shen et al., 2008; Zhang, 2009; Sennrich, 2015) Zhang et al., 2016 Tree LSTM 12th June, 2016 3 / 18

LSTMs + Dependency Trees = TreeLSTMs + Why? Sentence Length N v.s. Tree Height log ( N ) Zhang et al., 2016 Tree LSTM 12th June, 2016 4 / 18

LSTMs + Dependency Trees = TreeLSTMs + Why? Sentence Length N v.s. Tree Height log ( N ) How? Top-down Generation Breadth-first search reminiscent of Eisner (1996) Zhang et al., 2016 Tree LSTM 12th June, 2016 4 / 18

Generation Process (Unlabeled Trees) The luxury auto manufacturer last year sold 1,214 cars in the U.S. Zhang et al., 2016 Tree LSTM 12th June, 2016 5 / 18

Tree LSTM n � P ( S ) = P ( w i | w 1: i − 1 ) (2) i =1 ⇓ � P ( S | T ) = P ( w |D ( w )) (3) w ∈ BFS( T ) \ root D ( w ) is the Dependency Path of w . D ( w ) is a generated sub-tree. Works on projective and unlabeled dependency trees. Zhang et al., 2016 Tree LSTM 12th June, 2016 6 / 18

Tree LSTM Zhang et al., 2016 Tree LSTM 12th June, 2016 7 / 18

One Limitation of Tree LSTM Zhang et al., 2016 Tree LSTM 12th June, 2016 8 / 18

Left Dependent Tree LSTM Zhang et al., 2016 Tree LSTM 12th June, 2016 9 / 18

Experiments Zhang et al., 2016 Tree LSTM 12th June, 2016 10 / 18

MSR Sentence Completion Challenge Training set: 49 million words (around 2 million sentences) development set: 4000 sentences test set: 1040 completion questions. Zhang et al., 2016 Tree LSTM 12th June, 2016 11 / 18

Zhang et al., 2016 Tree LSTM 12th June, 2016 12 / 18

Dependency Parsing Reranking Rerank 2nd Order MSTParser (McDonald and Pereira, 2006) We train TreeLSTM and LdTreeLSTM as language models. We only use words as input features; POS tags, dependency labels or composition features are not used. Zhang et al., 2016 Tree LSTM 12th June, 2016 13 / 18

Dependency Parsing Reranking NN: Chen & Manning, 2014; S-LSTM: Dyer et al., 2015 Zhang et al., 2016 Tree LSTM 12th June, 2016 14 / 18

Tree Generation Four binary classifiers: Add Left? No! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Right? Yes! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Next Right? No! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Left? Yes! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Next Left? No! Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Four binary classifiers: Add Left? Add Right? Add Next Left? Add Next Right? Features: hidden states and word embeddings Classifiers Accuracies Add-Left 94.3 Add-Right 92.6 Add-Nx-Left 93.4 Add-Nx-Right 96.0 Zhang et al., 2016 Tree LSTM 12th June, 2016 15 / 18

Tree Generation Zhang et al., 2016 Tree LSTM 12th June, 2016 16 / 18

Conclusions Syntax can help language modeling. Predicting tree structures with Neural Networks is possible. Next Steps: Sequence to Tree Models Tree to Tree Models code available: https://github.com/XingxingZhang/td-treelstm Thanks & Questions? Zhang et al., 2016 Tree LSTM 12th June, 2016 17 / 18

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang - PowerPoint PPT Presentation

Top-down Tree Long Short-Term Memory Networks Xingxing Zhang , Liang Lu, Mirella Lapata School of Informatics, University of Edinburgh 12th June, 2016 Zhang et al., 2016 Tree LSTM 12th June, 2016 1 / 18 Sequential Language Models n P (

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks Kai Sheng

An Introduction to Neural Networks Long Short Term Memory (LSTM) and the Attention mechanism Ange

Lecture 23: Recurrent Neural Networks, Long Short Term Memory Networks, Conntectionist Temporal

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Lo Long-short term memory (L (LSTM) Jeong Min Lee CS3750, University of Pittsburgh Outline

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Long/Short-Term Memory Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified.

Chapter 3 - Cognition Types of human memory Short term memory and cognitive processes

Boosting the deep multidimensional long short- term memory network for handwritten recognition

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials

the human sensory, short-term , long-term Information processed and applied

Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks Ha sim

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel

Long-Term Memory Introduction STM versus LTM Episodic Memory Semantic Memory

Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory N AJMEH S

Discrimination between genuine versus fake emotion using long-short term memory with parametric

The Short- and Long-Term Economic Outlook Short-Term Update Jobs: PDX v US 3 3 Job Growth by

Towards Greater International Transparency of Clinical Trials Short Term Efforts for Long Term

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Return To Office Strategy Short-Term Strategy Mid-Term Strategy - Remote Work Long-Term

term Memory? Kowialiewski Benjamin Gorin Simon Majerus Steve Introduction Verbal short-term

Operating Systems CPU Scheduling ENCE 360 Operating System Schedulers Short-Term Long-Term

Deep Learning: multi-layer neural networks Recurrent Neural Networks: sequence data Long