Improved Semantic Representations From Tree-Structured Long - PowerPoint PPT Presentation

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning Daniel Perez � tuvistavie CTO @ Claude Tech M2 @ The University of Tokyo October 2, 2017

Distributed representation of words Idea Encode each word using a vector in R d , such that words with similar meanings are close in the vector space. 2

Representing sentences Limitation Good representation of words is not enough to represent sentences The man driving the aircraft is speaking. vs The pilot is making an announce. 3

Recurrent Neural Networks Idea Add state to the neural network by reusing the last output as an input of the model 4

Basic RNN cell In a plain RNN, h t is computed as follow h t = tanh( Wx t + Uh t − 1 + b ) given, g ( x t , h t − 1 ) = Wx t + Uh t − 1 + b , 5

Basic RNN cell In a plain RNN, h t is computed as follow h t = tanh( Wx t + Uh t − 1 + b ) given, g ( x t , h t − 1 ) = Wx t + Uh t − 1 + b , Issue Because of vanishing gradients, gradients do not propagate well through the network: impossible to learn long-term dependencies 5

Long short-term memory (LSTM) Goal Improve RNN architecture to learn long term dependencies Main ideas • Add a memory cell which does not suffer vanishing gradient • Use gating to control how information propagates 6

LSTM cell Given g n ( x t , h t − 1 ) = W ( n ) x t + U ( n ) h t − 1 + b ( n ) 7

Structure of sentences Sentences are not a simple linear sequence. The man driving the aircraft is speaking. 8

Structure of sentences Sentences are not a simple linear sequence. The man driving the aircraft is speaking. Constituency tree 8

Structure of sentences Sentences are not a simple linear sequence. The man driving the aircraft is speaking. Dependency tree 8

Tree-structured LSTMs Goal Improve encoding of sentences by using their structures Models • Child-sum tree LSTM Sums over all the children of a node: can be used for any number of children • N-ary tree LSTM Use different parameters for each node: better granularity, but maximum number of children per node must be fixed 9

Child-sum tree LSTM Children outputs and memory cells are summed Child-sum tree LSTM at node j with children k 1 and k 2 10

Child-sum tree LSTM Properties • Does not take into account children order • Works with variable number of children • Shares gates weight (including forget gate) between children Application Dependency Tree-LSTM: number of dependents is variable 11

N-ary tree LSTM Given g ( n ) l =1 U ( n ) k ( x t , h l 1 , · · · , h l N ) = W ( n ) x t + � N kl h jl + b ( n ) Binary tree LSTM at node j with children k 1 and k 2 12

N -ary tree LSTM Properties • Each node must have at most N children • Fine-grained control on how information propagates • Forget gate can be parameterized so that siblings affect each other Application Constituency Tree-LSTM: using a binary tree LSTM 13

Sentiment classification Task Predict sentiment ˆ y j of node j Sub-tasks • Binary classification • Fine-grained classification over 5 classes Method • Annotation at node level • Uses negative log-likelihood error � W ( s ) h j + b ( s ) � p θ ( y |{ x } j ) = softmax ˆ y j = arg max ˆ p θ ( y |{ x } j ) ˆ y 14

Sentiment classification results Constituency Tree-LSTM performs best on fine-grained sub-task Method Fine-grained Binary CNN-multichannel 47.4 88.1 LSTM 46.4 84.9 Bidirectional LSTM 49.1 87.5 2-layer Bidirectional LSTM 48.5 87.2 Dependency Tree-LSTM 48.4 85.7 Constituency Tree-LSTM - randomly initialized vectors 43.9 82.0 - Glove vectors, fixed 49.7 87.5 - Glove vectors, tuned 51.0 88.0 15

Semantic relatedness Task Predict similarity score in [1 , K ] between two sentences Method Similarity between sentences L and R annotated with score ∈ [1 , 5] • Produce representations h L and h R • Compute distance h + and angle h × between h L and h R • Compute score using fully connected NN � W ( × ) h × + W (+) h + + b ( h ) � h s = σ � W ( p ) h s + b ( p ) � p θ = softmax ˆ y = r T ˆ ˆ r = [1 , 2 , 3 , 4 , 5] p θ • Error is computed using KL-divergence 16

Semantic relatedness results Dependency Tree-LSTM performs best for all measures Method Pearson’s r MSE LSTM 0.8528 0.2831 Bidirectional LSTM 0.8567 0.2736 2-layer Bidirectional LSTM 0.8558 0.2762 Constituency Tree-LSTM 0.8582 0.2734 Dependency Tree-LSTM 0.8676 0.2532 17

Summary • Tree-LSTMs allow to encode tree topologies • Can be used to encode sentences parse trees • Can capture longer and more fine-grained words dependencies 18

References Christopher Olah. Understanding lstm networks, 2015. Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. 2015. 19

Improved Semantic Representations From Tree-Structured Long - PowerPoint PPT Presentation

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning Daniel Perez tuvistavie CTO @ Claude Tech M2 @ The University of Tokyo October 2, 2017

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks Kai Sheng

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

INDEXING - 1 Tree-Structured Indices Tree-structured indexing techniques support both

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

(XML from Chapter 20 of text) Outline Why Structured Data? Types of Structured Data

Announcements Final Examples Tree-Structured Data def tree(label, branches=[]): A tree can

CAS CS 460/660 Introduction to Database Systems Tree Based Indexing: B+-tree Slides from UC

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation

Improved pythonDEVS Simulator Improved pythonDEVS Simulator Improved pythonDEVS Simulator

61A Lecture 16 Announcements String Representations String Representations 4 String

GESIS Survey Guidelines Timo Lenzner and Natalja Menold These slides are based on the GESIS

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural Networks Introduction The

Recurrent Neural Networks + LSTMs + Attention Surag Nair (based on slides by Xavier

LSTM M Based sed Ada dapt ptive ive Fil ilterin ering g for r Redu duced ced Pre redi

PELOTON THE SELF-DRIVING DBMS 2008 5,000 txn/sec H-Store: A High-Performance, Distributed Main

Recurrent Neural Networks Luke Zettlemoyer (Slides adapted from Danqi Chen, Chris Manning,

Student Success ICTCM 2020 Diane Hollister Diane.Hollister@pearson.com Presentation Title Arial