Pointer Networks: Handling variable size output dictionary Outputs - PowerPoint PPT Presentation

Pointer Networks: Handling variable size output dictionary • Outputs are discrete and correspond to positions in the input. Thus, the output "dictionary" varies per example. • Q: Can we think of cases where we need such dynamic size dictionary?

Pointer Networks: Handling Variable Size Output Dictionary

Pointer Networks: Handling Variable Size Output Dictionary (a) Sequence-to-Sequence (b) Ptr-Net

Pointer Networks: Handling Variable Size Output Dictionary • Fixed-Size Dictionary the updated decoder hidden state!, d_i,d’_i are concatenated and feed into a softmax over the fixed size dictionary • Dynamic Dictionary the decoder hidden state is used to selected the location of the input via interaction with the encoder hidden states e_j

Pointer Networks: Handling Variable Size Output Dictionary

Key-variable memory We use similar indexing mechanism to index location in the key variable memory, during decoding, when we know we need to pick an argument, as opposed to function name. All arguments are stored in such memory.

Carnegie Mellon School of Computer Science Language Grounding to Vision and Control Recursive/tree structured networks Katerina Fragkiadaki

From Words to Phrases • We have already discussed word vector representations that "capture the meaning" of word by embedding them into a low- dimensional space where semantic similarity is preserved. • But what about longer phrases? For this lecture, understanding of the meaning of a sentence is representing the phrase as a vector in a structured semantic space, where similar sentences are nearby, and unrelated sentences are far away.

Building on Word Vector Space Models x 2 1 5 5 4 1.1 4 Germany 1 3 3 9 2 France 2 2 Monday 2.5 Tuesday 1 9.5 1.5 0 1 2 3 4 5 6 7 8 9 10 x 1 The country of my birth vs. The place where I was born How can we represent the meaning of longer phrases? By mapping them into the same vector space as words! Slide adapted from Manning-Socher

From Words to Phrases • We have already discussed word vector representations that "capture the meaning" of word by embedding them into a low- dimensional space where semantic similarity is preserved. • But what about longer phrases? For this lecture, understanding of the meaning of a sentence is representing the phrase as a vector in a structured semantic space, where similar sentences are nearby, and unrelated sentences are far away. • Sentence modeling is at the core of many language comprehension tasks sentiment analysis, paraphrase detection, entailment recognition, summarization, discourse analysis, machine translation, grounded language learning and image retrieval

From Words to Phrases • How can we know when larger units of a sentence are similar in meaning? • The snowboarders is leaping over a mogul. • A person on a snowboard jumps into the air. • People interpret the meaning of larger text units - entities, descriptive terms, facts, arguments, stories - by semantic composition of smaller elements. ”A small crowd quietly enters the historical church”. Slide adapted from Manning-Socher

From Words to Phrases • How can we know when larger units of a sentence are similar in meaning? • The snowboarders is leaping over a mogul. • A person on a snowboard jumps into the air. • People interpret the meaning of larger text units - entities, descriptive terms, facts, arguments, stories - by semantic composition of smaller elements. ”A small crowd quietly enters the historical church”.

From Words to Phrases: 4 models • Bag of words: Ignores word order, simple averaging of word vectors in a sub-phrase. Can’t capture differences in meaning as a result of differences in word order, e.g., " cats climb trees" and " trees climb cats" will have the same representation. • Sequence (recurrent) models, e.g., LSTMs: The hidden vector of the last word is the representation of the phrase. • Tree-structured (recursive) models: compose each phrase from its constituent sub-phrases, according to a given syntactic structure over the sentence • Convolutional neural networks Q: Does semantic understanding improve with grammatical understanding so that recursive models are justified?

From Words to Phrases: 4 models • Bag of words: Ignores word order, simple averaging of word vectors in a sub-phrase. Can’t capture differences in meaning as a result of differences in word order, e.g., " cats climb trees" and " trees climb cats" will have the same representation. • Sequence models, e.g., LSTMs: The hidden vector of the last word is the representation of the phrase. • Tree-structured (recursive) models: compose each phrase from its constituent sub-phrases, according to a given syntactic structure over the sentence • Convolutional neural networks Q: Does semantic understanding improve with grammatical understanding so that recursive models are justified?

Recursive Neural Networks Given a tree and vectors for the leaves, compute bottom-up vectors for the intermediate nodes, all the way to the root, via compositional function g.

How should we map phrases into a vector space? Use principle of composi%onality The meaning (vector) of a sentence x 2 is determined by the country of my birth 5 (1) the meanings of its words and the place where I was born 4 (2) the rules that combine them. Germany 3 France Monday 2 Tuesday 1 0 1 2 3 4 5 6 7 8 9 10 x 1 1 5 Models in this sec%on Jointly learn parse trees and 5.5 can jointly learn parse 6.1 compositional vector 1 trees and composi%onal 2.5 representations 3.5 3.8 vector representa%ons 0.4 2.1 7 4 2.3 Parsing with compositional vector 0.3 3.3 7 4.5 3.6 grammars, Socher et al. the country of my birth 12 Slide adapted from Manning-Socher

Constituency Sentence Parsing S VP PP NP NP 9 5 7 8 9 4 1 3 1 5 1 3 The cat sat on the mat. 13 Slide adapted from Manning-Socher

Learn Structure and Representation these are the intermediate concepts between words and 5 S full sentence 4 VP 7 3 8 PP 3 5 NP 2 3 NP 3 9 5 7 8 9 4 1 3 1 5 1 3 The cat sat on the mat. 14

Recursive vs. Recurrent Neural Networks Q: what is the difference in the intermediate concepts they 1 5 build? 5.5 6.1 1 2.5 3.5 3.8 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth 2.5 4.5 1 1 5.5 3.8 3.8 3.5 5 6.1 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth Slide adapted from Manning-Socher

Recursive vs. Recurrent Neural Networks 1 5 5.5 6.1 Recursive neural nets require a 1 2.5 3.5 parser to get tree structure. 3.8 0.4 2.1 7 4 2.3 0.3 3.3 7 4.5 3.6 the country of my birth 2.5 1 1 5.5 4.5 Recurrent neural nets cannot capture 3.8 3.8 3.5 5 6.1 phrases without prefix context and often capture too much of last words in final vector. However, they do not need a parser, 0.4 2.1 7 4 2.3 and they are much preferred in current ch 0.3 3.3 7 4.5 3.6 literature at least. the country of my birth r

Pointer Networks: Handling variable size output dictionary Outputs - PowerPoint PPT Presentation

Pointer Networks: Handling variable size output dictionary Outputs are discrete and correspond to positions in the input. Thus, the output "dictionary" varies per example. Q: Can we think of cases where we need such dynamic

Pointer Basics Lecture 13 COP 3014 Fall 2019 November 7, 2019 What is a Pointer? A pointer

Handling array size limitations Handling array size limitations Issue: array size is fixed

Opaque Pointer Types To a world without pointer to pointer bitcasts Motivation Proximal

Pointer arithmetic arrays only arrays only Pointer arithmetic Can add or subtract an

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

Material Handling Chapter 5 Designing material handling systems Overview of material

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Pointer Variables Chapter 4: (Pointers and) Linked Lists Declaring a variable creates space

Dangling Pointer Dangling Pointer Jonathan Afek, 2/ 8/ 07, BlackHat USA 1 Table of Contents

Pointers and Memory 1 Pointer values Pointer values are memory addresses

Objects and nesting and pointers, oh my! If we have a pointer variable ptr , we access the

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Pointers The Pointer Defined int *x; Read as: declare x as a pointer to a 32-bit integer

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

Information Retrieval Index compression Hamid Beigy Sharif university of technology October 19,

Introduction I Greedy methods: A technique for solving optimization problems Computer Science

Zero-Error Coding with a Generator Set of Variable-Length Words Nicolas Charpenay, Mal le

The Algol Family and ML Lisp Algol 60 Algol 68 John Mitchell Pascal ML Modula Many other

DATA LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University

CSE 143 Class Vector: Interface class Vector { public: Dynamic Memory In Classes Vector ( );

CS 101: Computer Programming and Utilization About These Slides Based on Chapter 22 of the

Whats New with Mediasite Recorders? Mediasite Mediasite Mediasite Mediasite Recorder Pro