Code Completion with Neural Attention and Pointer Networks Jian - PowerPoint PPT Presentation

Code Completion with Neural Attention and Pointer Networks Jian Li, Yue Wang, Irwin King, and Michael R. Lyu The Chinese University of Hong Kong Presented by Ondrej Skopek

Goal: Predict out-of-vocabulary words using local context (illustrative image) 2 Credits: van Kooten, P. neural_complete. https://github.com/kootenpv/neural_complete. (2017).

Pointer mixture networks Mixture Pointer network Attention Joint RNN 3 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Outline ● Recurrent neural networks ● Attention Pointer networks ● Data representation ● ● Pointer mixture network ● Experimental evaluation ● Summary 4

Recurrent neural networks 5 Credits: Olah, C. Understanding LSTM Networks. colah’s blog (2015).

Recurrent neural networks – unrolling 6 Credits: Olah, C. Understanding LSTM Networks. colah’s blog (2015).

Long Short-term Memory Forget gate Cell state Hidden state New memory Output gate generation Credits: Hochreiter, S. & Schmidhuber, J. Long Short-term Memory. Neural Computation 9, 1735–1780 (1997). 7 Olah, C. Understanding LSTM Networks. colah’s blog (2015).

Recurrent neural networks – long-term dependencies 8 Credits: Olah, C. Understanding LSTM Networks. colah’s blog (2015).

Attention ● Choose which context to look at when predicting ● Overcome the hidden state bottleneck 9 Credits: Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. (2014).

Attention (cont.) Credits: QI, X. Seq2seq. https://xiandong79.github.io/seq2seq- 基础知识 . (2017). 10 Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. (2014).

Pointer networks 11 Credits: Vinyals, O., Fortunato, M. & Jaitly, N. Pointer Networks. (2015).

Pointer networks (cont.) ● Based on Attention Softmax over a dictionary of inputs ● ● Output models a conditional distribution of the next output token Credits: Vinyals, O., Fortunato, M. & Jaitly, N. Pointer Networks. (2015). 12 Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. (2014).

Data representation ● Corpus of Abstract Syntax Trees (ASTs) ○ Parsed using a context-free grammar ● Each node has a type and a value ( type:value ) ○ Non-leaf value: EMPTY , unknown value: UNK , end of program: EOF ● Task: Code completion ○ Predict the “next” node ○ Two separate tasks (type and value) ● Serialized to use sequential models ○ In-order depth-first search + 2 bits of information on children/siblings ● Task after serialization: Given a sequence of words, predict the next one 14 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Pointer mixture networks Mixture Pointer network Attention Joint RNN 15 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

RNN with adapted Attention ● Intermediate goal ○ Produce two distributions at time t RNN with Attention (fixed unrolling) ● ○ L – input window size (L = 50) V – vocabulary size (differs) ○ ○ k – size of hidden state (k = 1500) Credits: Vinyals, O., Fortunato, M. & Jaitly, N. Pointer Networks. (2015). 16 Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. (2014).

Attention & Pointer components ● Attention for the “decoder” ● Pointer network Condition on both the hidden state Reuses Attention outputs ○ ○ and context vector Credits: Vinyals, O., Fortunato, M. & Jaitly, N. Pointer Networks. (2015). 17 Bahdanau, D., Cho, K. & Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. (2014).

Mixture component ● Combine the two distributions into one Using ● where 18 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Experimental evaluation Data Model & training parameters JavaScript and Python datasets Single-layer LSTM, unrolling length 50 ● ● ○ http://plml.ethz.ch ● Hidden unit size 1500 Each program divided into segments of 50 ● Forget gate biases initialized to 1 ● consecutive tokens ● Cross-entropy loss function ○ Last segment padded with EOF Adam optimizer (learning rate 0.001 + ● ● AST data as described beforehand decay) ○ Type embedding (300 dimensions) Gradient clipping (L2 norm [0, 5]) ● ○ Value embedding (1200 dimensions) ● Batch size 128 No unknown word problem for types! ● 8 epochs ● ● Trainiable initial states ○ Initialized to 0 ○ All other parameters ~ Unif([-0.05, 0.05]) 20 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Experimental evaluation (cont.) Training conditions Labels Hidden state reset to trainable initial state Vocabulary: K most frequent words ● ● only if segment from a different program, ● If in vocabulary: word ID otherwise last hidden state reused If in attention window: label it as the last ● ● If label UNK , set loss to 0 during training attention position During training and test, UNK prediction ○ If not, labeled as UNK ● considered incorrect 21 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Comparison to other results 22 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Example result 23 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Summary ● Applied neural language models to code completion ● Demonstrated the effectiveness of the Attention mechanism Proposed a Pointer Mixture Network to deal with the out-of-vocabulary values ● Future work ● Encode more static type information ● Combine the two distributions in a different way Use both backward and forward context to predict the given node ● Attempt to learn longer dependencies for out-of-vocabulary values (L>50) ● 24 Credits: Li, J., Wang, Y., King, I. & Lyu, M. R. Code Completion with Neural Attention and Pointer Networks. (2017).

Thank you for your attention!

Code Completion with Neural Attention and Pointer Networks Jian - PowerPoint PPT Presentation

Code Completion with Neural Attention and Pointer Networks Jian Li, Yue Wang, Irwin King, and Michael R. Lyu The Chinese University of Hong Kong Presented by Ondrej Skopek Goal: Predict out-of-vocabulary words using local context

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Opaque Pointer Types To a world without pointer to pointer bitcasts Motivation Proximal

Pointer arithmetic arrays only arrays only Pointer arithmetic Can add or subtract an

Pointer Basics Lecture 13 COP 3014 Fall 2019 November 7, 2019 What is a Pointer? A pointer

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Dangling Pointer Dangling Pointer Jonathan Afek, 2/ 8/ 07, BlackHat USA 1 Table of Contents

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

Pointers and Memory 1 Pointer values Pointer values are memory addresses

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush

Deep Learning Lab Paulo Rauber paulo@idsia.ch Imanol Schlag imanol@idsia.ch Aleksandar Stanic

Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee Howes, Paul H.J. Kelly (Imperial);

NP-completeness Evgenij Thorstensen V18 Evgenij Thorstensen NP-completeness V18 1 / 24 Recap

Advanced Machine Learning Convolutional Neural Networks Amit Sethi Electrical Engineering, IIT

Formalizing Turing Machines Andrea Asperti & Wilmer Ricciotti Department of Computer Science,

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Felix Hutchison Milda Zizyte Game physics is hard Even when your physics engine is good.

DAPHNE in Run 2B Rory Fitzpatrick, Matt Toups ICEBERG PD Meeting September 30, 2019 1

Code Completion with Neural Attention and Pointer Networks Jian - PowerPoint PPT Presentation

Code Completion with Neural Attention and Pointer Networks Jian Li, Yue Wang, Irwin King, and Michael R. Lyu The Chinese University of Hong Kong Presented by Ondrej Skopek Goal: Predict out-of-vocabulary words using local context

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Opaque Pointer Types To a world without pointer to pointer bitcasts Motivation Proximal

Pointer arithmetic arrays only arrays only Pointer arithmetic Can add or subtract an

Pointer Basics Lecture 13 COP 3014 Fall 2019 November 7, 2019 What is a Pointer? A pointer

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Dangling Pointer Dangling Pointer Jonathan Afek, 2/ 8/ 07, BlackHat USA 1 Table of Contents

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

Pointers and Memory 1 Pointer values Pointer values are memory addresses

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush

Deep Learning Lab Paulo Rauber paulo@idsia.ch Imanol Schlag imanol@idsia.ch Aleksandar Stanic

Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee Howes, Paul H.J. Kelly (Imperial);

NP-completeness Evgenij Thorstensen V18 Evgenij Thorstensen NP-completeness V18 1 / 24 Recap

Advanced Machine Learning Convolutional Neural Networks Amit Sethi Electrical Engineering, IIT

Formalizing Turing Machines Andrea Asperti &amp; Wilmer Ricciotti Department of Computer Science,

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Felix Hutchison Milda Zizyte Game physics is hard Even when your physics engine is good.

DAPHNE in Run 2B Rory Fitzpatrick, Matt Toups ICEBERG PD Meeting September 30, 2019 1

Formalizing Turing Machines Andrea Asperti & Wilmer Ricciotti Department of Computer Science,