Neural Program Synthesis with Priority Queue Training Daniel A. - PowerPoint PPT Presentation

Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le https://arxiv.org/abs/1801.03526

Why Program Synthesis? ● One of the hard AI reasoning domains ● A tool for planning in robotics ● Increased interpretability (human can read code more easily than NN weights)

Deep Reinforcement Learning Value based RL e.g. Q-learning Policy based RL e.g. policy gradient Agent https://becominghuman.ai/the-very-basics-of-reinforcement-learning-154f28a79071

Deep RL for Combinatorial Optimization ● Neural Architecture Search with Reinforcement Learning

Deep RL for Combinatorial Optimization ● Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Deep RL for Combinatorial Optimization ● Neural Combinatorial Optimization with Reinforcement Learning

"Fundamental" Program Synthesis ● Focus on algorithmic coding problems. ● No ground-truth program solutions. ● Simple Turing-complete language.

HelloWorld.bf ++++++++ Set Cell #0 to 8 [ >++++ Add 4 to Cell #1; this will always set Cell #1 to 4 [ as the cell will be cleared by the loop ++++++++ [ >++++ [ >++>+++>+++>+<<<<- ] >+>+>- >++ Add 2 to Cell #2 >+++ Add 3 to Cell #3 >>+ [ < ] <- ] >> . >--- . +++++++ .. +++ . >> . <- . < . ++ >+++ Add 3 to Cell #4 >+ Add 1 to Cell #5 + . ------ . -------- . >>+ . >++ . <<<<- Decrement the loop counter in Cell #1 ] Loop till Cell #1 is zero; number of iterations is 4 >+ Add 1 to Cell #2 >+ Add 1 to Cell #3 >- Subtract 1 from Cell #4 >>+ Add 1 to Cell #6 [<] Move back to the first zero cell you find; this will be Cell #1 which was cleared by the previous loop <- Decrement the loop Counter in Cell #0 ] Loop till Cell #0 is zero; number of iterations is 8 The result of this is: Cell No : 0 1 2 3 4 5 6 Contents: 0 0 72 104 88 32 8 Pointer : ^ >>. Cell #2 has value 72 which is 'H' >---. Subtract 3 from Cell #3 to get 101 which is 'e' +++++++..+++. Likewise for 'llo' from Cell #3 >>. Cell #5 is 32 for the space <-. Subtract 1 from Cell #4 for 87 to give a 'W' <. Cell #3 was set to 'o' from the end of 'Hello' +++.------.--------. Cell #3 for 'rl' and 'd' >>+. Add 1 to Cell #5 gives us an exclamation point https://en.wikipedia.org/wiki/Brainfuck >++. And finally a newline from Cell #6

Anatomy of BF Turing complete! https://esolangs.org/wiki/Brainfuck#Computational_class

BF Execution Demo: Reverse a list

Why BF ● Turing complete, and suitable for any algorithmic task in theory. ● Many algorithms have surprisingly elegant BF implementations. ● No syntax errors (with minor adjustment to interpreter). ● No names (variables, functions).

Training Setup RNN Reward Function Inference code Code inputs Scoring Reward outputs Test BF interpreter Function cases Gradient Update reward

Training Setup: Reward Function Score Test case S(Y, Y*) = d( ∅ , Y*) - d(Y, Y*) input X = [1, 2, 3, 4, 0] d(Y, Y*) = variable length Hamming distance output Y* = [4, 3, 2, 1, 0] program P = ,>,.<. program output P(X) = [2, 1] Y* = [4, 3, 2, 1, 0] Y* = [4, 3, 2, 1, 0] d = ∑ 2 2 B B B d = ∑ B B B B B ∅ = [] Y = [2, 1] Reward = ∑ S(P(I), Y*) base B = 256

Problems with policy gradient (REINFORCE) ● Catastrophic forgetting and unstable learning ● Sample inefficient

Solution: Priority Queue Training (PQT) code sampled from RNN Reward Function rewards Max-Unique Priority Queue training targets

Results

Fixed Length Programs remove <>[,[<+,.],<<]],[[-[+.>>][]>[>[>[[<+>>.+<>]>]<<>]],]>+-++--,>[+[[<----].->+]->]]]-[,.]+>>,-,,-]><,,] reverse ,[[>,<>]]-]+[<[.,++,<]<>[->.,+,[<+]<-]<,,<<>>[[[<+<[],.>->]>,<-]<]<>,-<,,[+>,<,><.[.<-+,+-<]+<[,+-<> add ,>,[-<+>][,],>]<]-<.+,,+,<.,>]>,[><<-,][+-[.[[+<[.>]],>.[]-<,],+,[,->]>>->+,[+[>]-,-]--,.,>+-<<]]<,+ length 100

Synthesized vs "Ground Truth" Synthesized Experimenter's best solution reverse ,[>,]+[,<.] >,[>,]<[.<]. remove ,-[+.,-]+[,.] ,[-[+.[-]],]. count-char ,[-[>]>+<<<<,]>. >,[-[<->[-]]<+>,]<. add ,[+>,<<->],<.,. ,>,<[->+<]>. bool-logic ,+>,<[,>],<+<. ??? print ++++++++.---.+++++++..+++. ++++++++.---.+++++++..+++. zero-cascade ,.,[.>.-<,[[[.+,>+[-.>]..<]>+<<]>+<<]] ,[.>[->+>.<<]>+[-<+>]<<,] cascade ,[.,.[.,.[..,[....,[.....,[.>]<]].]] ,>>+<<[>>[-<+>]<[->+<<.>]>+<<,]. shift-left ,>,[.,]<.>. ,>,[.,]<.,. shift-right ,[>,]<.,<<<<<.[>.] >,[>,]<.[-]<[<]>[.>]. unriffle -[,>,[.,>,]<[>,]<.] >,[>,[.[-]],]<[.<]. remove-last ,>,[<.>>,]. ,>,[[<.[-]>[-<+>]],]. remove-last-two >,<,>>,[<.,[<.[>]],]. ,>,>,[[<<.[-]>[-<+>]>[-<+>]],]. echo-alternating ,[.,>,]<<<<.[>.] >,[.,>,]<<[<]>[.>]. length ,[>+<,]>. >+>,[[<]>+[>],]<[<]>-. echo-second-seq ,[,]-[,.] ,[,],[.,]. echo-nth-seq ,-[->-[,]<]-[,.] ,-[->,[,]<],[.,].

What's next? Scale up to harder coding problems, and more complex programming languages. ● Augment RL with supervised training on a large corpus of programs. ● Give the code synthesizer access to auxiliary information, such as stack traces and program execution internals. ● Data augmentation techniques, such as Hindsight experience replay. ● Few-shot learning techniques can help with generalization issues, e.g. MAML.

Thank you! Questions? Thank you to my coauthors: Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le.

Prior Work ● Algorithm induction ○ Neural Programmer, A. Neelakantan, et al. ○ Neural Programmer-Interpreters, S. Reed, et al. ● Domain specific languages ○ RobustFill, J. Devlin, et al. ○ DeepCoder, M. Balog, et al. ○ TerpreT, A. Gaunt, et al. ● Precursors to PQT ○ Noisy Cross-Entropy Method, I. Szita, et al. ○ Neural Symbolic Machines, C. Liang, et al.

Neural Program Synthesis with Priority Queue Training Daniel A. - PowerPoint PPT Presentation

Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le https://arxiv.org/abs/1801.03526 Why Program Synthesis? One of the hard AI reasoning domains A tool for

Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue.

Priority Queue Queue Enqueue an item Dequeue: Item returned has been in the queue

ADT Queue 1 Queues 2 Queue of cars 3 Queue at logical level A queue is an ADT in which

Priority Queues, Heaps, Graphs, and Sets Priority Queue Queue Enqueue an item

Priority Queues Min Priority Queue Collection of elements. Each element has a priority

Heaps and Priority Queues 2 5 6 9 7 Heaps and Priority Queues 1 Priority Queue ADT (

Keys and Total Order Relations A Priority Queue ranks its elements by key with a total order

ECE 2574: Data Structures and Algorithms - Queue ADT C. L. Wyatt Today we will look at the Queue

ECE 2574: Data Structures and Algorithms - Priority Queue C. L. Wyatt Today we will look a the

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

Queue Mode Scheduling at Subaru Telescope Eric Jeschke Software Division eric@naoj.org Queue

Back of queue detection Edward D. Cox, Indiana DOT 1 Back ck of queue, queue, m many option

Implementing Heaps Bounded Priority Queues Priority queues : Bounded Priority Queue Interface

Priority queues Priority queue ADT Binary heap Heap insertion, removal March 16, 2020 Cinda

Plan 1 Priority areas Key to landscape priority areas Historic core Ornamental Parkland

Heaps Carola Wenk 9/8/17 1 CMPS 2200 Introduction to Algorithms Priority Queue A priority

node2vec: Scalable Feature Learning for Networks Aditya Grover, Jure Leskovec Farzaneh Heidari

15-780 Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel

Heuristic Algorithms for Multiobjective Combinatorial Optimization Adapted from a tutorial by

Efficient Black-Box Combinatorial Optimization Hamid Dadkhahi Karthikeyan Shanmugam Jesus Rios

and Analysis of Decision Trees Mikhail Moshkov King Abdullah University of Science and Technology

Counting the Optimal Solutions in Graphical Models Rina Dechter Radu Marinescu University of

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven

Class2: Constraint Networks Rina Dechter Dbook: chapter 2-3, Constraint book: chapters 2 and 4