neural program synthesis with priority queue training
play

Neural Program Synthesis with Priority Queue Training Daniel A. - PowerPoint PPT Presentation

Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le https://arxiv.org/abs/1801.03526 Why Program Synthesis? One of the hard AI reasoning domains A tool for


  1. Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le https://arxiv.org/abs/1801.03526

  2. Why Program Synthesis? ● One of the hard AI reasoning domains ● A tool for planning in robotics ● Increased interpretability (human can read code more easily than NN weights)

  3. Deep Reinforcement Learning Value based RL e.g. Q-learning Policy based RL e.g. policy gradient Agent https://becominghuman.ai/the-very-basics-of-reinforcement-learning-154f28a79071

  4. Deep RL for Combinatorial Optimization ● Neural Architecture Search with Reinforcement Learning

  5. Deep RL for Combinatorial Optimization ● Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

  6. Deep RL for Combinatorial Optimization ● Neural Combinatorial Optimization with Reinforcement Learning

  7. "Fundamental" Program Synthesis ● Focus on algorithmic coding problems. ● No ground-truth program solutions. ● Simple Turing-complete language.

  8. HelloWorld.bf ++++++++ Set Cell #0 to 8 [ >++++ Add 4 to Cell #1; this will always set Cell #1 to 4 [ as the cell will be cleared by the loop ++++++++ [ >++++ [ >++>+++>+++>+<<<<- ] >+>+>- >++ Add 2 to Cell #2 >+++ Add 3 to Cell #3 >>+ [ < ] <- ] >> . >--- . +++++++ .. +++ . >> . <- . < . ++ >+++ Add 3 to Cell #4 >+ Add 1 to Cell #5 + . ------ . -------- . >>+ . >++ . <<<<- Decrement the loop counter in Cell #1 ] Loop till Cell #1 is zero; number of iterations is 4 >+ Add 1 to Cell #2 >+ Add 1 to Cell #3 >- Subtract 1 from Cell #4 >>+ Add 1 to Cell #6 [<] Move back to the first zero cell you find; this will be Cell #1 which was cleared by the previous loop <- Decrement the loop Counter in Cell #0 ] Loop till Cell #0 is zero; number of iterations is 8 The result of this is: Cell No : 0 1 2 3 4 5 6 Contents: 0 0 72 104 88 32 8 Pointer : ^ >>. Cell #2 has value 72 which is 'H' >---. Subtract 3 from Cell #3 to get 101 which is 'e' +++++++..+++. Likewise for 'llo' from Cell #3 >>. Cell #5 is 32 for the space <-. Subtract 1 from Cell #4 for 87 to give a 'W' <. Cell #3 was set to 'o' from the end of 'Hello' +++.------.--------. Cell #3 for 'rl' and 'd' >>+. Add 1 to Cell #5 gives us an exclamation point https://en.wikipedia.org/wiki/Brainfuck >++. And finally a newline from Cell #6

  9. Anatomy of BF Turing complete! https://esolangs.org/wiki/Brainfuck#Computational_class

  10. BF Execution Demo: Reverse a list

  11. Why BF ● Turing complete, and suitable for any algorithmic task in theory. ● Many algorithms have surprisingly elegant BF implementations. ● No syntax errors (with minor adjustment to interpreter). ● No names (variables, functions).

  12. Training Setup RNN Reward Function Inference code Code inputs Scoring Reward outputs Test BF interpreter Function cases Gradient Update reward

  13. Training Setup: Reward Function Score Test case S(Y, Y*) = d( ∅ , Y*) - d(Y, Y*) input X = [1, 2, 3, 4, 0] d(Y, Y*) = variable length Hamming distance output Y* = [4, 3, 2, 1, 0] program P = ,>,.<. program output P(X) = [2, 1] Y* = [4, 3, 2, 1, 0] Y* = [4, 3, 2, 1, 0] d = ∑ 2 2 B B B d = ∑ B B B B B ∅ = [] Y = [2, 1] Reward = ∑ S(P(I), Y*) base B = 256

  14. Problems with policy gradient (REINFORCE) ● Catastrophic forgetting and unstable learning ● Sample inefficient

  15. Solution: Priority Queue Training (PQT) code sampled from RNN Reward Function rewards Max-Unique Priority Queue training targets

  16. Results

  17. Fixed Length Programs remove <>[,[<+,.],<<]],[[-[+.>>][]>[>[>[[<+>>.+<>]>]<<>]],]>+-++--,>[+[[<----].->+]->]]]-[,.]+>>,-,,-]><,,] reverse ,[[>,<>]]-]+[<[.,++,<]<>[->.,+,[<+]<-]<,,<<>>[[[<+<[],.>->]>,<-]<]<>,-<,,[+>,<,><.[.<-+,+-<]+<[,+-<> add ,>,[-<+>][,],>]<]-<.+,,+,<.,>]>,[><<-,][+-[.[[+<[.>]],>.[]-<,],+,[,->]>>->+,[+[>]-,-]--,.,>+-<<]]<,+ length 100

  18. Synthesized vs "Ground Truth" Synthesized Experimenter's best solution reverse ,[>,]+[,<.] >,[>,]<[.<]. remove ,-[+.,-]+[,.] ,[-[+.[-]],]. count-char ,[-[>]>+<<<<,]>. >,[-[<->[-]]<+>,]<. add ,[+>,<<->],<.,. ,>,<[->+<]>. bool-logic ,+>,<[,>],<+<. ??? print ++++++++.---.+++++++..+++. ++++++++.---.+++++++..+++. zero-cascade ,.,[.>.-<,[[[.+,>+[-.>]..<]>+<<]>+<<]] ,[.>[->+>.<<]>+[-<+>]<<,] cascade ,[.,.[.,.[..,[....,[.....,[.>]<]].]] ,>>+<<[>>[-<+>]<[->+<<.>]>+<<,]. shift-left ,>,[.,]<.>. ,>,[.,]<.,. shift-right ,[>,]<.,<<<<<.[>.] >,[>,]<.[-]<[<]>[.>]. unriffle -[,>,[.,>,]<[>,]<.] >,[>,[.[-]],]<[.<]. remove-last ,>,[<.>>,]. ,>,[[<.[-]>[-<+>]],]. remove-last-two >,<,>>,[<.,[<.[>]],]. ,>,>,[[<<.[-]>[-<+>]>[-<+>]],]. echo-alternating ,[.,>,]<<<<.[>.] >,[.,>,]<<[<]>[.>]. length ,[>+<,]>. >+>,[[<]>+[>],]<[<]>-. echo-second-seq ,[,]-[,.] ,[,],[.,]. echo-nth-seq ,-[->-[,]<]-[,.] ,-[->,[,]<],[.,].

  19. What's next? Scale up to harder coding problems, and more complex programming languages. ● Augment RL with supervised training on a large corpus of programs. ● Give the code synthesizer access to auxiliary information, such as stack traces and program execution internals. ● Data augmentation techniques, such as Hindsight experience replay. ● Few-shot learning techniques can help with generalization issues, e.g. MAML.

  20. Thank you! Questions? Thank you to my coauthors: Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le.

  21. Prior Work ● Algorithm induction ○ Neural Programmer, A. Neelakantan, et al. ○ Neural Programmer-Interpreters, S. Reed, et al. ● Domain specific languages ○ RobustFill, J. Devlin, et al. ○ DeepCoder, M. Balog, et al. ○ TerpreT, A. Gaunt, et al. ● Precursors to PQT ○ Noisy Cross-Entropy Method, I. Szita, et al. ○ Neural Symbolic Machines, C. Liang, et al.

Recommend


More recommend