programming with big code
play

Programming with Big Code: Lessons, Techniques, Applications Pavol - PowerPoint PPT Presentation

Programming with Big Code: Lessons, Techniques, Applications Pavol Bielik , Veselin Raychev, Martin Vechev Department of Computer Science ETH Zurich Work @ ETH Zurich Work on Big Code started a few years ago Prof. Prof. Veselin


  1. Programming with “Big Code”: Lessons, Techniques, Applications Pavol Bielik , Veselin Raychev, Martin Vechev Department of Computer Science ETH Zurich

  2. Work @ ETH Zurich Work on “Big Code” started a few years ago Prof. Prof. Veselin Pavol Svetoslav Christine Pascal Martin Andreas Raychev Bielik Karaivanov Zeller Roos Vechev Krause Code Completion with Statistical Language Models, PLDI 2014 Machine Translation for Programming Languages, Onward 2014 Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback Generation for Programs, ETH TR Programming with Big Code: Lessons, Techniques and Applications, SNAPL 2015

  3. Applications [PLDI 14] SLANG: Code Completion Intent i = new Intent(); ? ctx.sendBroadcast(i); All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

  4. Applications [PLDI 14] [Onward 14] SLANG: Code Completion Programming Language Translation Intent i = new Intent(); P ( Java | C# ) ? P ( C# | Java ) P ( Java ) ctx.sendBroadcast(i); All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

  5. Applications [PLDI 14] [Onward 14] SLANG: Code Completion Programming Language Translation Intent i = new Intent(); P ( Java | C# ) ? P ( C# | Java ) P ( Java ) ctx.sendBroadcast(i); [submitted] Statistical Feedback Generation likely error ... for x in range(a): print a[x] All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

  6. Applications [PLDI 14] [Onward 14] SLANG: Code Completion Programming Language Translation Intent i = new Intent(); P ( Java | C# ) ? P ( C# | Java ) P ( Java ) ctx.sendBroadcast(i); [POPL 15] [submitted] JSNice: Deobfuscation Statistical Feedback Generation Type Prediction likely error ... for x in range(a): print a[x] All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

  7. Probabilistic Programming Systems: Dimensions Applications Intermediate Representation Analyze Program (PL) Train Model (ML) Query Model (ML)

  8. Probabilistic Programming Systems: Dimensions What is a generic metric for code? Applications ✔ Cross Entropy → ✗ Code Completion ✔ BLEU Score ✗ Program Translation Intermediate → Representation Traditional metrics might not be indicative of client performance Analyze Program (PL) Train Model (ML) Query Model (ML)

  9. Probabilistic Programming Systems: Dimensions What is the best program representation? Applications Intermediate Representation Analyze Program (PL) Train Model (ML) Query Model (ML)

  10. Probabilistic Programming Systems: Dimensions What is the best program representation? Applications Sequences Trees Intermediate Representation = req → {<open, 0>, <send, 0>} a + source → {..., <open, 2>} Analyze Program x y (PL) Graphical Models Feature Vectors Train Model (ML) req → (0,0,1,1,0) source → (1,0,0,0,0) ... Query Model (ML)

  11. Probabilistic Programming Systems: Dimensions What is the best program representation? Applications Intermediate Choosing the right representation is crucial Representation Feedback Generation: Sequence representations Analyze Program (PL) 46.4% Allamanis et. al. [2013] 50.8% Hsiao et. al. [2014] Train Model (ML) 75.3% Incorporate semantic information 86.3% Incorporate dataflow analysis Query Model (ML)

  12. Probabilistic Programming Systems: Dimensions How to extract program representation? Applications SLANG (APIs): alias and typestate analysis JSNice (Variable Names): scope and alias analysis Intermediate Feedback Generation: alias, control-flow and typestate analysis Representation Analyze Program (PL) req.open("GET", source, false ); Train Model (ML) req → {<open, 0>, <send, 0>} source → {..., <open, 2>} Query Model (ML)

  13. Probabilistic Programming Systems: Dimensions How to extract program representation? Applications SLANG (APIs): alias and typestate analysis JSNice (Variable Names): scope and alias analysis Intermediate Feedback Generation: alias, control-flow and typestate analysis Representation Design scalable yet precise enough algorithms Analyze Program (PL) [Precision vs % of data used] 1 no alias analysis Train Model with alias analysis (ML) 0.5 Query Model 0 (ML) 1% 10% 100%

  14. Probabilistic Programming Systems: Dimensions What is the suitable probabilistic model? Applications N-gram language model Probabilistic context-free grammars Intermediate Neural networks Representation (Structured) Support vector machine Analyze Program Conditional Random Fields (PL) ... Train Model (ML) Query Model (ML)

  15. Probabilistic Programming Systems: Dimensions What is the suitable probabilistic model? Applications N-gram language model Probabilistic context-free grammars Intermediate Neural networks Representation (Structured) Support vector machine Analyze Program Conditional Random Fields (PL) ... Structured prediction is critical Train Model 25.3% (ML) Baseline 54.1% Independent Query Model 63.4% (ML) Structured

  16. Programming with “Big Code” Translation Program synthesis Code completion Applications Deobfuscation Feedback generation Graphical Models Intermediate Sequences (sentences) Translation Table Representation Feature Vectors Trees Analyze Program control-flow analysis alias analysis (PL) scope analysis typestate analysis Train Model Neural Networks SVM Structured SVM (ML) N-gram language model argmax P(y | x) Query Model y ∈ Ω

  17. Programming with “Big Code” Translation Program synthesis Code completion Applications Deobfuscation Feedback generation Graphical Models Intermediate Sequences (sentences) Translation Table Representation Feature Vectors Trees Analyze Program control-flow analysis alias analysis (PL) scope analysis typestate analysis Train Model Neural Networks SVM Structured SVM (ML) N-gram language model argmax P(y | x) Greedy Query Model MAP Inference y ∈ Ω http://www.nice2predict.org/ More information and tutorials at: http://www.srl.inf.ethz.ch/spas.php

  18. General framework http://www.nice2predict.org/ We have open-sourced our prediction engine and we are extending it with new capabilities Upcoming PLDI’15 tutorial

  19. Programming with “Big Code” Translation Program synthesis Code completion Applications Deobfuscation Feedback generation Graphical Models Intermediate Sequences (sentences) Translation Table Representation Feature Vectors Trees Analyze Program control-flow analysis alias analysis (PL) scope analysis typestate analysis Train Model Neural Networks SVM Structured SVM (ML) N-gram language model argmax P(y | x) Greedy Query Model MAP Inference y ∈ Ω http://www.nice2predict.org/ More information and tutorials at: http://www.srl.inf.ethz.ch/spas.php

Recommend


More recommend