Programming Languages and Machine Learning Martin Vechev DeepCode.ai and ETH Zurich
PL Research: Last 10 years (sample) • (Semi-) Automated Program Synthesis • Mostly learning functions/algorithms over discrete spaces (from examples, natural language, components, partial specs, etc) • Automated Symbolic Reasoning • Abstract Interpretation = theory of sound & precise approximation • SMT solvers • Approximate/Probabilistic Programming • Applications/Analysis/Synthesis
Two part talk ( 22 + 3) Learning-based Programming Engines PSI: Exact Solver for Probabilistic Programs SLANG Deep3 1. Pick a structure of interest, e.g., trees: def main() { p := Uniform(0,1); r := [1,1,0,1,0]; TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond for i in [0..r.len] { MoveOp ::= Up, Left, Right, DownFirst, DownLast, 2. Define a DSL for expressing functions: NextDFS, PrevDFS, NextLeaf, PrevLeaf, observe( PrevNodeType, PrevNodeValue, (can be Turing complete) Bernoulli (p) == r[i])); WriteOp ::= WriteValue, WriteType, WritePos } f best = argmin cost( D , f ) 3. Synthesize f best ∊ DSL from Dataset D: return p; f ∊ DSL } f best 𝜹 4. Use f best on new structures: ( ) http://plml.ethz.ch http://psisolver.org
Two part talk ( 22 + 3) Learning-based Programming Engines SLANG Deep3 1. Pick a structure of interest, e.g., trees: TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond MoveOp ::= Up, Left, Right, DownFirst, DownLast, 2. Define a DSL for expressing functions: NextDFS, PrevDFS, NextLeaf, PrevLeaf, PrevNodeType, PrevNodeValue, (can be Turing complete) WriteOp ::= WriteValue, WriteType, WritePos f best = argmin cost( D , f ) 3. Synthesize f best ∊ DSL from Dataset D: f ∊ DSL f best 𝜹 4. Use f best on new structures: ( ) http://plml.ethz.ch
Probabilistic Learning from Code Task Statistical Engine Solution probabilistic PL + ML model number of 15 million repositories repositories Billions of lines of code High quality, tested, maintained programs last 5 years
Probabilistic Learning from Code Probabilistically likely solutions to problems hard to solve otherwise Joint work with : Svetoslav Pascal Benjamin Timon Petar Andreas Pavol Mateo Veselin Christine Karaivanov Roos Bischel Gehr Tsankov Raychev Krause Bielik Zeller Panzacchi Publications Statistical Engines Program Synthesis for Char. Level Language Modeling, ICLR’17 sub • apk-deguard.com Learning a Static Analyzer from Data, https://arxiv.org/abs/1611.01752 • Statistical Deobfuscation of Android Applications, ACM CCS’16 • DEEP3 Probabilistic Mode for Code with Decision Trees, ACM OOPSLA’16 • jsnice.org PHOG: Probabilistic Mode for Code, ACM ICML’16 • Learning Programs from Noisy Data, ACM POPL ’16 • Predicting Program Properties from “Big Code”, ACM POPL ’15 • nice2predict.org Code Completion with Statistical Language Models, ACM PLDI’14 • SLANG Machine Translation for Programming Languages, ACM Onward’14 • more: http://plml.ethz.ch
JSNice.org Every country ~200,000 users Top ranked tool
A Key Question Data Learning Model Probabilis Pr ilistic ic Mod odel el Widel dely Effic icie ient nt Hig igh h Ex Expl plainabl ble e Appl pplicabl ble Learning ning Preci Pr ecision on Pr Predi edict ction ons
Training dataset D f.open(“f2” | “r”); query: f.read(); f.open(“file” | “r”); f.open(“f2” | “w”); f. ? f.write(“c”); f.open(“f1” | “r”); f.read();
Training dataset D f.open(“f2” | “r”); query: f.read(); f.open(“file” | “r”); f.open(“f2” | “w”); f. ? f.write(“c”); f.open(“f1” | “r”); 3-gram model on tokens f.read(); Hindle et. al., ACM ICSE’12 P( open | f. ) ~ 3/6 P( read | f. ) ~ 2/6 P( write | f. ) ~ 1/6 context 𝜹
Training dataset D f.open(“f2” | “r”); query: f.read(); f.open(“file” | “r”); f.open(“f2” | “w”); f. open f.write(“c”); f.open(“f1” | “r”); 3-gram model on tokens f.read(); Hindle et. al., ACM ICSE’12 P( open | f. ) ~ 3/6 P( read | f. ) ~ 2/6 P( write | f. ) ~ 1/6 context 𝜹
probabilistic model on APIs Raychev et. al., ACM PLDI’14 P( read | open ) ~ 2/3 P( write | open ) ~ 1/3 Training dataset D context 𝜹 f.open(“f2” | “r”); query: f.read(); f.open(“file” | “r”); f.open(“f2” | “w”); f. ? f.write(“c”); f.open(“f1” | “r”); 3-gram model on tokens f.read(); Hindle et. al., ACM ICSE’12 P( open | f. ) ~ 3/6 P( read | f. ) ~ 2/6 P( write | f. ) ~ 1/6 context 𝜹
probabilistic model on APIs Raychev et. al., ACM PLDI’14 P( read | open ) ~ 2/3 P( write | open ) ~ 1/3 Training dataset D context 𝜹 f.open(“f2” | “r”); query: f.read(); f.open(“file” | “r”); f.open(“f2” | “w”); f. read f.write(“c”); f.open(“f1” | “r”); 3-gram model on tokens f.read(); Hindle et. al., ACM ICSE’12 P( open | f. ) ~ 3/6 P( read | f. ) ~ 2/6 P( write | f. ) ~ 1/6 context 𝜹
probabilistic model on APIs Raychev et. al., ACM PLDI’14 P( read | open ) ~ 2/3 P( write | open ) ~ 1/3 Training dataset D 𝜹 context f.open(“f2” | “r”); query: f.read(); f.open(“file” | “r”); f.open(“f2” | “w”); f. ? f.write(“c”); f.open(“f1” | “r”); 3-gram model on tokens f.read(); Hindle et. al., ACM ICSE’12 P( open | f. ) ~ 3/6 P( read | f. ) ~ 2/6 P( write | f. ) ~ 1/6 context 𝜹 Wha hat t sho houl uld the the cont ntext b t be?
program context → 𝜹 key idea: synthesize a function f: “…All problems in computer science can be solved by another level of indirection …” -- David Wheeler
Creating probabilistic models: our method [“Learning Programs from Noisy Data”, ACM POPL ’16, “PHOG: Probabilistic Model for Code”, ACM ICML ’16, “Probabilistic Model for Code with Decision Trees”, ACM OOPSLA’16] 1. Pick a structure of interest, e.g., ASTs: DSL 2. Define a DSL for expressing functions: (can be Turing complete) 3. Synthesize f best ∊ DSL from Dataset D : f best = argmin cost( D , f ) f ∊ DSL ( ) 𝜹 f best 4. Use f best to compute context and predict:
Step 1: Pick Structure of Interest Let it be abstract syntax trees (ASTs) of programs AST JavaScript program elem.notify({ CallExpression position: ‘top’, autoHide: false, MemberExpression ObjectExpression delay: 100 }); Identifier Property Property Property Property elem notify position autoHide delay String Boolean ‘ top’ false
Step 2: Define a DSL over structure TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond Syntax MoveOp ::= Up, Left, Right, DownFirst, DownLast, NextDFS, PrevDFS, NextLeaf, PrevLeaf,PrevNodeType, PrevNodeValue, PrevNodeContext WriteOp ::= WriteValue, WriteType, WritePos Semantics Up Left WriteValue 𝜹 ← 𝜹 ∙
Step 3: synthesize f best f best = argmin cost( D , f ) f ∊ DSL
Step 3: synthesize f best DSL generate candidate f dataset D TCond ::= 𝜁 | WriteOp TCond | MoveOp TCond MoveOp ::= Up, Left, Right, ... WriteOp ::= WriteValue, WriteType, ... millions ( ≈ 10 8 ) f best = argmin cost( D , f ) Build f ∊ DSL Synthesizer Probabilistic Model P use D and f to compute P(element | f ( )) to scale: iterative synthesis cost( D , f) = entropy ( P ) on fraction of examples 𝑃 ( 𝐸 )
Step 4: use f best to predict 𝜹 program f best Context Left {} elem.notify( ... , WriteValue {hide} ... , Up {hide} { WritePos {hide, 3} position: ‘top’, Up {hide, 3} hide: false, DownFirst {hide, 3} ? DownLast {hide, 3} } WriteValue {hide, 3, notify} ); {Previous Property, Parameter Position, API name}
Deep3: Experimental Results [Probabilistic Model of JavaScript] Dataset D : 150,000 files Training Time: ~ 100 hours f best ~ ~ 50,000 instr. Accuracy (APIs) Probabilistic Model 22.2% Last two tokens, Hindle et. al. [ICSE’12] 30.4% Last two APIs, Raychev et. al. [PLDI’14] Deep3 eep3 66.6% Details in: “Probabilistic Model for Code with Decision Trees”, ACM OOPSLA’16
Deep3: Experimental Results [Probabilistic Model of Python] Dataset D : 150,000 files Training Time: ~ 100 hours f best ~ ~ 120,000 instr Accuracy (identifiers) Probabilistic Model 38% Last two tokens, Hindle et. al. [ICSE’12] Deep3 eep3 51% Details in: “Probabilistic Model for Code with Decision Trees”, ACM OOPSLA’16
Applying the Concept to Natural Language [ Program Synthesis for Character Level Language Modeling, ICLR’17 sub ] Dataset D : Training Time: ~ 8 hours f best ~ ~ 9,000 instr uses a char-level DSL with state Hutter Prize Wikipedia Dataset Interpretable model, browse here : http://www.srl.inf.ethz.ch/charmodel.html Bits-per-Character Probabilistic Model 7-gram (best) 1.94 1.67 Stacked LSTM (Graves 2013) Char-based DSL synthesis 1.62 1.60 MRNN (Sutskever 2011) 51% 1.44 MI-LSTM (Wu et al. 2016) 51% 1.40 HM-LSTM* (Chung et al. 2016) 51%
Learning (Abstract) Semantics [ Learning a Static Analyzer from Data, https://arxiv.org/abs/1611.01752 ] VarPtsTo(“global” , h ) checkIfInsideMethodCall function isBig(v) { checkMethodCallName return v < this .length checkReceiverType } checkNumberOfArguments ... [12, 5].filter(isBig); VarPtsTo( this, h ) Can be understood by experts Found issues in Facebook’s Flow
Recommend
More recommend