Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for r λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 costs are UNSAT 3 1.8 2.8 3.8 4.8 33
Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for r SAT λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 costs are UNSAT 3 1.8 2.8 3.8 4.8 34
Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost best program is UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for with two r SAT λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 instructions and costs are makes one error UNSAT 3 1.8 2.8 3.8 4.8 35
Noisy synthesizer: example Take an actual synthesizer and show that we can make it handle noise 36
Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } 37
Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 38
Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 39
Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] best area to be in. empirically pick λ here Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 40
So far… handling noise ● Problem statement and regularization ● Synthesis procedure using SMT ● Presented one synthesizer Handling noise enables us to solve new classes of problems beyond normal synthesis 41
Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 42
Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets Handling large datasets incorrect examples Representative Program dataset generator sampler 43
Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P 44
Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P D Millions of input/output examples 45
Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) D Millions of input/output examples 46
Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) Synthesis: practically intractable D Millions of input/output examples 47
Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) Synthesis: practically intractable D Millions of Key idea: iterative synthesis on input/output fraction of examples examples 48
Our solution: two components Synthesizer for small number of examples Program generator p best = arg min cost (d, p) p ∈ P given dataset d, finds best program 49
Our solution: two components Synthesizer for small number of examples Program generator p best = arg min cost (d, p) p ∈ P given dataset d, finds best program Dataset sampler We introduce representative dataset Picks dataset d ⊆ D sampler Generalize a user providing input/output examples 50
In a loop Program Representative generator dataset sampler 51
In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. 52
In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator 53
In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler 54
In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator 55
In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator Representative dataset sampler 56
In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator p best Representative dataset sampler 57
In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator p best Representative dataset sampler Algorithm generalizes synthesis by examples techniques 58
Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | 59
Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d 60
Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D 61
Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D 62
Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D 63
Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D Theorem: this sampler shrinks the candidate program search space 64 In evaluation: significant speedup of synthesis
So far... handling large datasets ● Iterative combination of synthesis and sampling ● New way to perform approximate empirical risk minimization ● Guarantees (in the paper) Representative Program dataset generator sampler 65
Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 66
Contributions New probabilistic models New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 67
Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 68
Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 1. Train machine learning model 69
Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 2. Make predictions 1. Train machine with model learning model 70
Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 2. Make predictions 1. Train machine with model learning model hard-coded model low precision 71
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) 72
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] 73
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] 74
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] 75
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP 76
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 77
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 78
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 79
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 80
Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 81 Relies on static analysis
Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 82 Relies on static analysis
Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 83 Relies on static analysis
Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Low precision for JavaScript Model will predict slice when it sees it after “ charAt ” 84 Relies on static analysis
Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” Core problem: This model comes from NLP Existing machine learning models are limited and not Raychev et al.[PLDI’14] expressive enough Learn a mapping charAt slice Low precision for JavaScript Model will predict slice when it sees it after “ charAt ” 85 Relies on static analysis
Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 86
Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 1. Synthesize program describing a model 87
Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model i.e. learn the mapping 1. Synthesize program describing a model 88
Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model 3. Make predictions with this model i.e. learn the mapping 1. Synthesize program describing a model 89
Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model 3. Make predictions with this model i.e. learn the mapping 1. Synthesize prior models are described by simple hard-coded programs program describing a model Our approach: learn a better program 90
Training and evaluation Training example: slice input output 91
Training and evaluation Compute context with program p Training example: slice input output 92
Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice 93
Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: 94
Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p 95
Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p predict completion slice 96
Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p predict completion ✔ slice 97
Observation Synthesis of probabilistic model can be done with the same optimization problem as before! evaluation data: input/output examples Our problem formulation: regularization constant p best = arg min errors (D, p) + λ r(p) p ∈ P regularizer penalizes long 98 programs
Observation Synthesis of probabilistic model can be done with the same optimization problem as before! cost (D, p) evaluation data: input/output examples Our problem formulation: regularization constant p best = arg min errors (D, p) + λ r(p) p ∈ P regularizer penalizes long 99 programs
So far... Handling noise Synthesizing a model Representative dataset sampler Techniques are generally applicable to program synthesis Next, application for “Big Code” called DeepSyn 100
Recommend
More recommend