learning programs from noisy data
play

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik - PowerPoint PPT Presentation

Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich Why learn programs from examples? Input/output examples often easier to provide examples than specification (e.g. in FlashFill) 2


  1. Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for r λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 costs are UNSAT 3 1.8 2.8 3.8 4.8 33

  2. Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for r SAT λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 costs are UNSAT 3 1.8 2.8 3.8 4.8 34

  3. Noisy synthesis using SMT total solution cost p best = arg min errors (D, p) + λ r(p) number of instructions p ∈ P err 1 = if p( )= then 0 else 1 err 2 = if p( )= then 0 else 1 Ψ err 3 = if p( )= then 0 else 1 encoding errors = err 1 + err 2 + err 3 formula given to SMT solver p ∈ P r (with r instructions) Ask a number of number of errors cost SMT queries in increasing value 0 1 2 3 of solution cost best program is UNSAT UNSAT 1 0.6 1.6 2.6 3.6 e.g. for with two r SAT λ = 0.6 2 UNSAT 1.2 2.2 3.2 4.2 instructions and costs are makes one error UNSAT 3 1.8 2.8 3.8 4.8 35

  4. Noisy synthesizer: example Take an actual synthesizer and show that we can make it handle noise 36

  5. Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } 37

  6. Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 38

  7. Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 39

  8. Implementation: BitSyn For BitStream programs, using Z3 similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11] best area to be in. empirically pick λ here Example program: function check_if_power_of_2( int32 x ) { var o = add(x, 1) synthesized, short loop-free programs return bitwise_and(x, o) } Question: how well does our synthesizer discover noise? (in programs from prior work) 40

  9. So far… handling noise ● Problem statement and regularization ● Synthesis procedure using SMT ● Presented one synthesizer Handling noise enables us to solve new classes of problems beyond normal synthesis 41

  10. Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 42

  11. Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets Handling large datasets incorrect examples Representative Program dataset generator sampler 43

  12. Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P 44

  13. Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P D Millions of input/output examples 45

  14. Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) D Millions of input/output examples 46

  15. Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) Synthesis: practically intractable D Millions of input/output examples 47

  16. Fundamental problem Large number of examples: p best = arg min cost (D, p) p ∈ P computing cost (D, p) O( |D| ) Synthesis: practically intractable D Millions of Key idea: iterative synthesis on input/output fraction of examples examples 48

  17. Our solution: two components Synthesizer for small number of examples Program generator p best = arg min cost (d, p) p ∈ P given dataset d, finds best program 49

  18. Our solution: two components Synthesizer for small number of examples Program generator p best = arg min cost (d, p) p ∈ P given dataset d, finds best program Dataset sampler We introduce representative dataset Picks dataset d ⊆ D sampler Generalize a user providing input/output examples 50

  19. In a loop Program Representative generator dataset sampler 51

  20. In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. 52

  21. In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator 53

  22. In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler 54

  23. In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator 55

  24. In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator Representative dataset sampler 56

  25. In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator p best Representative dataset sampler 57

  26. In a loop Program Representative generator dataset sampler Start with a small random sample d ⊆ D Iteratively generate programs and samples. p 1 Program generator Representative d dataset sampler p 1 , p 2 Program generator p best Representative dataset sampler Algorithm generalizes synthesis by examples techniques 58

  27. Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | 59

  28. Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d 60

  29. Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D 61

  30. Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D 62

  31. Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D 63

  32. Representative dataset sampler Idea: pick a small dataset d for which a set of already generated programs p 1 ,...,p n behave like on the full dataset d = arg min d ⊆ D max i ∊ 1..n | cost(d, p i ) - cost(D, p i ) | p 1 p 2 Costs on small dataset d p 1 p 2 Costs on full dataset D Theorem: this sampler shrinks the candidate program search space 64 In evaluation: significant speedup of synthesis

  33. So far... handling large datasets ● Iterative combination of synthesis and sampling ● New way to perform approximate empirical risk minimization ● Guarantees (in the paper) Representative Program dataset generator sampler 65

  34. Contributions New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 66

  35. Contributions New probabilistic models New probabilistic models 2. use probabilistic model parametrized Handling noise with p 1. synthesize p Input/output examples Handling large datasets incorrect examples Representative Program dataset generator sampler 67

  36. Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 68

  37. Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 1. Train machine learning model 69

  38. Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 2. Make predictions 1. Train machine with model learning model 70

  39. Statistical programming tools A new breed of tools: Learn from large existing codebases (e.g. Big Code) to make predictions about programs 2. Make predictions 1. Train machine with model learning model hard-coded model low precision 71

  40. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) 72

  41. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] 73

  42. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] 74

  43. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] 75

  44. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP 76

  45. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 77

  46. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 78

  47. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 79

  48. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] 80

  49. Existing machine learning models Essentially remember mapping from context in training data to prediction (with probabilities) Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 81 Relies on static analysis

  50. Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 82 Relies on static analysis

  51. Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Model will predict slice when it sees it after “ charAt ” 83 Relies on static analysis

  52. Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” This model comes from NLP Raychev et al.[PLDI’14] Learn a mapping charAt slice Low precision for JavaScript Model will predict slice when it sees it after “ charAt ” 84 Relies on static analysis

  53. Problem of existing systems Precision. They rarely predict the next statement Hindle et al.[ICSE’12] Learn a mapping + name . slice Very low precision Model will predict slice when it sees it after “ + name . ” Core problem: This model comes from NLP Existing machine learning models are limited and not Raychev et al.[PLDI’14] expressive enough Learn a mapping charAt slice Low precision for JavaScript Model will predict slice when it sees it after “ charAt ” 85 Relies on static analysis

  54. Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 86

  55. Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 1. Synthesize program describing a model 87

  56. Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model i.e. learn the mapping 1. Synthesize program describing a model 88

  57. Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model 3. Make predictions with this model i.e. learn the mapping 1. Synthesize program describing a model 89

  58. Key idea: second-order learning Learn a program that parametrizes a probabilistic model that makes predictions. 2. Train model 3. Make predictions with this model i.e. learn the mapping 1. Synthesize prior models are described by simple hard-coded programs program describing a model Our approach: learn a better program 90

  59. Training and evaluation Training example: slice input output 91

  60. Training and evaluation Compute context with program p Training example: slice input output 92

  61. Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice 93

  62. Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: 94

  63. Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p 95

  64. Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p predict completion slice 96

  65. Training and evaluation Compute context with program p Training example: slice Learn a mapping input output toUpperCase slice Evaluation example: Compute context with program p predict completion ✔ slice 97

  66. Observation Synthesis of probabilistic model can be done with the same optimization problem as before! evaluation data: input/output examples Our problem formulation: regularization constant p best = arg min errors (D, p) + λ r(p) p ∈ P regularizer penalizes long 98 programs

  67. Observation Synthesis of probabilistic model can be done with the same optimization problem as before! cost (D, p) evaluation data: input/output examples Our problem formulation: regularization constant p best = arg min errors (D, p) + λ r(p) p ∈ P regularizer penalizes long 99 programs

  68. So far... Handling noise Synthesizing a model Representative dataset sampler Techniques are generally applicable to program synthesis Next, application for “Big Code” called DeepSyn 100

Recommend


More recommend