program synthesis in the industrial world
play

Program Synthesis in the Industrial World: Inductive, Incremental, - PowerPoint PPT Presentation

Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov Sumit Gulwani polozov@cs.washington.edu sumitg@microsoft.com And the rest of the PROSE team! prose-contact@microsoft.com July 18, 2016 SYNT-16,


  1. Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov Sumit Gulwani polozov@cs.washington.edu sumitg@microsoft.com And the rest of the PROSE team! prose-contact@microsoft.com July 18, 2016 SYNT-16, Toronto, Canada 1

  2. PROgram Synthesis using Examples Ranvijay Sumit Daniel Allen Cypher Vu Le Kumar Gulwani Perelman Mohammad Abhishek Danny Adam Smith Alex Polozov Raza Udupa Simmons We are hiring! Interns or full-time  R&D team, MSR → industrial Microsoft July 18, 2016 SYNT-16, Toronto, Canada 2

  3. This talk Lessons Solutions Challenges July 18, 2016 SYNT-16, Toronto, Canada 3

  4. Outline  Programming by Examples (PBE) & PROSE: Quick Background  Mass-Market Deployment ↪ Goals ↪ Challenges ↪ Solutions  Discussion July 18, 2016 SYNT-16, Toronto, Canada 4

  5. PBE & PROSE A 3-slide Background July 18, 2016 SYNT-16, Toronto, Canada 5

  6. Motivation 99% of spreadsheet users do not know programming Data scientists spend 80% time wrangling raw data July 18, 2016 SYNT-16, Toronto, Canada 6

  7. PROSE Timeline PROSE FlashFill FlashExtract FlashRelate FlashMeta … (text transformations) (text extraction) (table transformations) (PBE framework) SDK 2014-2015 2010-2012 2012-2014 2012-2015 2015-present [OOPSLA 15] [POPL 11] [PLDI 14] [PLDI 15] July 18, 2016 SYNT-16, Toronto, Canada 7

  8. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 Debugging Program Synthesizer Translator DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 8

  9. Mass-Market Deployment Goals & Challenges July 18, 2016 SYNT-16, Toronto, Canada 9

  10. Inductive Scalable (snappy UI = responds in < 1 s) (intent is easily specified) Ambiguity resolution Incremental synthesis User Experience Predictive synthesis Engineering practices Interactive Agile (facilitates the debugging cycle) (quick software development) July 18, 2016 SYNT-16, Toronto, Canada 10

  11. Engineering practices • Production-quality library code • Prototyping still exists, but it’s not the final form • Unit tests & TDD • Integration tests: real-life scenarios • Close to 8K for all DSLs in total • Most are mined from public sources (e.g. help forums) • In preparation: benchmark suite release for the community July 18, 2016 SYNT-16, Toronto, Canada 13

  12. Performance-minded engineering • Parallelization of learning matters • E.g.: multi-user log file processing in Azure Log Analytics • Performance of program execution matters • E.g.: “Big Data” on an end - user’s machine • Smallest ≠ fastest! • (1) Synthesize many correct programs, then (2) optimize for the fast ones Robustness-based Performance-based ranking ranking July 18, 2016 SYNT-16, Toronto, Canada 14

  13. Should I process the string Development “25 -06- 11” with regexes? Treat it as a numeric computation? A date? • DSL design: ≈ 10 months → ≈ 2 weeks • This is not a bottleneck! * • Ranking: bulk of the effort • Designing a score for an operator 𝐺 is 2-3x longer than designing 𝐺 (incl. synthesis!) • E.g.: rock-paper-scissors among string processing operators * Once you learn the skill… July 18, 2016 SYNT-16, Toronto, Canada 15

  14. …and up to 10 20 more candidates From: all lines ending with “Number ∘ Dot” “Space ∘ Number ∘ Dot” starting with “Word ∘ Space ∘ CamelCase ” Extract: the first “Number” before a “Dot” the last “Number” before a “Dot” the last “Number” before a “Dot ∘ LineBreak” the last “Number” text between the last “Space” and the last “Dot” the first “Comma ∘ Space” and the last “Dot ∘ LineBreak ” July 18, 2016 SYNT-16, Toronto, Canada 18

  15. Anecdotes • FlashFill was not accepted to Excel until it solved the most common scenarios from 1 example Adam Smith Adam Alice Williams Alic • Some users still don’t know you can give 2 ! July 18, 2016 SYNT-16, Toronto, Canada 19

  16. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 July 18, 2016 SYNT-16, Toronto, Canada 20

  17. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round(x, Up, 25)) July 18, 2016 SYNT-16, Toronto, Canada 21

  18. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round( x, Up, 25)) July 18, 2016 SYNT-16, Toronto, Canada 22

  19. Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round( x, Up, 25)) Option 2: interactive clarification July 18, 2016 SYNT-16, Toronto, Canada 23

  20. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 Debugging Program Synthesizer Translator DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 26

  21. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 User Program Synthesizer Translator Questions DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Hypothesizer Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 27

  22. PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 User Program Synthesizer Translator Questions DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Hypothesizer Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 28

  23. Hypothesizer Given a program set ෩ 𝑶 , find program constraints (“hypotheses”) 𝝌 that best disambiguate among programs in ෩ 𝑶 , and present them to the user as multiple-choice questions.  Reduces the cognitive load on the user  Reduces the number of iterations by choosing the most effective disambiguating questions  Increases the user’s confidence in the system (“proactive = smart”) July 18, 2016 SYNT-16, Toronto, Canada 29

  24. Example Missing page numbers, 1993 1993 64-67, 1995 64 … … … 1995 Which output is correct here? a. 64 64 b. 67 67 c. 1995 ⊥ ෩ 𝑂 July 18, 2016 SYNT-16, Toronto, Canada 30

  25. Example Missing page numbers, 1993 1993 64-67, 1995 64 … … … 1995 Which output is correct here? a. 64 64 b. 67 67 c. 1995 ⊥ ෩ 𝑂 𝜒 𝑗+1 : 𝑄 𝜏 2 = "1995" July 18, 2016 SYNT-16, Toronto, Canada 31

  26. Example Missing page numbers, 1993 1993 64-67, 1995 1995 … … … Which output is correct here? a. 64 b. 67 c. 1995 𝜒 𝑗+1 : 𝑄 𝜏 2 = "1995" July 18, 2016 SYNT-16, Toronto, Canada 32

  27. Example – alternative Missing page numbers, 1993 1993 64-67 64-67, 1995 64 … … … 1995 Is this part of the input relevant? a. Yes 64 b. No 67 c. Maybe ⊥ ෩ 𝑂 July 18, 2016 SYNT-16, Toronto, Canada 33

  28. Picking the right question “Distinguishability” = effectiveness for disambiguation 1. An input is distinguishing if many top-ranked candidate programs disagree on the intended output on it. • Any response will partition the program set well 2. A question is distinguishing if the alternative candidate programs corresponding to all potential responses have high ranks. • Any response will lead to a good alternative program Preliminary results: good questions yield just 4-6 iterations until convergence July 18, 2016 SYNT-16, Toronto, Canada 34

  29. Big Data July 18, 2016 SYNT-16, Toronto, Canada 37

  30. Big Data + Program Synthesis July 18, 2016 SYNT-16, Toronto, Canada 38

  31. Problem definition Given a program set ෩ 𝑶 𝒋 ⊂ ℒ that satisfies the currently accumulated spec 𝝌 𝒋 , and a new constraint 𝝎 𝒋+𝟐 , learn a subset ෩ 𝑶 𝒋+𝟐 ⊂ ෩ 𝑶 𝒋 of programs that satisfy the new spec 𝝌 𝒋+𝟐 = 𝝌 𝒋 ∧ 𝝎 𝒋+𝟐 • ℒ is an industrial DSL (e.g., FlashFill) ෩ • 𝑂 𝑗 ≈ 10 20 • Time limit: ≈ 1 sec July 18, 2016 SYNT-16, Toronto, Canada 39

  32. Background: Version Space Algebra int positionIn[string s] := AbsPos(s, k) | RegPos(s, std.Pair(r, r), k); July 18, 2016 SYNT-16, Toronto, Canada 40

  33. Background: Version Space Algebra int positionIn[string s] := AbsPos(s, k) | RegPos(s, std.Pair(r, r), k); Sharing #1: cross-product representation July 18, 2016 SYNT-16, Toronto, Canada 41

Recommend


More recommend