Program Synthesis in the Industrial World: Inductive, Incremental, - PowerPoint PPT Presentation

Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov Sumit Gulwani polozov@cs.washington.edu sumitg@microsoft.com And the rest of the PROSE team! prose-contact@microsoft.com July 18, 2016 SYNT-16, Toronto, Canada 1

PROgram Synthesis using Examples Ranvijay Sumit Daniel Allen Cypher Vu Le Kumar Gulwani Perelman Mohammad Abhishek Danny Adam Smith Alex Polozov Raza Udupa Simmons We are hiring! Interns or full-time  R&D team, MSR → industrial Microsoft July 18, 2016 SYNT-16, Toronto, Canada 2

This talk Lessons Solutions Challenges July 18, 2016 SYNT-16, Toronto, Canada 3

Outline  Programming by Examples (PBE) & PROSE: Quick Background  Mass-Market Deployment ↪ Goals ↪ Challenges ↪ Solutions  Discussion July 18, 2016 SYNT-16, Toronto, Canada 4

PBE & PROSE A 3-slide Background July 18, 2016 SYNT-16, Toronto, Canada 5

Motivation 99% of spreadsheet users do not know programming Data scientists spend 80% time wrangling raw data July 18, 2016 SYNT-16, Toronto, Canada 6

PROSE Timeline PROSE FlashFill FlashExtract FlashRelate FlashMeta … (text transformations) (text extraction) (table transformations) (PBE framework) SDK 2014-2015 2010-2012 2012-2014 2012-2015 2015-present [OOPSLA 15] [POPL 11] [PLDI 14] [PLDI 15] July 18, 2016 SYNT-16, Toronto, Canada 7

PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 Debugging Program Synthesizer Translator DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 8

Mass-Market Deployment Goals & Challenges July 18, 2016 SYNT-16, Toronto, Canada 9

Inductive Scalable (snappy UI = responds in < 1 s) (intent is easily specified) Ambiguity resolution Incremental synthesis User Experience Predictive synthesis Engineering practices Interactive Agile (facilitates the debugging cycle) (quick software development) July 18, 2016 SYNT-16, Toronto, Canada 10

Engineering practices • Production-quality library code • Prototyping still exists, but it’s not the final form • Unit tests & TDD • Integration tests: real-life scenarios • Close to 8K for all DSLs in total • Most are mined from public sources (e.g. help forums) • In preparation: benchmark suite release for the community July 18, 2016 SYNT-16, Toronto, Canada 13

Performance-minded engineering • Parallelization of learning matters • E.g.: multi-user log file processing in Azure Log Analytics • Performance of program execution matters • E.g.: “Big Data” on an end - user’s machine • Smallest ≠ fastest! • (1) Synthesize many correct programs, then (2) optimize for the fast ones Robustness-based Performance-based ranking ranking July 18, 2016 SYNT-16, Toronto, Canada 14

Should I process the string Development “25 -06- 11” with regexes? Treat it as a numeric computation? A date? • DSL design: ≈ 10 months → ≈ 2 weeks • This is not a bottleneck! * • Ranking: bulk of the effort • Designing a score for an operator 𝐺 is 2-3x longer than designing 𝐺 (incl. synthesis!) • E.g.: rock-paper-scissors among string processing operators * Once you learn the skill… July 18, 2016 SYNT-16, Toronto, Canada 15

…and up to 10 20 more candidates From: all lines ending with “Number ∘ Dot” “Space ∘ Number ∘ Dot” starting with “Word ∘ Space ∘ CamelCase ” Extract: the first “Number” before a “Dot” the last “Number” before a “Dot” the last “Number” before a “Dot ∘ LineBreak” the last “Number” text between the last “Space” and the last “Dot” the first “Comma ∘ Space” and the last “Dot ∘ LineBreak ” July 18, 2016 SYNT-16, Toronto, Canada 18

Anecdotes • FlashFill was not accepted to Excel until it solved the most common scenarios from 1 example Adam Smith Adam Alice Williams Alic • Some users still don’t know you can give 2 ! July 18, 2016 SYNT-16, Toronto, Canada 19

Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 July 18, 2016 SYNT-16, Toronto, Canada 20

Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round(x, Up, 25)) July 18, 2016 SYNT-16, Toronto, Canada 21

Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round( x, Up, 25)) July 18, 2016 SYNT-16, Toronto, Canada 22

Ambiguity resolution Option 1: machine-learned robustness-based ranking [CAV 15] • Idioms/patterns from test data can influence search & ranking • E.g.: bucketing 100 76-100 51 51-75 86 x ⇒ Concat(Round(x, Down, 25), Const (“ - ”), Round( x, Up, 25)) Option 2: interactive clarification July 18, 2016 SYNT-16, Toronto, Canada 23

PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 Debugging Program Synthesizer Translator DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 26

PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 User Program Synthesizer Translator Questions DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Hypothesizer Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 27

PBE Architecture Refined intent Example-based Ranked Intended program set ෩ intent spec 𝜒 program 𝑄 ∈ ℒ 𝑂 User Program Synthesizer Translator Questions DSL ℒ Test inputs Ԧ 𝜏 Ranking function ℎ Intended program in Hypothesizer Python/C#/C++/… July 18, 2016 SYNT-16, Toronto, Canada 28

Hypothesizer Given a program set ෩ 𝑶 , find program constraints (“hypotheses”) 𝝌 that best disambiguate among programs in ෩ 𝑶 , and present them to the user as multiple-choice questions.  Reduces the cognitive load on the user  Reduces the number of iterations by choosing the most effective disambiguating questions  Increases the user’s confidence in the system (“proactive = smart”) July 18, 2016 SYNT-16, Toronto, Canada 29

Example Missing page numbers, 1993 1993 64-67, 1995 64 … … … 1995 Which output is correct here? a. 64 64 b. 67 67 c. 1995 ⊥ ෩ 𝑂 July 18, 2016 SYNT-16, Toronto, Canada 30

Example Missing page numbers, 1993 1993 64-67, 1995 64 … … … 1995 Which output is correct here? a. 64 64 b. 67 67 c. 1995 ⊥ ෩ 𝑂 𝜒 𝑗+1 : 𝑄 𝜏 2 = "1995" July 18, 2016 SYNT-16, Toronto, Canada 31

Example Missing page numbers, 1993 1993 64-67, 1995 1995 … … … Which output is correct here? a. 64 b. 67 c. 1995 𝜒 𝑗+1 : 𝑄 𝜏 2 = "1995" July 18, 2016 SYNT-16, Toronto, Canada 32

Example – alternative Missing page numbers, 1993 1993 64-67 64-67, 1995 64 … … … 1995 Is this part of the input relevant? a. Yes 64 b. No 67 c. Maybe ⊥ ෩ 𝑂 July 18, 2016 SYNT-16, Toronto, Canada 33

Picking the right question “Distinguishability” = effectiveness for disambiguation 1. An input is distinguishing if many top-ranked candidate programs disagree on the intended output on it. • Any response will partition the program set well 2. A question is distinguishing if the alternative candidate programs corresponding to all potential responses have high ranks. • Any response will lead to a good alternative program Preliminary results: good questions yield just 4-6 iterations until convergence July 18, 2016 SYNT-16, Toronto, Canada 34

Big Data July 18, 2016 SYNT-16, Toronto, Canada 37

Big Data + Program Synthesis July 18, 2016 SYNT-16, Toronto, Canada 38

Problem definition Given a program set ෩ 𝑶 𝒋 ⊂ ℒ that satisfies the currently accumulated spec 𝝌 𝒋 , and a new constraint 𝝎 𝒋+𝟐 , learn a subset ෩ 𝑶 𝒋+𝟐 ⊂ ෩ 𝑶 𝒋 of programs that satisfy the new spec 𝝌 𝒋+𝟐 = 𝝌 𝒋 ∧ 𝝎 𝒋+𝟐 • ℒ is an industrial DSL (e.g., FlashFill) ෩ • 𝑂 𝑗 ≈ 10 20 • Time limit: ≈ 1 sec July 18, 2016 SYNT-16, Toronto, Canada 39

Background: Version Space Algebra int positionIn[string s] := AbsPos(s, k) | RegPos(s, std.Pair(r, r), k); July 18, 2016 SYNT-16, Toronto, Canada 40

Background: Version Space Algebra int positionIn[string s] := AbsPos(s, k) | RegPos(s, std.Pair(r, r), k); Sharing #1: cross-product representation July 18, 2016 SYNT-16, Toronto, Canada 41

Program Synthesis in the Industrial World: Inductive, Incremental, - PowerPoint PPT Presentation

Program Synthesis in the Industrial World: Inductive, Incremental, Interactive Alex Polozov Sumit Gulwani polozov@cs.washington.edu sumitg@microsoft.com And the rest of the PROSE team! prose-contact@microsoft.com July 18, 2016 SYNT-16,

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

From Program Synthesis to Optimal Program . . . Optimal Program Synthesis Logical Interpretation

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

industrial IOT technologies INDUSTRIAL MARKET INSIGHTS within global industrial sector

Scaling Program Synthesis by Exploiting Existing Code James Bornholt Emina Torlak University of

Artificial Intelligence, Data Science in the Industrial World, Speech Synthesis

Guarantees in in Program Synthesis Qinheping Hu , Jason Breck , John Cyphert , Loris D'Antoni ,

Samba Computer Center, CS, NCTU Network-based File Sharing FTP (File Transfer Protocol)

Order at Last The New U-Boot Driver Model Architecture Simon Glass, Google Inc, ELCE 2015,

CSCI 3022 Intro to Data Science with Probability and Statistics What is Data Science? What is

P2P Loan Performance on Lending Club Peter Jin November 25, 2014 phj@cs.berkeley.edu Objectives

Machine Translation Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, and Bill Byrne

Big Data Cleaning Paolo Papotti EURECOM, France 3rd International KEYSTONE Conference 2017 2

A Scalable Approach to Incrementally Building Knowledge Graphs Gleb Gawriljuk (KIT), Andreas

Board Meeting The Falmouth Historical Society August 4, 2020 Agenda Local History

Sambuz

Useful Links

Newsletter

Mail Us