formal methods for database application evolution
play

Formal Methods for Database Application Evolution I l Dillig - PowerPoint PPT Presentation

Formal Methods for Database Application Evolution I l Dillig University of Texas at Austin VSTTE 2020 1 Database Applications DB application: program that interacts with underlying database Database applications are ubiquitous


  1. Formal Methods for Database Application Evolution I ş ıl Dillig University of Texas at Austin VSTTE 2020 1

  2. Database Applications DB application: program that interacts with underlying database Database applications are ubiquitous • Enterprise software, web applications, CRM applications, …. 2

  3. Schema Refactoring Common theme during DB app evolution: schema refactoring Examples • Splitting/merging tables • Denormalization • Even more extreme: Switch from relational schema to non-relational DB 3

  4. Implications of Schema Changes Structural schema changes have significant implications! Data migration: Code migration: Move data from source Must re-implement to target schema parts of program Currently done Attracted lots manually! of attention 4

  5. Our Recent Research Program verification and synthesis techniques to help programmers with DB schema refactoring Just a starting point, not a finished body of work! My goal: Convince you that there are lots of open & interesting problems in this space! 5

  6. Idealized Model: Parametrized SQL Programs Program is a set of methods string getName(int id){ (transactions), where each SELECT name method can take user FROM users WHERE uid = id input, but the body is } straight-line SQL code void insertUser Each method is either (int id, string name) { INSERT INTO users an update transaction VALUES (id, name) (i.e., modifies DB), or } query transaction 6

  7. Outline Part I: Equivalence verification for parametrized SQL programs (POPL’18) Part II: Synthesizing new version of parametrized SQL program for a given target schema (PLDI’19) Part III: Open problems & challenges 7

  8. Verification Problem Database1 Database2 Refactor schema Program1 Program2 Are the two programs equivalent before and after schema change? 8

  9. Example void createSub(int id, String name, String fltr) Subscriber INSERT INTO Subscriber VALUES (id, name, fltr); sid sname filter P void updateSub(int id, String fltr) … … … UPDATE Subscriber SET filter=fltr WHERE sid=id; Are these List<Tuple> getSubFilter(int id) SELECT filter FROM Subscriber WHERE sid=id; implementations equivalent? void createSub(int id, String name, String fltr) Subscriber’ INSERT INTO Subscriber’ VALUES (id, name, UID_x); INSERT INTO Filter VALUES (UID_x, fltr); sid sname fid_fk … … … void updateSub(int id, String fltr) P’ UPDATE Filter SET params=fltr WHERE fid IN Filter (SELECT fid_fk FROM Subscriber’ WHERE sid=id); fid params List<Tuple> getSubFilter(int id) SELECT params FROM Filter JOIN Subscriber’ … … ON fid=fid_fk WHERE sid=id; 9

  10. Defining Equivalence Consider two SQL programs P and P’ that provide same interface but different implementations queries P = updates queries P’ P P’: Every query transaction yields the same result ≡ after invoking same sequence of update transactions 10

  11. Example Revisited void createSub(int id, String name, String fltr) After an arbitrary INSERT INTO Subscriber VALUES (id, name, fltr); P void updateSub(int id, String fltr) sequence of createSub UPDATE Subscriber SET filter=fltr WHERE sid=id; and updateSub List<Tuple> getSubFilter(int id) SELECT filter FROM Subscriber WHERE sid=id; transactions, getSubFilter should void createSub(int id, String name, String fltr) INSERT INTO Subscriber’ VALUES (id, name, UID_x); yield same answer INSERT INTO Filter VALUES (UID_x, fltr); void updateSub(int id, String fltr) P’ UPDATE Filter SET params=fltr WHERE fid IN (SELECT fid_fk FROM Subscriber’ WHERE sid=id); List<Tuple> getSubFilter(int id) SELECT params FROM Filter JOIN Subscriber’ ON fid=fid_fk WHERE sid=id; 11

  12. Proving Equivalence: Methodology Find bisimulation invariant that Φ relates the two DB states update ... update update S D 0 D 1 D 2 A = B C ⋈ Φ inv inv inv Φ Φ ... S' update update update ' ' ' D 0 D 1 D 2 = J Q i K = J Q 0 ( Φ ∧ ~ x = ~ y ) | i K 12

  13. Inferring Bisimulation Invariants Automatically infer inductive bisimulation invariants using a guess-and-check approach (Houdini) •Generate a universe of candidate predicates from a set of templates Π ? (?) = Π ? (? o n ?) Π ? (?) = Π ? (?) •Perform fixed-point computation to find strongest (conjunctive) bisimulation invariant over this universe 13

  14. Verification Workflow Generate predicate universe P Φ ≡ ⋀ Generate VCs Use SMT φ solver φ ∈ P Yes Check if Check if it implies inductive No query results are same Yes P \ { ϕ } No 14

  15. Outline Part I: Equivalence verification for parametrized SQL programs (POPL’18) Part II: Synthesizing new version of parametrized SQL program for a given target schema (PLDI’19) Part III: Open problems & challenges 15

  16. Synthesis Problem Enumerate-and- verify approach Program P Synthesizer New program P' Check equivalence Schema S’ using technique P’ is over new schema from Part1 S’ and equivalent to P 16

  17. Challenge Search space is very large! Sketch Sketch generation completion 17

  18. Synthesis Methodology Mapping from source Value attrb’s to target attrb’s Correspondence Source schema Generator Target schema Value Corr. Target prog Program with Sketch Sketch unknown tables Solver and columns Generator Sketch Source prog 18

  19. Motivating Example for Synthesis Inst update addTA(int id, String name, Binary pic) INSERT INTO TA VALUES (id, name, pic); InstId IName IPic update deleteTA(int id) TA DELETE FROM TA WHERE TaId=id; TaId TName TPic query getTAInfo(int id) SELECT TName, TPic FROM TA WHERE TaId=id; Picture Inst’ PicId Pic InstId IName PicId TA’ TaId TName PicId 19

  20. Motivating Example for Synthesis Inst update addTA(int id, String name, Binary pic) INSERT INTO TA VALUES (id, name, pic); InstId IName IPic update deleteTA(int id) TA DELETE FROM TA WHERE TaId=id; TaId TName TPic query getTAInfo(int id) SELECT TName, TPic FROM TA WHERE TaId=id; Picture Inst’ PicId Pic InstId IName PicId TA’ TaId TName PicId 20

  21. Motivating Example for Synthesis Inst update addTA(int id, String name, Binary pic) INSERT INTO TA VALUES (id, name, pic); InstId IName IPic update deleteTA(int id) TA DELETE FROM TA WHERE TaId=id; TaId TName TPic query getTAInfo(int id) SELECT TName, TPic FROM TA WHERE TaId=id; Picture Inst’ PicId Pic InstId IName PicId TA’ TaId TName PicId 21

  22. Motivating Example for Synthesis Inst update addTA(int id, String name, Binary pic) INSERT INTO TA VALUES (id, name, pic); InstId IName IPic update deleteTA(int id) TA DELETE FROM TA WHERE TaId=id; TaId TName TPic query getTAInfo(int id) SELECT TName, TPic FROM TA WHERE TaId=id; Picture Inst’ PicId Pic InstId IName PicId TA’ query getTAInfo(int id) SELECT ?? 1 , ?? 2 FROM ?? 3 WHERE ?? 4 =id; TaId TName PicId ?? 1 =TName ?? 2 =Pic ?? 4 =TaId ?? 3 ∈ { Picture ⋈ TA, Picture ⋈ TA ⋈ Inst } 22

  23. Motivating Example for Synthesis Inst update addTA(int id, String name, Binary pic) INSERT INTO TA VALUES (id, name, pic); InstId IName IPic update deleteTA(int id) TA DELETE FROM TA WHERE TaId=id; TaId TName TPic query getTAInfo(int id) SELECT TName, TPic FROM TA WHERE TaId=id; update addTA(int id, String name, Binary pic) INSERT INTO ?? 1 VALUES (id, name, pic); Picture Inst’ update deleteTA(int id) PicId Pic InstId IName PicId DELETE ?? 2 FROM ?? 3 WHERE TaId = id; TA’ query getTAInfo(int id) SELECT TName, Pic FROM ?? 4 WHERE TaId=id; TaId TName PicId ?? 1, ?? 4 ∈ { Picture ⋈ TA, Picture ⋈ TA ⋈ Inst } ?? 2 ∈ { [Picture], [TA], …, [Picture, TA, Inst] } ?? 3 ∈ { Picture ⋈ TA, Picture ⋈ TA ⋈ Inst } 23

  24. Solving the Sketch Basic idea: Enumerate all programs in search space For each completion, check if it is equivalent to original program 15 holes, 3 instantiations: Scalability? >14 million programs! Use conflict-driven learning * to prune search space * Program Synthesis using Conflict-Driven Learning. Feng, Martins, Bastani, Dillig. PLDI’18 24

  25. Learning from Conflicts Whenever you enumerate an incorrect program, infer a set of other provably-incorrect programs! Prior work* shows how to do this for programming-by- example, but not applicable here Leverage notion of “minimum distinguishing inputs” to learn from failed synthesis attempts! * Program Synthesis using Conflict-Driven Learning. Feng, Martins, Bastani, Dillig. PLDI’18 25

  26. Distinguishing Inputs Recall: Input to DB program is a set of function invocations along with their arguments. update addTA(int id, String name, Binary pic) Input: INSERT INTO TA VALUES (id, name, UID1); INSERT INTO Picture VALUES (UID1, pic); addTA(1, “A”, null); getTAInfo(1); update deleteTA(int id) DELETE Picture FROM Picture JOIN TA ON Picture.PicId = TA.PicId JOIN Inst ON TA.PicId = Inst.PicId WHERE TaId = id; query getTaInfo(int id) Input I is distinguishing SELECT TName, Pic FROM Picture for a pair of programs P , P’ JOIN TA ON Picture.PicId=TA.PicId JOIN Inst ON TA.PicId = Inst.PicID WHERE TaId=id; iff P(I) P’(I) ≠ 26

Recommend


More recommend