data management research uw seattle
play

Data Management Research @ UW Seattle uwdb.io http://uwdb.io/ - PowerPoint PPT Presentation

Data Management Research @ UW Seattle uwdb.io http://uwdb.io/ Magdalena Balazinska Research in database systems, theory, and programming languages Alvin Cheung ~15 students + postdocs Dan Suciu Research Areas Big data processing in the


  1. Data Management Research @ UW Seattle uwdb.io

  2. http://uwdb.io/ Magdalena Balazinska Research in database systems, theory, and programming languages Alvin Cheung ~15 students + postdocs Dan Suciu

  3. Research Areas Big data processing in the cloud Walter Cai • Theory: optimal query processing • Systems : Myria, efficient & complex processing at scale, image analytics, DBMS+NN, data summarization Jenny Ortiz • Usability : Cloud SLAs, performance tuning, viz analytics New Types of DBMSs Leilani Battle • Open World DBMS • Image & video DBMS Brandon Haynes • LightDB: VR/AR/MR DBMS Scientific data management • Collaborations with scientists & deep involvement with eScience Institute Databases and programming languages • DBMS & app co-optimization Probabilistic Databases Laurel Orr Causality

  4. Towards Application-Specific Databases uwplse.org uwdb.io

  5. DB optimizer executor

  6. Ma Main Column Co SparkSQ Sp SQL St Storm Memory Me Sc Scidb St Stores DB DB Scientific Analytics OLTP Streams OLAP Workloads Specialization Can we generate customized data stores from application code?

  7. # stars Application # issues 22k Discourse (forum) 85 Cong Yan Lobster (forum) 45 1k Application Inefficiencies 49k Gitlab (collaboration) 23 Redmine (collaboration) 59 13k • Code translated to inefficient queries 17k Spree (E-commerce) 20 • Misplaced computation • Redundant data loads ROR Ecommerce 11 1.7k • Issuing queries with known results 697 Fulcrum (task mgmt) 2 • Loading unused data 3.5k Tracks (task mgmt) 30 • Missing indexes 18k Diaspora (social network) 57 Onebody (social network) 76 1.2k 78% of fixes took fewer than 5 lines 8k Openstreetmap (map) 4 Max app speedup: 39x 1.1k Fallingfruit (map) 16 Total 428

  8. Image Blur Rotate Hash Join Partitioning

  9. SEARCH Target code Proof of translation

  10. PROGRAM SYNTHESIS Target code Proof of translation

  11. Verified Lifting: Casper Maaz Ahmad 1. Define semantics of map and reduce 3. Retarget spec to Hadoop SumXY = reduce(map(points, f m ), codegen f r ) f m (x,y) = x * y f r (v1,v2) = v1 + v2 void map(Object key, Point [] value) { for(Point p : points) 2. Synthesizer infers emit("sumxy", SumXY); } spec from source void reduce(Text key, int [] vs) { int SumXY = 0; // sequential implementation for (Integer val : vs) SumXY = SumXY + val; void regress(Point [] points) emit(key, SumXY); } { int SumXY = 0; for(Point p : points){ SumXY += p.x * p.y; Lifted code can be } return SumXY; optimized by Hadoop } max 32x speedup

  12. SELECT ... SELECT ... FROM ... FROM ... WHERE ... WHERE ... Q2 Q1 ∃ D . Q1(D) ≠ Q2(D) ? ∀ D . Q1(D) = Q2(D) Query Optimizers Autograders Application Caches

  13. Deciding the equality of two arbitrary relational queries is undecidable. Boris Trakhtenbrot Full decision procedure exists for conjunctive queries Simple heuristics can already prove many common cases

  14. Rosette Coq Proof Assistant Constraint Solver Check validity of proofs Finding counterexamples Q1 == Q2 Q1 ≠ Q2 Cosette ShumoChu Daniel Li Q1 =?= Q2 Nick Anderson

  15. Repeat HTML Images Data CNN Output RNN Conv Conv ... Model Regex Filter Join Generate Train a caption- Training Labels generating model Many regex and Many regex and Likewise for join algorithms join algorithms convolution to choose from! to choose from!

  16. Cuttlefish: A Lightweight Tomer Kaftan Primitive for Online Tuning def loopConvolve(image, filters): ... def fftConvolve(image, filters): ... def mmConvolve(image, filters): ... for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start output result, elapsedTime

  17. Cuttlefish: A Lightweight Tomer Kaftan Primitive for Online Tuning def loopConvolve(image, filters): ... def fftConvolve(image, filters): ... def mmConvolve(image, filters): ... tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) for image, filters in convolutions: start = now() result = convolve(image, filters) elapsedTime = now() - start output result, elapsedTime

  18. Cuttlefish: A Lightweight Tomer Kaftan Primitive for Online Tuning def loopConvolve(image, filters): ... def fftConvolve(image, filters): ... def mmConvolve(image, filters): ... tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) for image, filters in convolutions: convolve, token = tuner.choose() start = now() result = convolve(image, filters) elapsedTime = now() - start output result, elapsedTime

  19. Cuttlefish: A Lightweight Tomer Kaftan Primitive for Online Tuning def loopConvolve(image, filters): ... def fftConvolve(image, filters): ... def mmConvolve(image, filters): ... tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) for image, filters in convolutions: convolve, token = tuner.choose() start = now() result = convolve(image, filters) elapsedTime = now() - start tuner.observe(token, elapsedTime) output result, elapsedTime

  20. Note: Y-axis is Log-scale

  21. Scythe Chenglong Wang Input tables Stored using id date specialized 1 12/25 data structures 2 11/21 4 12/24 … … Search for Instantiate abstract queries abstract queries Output tables id date max Prune Rank results 1 12/25 30 query based on 2 11/21 10 4 12/24 20 skeletons simplicity … … …

  22. Scythe Chenglong Wang Supported features SPJ • Grouping • Aggregation • Subqueries • Outer join • Exists • Union •

  23. Titles summarize post 80% of the time Stackoverflow dataset Filtered away titles Posts tagged with #sql, #oracle, #database (430k) • My query doesn't work! • Posts containing an accepted answer in SQL Why is my query slow? • • I hate SQL! • Results: 41k (title, query) pairs •

  24. Model Naturalness Informativeness Code-NN (Ours) 2.6 1.55 Nearest neighbor 1.9 1.55 Srini Iyer MOSES 1.76 1.36 ATTEN 2.82 0.93

  25. UWDB Collaborators UW Industry • Bill Howe (iSchool) • Adobe • Andrew Connolly (Astronomy) • Huawei • Aaron Lee (Ophtalmology) • Intel • Ariel Rokem (eScience) • Microsoft • Emilio Zagheni (Sociology) • Teradata • Prog Lang & SW Eng group

Recommend


More recommend