database learning toward a database that becomes smarter
play

Database Learning: Toward a Database that Becomes Smarter Over Time - PowerPoint PPT Presentation

Ahmad Shahab Tajik Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Database Learning: Toward a Database that Becomes Smarter Over Time Yongjoo Park Our Goal: reuse the work Users Database query Answer to query After


  1. Ahmad Shahab Tajik Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Database Learning: Toward a Database that Becomes Smarter Over Time Yongjoo Park

  2. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  3. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  4. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  5. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  6. Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases

  7. Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases Our Goal: reuse the work

  8. Query Synopsis Q Q Q Database A (2% err) A (10% err, 1 sec) A (10% err) Learning Users AQP engine 2 Our high-level approach

  9. Query Synopsis Q Q Database A (2% err) A (10% err, 1 sec) A (10% err) Learning Users AQP engine 2 Our high-level approach Q

  10. Query Synopsis Q Q Q Database A (2% err) A (10% err) Learning Users AQP engine 2 Our high-level approach A (10% err, 1 sec)

  11. Q Q Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Database Learning

  12. Q Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Q Database Learning

  13. Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database Learning

  14. Q A (2% err) A (10% err, 1 sec) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database A (10% err) Learning

  15. Q A (10% err, 1 sec) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning

  16. 3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12

  17. 3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12

  18. 3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12

  19. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  20. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  21. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  22. How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·

  23. 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · · How to leverage those queries for future queries?

  24. more queries Q 2 A 2 Q 1 A 1 Q 2 Q 1 and answers 5 . . . . . . . . . . . . . . . Our idea · · · ?

  25. more queries Q 2 A 2 Q 1 A 1 Q 2 and answers 5 . . . . . . . . . . . . . . . Our idea · · · � Q 1 ?

  26. more queries Q 2 A 2 Q 2 Q 1 and answers 5 . . . . . . . . . . . . . . . Our idea · · · � ( Q 1 , A 1 ) ?

  27. more queries Q 2 A 2 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · ( Q 1 , A 1 )

  28. more queries Q 2 A 2 Q 1 A 1 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · � Q 2

  29. more queries Q 1 A 1 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · � ( Q 2 , A 2 )

  30. more queries Q 1 A 1 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · ( Q 2 , A 2 )

  31. Q 2 A 2 Q 1 A 1 Q 2 Q 1 ? 5 . . . . . . . . . . . . . . . Our idea · · · more queries and answers

  32. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  33. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  34. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  35. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  36. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  37. 6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example

  38. latency 2. No Assumptions about Data 3. Lightweight BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries

  39. latency 3. Lightweight BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries 2. No Assumptions about Data

  40. BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries latency 2. No Assumptions about Data 3. Lightweight

  41. Our Approach

  42. Problem: Find the most likely answer to the new query ( q n Our result: our answer’s error bound original answer’s error bound Given past queries ( q 1 q n ), a new query ( q n 1 ), and their approximate answers, 1 ) and its estimated error. Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement

  43. Our result: our answer’s error bound original answer’s error bound Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement Problem: Given past queries ( q 1 , . . . , q n ), a new query ( q n + 1 ), and their approximate answers, Find the most likely answer to the new query ( q n + 1 ) and its estimated error.

  44. Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement Problem: Given past queries ( q 1 , . . . , q n ), a new query ( q n + 1 ), and their approximate answers, Find the most likely answer to the new query ( q n + 1 ) and its estimated error. Our result: our answer’s error bound ≤ original answer’s error bound

  45. Random variables (our uncertainty on answers) 1 select sum(Y2) from t where 5 < X1 < 8; 2 3 Probability distribution Estimated answer correlation between answers 2 3 1 3 Pr Two aggregations involve common values 9 2 1 Pr 3 2 1 Overview of our technique select count(Y2) select avg(Y2) from t from t where 1 < X1 < 2; where 6 < X1 < 8;

  46. Random variables (our uncertainty on answers) 1 2 3 Probability distribution Estimated answer correlation between answers 1 Two aggregations involve common values 2 1 3 Pr 3 2 9 Pr 3 2 1 Overview of our technique select count(Y2) select avg(Y2) select sum(Y2) from t from t from t where 1 < X1 < 2; where 6 < X1 < 8; where 5 < X1 < 8;

Recommend


More recommend