Ahmad Shahab Tajik Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Database Learning: Toward a Database that Becomes Smarter Over Time Yongjoo Park
Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases
Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases
Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases
Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases
Our Goal: reuse the work Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases
Users Database query Answer to query After answering queries, THE WORK is GONE. 1 Today’s databases Our Goal: reuse the work
Query Synopsis Q Q Q Database A (2% err) A (10% err, 1 sec) A (10% err) Learning Users AQP engine 2 Our high-level approach
Query Synopsis Q Q Database A (2% err) A (10% err, 1 sec) A (10% err) Learning Users AQP engine 2 Our high-level approach Q
Query Synopsis Q Q Q Database A (2% err) A (10% err) Learning Users AQP engine 2 Our high-level approach A (10% err, 1 sec)
Q Q Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Database Learning
Q Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Q Database Learning
Q A (2% err) A (10% err, 1 sec) A (10% err) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database Learning
Q A (2% err) A (10% err, 1 sec) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database A (10% err) Learning
Q A (10% err, 1 sec) Users AQP engine 2 Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning
3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12
3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12
3 Users Database learning AQP engine Error(%) Time (sec) 10 8 6 4 2 0 8 7 6 5 4 3 2 1 engine AQP Our high-level approach Query Synopsis Q Q Database A (2% err) A (10% err) ˆ Learning 9 10 11 12
How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·
How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·
How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·
How to leverage those queries for future queries? 4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · ·
4 . Queries use the data in different columns/rows. . . . . . . . . . . . . . . Technical challenges · · · How to leverage those queries for future queries?
more queries Q 2 A 2 Q 1 A 1 Q 2 Q 1 and answers 5 . . . . . . . . . . . . . . . Our idea · · · ?
more queries Q 2 A 2 Q 1 A 1 Q 2 and answers 5 . . . . . . . . . . . . . . . Our idea · · · � Q 1 ?
more queries Q 2 A 2 Q 2 Q 1 and answers 5 . . . . . . . . . . . . . . . Our idea · · · � ( Q 1 , A 1 ) ?
more queries Q 2 A 2 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · ( Q 1 , A 1 )
more queries Q 2 A 2 Q 1 A 1 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · � Q 2
more queries Q 1 A 1 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · � ( Q 2 , A 2 )
more queries Q 1 A 1 Q 2 Q 1 and answers ? 5 . . . . . . . . . . . . . . . Our idea · · · ( Q 2 , A 2 )
Q 2 A 2 Q 1 A 1 Q 2 Q 1 ? 5 . . . . . . . . . . . . . . . Our idea · · · more queries and answers
6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example
6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example
6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example
6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example
6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example
6 40 30M 40M Week Number SUM(count) 1 20 60 80 100 80 100 20M 30M 40M Week Number SUM(count) 20M 60 1 40M 20 40 60 80 100 20M 30M Week Number 40 SUM(count) True data Ranges observed by past queries Model (with 95% confidence interval) 1 20 Concrete example
latency 2. No Assumptions about Data 3. Lightweight BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries
latency 3. Lightweight BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries 2. No Assumptions about Data
BlinkDB DBL 7 Design goals select X3, avg(Y1) select sum(Y2) from t where 5 < X1 < 8; from t where X2 between Apr and May group by X3; 1. Support a wide class of SQL queries latency 2. No Assumptions about Data 3. Lightweight
Our Approach
Problem: Find the most likely answer to the new query ( q n Our result: our answer’s error bound original answer’s error bound Given past queries ( q 1 q n ), a new query ( q n 1 ), and their approximate answers, 1 ) and its estimated error. Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement
Our result: our answer’s error bound original answer’s error bound Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement Problem: Given past queries ( q 1 , . . . , q n ), a new query ( q n + 1 ), and their approximate answers, Find the most likely answer to the new query ( q n + 1 ) and its estimated error.
Under a certain model assumption , (in practice, much more accurate) if the error bounds provide the same probabilistic guarantees. 8 Problem statement Problem: Given past queries ( q 1 , . . . , q n ), a new query ( q n + 1 ), and their approximate answers, Find the most likely answer to the new query ( q n + 1 ) and its estimated error. Our result: our answer’s error bound ≤ original answer’s error bound
Random variables (our uncertainty on answers) 1 select sum(Y2) from t where 5 < X1 < 8; 2 3 Probability distribution Estimated answer correlation between answers 2 3 1 3 Pr Two aggregations involve common values 9 2 1 Pr 3 2 1 Overview of our technique select count(Y2) select avg(Y2) from t from t where 1 < X1 < 2; where 6 < X1 < 8;
Random variables (our uncertainty on answers) 1 2 3 Probability distribution Estimated answer correlation between answers 1 Two aggregations involve common values 2 1 3 Pr 3 2 9 Pr 3 2 1 Overview of our technique select count(Y2) select avg(Y2) select sum(Y2) from t from t from t where 1 < X1 < 2; where 6 < X1 < 8; where 5 < X1 < 8;
Recommend
More recommend