Datab Databas ase L e Lear earni ning ng: To Towa ward a a - PowerPoint PPT Presentation

Datab Databas ase L e Lear earni ning ng: To Towa ward a a D Database t that Be Becomes s Sm Smarter Every y Tim ime Presented by: Huanyi Chen

Where does the data come from? § Real world § The entire dataset follows certain underlying distribution Database Learning: Toward a Database that Becomes PAGE 2 Smarter Every Time

Income of a shop # of Day Income (CAD) Income of a shop per day 1 100 Income (CAD) 2 200 1800 3 400 1600 4 800 1400 5 1600 1200 1000 800 600 400 200 0 1 2 3 4 5 Database Learning: Toward a Database that Becomes PAGE 3 Smarter Every Time

Income of a shop # of Day Income (CAD) Income of a shop per day 1 100 Income (CAD) 2 200 1800 3 400 1600 4 800 1400 5 1600 1200 6 ? 1000 800 600 400 200 0 1 2 3 4 5 Database Learning: Toward a Database that Becomes PAGE 4 Smarter Every Time

Income of a shop § !"#$%& = 50 ∗ 2 , (n = Income of a shop per day 1, 2, 3 … ) Income (CAD) 7000 § No database needed if we 6000 can find the underlying distribution 5000 4000 3000 2000 1000 0 1 2 3 4 5 6 7 Database Learning: Toward a Database that Becomes PAGE 5 Smarter Every Time

Which distribution do we care? The exact underlying distribution that generates the entire § dataset and future unseen data? Not possible § An exact underlying distribution that generates the entire § dataset but excludes future unseen data? Benefits nothing. One can always make a model by using every value § of a column, but this model is not able to predict anything. We still need to store future data in order to answer queries. A possible distribution that generates the entire dataset and § future unseen data! Database Learning: Toward a Database that Becomes PAGE 6 Smarter Every Time

Mismatching data § A possible distribution that generates the entire dataset and future unseen data is not able to match every data in the dataset § Not work when the accurate query results needed § Works in Approximate query processing (AQP) Database Learning: Toward a Database that Becomes PAGE 7 Smarter Every Time

Approximate Query Processing (AQP) Trade accuracy for response time § Results are based on samples § Previous query results have no help in future queries § so it comes Database Learning - learning from past query answers! § Database Learning: Toward a Database that Becomes PAGE 8 Smarter Every Time

Database Learning Engine: Verdict Target Workflow § Improve future query answers by using previous query answers from an AQP engine Database Learning: Toward a Database that Becomes PAGE 9 Smarter Every Time

Verdict § A query is decomposed into possibly multiple query snippets § the answer of a snippet is a single scalar value Database Learning: Toward a Database that Becomes PAGE 10 Smarter Every Time

Verdict § A query is decomposed into possibly multiple query snippets § Verdict exploits potential correlations between snippet answers to infer the answer of a new snippet # of Day Income (CAD) 1 100 old avg 2 200 new avg 3 400 4 800 5 1600 old avg and new avg are correlated Database Learning: Toward a Database that Becomes PAGE 11 Smarter Every Time

Inference § !"#$%&'()!*# + %,-$# = /%$0)1()!* Observations Rules Prediction 2*1!3$ = 50 ∗ 2 8 Shop Income 100, 200, 400, 800, 3200, 6400, … 1600 9 8 = 9 8:; + 9 8:< Fibonacci Initial: 1, 1 2, 3, 5, 8, … Verdict Past snippet answers Maximize the Improved answer from AQP conditional and error for new + joint probability snippet AQP answer for the distribution new snippet function (pdf) Database Learning: Toward a Database that Becomes PAGE 12 Smarter Every Time

̅ ̅ Inference: pdf , - ' , … , " *+# = % ,+& ' % ',+& If we have ! " # = % & " ,+& = then the prediction is the value of ̅ % ',+& that maximizes ! - % ',+& | " # = % & , … , " *+# = % ,+& " ,+& = Database Learning: Toward a Database that Becomes PAGE 13 Smarter Every Time

Inference: pdf How to find the pdf? § maximum entropy (ME) principal § § ℎ " = − ∫ " ⃗ ' ( log " ⃗ ' , ⃗ .ℎ/0/ ⃗ , ̅ 3 , … , ' 562 3 3 ', ' = (' 2 ' 562 ) The joint pdf maximizing the above entropy differs § depending on the kinds of given testable information Verdict uses the first and the second order statistics of the random § variables: mean, variances, and covariances. Database Learning: Toward a Database that Becomes PAGE 14 Smarter Every Time

Inference: pdf Database Learning: Toward a Database that Becomes PAGE 15 Smarter Every Time

Inference: model-based answer and error Generally Computing above conditional pdf may be a computationally expensive task Database Learning: Toward a Database that Becomes PAGE 16 Smarter Every Time

Inference: model-based answer and error However, computing the conditional pdf in lemma 1 is not expensive and computable; the result is another normal distribution. $ are given by: The mean ! " and variance # " Database Learning: Toward a Database that Becomes PAGE 17 Smarter Every Time

̈ ̈ ̈ Inference: model-based answer and error Model -based answer § " #$% = ' ( § Model -based error § ) #$% = * ( § § Improved answer and error " #$% , + + " #$% , ̈ ) #$% = ) #$% (if validation succeed) § " #$% , + + ) #$% = " #$% , ) #$% (if validation failed, return AQP answers) § Database Learning: Toward a Database that Becomes PAGE 18 Smarter Every Time

Inference: means, variances, and covariances § mean ( ⃗ " ) § the arithmetic mean of the past query answers for the mean of each random variable, # $ , …, # %&$ , ' # %&$ . § variances, and covariances ( Σ ) § the covariance between two query snippet answers is computable using the covariances between the attribute values involved in computing those answers Database Learning: Toward a Database that Becomes PAGE 19 Smarter Every Time

Inference: means, variances, and covariances # of Day Income (CAD) 1 100 old avg 2 200 new avg 3 400 4 800 5 1600 old avg and new avg are correlated Database Learning: Toward a Database that Becomes PAGE 20 Smarter Every Time

Inference: means, variances, and covariances Inter-tuple Covariances Income Income # of Day (CAD) (CAD) 1 100 100 2 200 200 3 400 400 4 800 800 5 1600 1600 Database Learning: Toward a Database that Becomes PAGE 21 Smarter Every Time

Inference: means, variances, and covariances § Estimate the inter-tuple covariances § analytical covariance functions § squared exponential covariance functions: capable of approximating any continuous target function arbitrarily closely as the number of observations (here, query answers) increases § compute variances, and covariances ( Σ ) efficiently Database Learning: Toward a Database that Becomes PAGE 22 Smarter Every Time

Experiments § Up to 23 × speedup for the same accuracy level § Small memory and computational overhead Database Learning: Toward a Database that Becomes PAGE 23 Smarter Every Time

Summary § An idea: Database Learning § learning from past query answers § An implementation: Verdict § Given mean, variances, and covariances § Apply maximum entropy principal § Find a joint probability distribution function § Improve answer and error based on conditioning on snippet answers § https://verdictdb.org Database Learning: Toward a Database that Becomes PAGE 24 Smarter Every Time

Q & A § Using testable information other than or in addition to mean, variances, covariances? § Are there any other possible inferential techniques? § Can we cut out training phase? Database Learning: Toward a Database that Becomes PAGE 25 Smarter Every Time

Datab Databas ase L e Lear earni ning ng: To Towa ward a a - PowerPoint PPT Presentation

Datab Databas ase L e Lear earni ning ng: To Towa ward a a D Database t that Be Becomes s Sm Smarter Every y Tim ime Presented by: Huanyi Chen Where does the data come from? Real world The entire dataset follows certain

CURRENT WARD POPULATION Ward I 51,118 Ward II 68,274 Ward III 59,166 Ward IV

June 12, 2017 Laura Pitone, Chair, Ward V Andre L. Green, Vice-Chair Ward IV Steven Roix, Ward I

MONGODB A NoSQL , documen t -oriente d databas e DATABASES organized collections of data Databas

[1] Defini=on of allele-specific expression (ASE) Adopted from Unneberg, 2010 One gene can

VERSION 1.4 MAY 16, 2017 Laura Pitone, Chair, Ward V Andre L. Green, Vice-Chair Ward IV Steven

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

We Weaver: A Hig High h Performance, Tr Transa sacti tional Gr Graph Dat Datab abas ase

THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS Mode dern Datab abase ase Manageme ment th

WARD KLEIN WARD KLEIN WARD KLEIN WARD KLEIN ti Chi f E Forward-Looking Statements The

Wor orking ing to towa wards ds rec ecover overy y up upda date te Jul uly y 20

FY 2018 8 EARNI NING NGS S PRESENT ENTATI TION ON PeerStream, Inc. | OTCQB: PEER March

My6 y65+: an innova vative ve retirement plan for modest-ea earni ning ng wo worker kers

2Q 2017 EARNI NING NGS S REVI VIEW July 26, 2017 Business Review Appendix Ford Credit

2Q 2017 7 FORD D CREDIT IT EARNI NING NGS S REVI VIEW July 26, 2017 FORD CREDIT STRATEGY

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

P a g e | 2 City of Rock Falls Mayor William B. Wescott Council Members Ward 1 Ward 2 Ward

LIBPOLY: A LIBRARY FOR REASONING ABOUT POLYNOMIALS Dejan Jovanovi Bruno Dutertre SRI

Stress-Free Revit Project Setup Core Services Applied Matt Dillon Director Software Agenda

TE-MSC 07/04/2016 On behalf to MSC-MDT section and Coil working group Jose Ferradas

Towards Efficient Quantifier Elimination in Mathematica Katherine Cordwell 15824 December 11,

DCAA Update John Gilbert, Regional Audit Manager DCAA Central Region One Agency, One Team, One

Testing of the DHE Modules Dima Levit, Paolo Di Giglio Physik Department E18 - Technische

Exploring the Sensitivity of Choropleths under Attribute Uncertainty 1 2 3 Zhaosong Huang ,

CAD: Algorithmic Real Algebraic Geometry Zak Tonks 1 2 University of Bath z.p.tonks@bath.ac.uk