Towards a benefit-based optimizer for Interactive Data Analysis - PDF document

3/27/2019 Towards a benefit-based optimizer for Interactive Data Analysis (vision paper) Patrick Marcel , Nicolas Labroche, Panos Vassiliadis 1 Out utli line  Challenge  Vision  How to  Perspective 2 1

3/27/2019 Ten en yea ear challenge…  Ten years ago  SQL, MDX queries  Tuples as answers  TPC-H, SSB  Primary metric: QphH@Size  CBO Optimizer  Now  SQL, MDX queries  Tuples as answers  TPC-H, SSB, TPC-DS  Primary metric: QphH@Size  CBO Optimizer 3 Ten en yea ears fr from om no now (th (the vis visio ion)  Query: an intention in an high level declarative language  Analyze this, explain that …  Answer: a data story  Set of dashboards with highlights & narratives  Primary metric: the number of insights  Human-digestible pieces of interesting information about the data  Optimizer: concerned with sequences of analytical steps  Select the plan leading to the best insights 4 2

3/27/2019 In Intentio ions  Intentions are non prescriptive  Example  Verify that distribution of sales for mfgr#5 in Argentina from 2011 to 2016 holds in general,  build a clustering model for it,  compare with sibling countries,  explain the highest country-wise difference  The optimizer decides  the roll-up(s) for the verification,  the algorithm and number of clusters,  the way to explain the difference,  etc.  Each of these degrees of freedom gives rise to a new plan  yielding an answer different from those of the other plans 5 Ins Insights  Insights are diverse  They vary in complexity, value, they are domain-dependent, etc.  Insights should be tested for validity  E.g., to avoid the Simpson’s paradox [Zhao&al, SIGMOD 2017]  Insights are among us  Subjective insights  Unexpected values in cubes [Sarawagi, VLDB 2000]  Interesting patterns in data [Geng&Hamilton, ACM CompSur. 2006]  Surprising patterns in data [De Bie, IDA 2013]  Objective insights  Statistically significant relationships in datasets [Chirigati&al, SIGMOD 2016]  Hidden cause [Sarawagi, VLDB 1999] 6 3

3/27/2019 Cos ost mod odel  Traditional optimizers are concerned with resource consumption  Still needed for “local” optimizations  IDA optimizer is concerned with what the user gains from the exploration  It’s more a “benefit” model  Benefit objective function defined (and learned?) from  the number of insights,  the time it takes to obtain them,  some properties of insights or sets of insights:  their statistical significance  their relevance for the user  their understandability, diversity, etc.  the appropriateness of the insight to the current intention, etc.  Traditional optimization schemes still needed  Statistics collection, plan recycling, query re-optimization, etc. 7 How to o gen enerate act actio ions fr from om intentio ions?  Generating queries over data sources  Partly specified by the intention, generated from incomplete specifications [Simitsis&al, VLDBJ 2008], [Vassiliadis&Marcel, DOLAP 2018]  Generating ML actions over retrieved sources  Meta-learning [Lemke&al, AIR 2015]  How to predict a set of algorithms suitable for a specific problem under study, based on the relationship between data characteristics and algorithm performance  Auto-learning [Feurer&al, NIPS 2015]  How to choose and parametrize a ML algorithm for a given dataset, at a given cost 8 4

3/27/2019 How to o gen enerate the the act actual pla plan?  Generate plan nodes (data sources and actions) from the user intention and current dashboards  Project nodes in a feature space defined by  Data source characteristics  As done in meta-learning systems: statistical, information-theoretic and landmarking-based meta-features  Actions (queries, ML algorithms) characteristics  Complexity, parameters, etc.  Produce bundles of data sources + actions  Using e.g., fuzzy clustering with constraints 1  [Alsayasneh&al, TKDE 2018] 0,8  Prune irrelevant bundles 0,6  Using e.g., hard constraints on time, number of insights 0,4  Score remaining bundles with the objective function 0,2  Pick the best one as the plan 0 9 Per erspectiv ives  Categorization of insights  Objective functions  Mechanisms for statistic collection, user feedback  Feature space  Pruning strategy  … 10 5

3/27/2019 Th Thank you ou! Que uestio ions? The vision:  … query via intentions …  … to produce a data story…  … optimized with respect to the best insights! http://www.cs.uoi.gr/~pvassil/publications/2018_DOLAP/ 11 References [Alsayasneh&al, TKDE 2018 ] M.Alsayasneh,S.Amer-Yahia, Ê .Gaussier,V.Leroy,J.Pilourdault,R.M.Bor- romeo, M. Toyama, and J. Renders. Personalized and diverse task  composition in crowdsourcing. IEEE Trans. Knowl. Data Eng., 30(1):128 – 141, 2018. [Chirigati&al, SIGMOD 2016] F. Chirigati, H. Doraiswamy, T. Damoulas, and J. Freire. Data polygamy: The many-many relationships among urban spatio-temporal data sets. In  SIGMOD, pages 1011 – 1025. ACM, 2016. [De Bie, IDA 2013] T.D.Bie. Subjective interestingness in exploratory data mining.In IDA, pages 19 – 31, 2013.  [Eichmann&al, IEEE DEB 2016] P. Eichmann, E. Zgraggen, Z. Zhao, C. Binnig, and T. Kraska. Towards a benchmark for interactive data exploration. IEEE Data Eng. Bull.,  39(4):50 – 61, 2016. [Feurer&al, NIPS 2015] M.Feurer,A.Klein,K.Eggensperger,J.T.Springenberg,M.Blum,andF.Hutter. Efficient and robust automated machine learning. In NIPS, pages 2962 – 2970,  2015. [Geng&Hamilton, ACM Comp. Sur. 2006] L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. ACM Comput. Surv., 38(3):9, 2006.  [Lemke&al, AIR 2015] C. Lemke, M. Budka, and B. Gabrys. Metalearning: a survey of trends and technologies. Artif. Intell. Rev., 44(1):117 – 130, 2015.  [Milo&Somet, KDD 2018] T. Milo and A. Somech. Next-step suggestions for modern interactive data analysis platforms. In KDD, pages 576 – 585, 2018.  [Sarawagi, VLDB 2000] S. Sarawagi. User-adaptive exploration of multidimensional data. In Proceed- ings of VLDB, pages 307 – 316, 2000.  [Sarawagi, VLDB 1999] S. Sarawagi. Explaining differences in multidimensional aggregates. In Pro- ceedings of VLDB, pages 42 – 53, 1999.  [Simitsis&al, VLDBJ 2008] A. Simitsis, G. Koutrika, and Y. E. Ioannidis. Prê cis: from unstructured key- words as queries to structured databases as answers. VLDB J., 17(1):117 –  149, 2008. [Vassiliadis&Marcel, DOLAP 2018] P. Vassiliadis and P. Marcel. The road to highlights is paved with good intentions: Envisioning a paradigm shift in OLAP modeling. In DOLAP,  2018. [Zhao&al, SIGMOD 2017] Z.Zhao,L.D.Stefani,E.Zgraggen,C.Binnig,E.Upfal,andT.Kraska.Controlling false discoveries during interactive data exploration. In SIGMOD, pages  527 – 540, 2017. 12 6

Towards a benefit-based optimizer for Interactive Data Analysis - PDF document

3/27/2019 Towards a benefit-based optimizer for Interactive Data Analysis (vision paper) Patrick Marcel , Nicolas Labroche, Panos Vassiliadis 1 Out utli line Challenge Vision How to Perspective 2 1 3/27/2019 Ten en yea

The MySQL Query Optimizer Explained Through Optimizer Trace ystein Grvlen Senior Staff

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the "brain" of

Understanding and control of MySQL Query Optimizer traditional and novel tools and techniques

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

The Volcano Optimizer Generator: Extensibility and Efficient Search Presentation: Mirna Limic

Cost Benefit Analysis ECN 240 CMD ECN 240 Cost Benefit Analysis Intro Cost Benefit Analysis

Interactive Proofs Lecture 18 AM 1 Interactive Proofs 2 Interactive Proofs IP[k] 2

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Towards a Learning Optimizer for Shared Clouds* Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren

Data Formats Omayma Said Data Scientist DataCamp Interactive Data Visualization with rbokeh

Introducing the Bokeh Server Interactive Data Visualization with Bokeh Interactive Data

Housing Benefit Changes The Overall Benefit Cap Current Overall Benefit Cap (OBC) Introduced

HOW TO DO MONTE CARLO TOLERANCING IN TRACEPROS 3D INTERACTIVE OPTIMIZER Presented by : Lambda

Interactive Data Visualization with Bokeh Interactive Data Visualization with Bokeh What is

Zero-Knowledge Proofs Lecture 15 Interactive Proofs Interactive Proofs Interactive Proofs

Limited Benefit Health Insurance Plans Limited Benefit Health Insurance Plans For Individuals and

Web Security, Summer Term 2012 HyperText Transfer Protocol - HTTP Dr. E. Benoist Sommer Semester

Safe Semi-Supervised Learning Yu-Feng Li () National Key Laboratory for Novel Software

Image Cosegmentation Jean Ponce http://www.di.ens.fr/willow/ Willow team, DI/ENS, UMR 8548

How Can We Work Together to Close the Achievement Gap? h c L i t i g r a t i o n a

Toward the systematic generation of hypothetical atomic structures: Neural networks and geometric

What Can Hawk-Eye Data Reveal about Serve Performance in Tennis? Franois Rioult 1 Sami Mecheri 2

Big Body Play Why Something So Scary is So Good for Children Carlson, Frances Brought to you

Not Big Shots: Leadership through Message, Not Messenger Megan Oakleaf meganoakleaf.info |

Towards a benefit-based optimizer for Interactive Data Analysis - PDF document

3/27/2019 Towards a benefit-based optimizer for Interactive Data Analysis (vision paper) Patrick Marcel , Nicolas Labroche, Panos Vassiliadis 1 Out utli line Challenge Vision How to Perspective 2 1 3/27/2019 Ten en yea

The MySQL Query Optimizer Explained Through Optimizer Trace ystein Grvlen Senior Staff

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the &quot;brain&quot; of

Understanding and control of MySQL Query Optimizer traditional and novel tools and techniques

The Volcano Optimizer Generator Generator: Object-oriented and scientific Extensibility and

The Volcano Optimizer Generator: Extensibility and Efficient Search Presentation: Mirna Limic

Cost Benefit Analysis ECN 240 CMD ECN 240 Cost Benefit Analysis Intro Cost Benefit Analysis

Interactive Proofs Lecture 18 AM 1 Interactive Proofs 2 Interactive Proofs IP[k] 2

Module 13: Optimizing Query Performance Overview Introduction to the Query Optimizer

Towards a Learning Optimizer for Shared Clouds* Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren

Data Formats Omayma Said Data Scientist DataCamp Interactive Data Visualization with rbokeh

Introducing the Bokeh Server Interactive Data Visualization with Bokeh Interactive Data

Housing Benefit Changes The Overall Benefit Cap Current Overall Benefit Cap (OBC) Introduced

HOW TO DO MONTE CARLO TOLERANCING IN TRACEPROS 3D INTERACTIVE OPTIMIZER Presented by : Lambda

Interactive Data Visualization with Bokeh Interactive Data Visualization with Bokeh What is

Zero-Knowledge Proofs Lecture 15 Interactive Proofs Interactive Proofs Interactive Proofs

Limited Benefit Health Insurance Plans Limited Benefit Health Insurance Plans For Individuals and

Web Security, Summer Term 2012 HyperText Transfer Protocol - HTTP Dr. E. Benoist Sommer Semester

Safe Semi-Supervised Learning Yu-Feng Li () National Key Laboratory for Novel Software

Image Cosegmentation Jean Ponce http://www.di.ens.fr/willow/ Willow team, DI/ENS, UMR 8548

How Can We Work Together to Close the Achievement Gap? h c L i t i g r a t i o n a

Toward the systematic generation of hypothetical atomic structures: Neural networks and geometric

What Can Hawk-Eye Data Reveal about Serve Performance in Tennis? Franois Rioult 1 Sami Mecheri 2

Big Body Play Why Something So Scary is So Good for Children Carlson, Frances Brought to you

Not Big Shots: Leadership through Message, Not Messenger Megan Oakleaf meganoakleaf.info |

Explaining the Postgres Query Optimizer B RUCE M OMJIAN The optimizer is the "brain" of