DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

CREATE TABLE A ( id INT PRIMARY KEY , MORE EXAMPLES val INT NOT NULL ); • Ignoring Projections SELECT * FROM A; • Merging Predicates SELECT * FROM A WHERE val BETWEEN 1 AND 100 OR val BETWEEN 50 AND 150; Source: Lukas Eder GT 8803 // Fall 2019 45

CREATE TABLE A ( id INT PRIMARY KEY , MORE EXAMPLES val INT NOT NULL ); • Ignoring Projections SELECT * FROM A; • Merging Predicates SELECT * FROM A WHERE val BETWEEN 1 AND 150; Source: Lukas Eder GT 8803 // Fall 2019 46

RELATIONAL ALGEBRA EQUIVALENCES • Joins: – Commutative, associative R ⋈ S = S ⋈ R (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T) – How many different orderings are there for an n - way join? GT 8803 // Fall 2019 47

RELATIONAL ALGEBRA EQUIVALENCES • How many different orderings are there for an n -way join? • Catalan number ≈4 n – Exhaustive enumeration will be too slow. • We’ll see in a second how an optimizer limits the search space. GT 8803 // Fall 2019 48

PLAN COST ESTIMATION 49 GT 8803 // Fall 2018

COST ESTIMATION • How long will a query take? – CPU: Small cost; tough to estimate – Disk: # of block transfers – Memory: Amount of DRAM used • How many tuples will be read/written? • What statistics do we need to keep? GT 8803 // Fall 2019 50

STATISTICS • The DBMS stores internal statistics about tables, attributes, and indexes in its internal catalog. • Different systems update them at different times. • Manual invocations: – Postgres/SQLite: ANALYZE – SQL Server: UPDATE STATISTICS GT 8803 // Fall 2019 51

STATISTICS • For each relation R , the DBMS maintains the following information: – N R : Number of tuples in R . – V(A,R) : Number of distinct values for attribute A . GT 8803 // Fall 2019 52

DERIVABLE STATISTICS GT 8803 // Fall 2019 53

DERIVABLE STATISTICS • The selection cardinality SC(A,R) is the average number of records with a value for an attribute A given N R / V(A,R) • Note that this assumes data uniformity. – 10,000 students, 10 colleges – how many students in SCS? GT 8803 // Fall 2019 54

SELECTION STATISTICS GT 8803 // Fall 2019 55

SELECTION STATISTICS SELECT * FROM people WHERE id = 123 GT 8803 // Fall 2019 56

SELECTION STATISTICS • Equality predicates on unique keys are easy to estimate. SELECT * FROM people WHERE id = 123 • What about more complex predicates? What is their selectivity? SELECT * FROM people WHERE val > 1000 SELECT * FROM people WHERE age = 30 AND status = 'Lit' GT 8803 // Fall 2019 57

COMPLEX PREDICATES • The selectivity ( sel ) of a predicate P is the fraction of tuples that qualify. • Formula depends on type of predicate: – Equality – Range – Negation – Conjunction – Disjunction GT 8803 // Fall 2019 58

COMPLEX PREDICATES • The selectivity ( sel ) of a predicate P is the fraction of tuples that qualify. • Formula depends on type of predicate: – Equality – Range – Negation – Conjunction – Disjunction GT 8803 // Fall 2019 59

SELECTIONS – COMPLEX PREDICATES SELECT * FROM people WHERE age = 2 GT 8803 // Fall 2019 60

SELECTIONS – COMPLEX PREDICATES • Assume that V(age,people) has five distinct values (0–4) and N R = 5 SELECT * FROM people WHERE age = 2 • Equality Predicate: A=constant – sel(A=constant) = SC(P) / N R – Example: sel(age=2) = GT 8803 // Fall 2019 61

SELECTIONS – COMPLEX PREDICATES • Assume that V(age,people) has five distinct values (0–4) and N R = 5 SELECT * FROM people WHERE age = 2 • Equality Predicate: A=constant – sel(A=constant) = SC(P) / N R – Example: sel(age=2) = count 0 1 2 3 4 age GT 8803 // Fall 2019 62

SELECTIONS – COMPLEX PREDICATES • Assume that V(age,people) has five distinct values (0–4) and N R = 5 SELECT * FROM people WHERE age = 2 • Equality Predicate: A=constant – sel(A=constant) = SC(P) / N R – Example: sel(age=2) = count V(age,people)=5 0 1 2 3 4 age GT 8803 // Fall 2019 63

SELECTIONS – COMPLEX PREDICATES • Assume that V(age,people) has five distinct values (0–4) and N R = 5 SELECT * FROM people WHERE age = 2 • Equality Predicate: A=constant – sel(A=constant) = SC(P) / N R – Example: sel(age=2) = SC(age=2)=1 count V(age,people)=5 0 1 2 3 4 age GT 8803 // Fall 2019 64

SELECTIONS – COMPLEX PREDICATES • Assume that V(age,people) has five distinct values (0–4) and N R = 5 SELECT * FROM people WHERE age = 2 • Equality Predicate: A=constant – sel(A=constant) = SC(P) / N R – Example: sel(age=2) = 1/5 SC(age=2)=1 count V(age,people)=5 0 1 2 3 4 age GT 8803 // Fall 2019 65

SELECTIONS – COMPLEX PREDICATES • Range Query: – sel(A>=a) = (A max –a) / (A max –A min ) SELECT * FROM people – Example: sel(age >= 2) WHERE age >= 2 count 0 1 2 3 4 age GT 8803 // Fall 2019 66

SELECTIONS – COMPLEX PREDICATES • Range Query: – sel(A>=a) = (A max –a) / (A max –A min ) SELECT * FROM people – Example: sel(age >= 2) WHERE age >= 2 count 0 1 2 3 4 age GT 8803 // Fall 2019 67

SELECTIONS – COMPLEX PREDICATES • Range Query: – sel(A>=a) = (A max –a) / (A max –A min ) SELECT * FROM people – Example: sel(age >= 2) WHERE age >= 2 age min = 0 age max = 4 count 0 1 2 3 4 age GT 8803 // Fall 2019 68

SELECTIONS – COMPLEX PREDICATES • Range Query: – sel(A>=a) = (A max –a) / (A max –A min ) SELECT * FROM people – Example: sel(age >= 2) WHERE age >= 2 = (4–2) / (4–0) = 1/2 age min = 0 age max = 4 count 0 1 2 3 4 age GT 8803 // Fall 2019 69

SELECTIONS – COMPLEX PREDICATES SELECT * FROM people WHERE age != 2 count 0 1 2 3 4 age GT 8803 // Fall 2019 70

SELECTIONS – COMPLEX PREDICATES SELECT * FROM people WHERE age != 2 SC(age=2)=1 count 0 1 2 3 4 age GT 8803 // Fall 2019 71

SELECTIONS – COMPLEX PREDICATES SELECT * FROM people WHERE age != 2 SC(age!=2)=2 SC(age!=2)=2 count 0 1 2 3 4 age GT 8803 // Fall 2019 72

SELECTIONS – COMPLEX PREDICATES SELECT * FROM people = 1 – (1/5) = 4/5 WHERE age != 2 SC(age!=2)=2 SC(age!=2)=2 count 0 1 2 3 4 age GT 8803 // Fall 2019 73

SELECTIONS – COMPLEX PREDICATES • Negation Query: – sel(not P) = 1 – sel(P) SELECT * FROM people – Example: sel(age != 2) = 1 – (1/5) = 4/5 WHERE age != 2 • Observation: Selectivity ≈ Probability SC(age!=2)=2 SC(age!=2)=2 count 0 1 2 3 4 age GT 8803 // Fall 2019 74

SELECTIONS – COMPLEX PREDICATES • Conjunction: – sel(P1 ⋀ P2) = sel(P1) · sel(P2) SELECT * FROM people – sel(age=2 ⋀ name LIKE 'A%') WHERE age = 2 AND name LIKE 'A%' • This assumes that the predicates are independent. P 1 P 2 GT 8803 // Fall 2019 75

SELECTIONS – COMPLEX PREDICATES • Disjunction: – sel(P1 ⋁ P2) SELECT * FROM people = sel(P1)+ sel(P2)–sel(P1 ⋁ P2) WHERE age = 2 = sel(P1)+ sel(P2)–sel(P1)· sel(P2) OR name LIKE 'A%' – sel(age=2 OR name LIKE 'A%') • This again assumes that the P 1 P 2 selectivities are independent. GT 8803 // Fall 2019 78

SELECTIONS – COMPLEX PREDICATES • Disjunction: – sel(P1 ⋁ P2) SELECT * FROM people = sel(P1)+ sel(P2)–sel(P1 ⋁ P2) WHERE age = 2 = sel(P1)+ sel(P2)–sel(P1)· sel(P2) OR name LIKE 'A%' – sel(age=2 OR name LIKE 'A%') • This again assumes that the P 1 P 2 selectivities are independent. GT 8803 // Fall 2019 79

RESULT SIZE ESTIMATION FOR JOINS • Given a join of R and S , what is the range of possible result sizes in # of tuples? • In other words, for a given tuple of R , how many tuples of S will it match? GT 8803 // Fall 2019 80

RESULT SIZE ESTIMATION FOR JOINS • General case: R cols ⋂ S cols ={A} where A is not a key for either table. – Match each R -tuple with S -tuples: estSize ≈ N R · N S / V(A,S) – Symmetrically, for S : estSize ≈ N R · N S / V(A,R) • Overall: – estSize ≈ N R · N S / max({V(A,S), V(A,R)}) GT 8803 // Fall 2019 81

COST ESTIMATIONS • Our formulas are nice but we assume that data values are uniformly distributed. Uniform Approximation 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GT 8803 // Fall 2019 82

COST ESTIMATIONS • Our formulas are nice but we assume that data values are uniformly distributed. Uniform Approximation 10 # of occurrences 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Distinct values of attribute GT 8803 // Fall 2019 83

COST ESTIMATIONS • Our formulas are nice but we assume that data values are uniformly distributed. Non-Uniform Approximation 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GT 8803 // Fall 2019 84

COST ESTIMATIONS • Our formulas are nice but we assume that data values are uniformly distributed. Non-Uniform Approximation 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Bucket #1 Bucket #2 Bucket #3 Bucket #4 Bucket #5 Count=8 Count=4 Count=15 Count=3 Count=14 GT 8803 // Fall 2019 85

COST ESTIMATIONS • Our formulas are nice but we assume that data values are uniformly distributed. Non-Uniform Approximation 15 10 5 Bucket Ranges 0 1-3 4-6 7-9 10-12 13-15 Bucket #1 Bucket #2 Bucket #3 Bucket #4 Bucket #5 Count=8 Count=4 Count=15 Count=3 Count=14 GT 8803 // Fall 2019 86

HISTOGRAMS WITH QUANTILES • A histogram type wherein the "spread" of each bucket is same. Equi-width Histogram (Quantiles) 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GT 8803 // Fall 2019 87

HISTOGRAMS WITH QUANTILES • A histogram type wherein the "spread" of each bucket is same. Equi-width Histogram (Quantiles) 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Bucket #1 Bucket #2 Bucket #3 Bucket #4 Count=12 Count=12 Count=9 Count=12 GT 8803 // Fall 2019 88

HISTOGRAMS WITH QUANTILES • A histogram type wherein the "spread" of each bucket is same. Equi-width Histogram (Quantiles) 15 10 5 0 1-5 6-8 9-13 14-15 GT 8803 // Fall 2019 89

HISTOGRAMS WITH QUANTILES • A histogram type wherein the "spread" of each bucket is same. Equi-width Histogram (Quantiles) 15 10 5 0 1-5 6-8 9-13 14-15 GT 8803 // Fall 2019 90

SAMPLING • Modern DBMSs also collect samples from SELECT AVG (age) FROM people tables to estimate selectivities. WHERE age > 50 • Update samples when the underlying tables id name age status 1001 Obama 56 Rested changes significantly. 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit ⋮ 1 billion tuples GT 8803 // Fall 2019 91

SAMPLING • Modern DBMSs also collect samples from SELECT AVG (age) FROM people tables to estimate selectivities. WHERE age > 50 • Update samples when the underlying tables id name age status 1001 Obama 56 Rested changes significantly. 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit ⋮ 1 billion tuples GT 8803 // Fall 2019 92

SAMPLING • Modern DBMSs also collect samples from SELECT AVG (age) FROM people tables to estimate selectivities. WHERE age > 50 • Update samples when the underlying tables id name age status 1001 Obama 56 Rested changes significantly. 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk Table Sample 1005 Andy 37 Lit ⋮ 1001 Obama 56 Rested 1003 Tupac 25 Dead 1 billion tuples 1005 Andy 37 Lit GT 8803 // Fall 2019 93

SAMPLING • Modern DBMSs also collect samples from SELECT AVG (age) FROM people tables to estimate selectivities. WHERE age > 50 • Update samples when the underlying tables id name age status 1001 Obama 56 Rested changes significantly. 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk Table Sample 1005 Andy 37 Lit ⋮ 1001 Obama 56 Rested sel(age>50) = 1003 Tupac 25 Dead 1 billion tuples 1005 Andy 37 Lit GT 8803 // Fall 2019 94

SAMPLING • Modern DBMSs also collect samples from SELECT AVG (age) FROM people tables to estimate selectivities. WHERE age > 50 • Update samples when the underlying tables id name age status 1001 Obama 56 Rested changes significantly. 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk Table Sample 1005 Andy 37 Lit ⋮ 1001 Obama 56 Rested sel(age>50) = 1003 Tupac 25 Dead 1 billion tuples 1005 Andy 37 Lit GT 8803 // Fall 2019 95

SAMPLING • Modern DBMSs also collect samples from SELECT AVG (age) FROM people tables to estimate selectivities. WHERE age > 50 • Update samples when the underlying tables id name age status 1001 Obama 56 Rested changes significantly. 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk Table Sample 1005 Andy 37 Lit ⋮ 1001 Obama 56 Rested sel(age>50) = 1/3 1003 Tupac 25 Dead 1 billion tuples 1005 Andy 37 Lit GT 8803 // Fall 2019 96

OBSERVATION • Now that we can (roughly) estimate the selectivity of predicates, what can we actually do with them? GT 8803 // Fall 2019 97

PLAN ENUMERATION 98 GT 8803 // Fall 2018

QUERY OPTIMIZATION • After performing rule-based rewriting, the DBMS will enumerate different plans for the query and estimate their costs. – Single table. – Multiple tables. • It chooses the best plan it has seen for the query after exhausting all plans or some timeout. GT 8803 // Fall 2019 99

SINGLE-TABLE QUERY PLANNING • Pick the best access method. – Sequential Scan – Binary Search (clustered indexes) – Index Scan • Simple heuristics are often good enough for this. • OLTP queries are especially easy. GT 8803 // Fall 2019 100

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N administrivia Reminders Assignment 1: postponed to next Monday Sign up for discussion slots on Thursday

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Ron Minnich Ryan OLeary Gan Shun Lim Prachi Laud Chris Koch Ian Goegebuer With thanks to:

CS445 / SE463 / ECE 451 / CS645 So,ware requirements

Semi-Analytic Approach to Light Simulation Diego Garcia-Gamez & Andrzej Szelc The University

The Innermost Jet of 3C 84 The Innermost Jet of 3C 84 Sascha Trippe Sascha

Optimizing Join Enumeration in Transformation-based Query

Timing Performance of Silicon and Diamond Tracking Systems Nicolo Cartiglia, INFN, Torino -

Radiative Transfer & Volume Path Tracing CS295, Spring 2017 Shuang Zhao Computer Science

Inference Rules for Recognizing Textual Entailment Georgiana Dinu and Rui Wang Computational

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N administrivia Reminders Assignment 1: postponed to next Monday Sign up for discussion slots on Thursday

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Deep Data Analytics for Pricing: Uses, Issues, and Solutions Walter R. Paczkowski, Ph.D. Data

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // VENKATA KISHORE PATCHA Lecture#16 :

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Google Analytics Overview Whats Google Analytics? The Google Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Data Mining &amp; Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Ron Minnich Ryan OLeary Gan Shun Lim Prachi Laud Chris Koch Ian Goegebuer With thanks to:

CS445 / SE463 / ECE 451 / CS645 So,ware requirements

Semi-Analytic Approach to Light Simulation Diego Garcia-Gamez &amp; Andrzej Szelc The University

The Innermost Jet of 3C 84 The Innermost Jet of 3C 84 Sascha Trippe Sascha

Optimizing Join Enumeration in Transformation-based Query

Timing Performance of Silicon and Diamond Tracking Systems Nicolo Cartiglia, INFN, Torino -

Radiative Transfer &amp; Volume Path Tracing CS295, Spring 2017 Shuang Zhao Computer Science

Inference Rules for Recognizing Textual Entailment Georgiana Dinu and Rui Wang Computational

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues

Semi-Analytic Approach to Light Simulation Diego Garcia-Gamez & Andrzej Szelc The University

Radiative Transfer & Volume Path Tracing CS295, Spring 2017 Shuang Zhao Computer Science