Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- 𝑙 Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 1 : Top- 𝑙 Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. 1 See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details

Why "Optimal Join Algorithms meet Top- 𝑙 "? Optimal Join algorithms Top- 𝑙 Return all results over joins Given 𝑙 , return 𝑙 “best” results ⇒ How to avoid large ⇒ How to avoid working on intermediate results? any lower ranked results? Ranked Enumeration (Any- 𝑙 ) Incrementally return the 𝑙 “best” results over joins (for any 𝑙 = 1, 2, ...) ⇒ How to most effectively push sorting through joins? 3

Top- 𝑙 Optimal Join Algorithms Any- 𝑙 middleware cost model RAM cost model return all results; (# accesses) wish: 𝑃 𝑠 , 𝑠 > 𝑜 conjunctive queries ranking function small result size; query wish: 𝑃(𝑙) decompositions most important all results minimize results first are equally return only intermediate important 𝑙 - best results results incremental computation 4

Outline tutorial • Part 1: Top- 𝑙 (Wolfgang): ~20min – Top- 𝑙 selection problem – Threshold algorithm [Fagin+ '03] – Top- 𝑙 join problem – J* algorithm [Natsev+ '01] – Discussion on cost models • Part 2: Optimal Join Algorithms (Mirek): ~30min • Part 3: Ranked enumeration over joins (Nikolaos): ~40min 5

Top- 𝑙 Selection Query: overall setup • 𝑜 objects 𝑌 ! , 𝑌 " , … , 𝑌 # with ℓ numeric weight attributes 𝑥 ! , 𝑥 " , … , 𝑥 ℓ • weight of object = aggregate function over its weights 𝜍 𝑥 ! , 𝑥 " , … , 𝑥 ℓ = 𝜍 𝑌 • Goal: Find top- 𝑙 objects according to some order (e.g. min) In most original papers assumed to be max! id 𝑥 ! 𝑥 " 𝑥 % sum Example aggregate function: 𝜍 = sum {𝑥 ! , 𝑥 " , 𝑥 % } 𝑌 ! 3 4 3 10 𝑌 " 4 2 4 10 𝑌 % 6 8 1 15 𝑌 & 7 6 6 18 Top- 𝑙 : a set of 𝑙 objects s.t. 𝜍 𝑌 ( ≤ 𝜍(𝑌 ) ) 𝑌 ' 8 7 5 20 for every 𝑌 ( ∈ 𝑈 and every 𝑌 ) ∉ 𝑈 𝑜 = 5 , ℓ = 3 , 𝑙 = 2 6

Top- 𝑙 Selection Query: information in different relations • Weights are stored in ℓ distinct relations 𝑆 ! 𝑆 ! 𝑆 " 𝑆 % id 𝑥 ! 𝑥 " 𝑥 % sum id 𝑥 ! id 𝑥 " id 𝑥 % 𝑌 ! 3 4 3 10 𝑌 ! 3 𝑌 ! 4 𝑌 ! 3 𝑌 " 4 2 4 10 𝑌 " 4 𝑌 " 2 𝑌 " 4 𝑌 % 6 8 1 15 𝑌 % 6 𝑌 % 1 𝑌 % 8 𝑌 & 7 6 6 18 𝑌 & 7 𝑌 & 6 𝑌 & 6 𝑌 ' 8 7 5 20 𝑌 ' 8 𝑌 ' 7 𝑌 ' 5 7

Top- 𝑙 Selection Query: sorted access • Weights are stored in ℓ distinct relations 𝑆 ! - each 𝑆 ! is sorted by attribute 𝑥 ! 𝑆 ! 𝑆 " 𝑆 % id 𝑥 ! 𝑥 " 𝑥 % sum id 𝑥 ! id 𝑥 " id 𝑥 % 𝑌 ! 3 4 3 10 𝑌 ! 3 𝑌 ! 4 𝑌 ! 3 𝑌 " 4 2 4 10 𝑌 " 4 𝑌 " 2 𝑌 " 4 𝑌 % 6 8 1 15 𝑌 % 6 𝑌 % 1 𝑌 % 8 𝑌 & 7 6 6 18 𝑌 & 7 𝑌 & 6 𝑌 & 6 𝑌 ' 8 7 5 20 𝑌 ' 8 𝑌 ' 7 𝑌 ' 5 8

Top- 𝑙 Selection Query: sorted access • Weights are stored in ℓ distinct relations 𝑆 ! - each 𝑆 ! is sorted by attribute 𝑥 ! 𝑆 ! 𝑆 " 𝑆 % id 𝑥 ! 𝑥 " 𝑥 % sum id 𝑥 ! id 𝑥 " id 𝑥 % Notice we sort in increasing order 𝑌 ! 3 4 3 10 𝑌 ! 3 𝑌 " 2 𝑌 % 1 𝑌 " 4 2 4 10 𝑌 " 4 𝑌 ! 4 𝑌 ! 3 𝑌 % 6 8 1 15 𝑌 % 6 𝑌 & 6 𝑌 " 4 𝑌 & 7 6 6 18 𝑌 & 7 𝑌 ' 7 𝑌 ' 5 𝑌 ' 8 7 5 20 𝑌 ' 8 𝑌 % 8 𝑌 & 6 9

Top- 𝑙 Selection Query: "middleware" assumption As Assumption 1: 1: Mi Middl ddleware c cost m mode del : • Weights are stored in ℓ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute 𝑥 ! • we only pay for accesses to attribute lists • Goal: Find top- 𝑙 with minimal access cost • 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % id 𝑥 ! 𝑥 " 𝑥 % sum id 𝑥 ! id 𝑥 " id 𝑥 % Notice we sort in increasing order 𝑌 ! 3 4 3 10 𝑌 ! 3 𝑌 " 2 𝑌 % 1 𝑌 " 4 2 4 10 𝑌 " 4 𝑌 ! 4 𝑌 ! 3 𝑌 % 6 8 1 15 𝑌 % 6 𝑌 & 6 𝑌 " 4 𝑌 & 7 6 6 18 𝑌 & 7 𝑌 ' 7 𝑌 ' 5 𝑌 ' 8 7 5 20 𝑌 ' 8 𝑌 % 8 𝑌 & 6 10

Top- 𝑙 Selection Query as a Join Problem As Assumption 1: 1: Mi Middl ddleware c cost m mode del : • Weights are stored in ℓ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute 𝑥 ! • we only pay for accesses to attribute lists • Goal: Find top- 𝑙 with minimal access cost • 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % select R 1 .id, id 𝑥 ! 𝑥 " 𝑥 % sum id 𝑥 ! id 𝑥 " id 𝑥 % sum(w 1 ,w 2 ,w 3 ) as weight 𝑌 ! 3 4 3 10 𝑌 ! 3 𝑌 " 2 𝑌 % 1 from R 1 , R 2 , R 3 𝑌 " 4 2 4 10 𝑌 " 4 𝑌 ! 4 𝑌 ! 3 where R 1 .id=R 2 .id 𝑌 % 6 8 1 15 𝑌 % 6 𝑌 & 6 𝑌 " 4 and R 2 .id=R 3 .id 𝑌 & 7 6 6 18 𝑌 & 7 𝑌 ' 7 𝑌 ' 5 order by weight 𝑌 ' 8 7 5 20 𝑌 ' 8 𝑌 % 8 𝑌 & 6 limit 2 ~ Joins on unique object id: 1–1 relationships 11

Naive algorithm: retrieve all items Assumption 1: As 1: Mi Middl ddleware c cost m mode del : • Weights are stored in ℓ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute 𝑥 ! • we only pay for accesses to attribute lists • Goal: Find top- 𝑙 with minimal access cost • 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % select R 1 .id, id 𝑥 ! 𝑥 " 𝑥 % sum id 𝑥 ! id 𝑥 " id 𝑥 % sum(w 1 ,w 2 ,w 3 ) as weight 𝑌 ! 3 4 3 10 𝑌 ! 3 𝑌 " 2 𝑌 % 1 from R 1 , R 2 , R 3 𝑌 " 4 2 4 10 𝑌 " 4 𝑌 ! 4 𝑌 ! 3 where R 1 .id=R 2 .id 𝑌 % 6 8 1 15 𝑌 % 6 𝑌 & 6 𝑌 " 4 and R 2 .id=R 3 .id 𝑌 & 7 6 6 18 𝑌 & 7 𝑌 ' 7 𝑌 ' 5 order by weight 𝑌 ' 8 7 5 20 𝑌 ' 8 𝑌 % 8 𝑌 & 6 limit 2 Naive algorithm: retrieve all items, sort, return top- 𝑙 Cost = 𝑜 ⋅ ℓ ⋅ 𝑑 "#$% 12

Assumption 2: monotonicity of 𝜍 Assumption 1: As 1: Mi Middl ddleware c cost m mode del : • Weights are stored in ℓ distinct relations 𝑆 ! we aggregate rankings of other services. - each 𝑆 ! is sorted by attribute 𝑥 ! • we only pay for accesses to attribute lists • Goal: Find top- 𝑙 with minimal access cost • 2 types of access: sequential / random - get next object in 𝑆 ! sequentially: "sorted" sequential access cost 𝑑 "#$ - obtain the weight for a specific object in 𝑆 ! : random access (index lookup) cost 𝑑 %&'( 𝑆 ! 𝑆 " 𝑆 % select R 1 .id, id 𝑥 ! 𝑥 " 𝑥 % sum id 𝑥 ! id 𝑥 " id 𝑥 % sum(w 1 ,w 2 ,w 3 ) as weight 𝑌 ! 3 4 3 10 𝑌 ! 3 𝑌 " 2 𝑌 % 1 from R 1 , R 2 , R 3 𝑌 " 4 2 4 10 𝑌 " 4 𝑌 ! 4 𝑌 ! 3 where R 1 .id=R 2 .id 𝑌 % 6 8 1 15 𝑌 % 6 𝑌 & 6 𝑌 " 4 and R 2 .id=R 3 .id 𝑌 & 7 6 6 18 𝑌 & 7 𝑌 ' 7 𝑌 ' 5 order by weight 𝑌 ' 8 7 5 20 𝑌 ' 8 𝑌 % 8 𝑌 & 6 limit 2 Part 3: tropical semiring (min, sum) is instance Assumption 2: As 2: The aggregate function 𝜍 is mo monotone : of " sele lective di dioid " (i.e. min(a,b) = a or b). , if 𝑥 ! ≤ 𝑥 ! , for all i , , 𝑥 * , , … , 𝑥 ℓ 𝜍 𝑥 ) , 𝑥 * , … , 𝑥 ℓ ≤ 𝜍 𝑥 ) 𝜍 is decomposable: 𝜍 𝑥 ) , 𝑥 * , 𝑥 - = 𝜍{𝑥 ) , 𝑥 * , 𝑥 - } 13

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 1 : Top- Time Slides:

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Checki king in and Treating High-Achievi ving Students Meet Meet you your r Doctor Doctor

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Outline Ranking and skyline Top- k algorithms Skyline algorithms Reconciling top-k

CS411 Database Systems Join Expressions 06: SQL Kazuhiro Minami Join Expressions Products and

How does Hash Join work in PostgreSQL and its derivates Yandong Yao Pivotal Greenplum team

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

Lindsey M. Stepp, Commissioner Carollynn J. Lear, Assistant Commissioner 109 Pleasant Street,

Homeownership Month Webinar: HUD Approval & Housing Counselor Certification Update Audio is

OWASP Top 10 Patrik Karlsson patrik@cqure.net Martin Holst Swende martin.swende@2secure.se

Meet Digia, Qt Commercial 1 st July 2012 Michal Klocek 6/ 29/ 2012 Finland (5 sites) Stockholm

m NTC Financial Literacy K-12 Curriculum o c . e t a r o p MONEY SMART 2.0 TEACHER r

Right Thinking Prayer Stating the Case Planning Well Understanding Moneys place in

Aim Aim I can explain why we need to budget and how we make one. Success Criteria Success

Email Marketing Foundations for Success By Nicole Delma Email Marketing - Intro Email Marketing

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 1 : Top- Time Slides:

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Checki king in and Treating High-Achievi ving Students Meet Meet you your r Doctor Doctor

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Outline Ranking and skyline Top- k algorithms Skyline algorithms Reconciling top-k

CS411 Database Systems Join Expressions 06: SQL Kazuhiro Minami Join Expressions Products and

How does Hash Join work in PostgreSQL and its derivates Yandong Yao Pivotal Greenplum team

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network

Lindsey M. Stepp, Commissioner Carollynn J. Lear, Assistant Commissioner 109 Pleasant Street,

Homeownership Month Webinar: HUD Approval &amp; Housing Counselor Certification Update Audio is

OWASP Top 10 Patrik Karlsson patrik@cqure.net Martin Holst Swende martin.swende@2secure.se

Meet Digia, Qt Commercial 1 st July 2012 Michal Klocek 6/ 29/ 2012 Finland (5 sites) Stockholm

m NTC Financial Literacy K-12 Curriculum o c . e t a r o p MONEY SMART 2.0 TEACHER r

Right Thinking Prayer Stating the Case Planning Well Understanding Moneys place in

Aim Aim I can explain why we need to budget and how we make one. Success Criteria Success

Email Marketing Foundations for Success By Nicole Delma Email Marketing - Intro Email Marketing

Homeownership Month Webinar: HUD Approval & Housing Counselor Certification Update Audio is