Ranked Query Processing: Relevance a) Order-based Paradigm - PDF document

Ranking– Ordering according to the degree of some fuzzy notions: � Similarity (or dissimilarity) Ranked Query Processing: � Relevance a) Order-based Paradigm � Preference Q Kevin Chen-Chuan Chang ranking 2 Query models for order-based paradigm– When multiple dimensions are available-- On the better-than graph Assume the database stores the information of a set of flights � For each flight � � Better-than graph � Its price t2 � Its route (travel-time or distance traveled) t1 A user would retrieve all the “interesting” flights � � A flight is interesting if and only if there is no other cheaper and t4 t3 shorter (route) at the same time � Best-Matches-Only (BMO) query model � Retrieve maximal elements price y 1 0 � Thus also called maximal vector b e 9 a c � These maximal elements form the “skyline”! 8 7 d 6 g f 5 l h n � On better-than graph, how to process BMO? 4 3 2 i k m 1 x 3 4 o distance 1 2 3 4 5 6 7 8 9 1 0 1

The overall preference combines the Skyline Operation dimensions � P1 LOWEST(price) � Dominance: � a � b � i � c, h � g � d, m � f � n � k, e � l � A point dominates another point if it is no worse in all � P2 LOWEST(distance) dimensions, and better in at least one dimension � k � m, i � h, n � l � f � g � d � c � a � b, e � P :=({price,distance},<P1 ⊗ P2) Distance Price a 1 9 � Skyline: b 2 10 c 4 8 � BMO: Maximal elements of P? � A set of all points in the dataset that are not d 6 7 � Is a maximal? e 9 10 dominated by any other point in the dataset f 7 5 � Is b maximal? g 5 6 h 4 3 � Is c maximal? i 3 2 k 9 1 l 10 4 m 6 2 5 6 n 8 3 Why is it called “skyline”? What is skyline: An example (Also called: Pareto curve, Maximum Vector) � What do you see in the Chicago skyline? � Query: SELECT * FROM flights SKYLINE OF price MIN, distance MIN � What dominates what? � What points constitute the skyline? price y 1 0 b e 9 a c 8 7 d 6 g f 5 l h n 4 3 2 i k m 1 x 7 8 o distance 1 2 3 4 5 6 7 8 9 1 0 2

Skyline Algorithms: We will look at a few examples Block Nested Loop [Börzsönyi et al., 2001] � Block nested loop (BNL) � Conceptually: Nested loop joins— � Divide and Conquer � Joining the table with itself � Compare every pair of points to check dominance � Bitmap � NN Price Distance Price Distance a 1 9 a 1 9 b 2 10 b 2 10 c 4 8 c 4 8 d 6 7 d 6 7 e 9 10 e 9 10 f 7 5 f 7 5 g 5 6 g 5 6 h 4 3 h 4 3 i 3 2 i 3 2 k 9 1 k 9 1 l 10 4 l 10 4 m 6 2 m 6 2 9 n 8 3 n 8 3 10 Block Nested Loop -- Implementation Block Nested Look– Improvements How if the window overflow? � Multi-pass algorithm � One-pass scan: � Scan the table; maintain a window of current skyline points � Scan the table, write any overflow to temp file � Return the window at the end � Scan the temp file; repeat till done Scan Price Distance Skyline Discarded Pass 1 Pass 2 a 1 9 a Scan b 2 10 a b Price Distance Skyline Discarded TempFile c 4 8 a,c a 1 9 a d 6 7 a,c,d b 2 10 a b e 9 10 a ,c,d e c 4 8 a,c f 7 5 a,c,d,f d 6 7 a,c,d Scan g 5 6 a,c,f, g d e 9 10 a ,c,d e h 4 3 a, h c,f,g f 7 5 a,c,d f TempFile i 3 2 a, i h g 5 6 a,c, g d k 9 1 a,i,k h 4 3 a, h c,g � Any problems? l 10 4 a, i ,k l i 3 2 a, i h k 9 1 a,i,k l 10 4 a, i ,k l 11 12 3

Block Nested Look– Improvements However, BNL-based approaches are not How if the window overflow? incremental– Want progressive processing! [Börzsönyi et al., 2001] � Divide and conquer Desired: � Divide all the points into several groups such that each group � Compute the first few Skyline points almost instantaneously fits in memory � Compute more and more results incrementally � Process the groups separately � Merge their results y 10 b e 9 a c � Smart merging possible 8 s1 d 7 s2 � If s3 not empty then disregard s2 6 g f 5 � Use s3 to purge s1, s4 l h n 4 3 s3 s4 2 i k m 1 x o 1 2 3 4 5 6 7 8 9 10 13 14 Bitmap Algorithm: Representation [Tan et. al. Is b = (3, 2, 1) in the skyline? 2001] � For each dimension: � Any point with no-worse values in all dimensions? � 0110 & 0101 & 1111 = 0100 � n distinct values � n bits � Any point with a better value in some dimension? � A value as a bitmap of all no-higher bits = 1 � 0010 | 0001 | 1001 = 1011 d1: price d2: dist d3: rating � Any point satisfying both? 4 3 2 1 3 2 1 2 1 � 0100 & 1011 = 0000 a (1,1,2) 0 0 0 1 0 0 1 1 1 b (3,2,1) 0 1 1 1 0 1 1 0 1 � So, is b = (3,2,1) in the skyline? c (4,1,1) 1 1 1 1 0 0 1 0 1 d (2,3,2) 0 0 1 1 1 1 1 1 1 d1: price d2: dist d3: rating 4 3 2 1 3 2 1 2 1 a (1,1,2) 0 0 0 1 0 0 1 1 1 b (3,2,1) 0 1 1 1 0 1 1 0 1 c (4,1,1) 1 1 1 1 0 0 1 0 1 d (2,3,2) 0 0 1 1 1 1 1 1 1 15 16 4

The Bitmap Algorithm Bitmap Algorithm: Problems � for each point x in DB: � Bitmaps are not dynamic structures � check if x is in skyline � Hard to update � Bitmaps can have prohibitive space overhead � output x if so � How if there are many distinct values? � E.g., How about continuous values? � Incremental indeed; bitmap computation efficient � No focus of directions at all in skyline search � Depend on what points you check first � However, any problem? 17 18 NN – Finding the First Skyline Point [Kossmann et. NN– Are there other skyline points? al. 2002] Start by finding the nearest neighbor of the origin � = + � I.e., the point p = ( x , y ) with the smallest 2 2 dist ( o , p ) x y � Pruning-- What cannot be in the skyline? � How to find NN: Use NN algorithm based on R-tree. � Those dominated by point I � This NN point must be in the skyline � Iteration– What may be in the skyline? � Otherwise? � Non-dominated region 2 and 3 y 10 b e y 10 9 a c b e 9 8 a c 8 7 d 7 d 6 4 3 g f 6 g 5 f l 5 h n l 4 n h 4 3 3 2 i k 2 m i m k 1 2 1 1 x x o o 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 19 20 5

NN– Iteratively Process All the “ ToDo” Order-based rank query evaluation-- Still Regions until All Done ongoing research. y 10 � How optimal are these algorithms? Further b e 9 a c i improvement? 8 d 7 6 g 4 3 f 5 l h n y 4 10 3 4 b e 3 � Scale to high dimensionality? 9 a c 2 i m k 8 1 1 2 x 7 a o d 1 2 3 4 5 6 7 8 9 10 6 1 2 g f 5 h n l � Generalize to non-BMO type of aggregations? 4 y 3 10 b e 2 i m k 9 a c 1 x 8 o 1 2 3 4 5 6 7 8 9 10 7 d 6 k g f 5 l n h 4 3 m 2 i 3 k 4 21 22 1 2 1 x o 1 2 3 4 5 6 7 8 9 10 Ranking Query Processing: Thank You! b) Score-based Paradigms Kevin Chen-Chuan Chang 23 6

Ranking– Ordering according to the degree of Relational DBMS scenarios– A brief overview some fuzzy notions: � Similarity (or dissimilarity) Relational DBMS– � Relevance � Value mapping: [Chaudhuri and Gravano, 1999] � Preference � Mapping top-k scores to Boolean selection ranges Q � May have to restart � Cardinality mapping: [Carey and Kossmann, 1997, 1998] � Pushing “limit k” down query tree � May have to restart ranking 25 26 Our Focus: Middleware scenarios Top-k algorithms rely on accesses to evaluate query scores u j To each predicate p i : select h.id , h.address from Hotel h Random access: ra i ( u j ) p 1 : rating ( h.rate ), p 2 : cheap ( h.price ), p 3 : safe ( h.zip )) � order by F=min ( p 1 stop after k=10 Return score of u j for p i � p 1 [ u j ] Sorted access : sa � i Return some next best object and � k=10 its score for p i RDBMS p 1 u 3: .70 Top -k top results p 2 hotels.com u 2: .65 Algorithm u 1: .60 p 3 F=min ( p 1 , p 2 , p 3 ) p 1 apbs.com 27 28 7

Goal: Minimize the “ access” cost An algorithm performs a sequence of accesses: A simple algorithm � Sorted access on P1 then random accesses to P2, P2 RDBMS rating s 1 =3ms, r 1 =20ms Top -k hotels.com cheap Algorithm c:.80 b:.45 a:.30 k=1 s 2 =44ms, r 2 =466ms p 1 RDBMS apbs.com safe a:.8 b:.90 c:.90 c:.80 s 3 = ∞ , r 3 =700ms F, k top result p 2 Sort hotels.com a:.9 b:.70 c:.95 Access costs dominate in “middleware” scenarios p 3 F=min ( p 1 , p 2 , p 3 ) apbs.com � Cost model: aggregate of all access costs 29 30 Assumption: Monotonic scoring functions The Naïve Algorithm � Get all p i [ u ] score for every object u � Monotonic: � f ( x 1 , …, x n ) ≤ f ( x 1 ′ , …, x n ′ ) if x i ≤ x i ′ for all i � e.g., by complete sorted accesses � Compute F [ u ] = F ( p 1 [ u ] ,…, p m [ u ] ) for every u � Why good for query evaluation? � Sort � Gives bounds for pruning data � Return top k � Gives a simple function “surface” to maximize f � Reasonable? � Obviously expensive. Can we do better? � Analogy: Negation rarely used in Boolean queries � Note k is typically small � But, new “function-inference” front -ends also found this to be violated in many cases 31 32 8

Ranked Query Processing: Relevance a) Order-based Paradigm - PDF document

Ranking Ordering according to the degree of some fuzzy notions: Similarity (or dissimilarity) Ranked Query Processing: Relevance a) Order-based Paradigm Preference Q Kevin Chen-Chuan Chang ranking 2 Query models for

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

1 London.ca/Elections 2 What is Ranked Choice Voting? The City of London used Ranked Choice

The Ranked Sequence ADT A ranked sequence S (with n elements) supports the following methods:

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

SOCIAL AGENCY Established on May 1st 2010 Ranked #1 social agency in the Emerce 100

HOW WILL HOW WILL RANKED CHOICE VOTING RANKED CHOICE VOTING WORK IN HI? WORK IN HI? VOTERS

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.

Lecture 5: Top-1 and Skyline CMSC 5705 Advanced Topics in Database Systems Yufei Tao Department

DevOps My name is Fung NCCU ALPHA Camp Back-end developer DevOps in

Optical Communications Telecommunication Engineering School of Engineering University of Rome La

Earth Movement and Earth Movement and Solar Calendar Solar Calendar Recitation 2 Recitation 2

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul

Backwards Analysis Gerth Stlting Brodal Pre-talent track activity in Algorithms, Department of

Optimizing Multidimensional skyline queries Sofian Maabout Nicolas Hanusse Carlos Ordonez

Remaining ASEN 5007 Class Schedule Today Solving FEM Equations (descriptive, no assignments)

Ranked Query Processing: Relevance a) Order-based Paradigm - PDF document

Ranking Ordering according to the degree of some fuzzy notions: Similarity (or dissimilarity) Ranked Query Processing: Relevance a) Order-based Paradigm Preference Q Kevin Chen-Chuan Chang ranking 2 Query models for

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Information Retrieval &gt; Query Us User er Query Words Query Words Search Personalization

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

1 London.ca/Elections 2 What is Ranked Choice Voting? The City of London used Ranked Choice

The Ranked Sequence ADT A ranked sequence S (with n elements) supports the following methods:

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

SOCIAL AGENCY Established on May 1st 2010 Ranked #1 social agency in the Emerce 100

HOW WILL HOW WILL RANKED CHOICE VOTING RANKED CHOICE VOTING WORK IN HI? WORK IN HI? VOTERS

Online Query Processing Exposure to online query processing algorithms and fundamentals A

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit &amp; Continue 5.

Lecture 5: Top-1 and Skyline CMSC 5705 Advanced Topics in Database Systems Yufei Tao Department

DevOps My name is Fung NCCU ALPHA Camp Back-end developer DevOps in

Optical Communications Telecommunication Engineering School of Engineering University of Rome La

Earth Movement and Earth Movement and Solar Calendar Solar Calendar Recitation 2 Recitation 2

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul

Backwards Analysis Gerth Stlting Brodal Pre-talent track activity in Algorithms, Department of

Optimizing Multidimensional skyline queries Sofian Maabout Nicolas Hanusse Carlos Ordonez

Remaining ASEN 5007 Class Schedule Today Solving FEM Equations (descriptive, no assignments)

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

V.3 Query Processing 1. Term-at-a-Time 2. Document-at-a-Time 3. WAND 4. Quit & Continue 5.