9/23/2009 C O NFERENC ES Short Name Full Name Special Interest - PDF document

9/23/2009 C O NFERENC ES Short Name Full Name Special Interest Group on Management Of SIGMOD Data VLDB Very Large Data Base C O NT INUO US N EAREST N EIG HBO R S EARC H ICDE International Conference on Data Engineering Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong University of Science and Technology 2 Presented : Penny Bei Pan O VERVIEW I NT RO DUC T IO N Object � Introduction � Continuous Nearest Neighbor � Preliminary & Related Work � Continuous k-Nearest Neighbor Query(CkNN) � Definition � Problem Characteristics P bl Ch t i ti � R-tree algorithm Query � Query analysis Point � Complex CNN extension � Why called “continuous”? � Experiments � Nearest neighbor of every points in the trajectory � Discussion and Conclusion 3 4 P REL IMINARY - - P O INT NN Q UERIES P REL IMINARY - - P O INT NN Q UERIES � Branch and bound algorithms use mindist between � Depth-first (DF) and Best-first (BF) algorithms the query point q and an R-tree entry E , to prune � E: R-tree entry the search space: � q: query point � – mindist(E, q) = The minimum distance between E and � DF : choose the entrance with minimum min-dist q � BF: choose the min among all those visited (heap) � BF: choose the min among all those visited (heap) E 1 E 1 E 2 E 2 E 2 E 3 E 4 E 4 E 5 E 6 E 6 E 6 f k l l l m 5 6 1

9/23/2009 P REL IMINARY - - C O NT INUO US N EAREST N EIG HBO R R EL ED WO RK – S AMPL AT ING � Try to convert the continuous-NN to point-NN f � Every point on the line -> unlimited points � Sampling � Drawback: � Sample Rate: low -> incorrect Sample Rate: low > incorrect h � Sample Rate: high -> overhead (still cannot guarantee a c accuracy) � Data: A set of points (P={ a,b,c,d,f,g,h }) � Query: A line segment q=[s, e] � Time Parameterized queries � Result: The nearest neighbor (NN) of every point on q. � Output (R, T, C) : result, time period, changing point � Result representation: {<a,[s,s 1 ]>, <c,[s 1 ,s 2 ]>, � Tao, Y., Papadias, D. Time Parameterized Queries in 7 8 <f,[s 2 ,s 3 ]>, <h, [s 3 ,e]>} Spatio-Temporal Databases. ACM SIGMOD, 2002. R EL ED WO RK – T IME P ARAMET ERIZED NN R EL ED WO RK – T P NN ( C O NT .) AT AT s 1 � Step 1: Find the NN of the start point s , i.e., point a . � Step 3: Perform another TP NN to find: � Step 2: Use the TP technique to find: The first point � Starting from s1, how far we need to travel for the on the line segment ( s 1 ) where there is a change in current NN (i.e., c ) to change to f . the NN (i.e., point c ) will become the next NN 9 10 � Repeat this until we finish the entire segment. R EL ED WO RK – T P NN ( C O NT .) C K NN - D EFINIT AT IO N d f s 1 s d s f s h s g h g � Goal: Find all split points(as well as the corresponding NN for each partition) with a single traversal. � Intuitively: perpendicular bisector & [s,e] segment � Split list: The set of split points (including s and e). � Not only NN, but support k-NN � Vicinity circle: The circle that centers at split point s i � Still overhead: n times with radius dist(s i , s i .NN) Yufei , Dimitris Tao Papadias 11 � We say a data point u covers a point s if u=s.NN. E.g., 12 points a, c cover segments [s, s1], [s1, s2] 2

9/23/2009 C K NN – P RO BL EM C HARAC T C K NN - P RO BL EM C HARAC T ERIST IC S ERIST IC S � Lemma 1: Given a split list SL { s 0 , s 1 , …, s |SL − 1| }, and a � Lemma 2: (Covering Continuity) new data point p, then: p covers some point on query � The split points covered by a point p are continuous. segment q if and only if p covers a split point. � Namely, if p covers split point s i but not s i − 1 (or s i+1 ), then p cannot cover s i − j (or s i+j ) for any value of j>1. ┐ s 1 13 14 C K NN - P RO BL EM C HARAC T C K NN – R- T ERIST IC S REE AL G O RIT HM � How about the k-NN? � General key notes: � Lemma 1 : Fit || Lemma 2 : Cannot Fit � Use branch-and-bound techniques to prune the search space. � Eg: � K=3 � R-tree traverse principle: � When a leaf entry (i.e., a data point) p is encountered, SL is updated if p covers any split point (i.e., p is a qualifying entry) – By Lemma 1. � For an intermediate entry, We visit its subtree only if it may contain any qualifying data point – Use heuristics. � Avoid accessing not qualified nodes 15 16 R- T HM – H EURIST IC 1 R- T HM – H EURIST IC 2 ( AFT ER 1) REE AL G O RIT REE AL G O RIT � Given an intermediate entry E and query segment q , � Given an intermediate entry E and query segment q , the sub-tree of E may contain qualifying points only if the subtree of E must be searched if and only if there mindist(E,q) < SL MAXD , where SL MAXD is the maximum exists a split point s i � SL such that dist(s i , s i .NN) > distance between a split point and its NN. mindist(s i , E). 17 18 Compute Mindist(E,q) 3

9/23/2009 R- T HM – H EURIST IC 3 (O RDER ) R- T HM – L REE AL G O RIT REE AL G O RIT EAF ENT RY � Entries (satisfying heuristics 1 and 2) are accessed in � Input: New entry p , SL ={s 1 ,…s 10 } increasing order of their minimum distances to the � 1) retrieve the split points covered by p query segment q. � 2) update SL � Binary search: Start at s 5 , then s 2 … � Using bisector to judge the direction U i bi t t j d th di ti 19 20 C K NN – R- T HM ( EXAMPL E ) A NAL YSIS - C O ST M O DEL FO R U NIFO RM D AT REE AL G O RIT A � Depth First Actual Search region Approximate Search region � An optimal algorithm on R-trees must access only those nodes whose MBRs intersect the actual search region (i.e., E1 but not E2). � To facilitate the analysis we focus on a more regular 21 22 (approximated) region A NAL YSIS – N O DE A C C ESS P RO BABIL A NAL YSIS – C O ST M O DEL (N O DE A C C ESS ) IT Y � P ACCESS is the probability the MBR E of a node Intersects the Intersects the search region � Dataset cardinality N � R tree structure (Height: h) � The query length: q.l � The orientation angle 23 24 4

9/23/2009 A NAL YSIS – C O ST M O DEL ( C O NT .) O T HER C NN Q UERY � kCNN query (k=2) � Updating Vicinity circle � The number of distinct neighbors in the final result neighbors in the final result. � Trajectory NN query (TNN) � CPU overhead comparison � q1 = [s,u] � q2 = [u,v] � TP: increase with n NN � q3 = [v,e] � This paper: increase with dataset size N, query � Each segment has a SL length l… � Treated one by one 25 26 26 E E XP 1: C O ST M O DEL E XPERIMENT S VAL UAT IO N � Datasets: � Uniform � Real street segments: CA (130K points), ST (2M points). � Queries (each a segment): � Location and orientation randomly generated Location and orientation randomly generated � Length is set as a parameter � Performance is measured as the average of running 200 queries. � Machine: � 1Ghz CPU, 256M memory � Page size=4K (R-tree node capacity=200) 27 28 � Compare CNN and TP (the only existing solution) E XP 2: P ERFO RMANC E VS Q UERY L E XP 3: P ERFO RMANC E VS K ENG T H 29 30 5

9/23/2009 E S – KEY NO T XPERIMENT ES D ISC USSIO N AND C O NC L USIO N � A fast algorithm for C- kNN query. In general, CNN outperform TP significantly � � Future work: Single traversal � � Rectangle data For cost model: � � Moving data points BF better than DF (consistent with previous work) � � Application to road networks (i.e., travel instead of Application to road networks (i.e., travel instead of Th The cost model is accurate t d l i t � Euclidean distance) Performance & query Length � Length increase, split points increase � CPU for TP: keep repeat retrieving the same objects � Thank you! Performance & k � For CNN: k has not much influenced on NA, but k influences � CPU: higher number of split points 31 32 6

9/23/2009 C O NFERENC ES Short Name Full Name Special Interest - PDF document

9/23/2009 C O NFERENC ES Short Name Full Name Special Interest Group on Management Of SIGMOD Data VLDB Very Large Data Base C O NT INUO US N EAREST N EIG HBO R S EARC H ICDE International Conference on Data Engineering Yufei Tao, Dimitris

SURVEY AREA WWW-YES-2009-France Water Survey Results 3 June 2009 WWW-YES-2009-France water

2009 Half Year Results Presentation 6 months to 30 June 2009 13 August 2009 2009 Half Year

First Quarter 2009 - A Good Start 1Q 2009 Results Presentation - 29 April 2009 Agenda 1Q 2009

Platinum Platinum 2009 2009 th May 2009 18 18 th May 2009 Good morning to everyone, and

anton@linevich.com http://viewdle.com Friday, July 3, 2009 Friday, July 3, 2009 Friday, July 3,

Thursday, September 10, 2009 Thursday, September 10, 2009 Thursday, September

Pinal County Adopted Budget FY 2009 FY 2009 - 2010 2010 June 24, 2009 Pinal County Truth in

COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009

Construction Storm Water Construction Storm Water Workshop Workshop 2009 2009 2009 2009

Merging Merb into Rails Wednesday, November 18, 2009 Me Wednesday, November 18, 2009 Yehuda

West Virginia Performance West Virginia Performance Eff Effectiveness Review Tool Effectiveness

ITV plc Interim Results 2009 6 th August 2009 Interim Results 2009 0 Overview Michael Grade

PHP code audits OSCON 2009 San Jos, CA, USA July 21th 2009 samedi 25 juillet 2009 1 Agenda

AP PROJECT UPDATE AP PROJECT UPDATE Current Performance (Million Baht) 2008 Q1 2009 Q2 2009

Swedbank Q3 Results 2009 Q3 Results 2009 Swedbank 2073075 CEO Michael Wolf October 20, 2009

Bank of Georgia Q3 2009 & YTD 2009 financials January 2010 Bank of Georgia consolidated

Chapter 4: Foundations for inference OpenIntro Statistics, 2nd Edition Variability in estimates

The Coq proof assistant : From graphical presentation to principles and practice Coq syntax

Conformal theory of MacDowell-Mansouri type Micha Szczachor Capstone Institute for Theoretical

CSE 101 Algorithm Design and Analysis Sanjoy Dasgupta Russell Impagliazzo Ragesh Jaiswal

FUNC Lecture 7 Purely Functional Queues (lightly adapted for TFPIE17) Colin Runciman Purely

Cis330 Decision Support Systems and Business Intelligence Mostafa Z. Ali Mostafa Z. Ali

A Log-linear Block Transliteration Model based on Bi-Stream HMMs Bing Zhao Joint work with

Business Intelligence and Analytics applied to Public Housing Doctoral Consortium @ ADBIS 2019

9/23/2009 C O NFERENC ES Short Name Full Name Special Interest - PDF document

9/23/2009 C O NFERENC ES Short Name Full Name Special Interest Group on Management Of SIGMOD Data VLDB Very Large Data Base C O NT INUO US N EAREST N EIG HBO R S EARC H ICDE International Conference on Data Engineering Yufei Tao, Dimitris

SURVEY AREA WWW-YES-2009-France Water Survey Results 3 June 2009 WWW-YES-2009-France water

2009 Half Year Results Presentation 6 months to 30 June 2009 13 August 2009 2009 Half Year

First Quarter 2009 - A Good Start 1Q 2009 Results Presentation - 29 April 2009 Agenda 1Q 2009

Platinum Platinum 2009 2009 th May 2009 18 18 th May 2009 Good morning to everyone, and

anton@linevich.com http://viewdle.com Friday, July 3, 2009 Friday, July 3, 2009 Friday, July 3,

Thursday, September 10, 2009 Thursday, September 10, 2009 Thursday, September

Pinal County Adopted Budget FY 2009 FY 2009 - 2010 2010 June 24, 2009 Pinal County Truth in

COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009

Construction Storm Water Construction Storm Water Workshop Workshop 2009 2009 2009 2009

Merging Merb into Rails Wednesday, November 18, 2009 Me Wednesday, November 18, 2009 Yehuda

West Virginia Performance West Virginia Performance Eff Effectiveness Review Tool Effectiveness

ITV plc Interim Results 2009 6 th August 2009 Interim Results 2009 0 Overview Michael Grade

PHP code audits OSCON 2009 San Jos, CA, USA July 21th 2009 samedi 25 juillet 2009 1 Agenda

AP PROJECT UPDATE AP PROJECT UPDATE Current Performance (Million Baht) 2008 Q1 2009 Q2 2009

Swedbank Q3 Results 2009 Q3 Results 2009 Swedbank 2073075 CEO Michael Wolf October 20, 2009

Bank of Georgia Q3 2009 &amp; YTD 2009 financials January 2010 Bank of Georgia consolidated

Chapter 4: Foundations for inference OpenIntro Statistics, 2nd Edition Variability in estimates

The Coq proof assistant : From graphical presentation to principles and practice Coq syntax

Conformal theory of MacDowell-Mansouri type Micha Szczachor Capstone Institute for Theoretical

CSE 101 Algorithm Design and Analysis Sanjoy Dasgupta Russell Impagliazzo Ragesh Jaiswal

FUNC Lecture 7 Purely Functional Queues (lightly adapted for TFPIE17) Colin Runciman Purely

Cis330 Decision Support Systems and Business Intelligence Mostafa Z. Ali Mostafa Z. Ali

A Log-linear Block Transliteration Model based on Bi-Stream HMMs Bing Zhao Joint work with

Business Intelligence and Analytics applied to Public Housing Doctoral Consortium @ ADBIS 2019

Bank of Georgia Q3 2009 & YTD 2009 financials January 2010 Bank of Georgia consolidated