Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, - PowerPoint PPT Presentation

Overview Our Work Discussion Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, Vladimir Pestov Department of Mathematics and Statistics University of Ottawa Ottawa, Ontario, Canada SISAP 2009, Prague, 29/09/2009 Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview Our Work Discussion Outline Overview 1 The Setting for Similarity Search Previous Work Our Work 2 Framework Concentration of Measure Statistical Learning Theory Asymptotic Bounds Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview The Setting for Similarity Search Our Work Previous Work Discussion Similarity Workloads Universe Ω : metric space with metric 휌 . Dataset X ⊂ Ω , always finite, with metric 휌 . A range query : given q ∈ Ω and r > 0 find { x ∈ X ∣ 휌 ( x , q ) < r } For analysis purposes, we add: A measure 휇 on Ω . Treat X as i.i.d. sample ∼ 휇 of size n Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview The Setting for Similarity Search Our Work Previous Work Discussion Curse of dimensionality conjecture All indexing schemes suffer from the curse of dimensionality: (conjecture) If d = 휔 ( log n ) and d = n o ( 1 ) , any sequence of indexes built on a sequence of datasets X d ⊂ Σ d allowing similarity search in time polynomial in d must use n 휔 ( 1 ) space. Handbook of Discrete and Computational Geometry The Hamming cube Σ d of dimension d : The set of all binary sequences of length d . Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview The Setting for Similarity Search Our Work Previous Work Discussion Fixed dimension Examples of previous work: Let n the size of X vary, but the space (Ω , 휌, 휇 ) be fixed. The usual “asymptotic” analysis in the CS sense. Does not investigate the curse of dimensionality. Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview The Setting for Similarity Search Our Work Previous Work Discussion Fixed n Let the dimension and hence (Ω , 휌, 휇 ) vary but the size n of X stay the same. e.g. [Weber 98], [Chávez 01] Too small sample size n makes it easier to index spaces of high dimension d . When both d and n vary, the math is more challenging. Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview The Setting for Similarity Search Our Work Previous Work Discussion Points to keep in mind Distinction between X and Ω . Both d and n grow. Need to make assumptions about the sequence of Ω ’s (?) Need to make assumption about the indexes. Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview The Setting for Similarity Search Our Work Previous Work Discussion Gameplan Pick an index type to analyze. 1 Pick a cost model. 2 The sequence of Ω ’s exhibits concentration of measure, 3 the “intrinsic dimension” grows. Statistical Learning Theory: linking properties of Ω ’s and 4 properties of X ’s. Conclusion: if all conditions are met, the Curse of 5 Dimensionality will take place. Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Overview The Setting for Similarity Search Our Work Previous Work Discussion Main Result From a sequence of metric spaces with measure (Ω d , 휌 d , 휇 d ) , where d = 1 , 2 , 3 , . . . take i.i.d. samples (datasets) X d ∼ 휇 d . Assume (Ω d , 휌 d , 휇 d ) display the concentration of measure. The VC dimension of closed balls in (Ω d , 휌 d ) is O ( d ) . We build a pivot-index using k pivots, where k = o ( n d / d ) . Sample size n d satisfies d = 휔 ( log n d ) and d = n o ( 1 ) . d Suppose we perform queries of radius=NN. Then: If we fix arbitrarily small 휀, 휂 > 0, ∃ D such that for all d ⩾ D , the probability that at least half the queries on dataset X d take less than ( 1 − 휀 ) n d time is less than 휂 . Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Framework Overview Concentration of Measure Our Work Statistical Learning Theory Discussion Asymptotic Bounds Pivot indexing scheme Build an index: Pick { p 1 . . . p k } from X 1 Calculate n × k array of distances 2 휌 ( x , p i ) , 1 ⩽ i ⩽ k , x ∈ X Perform query given q and r : Compute 휌 k ( q , x ) := sup 1 ⩽ i ⩽ k ∣ 휌 ( q , p i ) − 휌 ( x , p i ) ∣ . 1 Since 휌 ( q , x ) ⩾ 휌 k ( q , x ) , no need to compute 휌 ( q , x ) if 2 휌 k ( q , x ) > r Compute 휌 ( q , x ) otherwise. 3 Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Framework Overview Concentration of Measure Our Work Statistical Learning Theory Discussion Asymptotic Bounds The cost model Only one cost: 휌 ( q , x ) Computing 휌 k ( q , x ) costs k . Let C q , r , p 1 ,..., p k denote all the discarded points in X : { x ∈ X ∣ 휌 k ( q , x ) > r } Let n = ∣ X ∣ . Total cost: k + n − ∣ C q , r , p 1 ,..., p k ∣ . Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Framework Overview Concentration of Measure Our Work Statistical Learning Theory Discussion Asymptotic Bounds Concentration of Measure A function f : Ω → ℝ is 1-Lipschitz if ∣ f ( 휔 1 ) − f ( 휔 2 ) ∣ ⩽ 휌 ( 휔 1 , 휔 2 ) ∀ 휔 1 , 휔 2 ∈ Ω Examples: f ( x ) = x f ( x ) = 1 2 x ( x 2 + 1 ) √ f ( x ) = Its median is a number M such that 휇 { 휔 ∣ f ( 휔 ) ⩽ M } ⩾ 1 / 2 and 휇 { 휔 ∣ f ( 휔 ) ⩾ M } ⩾ 1 / 2 Ilya Volnyansky, Vladimir Pestov Curse of Dimensionality in Pivot-based Indexes

Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, - PowerPoint PPT Presentation

Overview Our Work Discussion Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, Vladimir Pestov Department of Mathematics and Statistics University of Ottawa Ottawa, Ontario, Canada SISAP 2009, Prague, 29/09/2009 Ilya

23 Advanced Topics 5: Multi-lingual Models Up until now, we have assumed that in the case of

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

X1D: Create Pivot Tables using Excel 2013 3/07/2018 V1N Create Pivot Tables using Excel 2013 1

Create Pivot Tables using Excel 2008/2013 1/26/2016 V1H Create Pivot Tables using Excel 2008 1

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

Traveling The PIVOT FOOT is what matters!!! If the pivot foot is lifted the ball MUST be passed

PIVOT TABLES AND CHARTS Leena Razzaq lrazzaq@ccs.neu.edu CS1100 Pivot tables and charts 1

PIVOT TABLES AND CHARTS Leena Razzaq lrazzaq@ccs.neu.edu CS1100 Pivot tables and charts 1

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Dimensionality Reduction Techniques for Proximity Problems Piotr Indyk, SODA 2000 CS 468 |

Information Overview & Information Overview & Corporate Presentation for Pacific Control

January 17, 2013 Government of the District of Columbia Vincent C. Gray, Mayor 1 MAYORS POWER

benthic biotopes Torsten Berg & Birgit Heyden With input from Kai Hoppe, Petra Schmitt,

Reliability Analysis in High Dimensions S Adhikari Department of Aerospace Engineering,

BOAT: Building Auto-Tuners with Structured Bayesian Optimization B esp O ke A uto- T uners Indigo

PARAMETRIC MODELING OF COMPOSITE LAMINATES Ch. Ghnatios, B. Bognet, A. Leygue, F. Chinesta*, A.

CFD Lab Course The Lattice Boltzmann Method Philipp Neumann 20.5.2011 P. Neumann: CFD Lab

Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, - PowerPoint PPT Presentation

Overview Our Work Discussion Curse of Dimensionality in Pivot-based Indexes Ilya Volnyansky, Vladimir Pestov Department of Mathematics and Statistics University of Ottawa Ottawa, Ontario, Canada SISAP 2009, Prague, 29/09/2009 Ilya

23 Advanced Topics 5: Multi-lingual Models Up until now, we have assumed that in the case of

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

X1D: Create Pivot Tables using Excel 2013 3/07/2018 V1N Create Pivot Tables using Excel 2013 1

Create Pivot Tables using Excel 2008/2013 1/26/2016 V1H Create Pivot Tables using Excel 2008 1

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

Traveling The PIVOT FOOT is what matters!!! If the pivot foot is lifted the ball MUST be passed

PIVOT TABLES AND CHARTS Leena Razzaq lrazzaq@ccs.neu.edu CS1100 Pivot tables and charts 1

PIVOT TABLES AND CHARTS Leena Razzaq lrazzaq@ccs.neu.edu CS1100 Pivot tables and charts 1

Module 7: Creating and Maintaining Indexes Overview Creating Indexes Creating Index

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

An Example of Index An Example of Index pattern of structure in indicators pattern of structure

Module 6: Planning Indexes Overview Introduction to Indexes Index Architecture How

Dimensionality Reduction Techniques for Proximity Problems Piotr Indyk, SODA 2000 CS 468 |

Information Overview &amp; Information Overview &amp; Corporate Presentation for Pacific Control

January 17, 2013 Government of the District of Columbia Vincent C. Gray, Mayor 1 MAYORS POWER

benthic biotopes Torsten Berg &amp; Birgit Heyden With input from Kai Hoppe, Petra Schmitt,

Reliability Analysis in High Dimensions S Adhikari Department of Aerospace Engineering,

BOAT: Building Auto-Tuners with Structured Bayesian Optimization B esp O ke A uto- T uners Indigo

PARAMETRIC MODELING OF COMPOSITE LAMINATES Ch. Ghnatios, B. Bognet, A. Leygue, F. Chinesta*, A.

CFD Lab Course The Lattice Boltzmann Method Philipp Neumann 20.5.2011 P. Neumann: CFD Lab

Information Overview & Information Overview & Corporate Presentation for Pacific Control

benthic biotopes Torsten Berg & Birgit Heyden With input from Kai Hoppe, Petra Schmitt,