BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter - PowerPoint PPT Presentation

DBRank 2011 LUDWIG- August 29, 2011 MAXIMILIANS- DEPARTMENT DATABASE UNIVERSITÄT INSTITUTE FOR SYSTEMS Seattle, WA Seattle WA MÜNCHEN MÜNCHEN INFORMATICS INFORMATICS GROUP GROUP BeyOND – Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter Kriegel, , , g , Christian Moennig and Arthur Zimek Ludwig-Maximilians-Universität München (LMU) Munich, Germany http://www.dbs.ifi.lmu.de {bernecker, graf, kriegel, zimek}@dbs.ifi.lmu.de moennig@cip.ifi.lmu.de

Outline DATABASE SYSTEMS GROUP 1. Background Motivation: k-nearest neighbor search in high-dimensional g g – databases – BOND revisited 2. Introducing BeyOND – Filtering objects via distance approximations – Sub Cubes, MBRs 3. Experimental Evaluation 4. Conclusions BeyOND – Unleashing BOND 2

Motivation DATABASE SYSTEMS GROUP • Similarity search in high-dimensional space is ☺ important in cases of images, e-commerce, etc. � slow • The suitability of index-based solutions depends on the data di t ib ti distribution • Open question: relevant vs. irrelevant attributes • Similarity search in subspaces: Si il it h i b – Fix query attributes beforehand – Use multiple pivot points to derive upper and lower bounds Use multiple pivot points to derive upper and lower bounds – Process data vertically to reduce the high-dimensional space BeyOND – Unleashing BOND 3

BOND Revisited (1) DATABASE SYSTEMS GROUP • BOND [1] : k-nearest neighbor search on high-dimensional data – Resolves feature vectors (FVs) column-wise – Ranking of columns w.r.t. relevance – Pruning of columns using a branch-and-bound approach – Resolved part is known exactly – Unresolved part has to be approximated Unresolved part has to be approximated – Resolving stops when approximation is „good enough“ – Support of subspace queries pp p q – Distance metrics: • Histogram intersection (uncorrelated dimensions) • Euclidean distance E lid di t [1] de Vries, Mamoulis, Nes, Kersten: Efficient k-NN Search On Vertically Decomposed Data (SIGMOD’02) BeyOND – Unleashing BOND 4

BOND Revisited (2) DATABASE SYSTEMS GROUP • Restrictions of BOND: 1. The approach works only on Zipfian distributed data. 2. The feature values are normalized to [0,1] in each dimension. 3 3. The proposed bounds are loose The validity of stricter bounds The proposed bounds are loose. The validity of stricter bounds (Bond advanced) depends on a certain resolve order of the columns. BeyOND – Unleashing BOND 5

BOND Revisited (3) DATABASE SYSTEMS GROUP • Notation: – query vector q y , , database vector q q v − ∪ − + + = – Splitting of : resolved part , unresolved part ⇒ v v v v v v − − + + = + • Approximated distance: S approx ( q , v ) S ( q , v ) S ( q , v ) 1 2 ∑ ∑ − − − − = − – Resolved part: 2 2 S ( q , v ) ( q v ) 1 i { i } = ∑ i + + + + 2 + + − ≥ – Unresolved part: S ( q , v ) max q , 1 q S ( q , v ) 2 i i 1 i • Distance bounds: − − + + = = + + ≥ ≥ S S upper ( ( q q , v v ) ) S S ( ( q q , v v ) ) S S ( ( q q , v v ) ) S S ( ( q q , v v ) ) 1 2 1 − − = + ≤ S lower ( q , v ) S ( q , v ) 0 S ( q , v ) 1 1 BeyOND – Unleashing BOND 6

Beyond BOND DATABASE SYSTEMS GROUP • Benefits of BeyOND: 1. Independence of the data distribution. p ☺ ☺ 2. No restriction to a normalized data space. ☺ 3. No specific resolve order of the dimensions is needed. ☺ ⇒ Price: Distance approximations are no more suitable! � • Solution: Combining the idea of BOND with well-known t techniques: h i – VA-file (data space partitioning) – MBR (Minimum Bounding Rectangle) approximation (data organizing) MBR (Minimum Bounding Rectangle) approximation (data organizing) ⇒ Remaining restriction: minimum/maximum values for each ⇒ Remaining restriction: minimum/maximum values for each dimension need to be known � BeyOND – Unleashing BOND 7

Sub Cubes (1) DATABASE SYSTEMS GROUP • First extension: VA-file [2] with one split ⇒ 2 d sub cubes ⇒ 2 sub cubes 1 ⇒ Addressing via Z-IDs ⇒ Improved bounds based on the close / far ⇒ Improved bounds based on the close / far sub cube borders and lower 1 upper 2 c c 2 v i v i • Memory-efficient representation (8 bytes → 1 bit) – Sub cube need not be kept in main memory p y • Split positions stored in one separate array per dimension • Dependence on split level: p p – FV: 8 bytes per dimension – s splits: s / 8 bytes ( s bits) per dimension [2] Weber, Schek, Blott. A Quantitative Analysis and Performance Study for Similarity Search Methods in High-Dimensional Spaces (VLDB‘98) BeyOND – Unleashing BOND 8

Sub Cubes (2) DATABASE SYSTEMS GROUP • Old distance bounds: { { } } ∑ ∑ − − + + 2 = = + + − S S ( ( q q , v v ) ) S S ( ( q q , v v ) ) max max q q , 1 1 q q upper 1 i i i − v − = + S lower ( q , v ) S ( q , ) 0 1 • Approximations of unresolved dimensions: { { } } ∑ 2 2 ′ + + + + = − − lower upper S ( q , v ) max q c , q c [ ] + + 2 i i v v i i i ⎧ + ∈ lower upper 0 if q c , c ⎪ ⎪ + + ∑ ∑ i { } v v ′ ′ ′ ′ + + + + = i i i i ⎨ ⎨ S S ( ( q , v ) ) 2 2 + + − − i lower upper ⎪ min q c , q c else ⎩ + + i i v v i i • New distance bounds: • New distance bounds: ′ ′ = − − + + + ≥ S upper ( q , v ) S ( q , v ) S ( q , v ) S ( q , v ) 1 2 1 ′ ′ ′ = − − + + + ≤ S lower ( q , v ) S ( q , v ) S ( q , v ) S ( q , v ) 1 2 1 BeyOND – Unleashing BOND 9

MBR Caching (1) DATABASE SYSTEMS GROUP • Most sub cubes are (very) sparse, i.e. occupied by at most one FV • Dense sub cubes allow a tighter Dense sub cubes allow a tighter approximation via MBRs – Restrict the number of MBRs in order to avoid a memory overhead – Ranking function for MBRs: V V = ⋅ sub cube f ( MBR ) card ( MBR ) V MBR d ⋅ d 16 16 – 8 byte coordinates: memory increase is limited to bytes card ( MBR ) per feature vector (+ pointer to Z-ID) BeyOND – Unleashing BOND 10

MBR Caching (2) DATABASE SYSTEMS GROUP • Limit the number of MBRs to 1% of the database size • Threshold as a trade-off between pruning power and Threshold as a trade off between pruning power and additional memory consumption • Requirements: Requirements: – Either all MBRs can be kept in memory, – or the time for loading the MBRs is less than the time for resolving the respective FVs. • Adaption of the equations for lower and upper bounds BeyOND – Unleashing BOND 11

Experimental Evaluation (1) DATABASE SYSTEMS GROUP • Evaluated approaches: 1. BondAdvanced (stricter bounds, but resolve order dependent) 2. Bond (original bounds)* 3. Sequential* 4. Beyond-1 (1 split) 5. BeyondMBR-1 (1 split + MBRs) y ( p ) 6. Beyond-2 7. BeyondMBR-2 8. Beyond-3* 9. BeyondMBR-3* BeyOND – Unleashing BOND 12

Experimental Evaluation (2) DATABASE SYSTEMS GROUP • Data set descriptions: Data Set Dims Size Type ALOI 27 110,250 Color Histograms, Zipfian CLUSTERED CLUSTERED 20 20 500 000 500,000 S Synthetic, 50 Clusters, Gaussian th ti 50 Cl t G i PHOG [3] 110 10,715 CT Histograms, PCA‘ed SIFT [4] SIFT 133 133 335 583 335,583 Image Features Image Features [3] Graf, Kriegel, Schubert, Poelsterl, Cavallaro. 2D Image Registration in CT Images Using Radial Image Descriptors (MICCAI‘11) [4] Lowe. Distinctive Image Features from Scale-Invariant Keypoints (Int. Journal of Computer Vision, 2004) BeyOND – Unleashing BOND 13

Experimental Evaluation (3) DATABASE SYSTEMS GROUP • Experimental settings: – 50 k-nearest neighbor queries g q – k = 10 – Averaged cumulative number of pruned FVs after resolving a column – AUC: data not resolved – AOC: data resolved for refinement BeyOND – Unleashing BOND 14

Experimental Evaluation (4) DATABASE SYSTEMS GROUP ALOI 27 110,250 Color Histograms, Zipfian BondAdvanced Bond Beyond-2 Beyond-1 BeyondMBR-1 BeyOND – Unleashing BOND 15

Experimental Evaluation (5) DATABASE SYSTEMS GROUP CLUSTERED 20 500,000 Synthetic, 50 Clusters, Gaussian BondAdvanced Bond Beyond-2 Beyond-1 BeyondMBR-1 BeyOND – Unleashing BOND 16

Experimental Evaluation (6) DATABASE SYSTEMS GROUP PHOG 110 10,715 CT Histograms, PCA‘ed BondAdvanced Bond Beyond-2 BeyondMBR-1 Beyond-1 BeyOND – Unleashing BOND 17

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter - PowerPoint PPT Presentation

DBRank 2011 LUDWIG- August 29, 2011 MAXIMILIANS- DEPARTMENT DATABASE UNIVERSITT INSTITUTE FOR SYSTEMS Seattle, WA Seattle WA MNCHEN MNCHEN INFORMATICS INFORMATICS GROUP GROUP BeyOND Unleashing BOND Thomas Bernecker, Franz

Bond Basics What is a Bond? Bail Bond Gold Bond James Bond Municipal Bond What is a Bond? A

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Chapter 5 Interest Rates and Bond Valuation } Know the important bond features and bond types }

Bond Oversight Committee Bond 2014 and Bond 2018 April 25, 2019 1 AGENDA 1. Welcome Joanne

Bond Oversight Committee Bond 2014 and Bond 2018 July 18, 2019 1 AGENDA 1. Welcome Joanne

AGENDA 2012 2017 BOND PROGRAM BOND PROGRAM 2 1 8/10/2016 2012 BOND PROGRAM PARKS,

Records Required of Bail Bond Records Required of Bail Bond Companies in Counties with Bail Bond

Measure G Bond Spending Plan Regular Meeting of the Citizens Bond Oversight Committee June

Unleashing the potential of open-source in the 5G arena Some visions of 5G and beyond 5G and

NORTH MARION School District- Bond Projects BOND OVERSIGHT COMMITTEE February 19, 2020 BOND

BOND 2019 Implementation Update BOND 2019 PROGRAM SUMMARY Facilities $852,726,335 Safety,

2017 Capital Bond Program Overview 1. What Is a Capital Bond Program? 2. Project Categories 3.

Escondido Union School District March 2020 Bond Feasibility October 24, 2019 March 2020 Bond

Synthesis Bond Presentation Synthesis Bond Presentation * * * * * * Bond Issuance 6.50%

Global Primary Bond Markets News & Analysis Global Primary Bond Markets News & Analysis

Covered Bond Investor Council (CBIC) 2 years of the CBIC Covered Bond Transparency Template

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Lecture 13 : The Exponential Distribution 0/ 19 Definition A continuous random variable X is

Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel

Key Management and Distribution Symmetric with Asymmetric Public Keys CSS322: Security and

The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of

BeyOND Unleashing BOND Thomas Bernecker, Franz Graf, Hans-Peter - PowerPoint PPT Presentation

DBRank 2011 LUDWIG- August 29, 2011 MAXIMILIANS- DEPARTMENT DATABASE UNIVERSITT INSTITUTE FOR SYSTEMS Seattle, WA Seattle WA MNCHEN MNCHEN INFORMATICS INFORMATICS GROUP GROUP BeyOND Unleashing BOND Thomas Bernecker, Franz

Bond Basics What is a Bond? Bail Bond Gold Bond James Bond Municipal Bond What is a Bond? A

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Chapter 5 Interest Rates and Bond Valuation } Know the important bond features and bond types }

Bond Oversight Committee Bond 2014 and Bond 2018 April 25, 2019 1 AGENDA 1. Welcome Joanne

Bond Oversight Committee Bond 2014 and Bond 2018 July 18, 2019 1 AGENDA 1. Welcome Joanne

AGENDA 2012 2017 BOND PROGRAM BOND PROGRAM 2 1 8/10/2016 2012 BOND PROGRAM PARKS,

Records Required of Bail Bond Records Required of Bail Bond Companies in Counties with Bail Bond

Measure G Bond Spending Plan Regular Meeting of the Citizens Bond Oversight Committee June

Unleashing the potential of open-source in the 5G arena Some visions of 5G and beyond 5G and

NORTH MARION School District- Bond Projects BOND OVERSIGHT COMMITTEE February 19, 2020 BOND

BOND 2019 Implementation Update BOND 2019 PROGRAM SUMMARY Facilities $852,726,335 Safety,

2017 Capital Bond Program Overview 1. What Is a Capital Bond Program? 2. Project Categories 3.

Escondido Union School District March 2020 Bond Feasibility October 24, 2019 March 2020 Bond

Synthesis Bond Presentation Synthesis Bond Presentation * * * * * * Bond Issuance 6.50%

Global Primary Bond Markets News &amp; Analysis Global Primary Bond Markets News &amp; Analysis

Covered Bond Investor Council (CBIC) 2 years of the CBIC Covered Bond Transparency Template

Definition of Stochastic Processes Definition of Stochastic Processes st Order Density

Security proofs for continuous-variable quantum key distribution Anthony Leverrier Inria Paris

Lecture 13 : The Exponential Distribution 0/ 19 Definition A continuous random variable X is

Communication Complexity of Learning Discrete Distributions Krzysztof Onak IBM T.J. Watson

Embe mbedding dding as a To Tool ol for Al Algorithm orithm De Design sign Le Song

Geometric losses for distributional learning Arthur Mensch (1) , Mathieu Blondel (2) , Gabriel

Key Management and Distribution Symmetric with Asymmetric Public Keys CSS322: Security and

The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of

Global Primary Bond Markets News & Analysis Global Primary Bond Markets News & Analysis