Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shinichi - PowerPoint PPT Presentation

slides: https://bit.ly/2P0KuW1 Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shin’ichi Satoh 1 1 National Institute of Informatics 2 The University of Tokyo

slides: https://bit.ly/2P0KuW1 Approximate nearest neighbor search Approximate NN search Result 0.23 0.20 2 3.15 3.25 argmin 𝒓 − 𝒚 𝑜 Query 2 ANN system 0.65 0.72 𝑜∈ 1,…,𝑂 1.43 1.68 𝒓 ∈ ℝ 𝐸 𝒚 74 hash-table, trees, inverted-index, etc Add Database vectors 4.63 0.86 5.22 … 0.54 6.21 3.44 1.66 0.72 1.12 0.74 0.31 0.04 𝒚 1 𝒚 2 𝒚 𝑂

slides: https://bit.ly/2P0KuW1 Related work ➢ Locality-sensitive-hashing (LSH) - FALCONN [Andoni+, 15] [Razenshteyn+, 18] ➢ Project/tree-based - FLANN [Muja+, 14] - Annoy [Bernhardsson, 18] ➢ Graph traversal - NSW/HNSW on NMSLIB [Malkov+, 16][Boytsov+, 13] ➢ Product quantization (PQ) - IVFPQ on Faiss [Jégou+, 11][Johnson+, 17] etc. - Our Reconfigurable Inverted Index

slides: https://bit.ly/2P0KuW1 Approximate NN Search Result 0.23 0.20 2 3.15 3.25 argmin 𝒓 − 𝒚 𝑜 Query 2 ANN system 0.65 0.72 𝑜∈𝒯 1.43 1.68 𝒓 ∈ ℝ 𝐸 𝒚 74 Subset search problem ➢ Existing ANN systems are fast for the all vectors Search is over 𝒯 = 1, … , 𝑂 - ➢ However, it is hard to run the search for a subset Search is over 𝒯 ⊆ 1, … , 𝑂 - e.g., searching from 𝒚 1000 , … , 𝒚 2000 - - Why? Systems are usually optimized for 𝒯 = 1, … , 𝑂

slides: https://bit.ly/2P0KuW1 There is a demand for subset search!

slides: https://bit.ly/2P0KuW1 There is a demand for subset search! Propose: Reconfigurable inverted index (Rii) ✓ Subset search ✓ A comparative performance with IVFPQ (Faiss) ✓ 10 ms for billion-scale data

slides: https://bit.ly/2P0KuW1 Reconfigurable inverted index (Rii) ➢ Preliminary Fast if |𝒯| Runtime is small - PQ linear scan 𝒯 - IVFPQ Fast if |𝒯| Runtime is large ➢ Data structure 𝒯 ➢ Search Cherry pick! Runtime Always fast 𝒯

slides: https://bit.ly/2P0KuW1 Preliminary: Product quantization (PQ) [Jégou+, TPAMI 11] PQ : Compress a vector All database vectors are PQ-encoded beforehand 𝒚 1 𝒚 2 𝒚 𝑂 into a short code 5.22 4.63 0.86 5.22 … 0.54 6.21 3.44 0.54 1.66 0.72 1.12 1.66 0.31 0.04 0.74 0.74 PQ PQ PQ ℝ 4 → … 2 , , 1 2 N …

slides: https://bit.ly/2P0KuW1 Preliminary: Product quantization (PQ) [Jégou+, TPAMI 11] ➢ The subset search is possible with a linear cost of 𝒯 𝑜 e.g., 𝒯 = 2, 4, 5, 8 argmin 𝑒 𝒓, 𝑜∈𝒯 ✔ ✔ ✔ 1 2 3 4 5 6 N 0.23 … 3.15 Linearly compared 0.65 1.43 𝒓 ∈ ℝ 𝐸 Runtime ➢ The search is efficient only if 𝒯 is small 𝑂 𝒯

slides: https://bit.ly/2P0KuW1 Reconfigurable inverted index (Rii) ➢ Preliminary Fast if |𝒯| Runtime is small - PQ linear scan 𝒯 - IVFPQ Fast if |𝒯| Runtime is large ➢ Data structure 𝒯 ➢ Search Cherry pick! Runtime Always fast ➢ Evaluation 𝒯

slides: https://bit.ly/2P0KuW1 Preliminary: Inverted Index + PQ (IVFPQ) [Jégou+, TPAMI 11] ➢ Current basic data structure for a large-scale search ➢ Subset-search is possible only if 𝒯 is large 126 225 𝒅 1 𝒅 2 𝒅 13 13 92 188 𝒅 5 Space partitioning …

slides: https://bit.ly/2P0KuW1 Preliminary: Inverted Index + PQ (IVFPQ) [Jégou+, TPAMI 11] ➢ Current basic data structure for a large-scale search ➢ Subset-search is possible only if 𝒯 is large e.g., 𝒯 = 13, 92, 105, … 126 225 𝑜 ∈ 𝒯 or not 0.23 3.15 𝒅 1 0.65 1.43 ✔ Re-rank via ✔ 𝒓 ∈ ℝ 𝐸 𝒅 2 𝒅 13 13 92 188 92 PQ-linear scan 𝒅 5 Space partitioning … 1.Find the closest space: 𝑙 ∗ = argmin 𝑙 𝒓 − 𝒅 𝑙 2 2 2.Focus the 𝑙 ∗ th space, accept items ∈ 𝒯 3.Re-rank the items via PQ-linear scan

slides: https://bit.ly/2P0KuW1 Preliminary: Inverted Index + PQ (IVFPQ) [Jégou+, TPAMI 11] ➢ Current basic data structure for a large-scale search ➢ Subset-search is possible only if 𝒯 is large e.g., 𝒯 = 13, 92, 105, … 126 225 𝑜 ∈ 𝒯 or not 0.23 3.15 𝒅 1 0.65 1.43 ✔ Re-rank via ✔ 𝒓 ∈ ℝ 𝐸 𝒅 2 𝒅 13 13 92 188 92 PQ-linear scan 𝒅 5 Space partitioning Runtime Why is it slow for small 𝒯 ? … 1.Find the closest space: 𝑙 ∗ = argmin 𝑙 𝒓 − 𝒅 𝑙 2 e.g., if 𝒯 is small and they are far away from 2 the query, we might need to scan all items 2.Focus the 𝑙 ∗ th space, accept items ∈ 𝒯 𝑂 𝒯 3.Re-rank the items via PQ-linear scan

slides: https://bit.ly/2P0KuW1 Data structure ➢ Store (1) PQ-codes linearly , and (2) IDs as an inverted index ➢ Can run either PQ-linear-scan or IVFPQ with a single data structure 1 2 13 N Key: store codes linearly … … cf. IVFPQ ➢ PQ-codes are also chunked. Natural 126 225 𝒅 1 ➢ Slight, but critical change 126 225 𝒅 2 13 92 188 𝒅 1 𝒅 5 … 13 92 188 𝒅 2 𝒅 5

slides: https://bit.ly/2P0KuW1 Search ➢ If 𝒯 is small, run PQ-linear scan ➢ If 𝒯 is large, run IVFPQ Runtime 1 2 13 N … … … … 𝒯 𝑂 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝒓 ∈ ℝ 𝐸 Runtime 𝒅 2 13 92 188 𝒅 5 … 𝑂 𝒯

slides: https://bit.ly/2P0KuW1 Search ➢ If 𝒯 is small, run PQ-linear scan ➢ If 𝒯 is large, run IVFPQ Runtime 1 2 13 N … … … … 𝒯 𝑂 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝒓 ∈ ℝ 𝐸 fetch Runtime 𝒅 2 13 92 188 𝒅 5 … 𝑂 𝒯

slides: https://bit.ly/2P0KuW1 Search ➢ Set a threshold 𝜄 ➢ If 𝒯 is small, run PQ-linear scan ➢ Key: Switch two methods ➢ If 𝒯 is large, run IVFPQ based on 𝒯 ≶ 𝜄 1 2 13 N … … … … Runtime 0.23 3.15 0.65 1.43 126 225 𝒅 1 𝑂 𝜄 𝒓 ∈ ℝ 𝐸 fetch 𝒯 𝒅 2 13 92 188 Use PQ-linear-scan Use IVFPQ 𝒅 5 …

slides: https://bit.ly/2P0KuW1 Evaluation ➢ SIFT1M ( 𝑂 = 10 6 , 𝐸 = 128 ). Results for top-R search

slides: https://bit.ly/2P0KuW1 ➢ Existing system: Annoy Evaluation ➢ Force to search a subset ➢ SIFT1M ( 𝑂 = 10 6 , 𝐸 = 128 ). Results for top-R search The existing system is slow, especially when 𝒯 is small Proposed Rii is always fast regardless of 𝒯 and 𝑆

slides: https://bit.ly/2P0KuW1 $ pip install rii https://github.com/matsui528/rii import rii import nanopq # Prepare a PQ/OPQ codec with M=32 sub spaces codec = nanopq.PQ(M=32).fit(vecs=Xt) # Trained using Xt # Instantiate a Rii class with the codec e = rii.Rii(fine_quantizer=codec) # Add vectors e.add_configure(vecs=X) # Search ids, dists = e.query(q=q, topk=3, target_ids=S) print(ids, dists) # e.g., [7484 8173 1556] [15.062 15.385 16.169]

Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shinichi - PowerPoint PPT Presentation

slides: https://bit.ly/2P0KuW1 Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shinichi Satoh 1 1 National Institute of Informatics 2 The University of Tokyo slides: https://bit.ly/2P0KuW1 Approximate nearest neighbor search

Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable

Indices Tomasz Bartoszewski Inverted Index Search Construction Compression Inverted

Reconfigurable Computing Computing Reconfigurable Reconfigurable Architectures Architectures

Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology

Crawling HTML create an user user inverted index query Search show results inverted

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Microsoft AI & Research Traditional IR Keyword based Search AUTB streams Inverted index

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

Inverted Indexes the IR Way CS330 Fall 2005 1 Term Doc # How Inverted Files now 1 is 1

Inverted Index Large set D of documents (possibly from WWW). We have a set of terms appearing in

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Reconfigurable Computing Computing Reconfigurable Design and implementation implementation

Reconfigurable Computing Reconfigurable Computing Design and implementation Design and

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive

Presentation 2017 O F F I C E O F T H E G O V E R N O R | M I S S I S S I P P I D I V I S

Building materials progressing towards a more circular economy 7 December 2017 Centre for

Mary Ann Nelson Leaders in Health November 16, 2017 Mission Hill Health Movement A

Cautionary Statement The following presentation includes forward-looking statements. These

FElls Forward! Live & OUTSIDE May 2020 Fells Forward IS ABOUT... Adapting to Physical

Products Planets F & B Park, A Unit of Planets Aviation Pvt.Ltd. Mundra-SEZ To book your

March 10 th ,2016 Enterprise Ireland / UCC seminar Natural Food Colour Ingredients Cybercolors

McBride plc Interim Results Presentation: February 2020 Agenda 1. CEO fjrst thoughts 2.

Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shinichi - PowerPoint PPT Presentation

slides: https://bit.ly/2P0KuW1 Reconfigurable Inverted Index Yusuke Matsui 1 Ryota Hinami 2 Shinichi Satoh 1 1 National Institute of Informatics 2 The University of Tokyo slides: https://bit.ly/2P0KuW1 Approximate nearest neighbor search

Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable

Indices Tomasz Bartoszewski Inverted Index Search Construction Compression Inverted

Reconfigurable Computing Computing Reconfigurable Reconfigurable Architectures Architectures

Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology

Crawling HTML create an user user inverted index query Search show results inverted

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Microsoft AI &amp; Research Traditional IR Keyword based Search AUTB streams Inverted index

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

Inverted Indexes the IR Way CS330 Fall 2005 1 Term Doc # How Inverted Files now 1 is 1

Inverted Index Large set D of documents (possibly from WWW). We have a set of terms appearing in

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Reconfigurable Computing Computing Reconfigurable Design and implementation implementation

Reconfigurable Computing Reconfigurable Computing Design and implementation Design and

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive

Presentation 2017 O F F I C E O F T H E G O V E R N O R | M I S S I S S I P P I D I V I S

Building materials progressing towards a more circular economy 7 December 2017 Centre for

Mary Ann Nelson Leaders in Health November 16, 2017 Mission Hill Health Movement A

Cautionary Statement The following presentation includes forward-looking statements. These

FElls Forward! Live &amp; OUTSIDE May 2020 Fells Forward IS ABOUT... Adapting to Physical

Products Planets F &amp; B Park, A Unit of Planets Aviation Pvt.Ltd. Mundra-SEZ To book your

March 10 th ,2016 Enterprise Ireland / UCC seminar Natural Food Colour Ingredients Cybercolors

McBride plc Interim Results Presentation: February 2020 Agenda 1. CEO fjrst thoughts 2.

Microsoft AI & Research Traditional IR Keyword based Search AUTB streams Inverted index

FElls Forward! Live & OUTSIDE May 2020 Fells Forward IS ABOUT... Adapting to Physical

Products Planets F & B Park, A Unit of Planets Aviation Pvt.Ltd. Mundra-SEZ To book your