Using an Inverted Index Synopsis for Query Latency and Performance Prediction Nicola Tonellotto University of Pisa nicola.tonellotto@unipi.it
The scale of Web search challenge
How many documents? In how long? • Reports suggest that Google considers a total of 30 trillion pages in the indexes of its search engine • Identifies relevant results from these 30 trillion in 0.63 seconds • Clearly this a big data problem! • To answer a user's query, a search engine doesn’t read through all of those pages: the index data structures help it to e ffi ciently find pages that e ff ectively match the query and will help the user • E ff ective : users want relevant search results • E ffi cient : users aren't prepared to wait a long time for search results
Search as a Distributed Problem • To achieve e ffi ciency at Big Data scale, search engines use many servers: Query Retrieval Shard Server Strategy Replica Broker queries Results Query Merging Scheduler Query Retrieval Shard Server Strategy Replica M N • N & M can be very big: • Microsoft's Bing search engine has "hundreds of thousands of query servers"
Computing Platform Source: https://www.pexels.com/photo/datacenter-server-449401/
Ranking in IR If we know how long a query will take, can we reconfigure the search engines' ranking pipeline? Query Result Page(s) First Stage Second Stage 1. ... 2. ... N documents K documents • Probabilistic models • Machine learning Base Ranker Top Ranker 3. ... ⋮ • Few features • Di ff erent models K. ... • Inverted indexes • Hundreds of features Learning to Rank Inverted • Optimised processing • (Optimised) Sequential processing Algorithms Features Index BM25 + DAAT Learning To Rank 1,000 – 10,000 docs 10 – 100 docs
Query E ffi ciency Prediction • Predict how long an unseen query will take to execute, before it has executed. • This facilitates 3+ manners to make a search engine more e ffi cient: 1. Reconfigure the pipelines of the search engine, trading o ff a little e ff ectiveness for e ffi ciency 2. Apply more CPU cores to long-running queries 3. Decide how to plan the rewrites of a query , to reduce long-running queries • In each case, increasing e ffi ciency means increased server capacity and energy savings
Dynamic Pruning: MaxScore score space t 5 OR threshold 휃 σ 5 OR OR t 4 OR σ 4 OR t 3 AND σ 3 AND t 2 σ 2 AND t 1 AND σ 1 critical critical critical critical docid docid docid docid docid space
Dynamic Pruning: WAND score space σ 1 + σ 2 + σ 3 t 1 t 2 t 3 OR OR threshold 휃 OR OR σ 2 + σ 3 t 2 t 3 OR σ 1 + σ 3 t 1 t 3 OR σ 1 + σ 2 t 1 t 2 OR AND σ 3 t 3 AND AND σ 2 t 2 AND σ 1 t 1 AND AND critical critical critical critical critical critical docid docid docid docid docid docid docid space
What makes a single query fast or slow? Length of posting lists Query processing strategy (MaxScore, Wand, BMW) 2 term queries Number of terms Co-occurrence of query terms 4 term queries (Posting list union/intersection)
Static QEP • Static QEP (Macdonald et al., SIGIR 2012) • a supervised learning task • using pre-computed term-level features such as • the length of the posting lists • the variance of scored postings for each term • Extended for long-running queries classification on the Bing search engine infrastructure (Jeon et al., SIGIR 2014) • Extended to rewritten queries that include complex query operators (Macdonald et al., SIGIR 2017)
Analytical QEP • Analytical QEP (Wu and Fang, CIKM 2014) • analytical model of query processing e ffi ciency • key factor in their model was the number of documents containing pairs of query terms • Intersection size not precomputed but estimated with • N = num docs in collection • N 1 = t 1 posting list length • N 2 = t 2 posting list length • 𝜀 = control parameter set to 0.5
Dynamic QEP • Dynamic QEP (Kim et al, WSDM 2015) • Predictions after a short period of query processing has elapsed • Able to determine how well a query is progressing • Use the period to better estimate the query’s completion time • Supervised learning task • Must be periodically re-trained as new queries arrive • The dynamic features are naturally biased towards the first portion of the index used to calculate them • With various index orderings possible, it is plausible that the first portion of the index does not reflect well the term distributions in the rest of the index • More accurate than predictions based on pre-computed features or an analytical model
Index Synopsis 15 15 15 15 15 15 14 14 14 14 13 13 13 12 12 12 12 12 12 12 12 11 11 11 10 10 9 9 9 8 8 8 8 𝛿 sampling 7 7 7 6 6 6 5 5 5 4 4 4 4 4 4 4 4 3 3 2 2 2 1 1 1 1 1 1 Can be used to estimate the expected number of documents processed in any query, processed either in OR mode ( union of posting lists) or in AND mode ( intersection of posting lists)
Research Questions 1. Compression of an index synopsis 2. Space overheads of an index synopsis 3. Time overheads of an index synopsis 4. Posting list estimates accuracy w.r.t. AND/OR retrieval 5. Posting list estimates accuracy w.r.t. dynamic pruning 6. Accuracy of overall response time prediction 7. Accuracy of long-running queries classification
Experimental Setup TREC ClueWeb09-B corpus ( 50 million English web pages ) • Indexing and retrieval using the Terrier IR platform • • Stopwords removal and stemming Docids are assigned according to their descending PageRank score • Compressed using Elias-Fano encoding • Retrieving 50,000 unique queries from the TREC 2005 E ffi ciency Track topics • Scoring with BM25 , with a block size of 64 postings for BMW • Retrieved 1000 documents per query • Learning performed 4,000 train and 1,000 test queries • All indices are loaded in memory before processing starts • • Single core of a 8-core Intel i7-7770K with 64 GiB RAM Sampling probabilities 𝛿 = 0.001, 0.005, 0.01, 0.05 •
Compression & Space Overheads Original docids Remapped docids
Time Overheads
Union & Intersection Estimates Accuracy Intersection Union Analytical model Index synopsis
Actual vs. Synopsis Response Times MaxScore WAND BMW
Overall Response Time Accuracy
Long-running Query Classification
Query Performance Prediction • QPP is another use case for index synopsis • Can we use synopsis for post-retrieval QPP ? • Performance w.r.t. pre-retrieval QPP on full index • Performance w.r.t. post-retrieval QPP on full index • Main findings: 1. many of the post retrieval predictors can be e ff ective on very small synopsis indices 2. high correlations with the same predictors calculated on the full index 3. more e ff ective than the best pre-retrieval predictors 4. computation requires an almost negligible amount of time • More details in the journal article
Conclusions & Future Works • QEP is fundamental component that plans a query’s execution appropriately • Index synopses are random samples of complete document indices • Able to reproduce the dynamic pruning behavior of the MaxScore, WAND and BMW strategies on a full inverted index • 0.5% of the original collection is enough to obtain accurate query e ffi ciency predictions for dynamic pruning strategies • Used to estimate the processing times of queries on the full index • Post-retrieval query performance predictors calculated on an index synopsis can outperform pre-retrieval query performance predictors • 0.1% of the original collection outperforms pre-retrieval predictors by 73% • 5% of the original collection outperforms pre-retrieval predictors by 103% • What about applying index synopses across a tiered index layout ? • What about sampling at snippet/paragraph granularity ? • How document/snippet sampling can be combined with a neural ranking model for the first-pass retrieval to achieve e ffi cient neural retrieval ?
Thanks for your attention!
Recommend
More recommend