E ffi cient Document Scoring VSM, session 5 CS6200: Information - PowerPoint PPT Presentation

Jan 05, 2023 •373 likes •444 views

E ffi cient Document Scoring VSM, session 5 CS6200: Information Retrieval Slides by: Jesse Anderton Scoring Algorithm This algorithm runs a query in a straightforward way. It assumes the existence of a few helper functions, and uses a

E ffi cient Document Scoring VSM, session 5 CS6200: Information Retrieval Slides by: Jesse Anderton
Scoring Algorithm • This algorithm runs a query in a straightforward way. • It assumes the existence of a few helper functions, and uses a max heap to find the top k items efficiently. • If IDF is used, the values of D and df t should be stored in the index for efficient retrieval.
Faster Scoring • We only care about relative document scores: optimizations that do not change document rankings are safe. • If query terms appear once, and all query terms are equally important, the query vector q has one nonzero entry for each query term and all entries are equal. • Order is preserved if we use a query vector where all values are 1. This is equivalent to summing up document term scores as matching scores.
Faster, Approximate Scoring • If we prefer speed over finding the exact top k documents, we can filter documents out without calculating their cosine scores. ‣ Only consider documents containing high-IDF query terms ‣ Only consider documents containing most (or all) query terms ‣ For each term, pre-calculate the r highest-weight documents. Only consider documents which appear in these lists for at least one query term. ‣ If you have query-independent document quality scores (i.e. user rankings), pre-calculate the r highest-weight documents for each term, but use the sum of the weight and the quality score. Proceed as above. • If the above methods do not produce k documents, you can calculate scores for the documents you skipped. This involves keeping separate posting lists for the two passes through the index.
Cluster Pruning • When building the index, select “leader” √ D documents at random. • All other documents are “followers,” and assigned to the nearest leader (using cosine similarity). query • At query time: ‣ Compare the query to each leader to choose the closest ‣ Compare the query to all followers of the closest leader • Variant: assign followers to the closest b 1 leaders; leader compare query to followers of closest b 2 leaders. follower
Wrapping Up • There are many optimizations we can consider, but they focus on a few key ideas: ‣ For exact scoring, find ways to mathematically deduce the document ranking without calculating the full cosine similarity. ‣ For approximate scoring, choose either query terms or documents which you can safely ignore in order to reduce the necessary calculations without reducing search quality by too much. • Next, we’ll compare the performance of several VSM techniques.

Recommend

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

0. Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient statistics Su ffi cient statistics The concept of su ffi ciency addresses the question Is there a statistic T ( X ) that in some sense contains

303 views • 14 slides

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add scoring cards 2- Practice with AUXSCORE card 3- Plot simulation results 4- Convert results to ASCII 2 Exercise 8: Scoring

501 views • 4 slides

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for small memories, but it be- comes increasingly e ffi cient in using available storage space as memory size increases. The attractive features of the

940 views • 60 slides

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

867 views • 62 slides

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work with Mutiara Sondjaja 1 Euclidean space A homogeneous polynomial p : E ! R is hyperbolic if there is a vector e 2 E such that for all x 2 E , the

526 views • 21 slides

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training 2018 Contact Info Strong Recommendation: use permanent e-mail addresses Meet your Division's Scoring Committee Representative

366 views • 22 slides

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of FFI: A mess :( Several options, none of them very clear. Three options: FFI, AlienFFI, NB-FFI FFI Basic types and not much more you can declare

295 views • 16 slides

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics of e ffi ciency for meshes Use triangle or quad meshes general polygon meshes lead to too much complexity quad meshes are great for some

207 views • 17 slides

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add scoring cards 2- Practice with AUXSCORE card 3- Plot simulation results 4- Convert results to ASCII 2 Exercise 8: Scoring Start from the

724 views • 4 slides

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics why FFI? want to do $THING in Haskell there exists a C library for $THING interoperability / bug-compatibility performance /

737 views • 54 slides

Solid State Drive Based Energy E ffi cient Cloud Storage Jesus Ramos Alexis Je ff erson Ti ff any

Solid State Drive Based Energy E ffi cient Cloud Storage Jesus Ramos Alexis Je ff erson Ti ff any Da Silva Salma Rodriguez Jorge Cabrera Florida International University VISA Research Lab CIS 4911 - Senior Project Project Mentor: Dr. Ming

361 views • 17 slides

E ffi cient, Cost E ff ective and Sustainable Self-Delivery of Asphalt for Small Works 1

E ffi cient, Cost E ff ective and Sustainable Self-Delivery of Asphalt for Small Works 1 Overview Roadmender Asphalt is a mobile volumetric process that enables contractors to make their own premium quality hot-mix asphalt: In just the

302 views • 16 slides

A Large Scale Study of the Small Sample Performance of Random Coe ffi cient Models of Demand

A Large Scale Study of the Small Sample Performance of Random Coe ffi cient Models of Demand Benjamin S. Skrainka University of Chicago The Harris School of Public Policy skrainka@uchicago.edu June 26, 2012 Introduction Objectives This

549 views • 30 slides

E ffi cient and Incentive-Compatible Liver Exchange Haluk Ergin Tayfun Snmez M. Utku nver U

E ffi cient and Incentive-Compatible Liver Exchange Haluk Ergin Tayfun Snmez M. Utku nver U C Berkeley Boston College Boston College Introduction Kidney Exchange became a wide-spread modality of transplantation within the last

556 views • 42 slides

Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and

H ardware, A I and N eural-nets Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and Song Han Project Page: http://pvcnn.mit.edu/ 3D Deep Learning 3D Part Segmentation 3D Semantic Segmentation 3D

351 views • 19 slides

E ffi cient use of semidefinite programming for the selection of rotamers in protein conformation

E ffi cient use of semidefinite programming for the selection of rotamers in protein conformation Forbes Burkowski, Yuen-Lam Cheung & Henry Wolkowicz Retrospective Workshop on Discrete Geometry, Optimization and Symmetry November 2013

585 views • 39 slides

INSTAGRAM FOR SPEAKERS WHY INSTAGRAM WHATS THE PURPOSE? Find new followers and engage with

INSTAGRAM FOR SPEAKERS WHY INSTAGRAM WHATS THE PURPOSE? Find new followers and engage with people who will like your message Keep in touch with people who already know you Showcase who you are to people looking for speakers

659 views • 39 slides

W HAT A BOUT P AXOS ? Paxos tolerates a minority of processing failing by crashing . What

B YZANTINE F AULT T OLERANCE Ellis Michael A H IERARCHY OF F AULT M ODELS No faults Crash faults Byzantine faults People who use tabs instead of spaces B YZANTINE F AULTS Also called "general" or "arbitrary" faults.

563 views • 31 slides

Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using

Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer Science, University of Toronto 10

232 views • 22 slides

L. Confrontation at the Feast of Dedication John 10:22 42 1. John 10:22 During the 400

L. Confrontation at the Feast of Dedication John 10:22 42 1. John 10:22 During the 400 year period between the Old and New Testaments, the Jews initiated the Feast of Dedication (also known as Hanukkah and the Festival of Lights) to

488 views • 47 slides

Cassandra By Example: Data Modelling with CQL3 Berlin Buzzwords June 4, 2013 Eric Evans

Cassandra By Example: Data Modelling with CQL3 Berlin Buzzwords June 4, 2013 Eric Evans eevans@opennms.com @jericevans CQL is... Query language for Apache Cassandra Almost SQL (almost) Alternative query interface First class

794 views • 48 slides

1/25/2016 What Disciples Of Jesus Do The premier action verb for Christian discipleship is

1/25/2016 What Disciples Of Jesus Do The premier action verb for Christian discipleship is "follow." Disciples of Jesus follow him! John 12:26 & Luke 9:23, NLT Anyone who wants to be my disciple must follow meIf any of you wants

230 views • 4 slides

Debrief by Tao Chen Feb 27, 2015 Austin, Texas, USA Texas: The Lone Star State Before I went When

Debrief by Tao Chen Feb 27, 2015 Austin, Texas, USA Texas: The Lone Star State Before I went When I was there Texas State Capitol Colorado River University of Texas, Austin Reception at UT , Austin Big Picture of AAAI Information about

621 views • 29 slides

Matthew 4:23-25 1. the few who became disciples ( Matthew 4:18-22 ) 2. the great multitudes (

Matthew 4:23-25 1. the few who became disciples ( Matthew 4:18-22 ) 2. the great multitudes ( Matthew 4:23-25 ) didnt have to leave everything to follow Jesus came because they saw the miracles Jesus did followed at their convenience

210 views • 5 slides