Modeling Relevance Gain Evaluation, session 4 CS6200: Information Retrieval
Expected Relevance Gain All of the measures we’ve seen so far can be expressed in a different way, Let P ( i ) := prob. user reads doc i based on a user model. r ) := fraction of docs user reads R ( � The user model gives the probability from � r which are relevant of the user reading each document in Then gain ( r ) := E P [ R ( r )] � � the ranking. | r | � With these probabilities, we can � = P ( i ) · r i calculate the expected amount of i = 1 relevance the user would gain from the ranking.
Precision@k if i ≤ k � 1 / k P prec @ k ( i ) := otherwise 0 For precision@k , we model the user as | r | � having equal probability of reading � E P prec @ k [ R ( r )] = P prec @ k ( i ) · r i � each of the top k documents and zero probability of reading anything else. i = 1 k 1 Is this a reasonable user model? � = kr i i = 1 k = 1 � r i k i = 1
Scaled DCG if i ≤ k � 1 / Z · 1 / lg( i + 1 ) P sdcg @ k ( i ) := otherwise DCG and nDCG don’t normalize easily 0 for this framework, so instead we k � introduce a related measure: Scaled Z := 1 / lg( i + 1 ) DCG , or sdcg. i = 1 ∞ � This user model is top-weighted: the sdcg @ k ( r ) := r i P sdcg @ k ( i ) � probability of observing a document is i = 1 higher for top-ranked documents. k = 1 r i � lg( i + 1 ) Z i = 1
Probability of Continuing C M ( i ) := P M ( i + 1 ) So far, we have reconsidered the measures based on the probability of P M ( i ) the user observing a document. It’s sometimes useful to instead if i < k � 1 consider the probability of the user C prec @ k ( i ) := otherwise 0 continuing past a given document. If they read doc i , will they read i + 1 ? � lg( i + 1 ) if i < k lg( i + 2 ) C sdcg @ k ( i ) := otherwise 0
Rank-biased Precision Rank-biased precision is the measure we get if we imagine that the user has some fixed probability, p , of continuing. P rbp ( i ) := ( 1 − p ) p i − 1 This hypothetical user flips a p -biased C rbp ( i ) := p coin at each document to decide when to give up. On average, this user will read 1 / ( 1 - p ) documents before giving up.
Inverse Squares This form of Inverse Squares (by m Let S m := π 2 Moffat et al 2012) is built on the 1 � intuition that the probability of 6 − i 2 continuing depends on the number of i = 1 documents the user expects to need Then: to satisfy her information need. 1 1 P insq ( T , i ) := Its parameter T is the anticipated · ( i + 2 T − 1 ) 2 S 2 T − 1 number of documents. C insq ( T , i ) = ( i + 2 T − 1 ) 2 • For nav queries, T ≅ 1 ( i + 2 T ) 2 • For info queries, T ≫ 1
Average Precision A final way to model user behavior is L M ( i ) := P M ( i ) − P M ( i + 1 ) based on the probability that document i P M ( 1 ) is the last document read. This gives an interpretation for Average Precision: the expected relevance gained from the user choosing a relevant if R > 0 � r i / R document i uniformly at random, and L ap ( i ) := otherwise reading all documents from 1 to i . 0 Imagine that exactly one of the relevant documents will satisfy the user, but we don’t know which one.
Wrapping Up Evaluation metrics should be carefully chosen to be well-suited to the users and task you’re trying to measure. Understanding the user model underlying a given metric can help shed light on what you’re really measuring. Next, we’ll look at the construction and use of test collections.
Recommend
More recommend