Evaluation Metrics Presented by Dawn Lawrie 1
Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2
Precision Set Proportion of things of interest in some set Example: I’m interested in apples Precision = 3 apples / 5 pieces of fruit 3
Recall Set Proportion of things of interest in the set out of all the things of interest Example: I’m looking for apples Recall = 3 apples / 6 total apples 4
F-measure Harmonic mean of precision and recall Combined measure that values each the same F 1 = 2 * precision * recall precision + recall 5
Where to use The set is well defined Order of things in the set doesn’t matter 6
But with a Ranked List 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 7
Mean Average Precision Also known as MAP Favored IR metric for ranked retrieval 8
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 2 3 6 101112 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 2 3 6 101112 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 2 3 6 101112 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 2 3 6 101112 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 + 2/3 2 3 6 101112 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 + 2/3 2 3 6 101112 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 + 2/3 + 3/6 2 3 6 101112 9
Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 + 2/3 + 3/6 + 4/10 + 5/11 + 6/12 2 3 6 101112 9
Compute MAP Apple Query Compute average over Blueberry Query a query set Pineapple Query ( ) ∑ AP Relevant( q ) Banana Query q ∈ Query ( ) = MAP Query Query 10
Limitation of MAP Results can be biased for query sets that include queries with few relevant documents 11
Mean Reciprocal Rank ! if q retrieves no 0 # # relevant documents RR ( q ) = " 1 # otherwise ( ) TopRank q # $ ∑ RR ( q ) q ∈ Query ) = ( MRR Query Query 12
Mean Reciprocal Rank Reciprocal Rank ! if q retrieves no 0 # # relevant documents RR ( q ) = " 1 # otherwise ( ) TopRank q # $ ∑ RR ( q ) q ∈ Query ) = ( MRR Query Query 12
Understanding MRR Ranks 5 15 13
Understanding MRR Ranks 5 15 205 215 13
Understanding MRR Ranks RR values 5 0.2 15 0.067 205 215 13
Understanding MRR Ranks RR values 5 0.2 15 0.067 205 0.0049 215 0.0047 13
Understanding MRR Ranks RR values 5 0.2 15 0.067 205 0.0049 215 0.0047 Average: 110 MRR: 0.069 13
MRR vs. Average Rank MRR=MAP when one relevant document High Ranks does not reflect the importance of Bound result between 0 and 1 those documents in practice Minimizes difference 1 is perfect retrieval between 750 and 900 Average rank greatly influenced by documents retrieved at large ranks 14
Take Home Message P /R and f-measure good for well defined sets MAP good for ranked results when your looking for 5+ things MRR good for ranked results when your looking for <5 things and best when just 1 thing 15
Recommend
More recommend