Evaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities - PowerPoint PPT Presentation

Evaluation Metrics Presented by Dawn Lawrie 1

Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2

Precision Set Proportion of things of interest in some set Example: I’m interested in apples Precision = 3 apples / 5 pieces of fruit 3

Recall Set Proportion of things of interest in the set out of all the things of interest Example: I’m looking for apples Recall = 3 apples / 6 total apples 4

F-measure Harmonic mean of precision and recall Combined measure that values each the same F 1 = 2 * precision * recall precision + recall 5

Where to use The set is well defined Order of things in the set doesn’t matter 6

But with a Ranked List 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 7

Mean Average Precision Also known as MAP Favored IR metric for ranked retrieval 8

Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 9

Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 2 3 6 101112 9

Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 2 3 6 101112 9

Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 + 2/3 2 3 6 101112 9

Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 + 2/3 + 3/6 2 3 6 101112 9

Computing Average Precision Ordered list = ranked list ( ) ( ) ∑ Precision Rank r ( ) = AP Relevant r ∈ Relevant Relevant Let Relevant = Set of Apples 1/2 + 2/3 + 3/6 + 4/10 + 5/11 + 6/12 2 3 6 101112 9

Compute MAP Apple Query Compute average over Blueberry Query a query set Pineapple Query ( ) ∑ AP Relevant( q ) Banana Query q ∈ Query ( ) = MAP Query Query 10

Limitation of MAP Results can be biased for query sets that include queries with few relevant documents 11

Mean Reciprocal Rank ! if q retrieves no 0 # # relevant documents RR ( q ) = " 1 # otherwise ( ) TopRank q # $ ∑ RR ( q ) q ∈ Query ) = ( MRR Query Query 12

Mean Reciprocal Rank Reciprocal Rank ! if q retrieves no 0 # # relevant documents RR ( q ) = " 1 # otherwise ( ) TopRank q # $ ∑ RR ( q ) q ∈ Query ) = ( MRR Query Query 12

Understanding MRR Ranks 5 15 13

Understanding MRR Ranks 5 15 205 215 13

Understanding MRR Ranks RR values 5 0.2 15 0.067 205 215 13

Understanding MRR Ranks RR values 5 0.2 15 0.067 205 0.0049 215 0.0047 13

Understanding MRR Ranks RR values 5 0.2 15 0.067 205 0.0049 215 0.0047 Average: 110 MRR: 0.069 13

MRR vs. Average Rank MRR=MAP when one relevant document High Ranks does not reflect the importance of Bound result between 0 and 1 those documents in practice Minimizes difference 1 is perfect retrieval between 750 and 900 Average rank greatly influenced by documents retrieved at large ranks 14

Take Home Message P /R and f-measure good for well defined sets MAP good for ranked results when your looking for 5+ things MRR good for ranked results when your looking for <5 things and best when just 1 thing 15

Evaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities - PowerPoint PPT Presentation

Evaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Set Proportion of things of interest in some set Example: Im interested in apples

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Chapter 4 1 SW Metrics SW process and product metrics are quantitative

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Opsim and MAF Metrics Lynne Jones Opsim and MAF metrics Call for white papers on LSST survey

Electron Lifetime Measurement Matt Thiesse 11 January 2017 35-ton Sim/Reco/Ana Meeting 1

StatisticalNLP Spring2010 Lecture14:PCFGs DanKlein UCBerkeley

Harmonic Means of Wishart Matrices Hi! Im Asad Lodhia, Im a postdoc at the University of

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Quadrature Domains in Complex Variables Alan Legg Department of Mathematical Sciences, IPFW

EE E6882 SVIA: Homework 1 Due on October 1, 2007 Shih-Fu Chang, Lexing Xie Monday 4:10-6:30

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Harmonic measure with lower dimensional boundaries Guy David, Universit e de Paris-Sud Joseph

Evaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities - PowerPoint PPT Presentation

Evaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Set Proportion of things of interest in some set Example: Im interested in apples

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Chapter 4 1 SW Metrics SW process and product metrics are quantitative

Software Metrics Overview SE 350 Software Process &amp; Product Quality Lecture Objectives

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Opsim and MAF Metrics Lynne Jones Opsim and MAF metrics Call for white papers on LSST survey

Electron Lifetime Measurement Matt Thiesse 11 January 2017 35-ton Sim/Reco/Ana Meeting 1

StatisticalNLP Spring2010 Lecture14:PCFGs DanKlein UCBerkeley

Harmonic Means of Wishart Matrices Hi! Im Asad Lodhia, Im a postdoc at the University of

Contents averages averages Contents Contents Harmonic mean (average) Harmonic mean (average)

Quadrature Domains in Complex Variables Alan Legg Department of Mathematical Sciences, IPFW

EE E6882 SVIA: Homework 1 Due on October 1, 2007 Shih-Fu Chang, Lexing Xie Monday 4:10-6:30

Evaluating search engines CE-324: Modern Information Retrieval Sharif University of Technology

Harmonic measure with lower dimensional boundaries Guy David, Universit e de Paris-Sud Joseph

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives