search quality evaluation
play

Search Quality Evaluation Tools and Techniques Alessandro Benedetti, - PowerPoint PPT Presentation

Search Quality Evaluation Tools and Techniques Alessandro Benedetti, Software Engineer Andrea Gazzarini, Software Engineer 2 nd October 2018 Who we are Alessandro Benedetti Search Consultant R&D Software Engineer Master in


  1. Search Quality Evaluation Tools and Techniques Alessandro Benedetti, Software Engineer Andrea Gazzarini, Software Engineer 2 nd October 2018

  2. Who we are Alessandro Benedetti ▪ Search Consultant ▪ R&D Software Engineer ▪ Master in Computer Science ▪ Apache Lucene/Solr Enthusiast ▪ Semantic, NLP, Machine Learning Technologies passionate ▪ Beach Volleyball Player & Snowboarder

  3. Who we are Andrea Gazzarini, “Gazza” ▪ Software Engineer (1999-) ▪ “Hermit” Software Engineer (2010-) ▪ Java & Information Retrieval Passionate ▪ Apache Qpid (past) Committer ▪ Husband & Father ▪ Bass Player

  4. Sease Sea rch Se rvices ● Open Source Enthusiasts ● Apache Lucene/Solr experts Community Contributors ● ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Measuring Search Quality, Relevancy Tuning

  5. Agenda ✓ Search Quality Evaluation Context overview ‣ Correctness ‣ Evaluation Measures ‣ ➢ Rated Ranking Evaluator (RRE) ➢ Future Works ➢ Q&A

  6. Search Quality Evaluation Context Overview Search Quality Internal Factors External Factors Understandability Timeliness Reusability Modularity Maintainability Search engineering is the production of quality Efficiency Reusability Testability Extendibility search systems . Robustness Readability Maintainability Correctness Search quality (and in general software quality) is a …. huge topic which can be described using internal and external factors . In the end, only external factors matter , those that Focused on can be perceived by users and customers . But the Primarily focused on key for getting optimal levels of those external factors are the internal ones . One of the main differences between search and software quality (especially from a correctness perspective) is in the ok / ko judgment , which is, in general, more “deterministic” in case of software development.

  7. Search Quality Evaluation : Correctness New system Existing system Correctness Here are the requirements We need to improve our search Ok system, users are complaining about junk in search results. Ok v0.1 Correctness is the ability of a system to perform its … exact task, as defined by its specification. Cool! v0.9 v1.1 V1.0 has been released Search domain is critical from this perspective because correctness depends on arbitrary user judgments. v1.2 For each internal (gray) and external (red) iteration a month later… v2.0 we need to find a way to measure the correctness . v1.3 We found a bug We have a change request. Evaluation measures for an information retrieval system are used to assert how well the search results … satisfied the user's query intent. v2.0 How can we know where our system is going between versions, in terms of correctness?

  8. Search Quality Evaluation / Measures Evaluation Measures Evaluation Measures Online Measures Offline Measures Click-through rate Precision Recall F-Measure NDCG Evaluation measures for an information retrieval Session abandonment rate Mean Reciprocal Rank system try to formalise how well a search system Zero result rate satisfies its user information needs. Average Precision …. Session success rate …. Measures are generally split into two categories: online and offline measures . In this context we will focus on offline measures. We will talk about something that can help a search engineer during his ordinary day (i.e. in those phases We are mainly focused here previously called “ internal iterations ”) We will also see how the same tool can be used for a broader usage, like contributing in the continuous integration pipeline or even for delivering value to functional stakeholders .

  9. Agenda ➢ Search Quality Evaluation ✓ Rated Ranking Evaluator (RRE) What is it? ‣ How does it work? ‣ Evaluation Process Input & Output ‣ Challenges ‣ ➢ Future Works ➢ Q&A

  10. RRE : What is it? https://github.com/SeaseLtd/rated-ranking-evaluator RRE: What is it? • A set of search quality evaluation tools • A search quality evaluation framework • Multi ( search ) platform • Written in Java • It can be used also in non-Java projects • Licensed under Apache 2.0 • Open to contributions • Extremely dynamic!

  11. RRE : At a glance 2 10 48950 2 10 Apache Lucene/Solr London ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________ ____________ ____________ ____________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ People Modules Lines of Code ____________ Months Modules 2 10 67317 5 10 ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________ ____________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________ ____________ ____________ People Modules Lines of Code Months Modules

  12. RRE : Ecosystem RRE Ecosystem Archetypes Plugin Plugin The picture illustrates the main modules composing Reporting Plugin the RRE ecosystem. All modules with a dashed border are planned for a Plugin future release . Plugin RRE CLI has a double border because although the rre-cli module hasn’t been developed, you can run CORE RRE from a command line using RRE Maven Search archetype , which is part of the current release. Platform RequestHandler API As you can see, the current implementation includes two target search platforms: Apache Solr and RRE Server RRE CLI Elasticsearch . The Search Platform API module provide a search platform abstraction for plugging-in additional search systems. Plugin

  13. RRE : Available metrics Precision Available Metrics Recall Precision at 1 (P@1) Precision at 2 (P@2) These are the RRE built-in metrics which can be used out of the box. Precision at 3 (P@3) The most part of them are computed at query level Precision at 10 (P@10) and then aggregated at upper levels. Average Precision (AP) However, compound metrics (e.g. MAP, or GMAP ) are not explicitly declared or defined, because the Reciprocal Rank computation doesn’t happen at query level. The result Mean Reciprocal Rank of the aggregation executed on the upper levels will automatically produce these metric. Mean Average Precision (MAP) For example, the Average Precision computed for Normalised Discounted Cumulative Gain Q1, Q2, Q3, Qn becomes the Mean Average Precision at Query Group or Topic levels. Compound Metric F-Measure

  14. RRE : Domain Model (1/2) RRE Domain Model Evaluation Top level domain entity 1..* Corpus Test dataset / collection RRE Domain Model is organized into a composite / 1..* tree-like structure where the relationships between Topic Information need entities are always 1 to many . The top level entity is a placeholder representing an 1..* Query Group Query variants evaluation execution . Versioned metrics are computed at query level and 1..* Query Queries then reported, using an aggregation function , at upper levels. The benefit of having a composite structure is clear: v1.0 v1.1 v1.2 v1.n we can see a metric value at different levels (e.g. a F-MEASURE F-MEASURE F-MEASURE … F-MEASURE query, all queries belonging to a query group, all P@10 AP P@10 AP P@10 AP P@10 AP queries belonging to a topic or at corpus level) NDCG …. NDCG …. NDCG …. NDCG ….

Recommend


More recommend