NPFL103: Information Retrieval (6) Result summaries, Relevance - PowerPoint PPT Presentation

Result summaries Relevance feedback Qvery expansion NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel Pecina Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University Original slides are courtesy of Hinrich Schütze, University of Stutugart. 1 / 62 pecina@ufal.mff.cuni.cz

Result summaries Relevance feedback Qvery expansion Contents Result summaries Static summaries Dynamic summaries Relevance feedback Rocchio algorithm Pseudo-relevance feedback Qvery expansion Thesauri 2 / 62

Result summaries Relevance feedback Qvery expansion Result summaries 3 / 62

Result summaries Relevance feedback Qvery expansion How do we present results to the user? 4 / 62 ▶ Most ofuen: as a list of hits – aka “10 blue links” – with description ▶ The hit description is crucial: ▶ The user ofuen can identify good hits based on the description. ▶ No need to “click” on all documents sequentially. ▶ The description usually contains: ▶ documet title, url, some metadata ▶ summary ▶ How do we “compute” the summary?

Result summaries Relevance feedback Qvery expansion Summaries Two basic kinds: (i) static (ii) dynamic: (i) A static summary of a document is always the same, regardless of the query that was issued by the user. (ii) Dynamic summaries are query-dependent. They atuempt to explain why the document was retrieved for the query at hand. 5 / 62

Result summaries Relevance feedback Qvery expansion Static summaries 7 / 62 ▶ In typical systems, the static summary is a subset of the document. ▶ Simplest heuristic: the first 50 or so words of the document ▶ More sophisticated: an extract consisting of a set of “key” sentences ▶ Simple NLP heuristics to score each sentence ▶ Summary is made up of top-scoring sentences. ▶ Machine learning approach ▶ Most sophisticated: complex NLP to synthesize/generate a summary ▶ For most IR applications: not quite ready for prime time yet

Result summaries Relevance feedback Qvery expansion Dynamic summaries contain several of the query terms. small window (e.g., paragraph). the window – all terms, not just the query terms. 9 / 62 ▶ Present one or more “windows” or snippets within the document that ▶ Prefer snippets where query terms occurred as a phrase or jointly in a ▶ The summary that is computed this way gives the entire content of

Result summaries contributor to Government finances), the Asian financial crisis, a decline and a willingness to make service delivery a priority in practice. … public sector management, efgicient fiscal and accounting mechanisms, purposes of equitable and sustainable development, flows from proper management of human, natural, economic and financial resources for the governance, which may be defined as the transparent and accountable that governance issues underly many of the country’s problems. Good in the prices of gold and copper, and a fall in the production of oil. PNG’s guna mine (at that time the most important foreign exchange earner and Relevance feedback as the Bougainville civil war which led to the closure in 1989 of the Pan- governance and civil war, and partly as a result of external factors such difgiculties and economic growth has slowed, partly as a result of weak … In recent years, Papua New Guinea has faced severe economic Snippets (in bold) that were extracted from a document: Qvery: “new guinea economic development” A dynamic summary Qvery expansion 10 / 62 economic development record over the past few years is evidence

Result summaries Relevance feedback Qvery expansion Generating dynamic summaries inverted index – at least not efgiciently. the document. 11 / 62 ▶ Where do we get these other terms in the snippet from? ▶ We cannot construct a dynamic summary from the positional ▶ We need to cache documents. ▶ The positional index tells us: query term occurs at position 4378 in ▶ Byte ofgset or word ofgset? ▶ Note that the cached copy can be outdated ▶ Don’t cache very long documents – just cache a short prefix

Result summaries Relevance feedback Qvery expansion Dynamic summaries answers the query. … we can quickly scan them to find the relevant document to click on. … in many cases, we don’t have to click at all and save time. 12 / 62 ▶ Space on the search result page is limited. ▶ The snippets must be short but also long enough to be meaningful. ▶ Snippets should communicate whether and how the document ▶ Ideally: ▶ linguistically well-formed snippets ▶ should answer the query, so we don’t have to look at the document. ▶ Dynamic summaries are a big part of user happiness because …

Result summaries Relevance feedback Qvery expansion Relevance feedback 13 / 62

Result summaries Relevance feedback Qvery expansion How can we improve recall in search? relevant document for q ! with the (original) query. 14 / 62 ▶ Two ways of improving recall: relevance feedback, query expansion ▶ Example: ▶ query q : [aircrafu] ▶ document d : containing “plane”, but not containing “aircrafu” ▶ A simple IR system will not return d for q even if d is the most ▶ We want to return relevant documents even if there is no term match

Result summaries Relevance feedback Qvery expansion Improving recall e.g., when expanding “jaguar” with “panthera” documents returned on top pages. 1. Local: on-demand analysis for a user query – relevance feedback 2. Global: on-time analysis to produce thesaurus – query expansion 15 / 62 ▶ Goal: increasing the number of relevant documents returned to user ▶ This may actually decrease recall on some measures ▶ which eliminates some relevant documents, but increases relevant ▶ Options for improving recall:

Result summaries Relevance feedback Qvery expansion Relevance feedback: Basic idea 2. The search engine returns a set of documents. 3. User marks some docs as relevant, some as nonrelevant. 4. Search engine computes a new representation of the information need. Hope: betuer than the initial query. 5. Search engine runs new query and returns new results. 6. New results have (hopefully) betuer recall. 16 / 62 1. The user issues a (short, simple) query.

Result summaries Relevance feedback Qvery expansion Relevance feedback without relevance feedback. that highlight difgerent aspects of the process. 17 / 62 ▶ We can iterate this: several rounds of relevance feedback. ▶ We will use the term ad-hoc retrieval to refer to regular retrieval ▶ We will now look at three difgerent examples of relevance feedback

Result summaries Relevance feedback Qvery expansion Relevance Feedback: Example 1 18 / 62

Result summaries Relevance feedback Qvery expansion Results for initial query 19 / 62

Result summaries Relevance feedback Qvery expansion User feedback: Select what is relevant 20 / 62

Result summaries Relevance feedback Qvery expansion Results afuer relevance feedback 21 / 62

Result summaries Relevance feedback Qvery expansion Vector space example: query “canine” (1) source: Fernando Díaz 22 / 62

Result summaries Relevance feedback Qvery expansion Similarity of docs to query “canine” source: Fernando Díaz 23 / 62

Result summaries Relevance feedback Qvery expansion User feedback: Select relevant documents source: Fernando Díaz 24 / 62

Result summaries Relevance feedback Qvery expansion Results afuer relevance feedback source: Fernando Díaz 25 / 62

Result summaries Report Provides Support for the Critics Of Using Big Satellites to Study Staying Within Budget 5 0.525 Scientist Who Exposed Global Warming Proposes Satellites for Cli- mate Research 6 0.524 Climate 0.526 7 0.516 Arianespace Receives Satellite Launch Pact From Telesat Canada + 8 0.509 Telecommunications Tale of Two Companies User then marks relevant documents with “+”. A NASA Satellite Project Accomplishes Incredible Feat: 4 Relevance feedback 1 Qvery expansion Example 3: A real (non-image) example Initial query: [new space satellite applications] Results for initial query: ( r = rank, s = score) r s title + 0.539 Smaller Probes NASA Hasn’t Scrapped Imaging Spectrometer + 2 0.533 NASA Scratches Environment Gear From Satellite Plan 3 0.528 Science Panel Backs NASA Satellite Plan, But Urges Launches of 26 / 62

Result summaries scientist 3.004 bundespost 2.806 ss 2.790 rocket 2.053 2.003 3.446 broadcast 1.172 earth 0.836 oil 0.646 measure Compare to original query: [new space satellite applications] arianespace instrument Relevance feedback 5.660 Qvery expansion Expanded query afuer relevance feedback 2.074 new 15.106 space 30.816 satellite application 3.516 5.991 nasa 5.196 eos 4.196 launch 3.972 aster 27 / 62

NPFL103: Information Retrieval (6) Result summaries, Relevance - PowerPoint PPT Presentation

Result summaries Relevance feedback Qvery expansion NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel Pecina Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles

NPFL103: Information Retrieval (1) Introduction, Boolean retrieval, Inverted index, Text

NPFL103: Information Retrieval (4) Ranked retrieval, Term weighting, Vector space model Pavel

NPFL103: Information Retrieval (8) Language Models for Information Retrieval, Text Classification

NPFL103: Information Retrieval (2) Dictionaries, Tolerant retrieval, Spelling correction Pavel

NPFL103: Information Retrieval (11) Latent semantic indexing Pavel Pecina Institute of Formal

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

NPFL103: Information Retrieval (12) Web search, Crawling, Spam detection Pavel Pecina Institute

NPFL103: Information Retrieval (10) Document clustering Pavel Pecina Institute of Formal and

NPFL103: Information Retrieval (3) Index construction, Distributed and dynamic indexing, Index

NPFL103: Information Retrieval (9) Vector Space Classification Pavel Pecina Institute of Formal

NPFL103: Information Retrieval (5) Ranking, Complete search system, Evaluation, Benchmarks Pavel

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Fisheries Management on Lake Vermilion 2014 Update Duane Williams Large Lake Specialist Tower

OPTIMAL AGING 2013 If You Are Planning on Growing Old Mary Mareck LMSW Mareck Family

Oklahoma Health Care Providers Responsibilities and Rights Under Certain Medical Treatment

South Carolina Surface Water John Boyer, PE, BCEE Quantity Modeling Project Nina Caraway

Comprehensive State Energy Plan (CSEP): An Update Martin R. Hyman Senior Energy Policy Analyst,

A. Yu. Smirnov International Centre for Theoretical Physics, Trieste, Italy Invisibles network

of 7 Mr. Keith Strassner Mr. John Woodson Director, Technology Transfer and Economic Senior

Kelly Hon MRWA Training Specialist July 2014 Michigan WHPP Goal Safeguard public groundwater

Sambuz

Useful Links

Newsletter

Mail Us