Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & - PDF document

Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & Management Ad-hoc Information Retrieval: CS-490W � Search a collection of documents to find relevant documents that Web Information Search & Management satisfy different information needs (i.e. queries) � Example: Web search Basic Concepts of Information Retrieval Luo Si Department of Computer Science Purdue University Basic Concepts of I R: Outline Ad-hoc I R: I ntroduction Basic Concepts of Information Retrieval: Ad-hoc Information Retrieval: � Search a collection of documents to find relevant documents that � Task definition of Ad-hoc IR satisfy different information needs (i.e. queries) � Terminologies and concepts Relatively � Overview of retrieval models Changes Stable � Queries are created and used dynamically; change fast � Text representation � “Ad-hoc”: formed or used for specific or immediate problems or � Indexing needs” – Merriam-Webster’s collegiate Dictionary � Text preprocessing Ad-hoc IR vs. Filtering � Evaluation � Filtering: Queries are stable (e.g., Asian High-Tech) while the � Evaluation methodology collection changes (e.g., news) � Evaluation metrics � More for filtering in later lectures Ad-hoc I R: Terminologies Content Based Filtering Filtering Terminologies: Information Needs are Stable � Query System should make a delivery decision on the fly when a � Representative data of user’s information need: text (default) and document “arrives” other media � Document User Profile: Asian High-Tech � Data candidate to satisfy user’s information need: text (default) and other media � Database|Collection|Corpus Filtering System � A set of documents � Corpora � A set of databases � Valuable corpora from TREC (Text Retrieval Evaluation Conference)

AD-hoc I R: Basic Process Types of Retrieval Models � Exact Match (Document Selection) Information � Example: Boolean Retrieval Method Need � Query defines the exact retrieval criterion Representation Representation � Relevance is a binary variable; a document is either relevant (i.e., match query) or irrelevant (i.e., mismatch) Query Retrieval Model � Result is a set of documents Indexed Objects � Documents are unordered � Often in reverse-chronological order (e.g., Pubmed) Retrieved Objects Return Exact Evaluation/Feedback Match Ignore AD-hoc I R: Overview of Retrieval Model Types of Retrieval Models Retrieval Models � Best Match (Document Ranking) � Boolean � Example: Most probabilistic models � Query describes the desired retrieval criterion � Vector space � Degree of relevance is a continuous/integral variable; � Basic vector space SMART each document matches query to some degree � Extended Boolean � Result in a ranked list ( top ones match better) � Probabilistic models � Often return a partial list (e.g., rank threshold) � Statistical language models Lemur � Two Possion model Okapi Doc1 0.99 + � Bayesian inference networks Inquery Return Doc2 0.90 + Best Doc3 0.85 + � Citation/Link analysis models Match Doc4 0.82 - Rank � Page rank Google Doc5 0.81 + � Hub & authorities Clever Doc6 0.79 - ………………. AD-hoc I R: Overview of Retrieval Model Types of Retrieval Models Retrieval Model Exact Match (Selection) vs. Best Match (Ranking) Determine whether a document is relevant to query � Best Match is usually more accurate/effective � Do not need precise query; representative query generates good � Relevance is difficult to define results � Varies by judgers � Users have control to explore the rank list: view more if need every � Varies by context (i.e., jointly by a set of documents and queries) piece; view less if need one or two most relevant � Different retrieval methods estimate relevance differently � Exact Match � Word occurrence of document and query � Hard to define the precise query; too strict (terms are too specific) or � In probabilistic framework, P(query|document) or too coarse (terms are too general) P(Relevance|query,document) � Users have no control over the returned results � Estimate semantic consistency between query and document � Still prevalent in some markets (e.g., legal retrieval)

AD-hoc I R: Basic Process Text Representation: TREC Format <DOC> Information <DOCNO> AP900101-0001 </DOCNO> Need <FILEID>AP-NR-01-01-90 2345EDT</FILEID> <FIRST>r i PM-Iran-Population Bjt 01-01 0777</FIRST> <SECOND>PM-Iran-Population, Bjt,0800</SECOND> Representation Representation <HEAD>Iran Moves To Curb A Baby Boom That Threatens Its Economic Future</HEAD> <HEAD>An AP Extra</HEAD> Query Retrieval Model <BYLINE>By ED BLANCHE</BYLINE> Indexed Objects <BYLINE>Associated Press Writer</BYLINE> <DATELINE>NICOSIA, Cyprus (AP) </DATELINE> <TEXT> Retrieved Objects Iran's government is intensifying a birth control program _ despite opposition from radicals _ because the country's fast-growing population is imposing strains on a struggling economy. Evaluation/Feedback ………… </TEXT> </DOC> Text Representation: What you see Text Representation: I ndexing Indexing It never leaves my side, April 6, 2002 Associate document/query with a set of keys Reviewer:"dage456" (Carmichael, CA USA) - See all my reviewsIt fits in the palm of your hand and is the size of a deflated wallet (wonder where the money went). � Manual or human Indexing I have had my ipod now for 4 months and cannot imagine how I used to get by with my old rio 600 with is 64 megs of ram and.. usb connection. Because of its � Indexers assign keywords or key concepts (e.g., libraries, Medline, size this little machine goes with my everywhere and its ten hour battery life means I can listen to stuff all day long. Yahoo!); often small vocabulary Pros: size, both physical and capacity. � Significant human efforts, may not be thorough design: It looks beautiful controls: simple and very easy to use � Automatic Indexing connection: FIREWIRE!! Cons: needs the ability to bookmark. I use my ipod mostly for audiobooks. the � Index program assigns words, phrases or other features; often large ipod needs to include a bookmark feature for those like me. vocabulary From Amazon Customer Review of IPod � No human efforts Text Representation: What computer see Text Representation: I ndexing <table><tr><td valign="top"> Controlled Vocabulary vs. Full Text Reviewer:</td> � Controlled Vocabulary Indexing <td><a href="http://www.amazon.com/exec/obidos/tg/cm/member-glance/- /AJF9GJKJ8UGNX/1/ref=cm_cr_auth/002-1193904-0468830?%5Fencoding=UTF8">"dage456"</a> (Carmichael, CA USA) - <a href="http://www.amazon.com/gp/cdp/member- � Often manually but can be done by learning algorithms reviews/AJF9GJKJ8UGNX/ref=cm_cr_auth/002-1193904-0468830?ie=UTF8“> � Full Indexing: See all my reviews</a></td></tr></table>It fits in the palm of your hand and is the size of a deflated wallet (wonder where the money went). I have had my ipod now for 4 � Often index with an uncontrolled vocabulary of full text months and cannot imagine how I used to get by with my old rio 600 with is 64 megs of ram and.. usb connection. Because of its size this little machine goes with my � Automatically while good algorithm can generate more everywhere and its ten hour battery life means I can listen to stuff all day long.Pros: representative keywords/ key concepts size, both physical and capacity. design: It looks beautiful controls: simple and very easy to useconnection: FIREWIRE!!Cons: needs the ability to bookmark. I use my ipod mostly for audiobooks. the ipod needs to include a bookmark feature for those like me. From Amazon Customer Review of IPod

Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & - PDF document

Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & Management Ad-hoc Information Retrieval: CS-490W Search a collection of documents to find relevant documents that Web Information Search & Management satisfy different

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Area 11 Redistricting Ad-Hoc Committee AREA 11 Redistricting Ad-Hoc Committee March 8 th 2017 a

Routing In Ad Hoc Networks 1. Introduction to Ad-hoc networks 2. Routing in Ad-hoc networks 3.

Ad-hoc and Mesh Networks MAP-I Manuel P. Ricardo Faculdade de Engenharia da Universidade do

Mobile Communications Ad-hoc and Mesh Networks Manuel P. Ricardo Faculdade de Engenharia da

Process Slides 1 Directing a Project Mandate REQUEST AN EXCEPTION PLAN Ad Ad Ad Ad hoc

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Wireless Ad Hoc & Sensor Networks Wireless Ad Hoc & Sensor Networks Introduction -

Corridor Routing Routing in Mobile in Mobile Ad Ad- -hoc hoc Networks Networks Corridor

Energy Management Issue in Ad Hoc Networks Outline In ad hoc networks the devices are battery

ZDLRA @ METRONOM 1 0 .2 4 .2 0 1 8 1 I ntroduction Agenda 2 Mission 3 Best Practices 4

Ad Hoc Committee Report March 23, 2016 Committee Update Ad Hoc Committee met on March 7, 2016

Clock Synchronization Chapter 9 Ad Hoc and Sensor Networks Ad Hoc and Sensor Networks

Vehicular Ad-hoc Networking: Current Solutions, Challenges, and Future Applications Stephanie

AD HOC METADATA COMMITTEE UPDATE Steve Averett, Chair August 14, 2013 CHARTER ad hoc

Mobility Increases the Capacity of Ad-hoc Wireless Networks Matthias Grossglauser and David Tse

Lexical-Functional Grammar & Flexible Composition Ash Asudeh Oxford University &

Project 1 slides Principles one can apply to all professional communication Professional

EECS 4441 Human-Computer Interaction Topic #6: Parts of a Research Paper I. Scott MacKenzie York

Table of contents 1. Introduction: You are already an experimentalist 2. Conditions 3. Items

Writing a thesis paper By Shelley torgnyson IEI Facksprk English 1 Importance of

L15 July 16, 2018 1 Lecture 15: Natural Language Processing I CSCI 1360E: Foundations for

Bu Busines ess a and S Sci cien en,fic W fic Wri,ng g I N S T R U C TO R :

Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure Lukas

Sambuz

Useful Links

Newsletter

Mail Us

Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & - PDF document

Ad-hoc I R: I ntroduction CS490W: Web I nformation Search & Management Ad-hoc Information Retrieval: CS-490W Search a collection of documents to find relevant documents that Web Information Search & Management satisfy different

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Area 11 Redistricting Ad-Hoc Committee AREA 11 Redistricting Ad-Hoc Committee March 8 th 2017 a

Routing In Ad Hoc Networks 1. Introduction to Ad-hoc networks 2. Routing in Ad-hoc networks 3.

Ad-hoc and Mesh Networks MAP-I Manuel P. Ricardo Faculdade de Engenharia da Universidade do

Mobile Communications Ad-hoc and Mesh Networks Manuel P. Ricardo Faculdade de Engenharia da

Process Slides 1 Directing a Project Mandate REQUEST AN EXCEPTION PLAN Ad Ad Ad Ad hoc

Mobile Communications Ad-Hoc Networks &amp; Wireless Sensor Networks Ad-hoc networks

Wireless Ad Hoc &amp; Sensor Networks Wireless Ad Hoc &amp; Sensor Networks Introduction -

Corridor Routing Routing in Mobile in Mobile Ad Ad- -hoc hoc Networks Networks Corridor

Energy Management Issue in Ad Hoc Networks Outline In ad hoc networks the devices are battery

ZDLRA @ METRONOM 1 0 .2 4 .2 0 1 8 1 I ntroduction Agenda 2 Mission 3 Best Practices 4

Ad Hoc Committee Report March 23, 2016 Committee Update Ad Hoc Committee met on March 7, 2016

Clock Synchronization Chapter 9 Ad Hoc and Sensor Networks Ad Hoc and Sensor Networks

Vehicular Ad-hoc Networking: Current Solutions, Challenges, and Future Applications Stephanie

AD HOC METADATA COMMITTEE UPDATE Steve Averett, Chair August 14, 2013 CHARTER ad hoc

Mobility Increases the Capacity of Ad-hoc Wireless Networks Matthias Grossglauser and David Tse

Lexical-Functional Grammar &amp; Flexible Composition Ash Asudeh Oxford University &amp;

Project 1 slides Principles one can apply to all professional communication Professional

EECS 4441 Human-Computer Interaction Topic #6: Parts of a Research Paper I. Scott MacKenzie York

Table of contents 1. Introduction: You are already an experimentalist 2. Conditions 3. Items

Writing a thesis paper By Shelley torgnyson IEI Facksprk English 1 Importance of

L15 July 16, 2018 1 Lecture 15: Natural Language Processing I CSCI 1360E: Foundations for

Bu Busines ess a and S Sci cien en,fic W fic Wri,ng g I N S T R U C TO R :

Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure Lukas

Sambuz

Useful Links

Newsletter

Mail Us

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Wireless Ad Hoc & Sensor Networks Wireless Ad Hoc & Sensor Networks Introduction -

Lexical-Functional Grammar & Flexible Composition Ash Asudeh Oxford University &