Information Needs IR, session 2 CS6200: Information Retrieval Slides by: Jesse Anderton
Information Retrieval • Information Retrieval is the field of Computer Science concerned with finding the information from a collection that is relevant to a user’s information need, as expressed by a query. • This general task has many possible concrete formalizations, which define collection, information need, relevance, and query more precisely. • Let’s look at several examples to get a sense of the field’s scope.
Ad-Hoc Search • Currently, the most important search task is ad hoc search on the Internet. • The collection is the set of web pages indexed by the search engine. • The information need is the web content the user is looking for. • The query is an ordered list of keywords. • A document is relevant if it contains text on the same topic as the query.
Vertical Search • Vertical Search focuses on information from a particular domain: flights, music, news, sports, etc. • The collection might be the set of all airline fares, research papers, or blog posts. • An information need can be very specific: “the cost of a flight to Iceland tomorrow” • The query may be structured using a web form, providing specific property values to search for. • Document relevance is sometimes less ambiguous: matching the search fields.
Enterprise Search • Enterprise Search is vertical search run against a company’s internal content. • The collection is the set of documents, e-mails, forum threads, wiki pages, etc. in the company’s internal network. • Information needs, queries, and relevance are typically defined as for ad-hoc search.
Desktop Search • Desktop Search focuses on searching the contents of your computer. • The collection is the set of files (and contacts, messages, events, etc.) stored on your computer. • An information need is generally either a file, or information stored in a file. • A query can be a list of keywords as in ad-hoc search, or a list of property values in a custom query language. • Relevance is defined as in ad-hoc search.
Peer to Peer Search • Peer to Peer Search focuses on finding content shared on peer to peer networks. • The collection is the set of all files currently shared by any peer on the network. • An information need is a particular file, e.g. a music video. • A query is often a keyword list, but may use an extended query language. • A document is relevant only if it’s the file the user wanted.
Question Answering • Question Answering tries to answer questions posed as normal dialog. • Information needs are usually restricted to concisely-stated answers. • Queries are posed as a single sentence of natural language text. • A response is relevant if it answers the question correctly, and if it is expressed clearly (e.g. fluently).
Wrapping Up • Information Retrieval is the field of Computer Science concerned with finding the information from a collection that is relevant to a user’s information need, as expressed by a query. • A query is an expression of an information need, and not the need itself. • Next, we’ll take a look at some of the distinct types of information needs users have.
Recommend
More recommend