Searching with Context Reiner Kraft Farzin Maghoul Chi Chao Chang - PowerPoint PPT Presentation

Searching with Context Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar Yahoo!, Inc., Sunnyvale, CA 94089, USA

Agenda • Motivation • Contextual Search – Introduction – Case Study: Y!Q – Algorithms • Query Rewriting • Rank-Biasing • Iterative, Filtering Meta-search (IFM) • Evaluation and Results • Conclusion Yahoo! Confidential 2

Motivation • Traditional web search based on keywords as good as it gets? – Not too much qualitative differences between search results of major search engines – Introducing anchor text and link analysis to improve search relevancy last major significant feature (1998) • Search can be vastly improved in the dimension of precision • The more we know about a user’s information need, the more precise our results can be • There exists a lot of evidence (context) beyond the terms in the query box from which we can infer better knowledge of information need • Study of web query logs show that users are already employing a manual form of contextual search by using additional terms to refine and reissue queries when the search results for the initial query turn out to be unsatisfactory • => How can we automatically use context for augmenting, refining, and improving a user’s search query to obtain more relevant results? Yahoo! Confidential 3

Contextual Search - General Problems • Gathering evidence (context) • Representing and inferring user information need from evidence • Using that representation to get more precise results Yahoo! Confidential 4

Contextual Search - Terminology • Context – In general: Any additional information associated with a query – More narrow: A piece of text (e.g., a few words, a sentence, a paragraph, an article) that has been authored by someone • Context Term Vector – Dense representation of a context in the vector space model – Obtained using keyword extraction algorithms (e.g., Wen-tau Yih et al., KEA, Y! Content Analysis) • Search Query Types – Simple : Few keywords, no special or expensive operators – Complex : Keywords/phrases plus special ranking operators, more expensive to evaluate – Contextual : Query + context term vector • Search Engine Types – Standard : Web search engines (e.g., Yahoo, Google, MSN, …) that support simple queries – Modified : A Web search engine that has been modified to support complex search queries Yahoo! Confidential 5

Case Study: Y!Q Contextual Search • Acquiring context: – Y!Q provides a simple API that allows publishers to associate visual information widgets (actuators) to parts of page content (http://yq.search.yahoo.com/publisher/embed.html) – Y!Q lets users manually specify or select context (e.g., within Y! Toolbar, Y! Messenger, included JavaScript library) • Contextual Search Application – Generates a digest (context term vector) of the associated content piece as additional terms of interest for augmenting queries (content analysis) – Knows how to perform contextual searches for different search back-end providers (query rewriting framework) – Knows how to rank results based on query + context (contextual ranking) – Seamless integration by displaying results in overlay or embedded within page without interrupting the user’s workflow Yahoo! Confidential 6

Example Yahoo! Confidential 7

Y!Q Actuator Example Yahoo! Confidential 8

Example Y!Q Overlay showing contextual search results Yahoo! Confidential 9

Example Y!Q: Searching in Context Yahoo! Confidential 10

Example CSRP Terms extracted from context Yahoo! Confidential 11

Y!Q System Architecture Yahoo! Confidential 12

Implementing Contextual Search • Assumption: – We have a query plus a context term vector (contextual search query) • Design dimensions: – Number of queries to send to a search engine per contextual search query – Types of queries to send • Simple • Complex • Algorithms: – Query Rewriting (QR) – Rank-Biasing (RB) – Iterative, Filtering, Meta-Search (IFM) Yahoo! Confidential 13

Yahoo! Confidential 14

Algorithm 1: Query Rewriting • Combine query + context term vector using AND/OR semantics • Input Parameters: – Query, context term vector – Number of terms to consider from context term vector • Experimental Setup: – QR1 (takes top term only) – QR2 (takes top two terms only) – … up to QR5 • Example: � ( ) c = a b c d – QR3 : Given query q and => q AND a AND b AND c • Pros: – Simplicity, supported in all major search engines • Cons: – Possibly low recall for longer queries Yahoo! Confidential 15

Algorithm 2: Rank-Biasing • Requires modified search engine with support for RANK operator for rank-biasing • Complex query comprises: – Selection part – Optional ranking terms are only impacting score of selected documents • Input Parameters: – Query, context term vector – Number of selection terms to consider (conjunctive semantics) – Number of RANK operators – Weight multiplier for each RANK operator (used for scaling) • Experimental Setup: RB2 (uses 1 selection term, 2 RANK operators, weight multiplier=0.1) – RB6 (uses 2 selection terms, 6 RANK operators, weight multiplier=0.01) – ⎛ ⎞ a ,50 • Example: ⎜ ⎟ � c = b ,25 ⎜ ⎟ RB2: Given q and => q AND a RANK( b, 2.5) RANK( c, 1.2) – ⎜ ⎟ • ⎝ ⎠ c ,12 Pros: – Ranking terms do not limit recall • Cons: – Requires a modified search engine back-end, more expensive to evaluate Yahoo! Confidential 16

Algorithm 3: IFM • IFM based on concept of Meta-search (e.g., used in Buying Guide Finder [kraft, stata, 2003]) – Sends multiple (simple) queries to possibly multiple search engines – Combines results using rank aggregation methodologies Yahoo! Confidential 17

IFM Query Generation • Uses “query templates” approach: – Query templates specify how sub-queries get constructed from the pool of candidate terms – Allow to explore the problem domain in a systematic way – Implemented primarily sliding window technique using query templates � ( ) c = a b c d – Example: Given query q and => a sliding window query template of size 2 may construct the following queries: • q a b • q b c • q c d • Parameters: – Size of the sliding window • Experimental Setup: – IFM-SW1, IFM-SW2, IFM-SW3, IFM-SW4 Yahoo! Confidential 18

IFM uses Rank Aggregation for combining different result sets • Rank aggregation represents a robust and principled approach of combining several ranked lists into a single ranked list Given universe U , and k ranked lists π 1 , …, π k on the elements • of the universe k ∑ d ( π * , π i ) – Combine k lists into π *, such that i = 1 is minimized – For d(.,.) we used various distance functions (e.g,. Spearman footrule, Kendall tau) • Parameters: – Style of rank aggregation: • Rank averaging (adaptation of Borda voting method) • MC4 (based on Markov chains,more computationally expensive) • Experimental Setup: – IFM-RA, IFM-MC4 Yahoo! Confidential 19

Experimental Setup and Methodology • Benchmark – 200 contexts sampled from Y!Q query logs • Tested 41 configurations – 15 QR (Yahoo, MSN, Google) – 18 RB (1 or 2 selection terms; 2, 4, or 6 RANK operators, 0.01, 0.1, or 0.5 weight multipliers) – 8 IFM (avg and MC4 on Yahoo, SW1 to SW4) • Per item test – Relevancy to the context, perceived relevancy used – Relevancy Judgments: • Yes • Somewhat • No • Can’t Tell – 28 expert judges, look at top 3 results, total of 24,556 judgments Yahoo! Confidential 20

Example • Context: – “Cowboys Cut Carter; Testaverde to Start OXNARD, Calif Quincy Carter was cut by the Dallas Cowboys on Wednesday, leaving 40-year-old Vinny Testaverde as the starting quarterback. The team would’nt say why it released Carter.” • Judgment Examples: – A result directly relating to the “Dallas Coyboys” (football team) or Quincy Carter => Yes – A result repeating the same or similar information => Somewhat – A result about Jimmy Carter, the former U.S. president => No – If result doesn’t provide sufficient information => Can’t tell Yahoo! Confidential 21

Metrics • Strong Precision at 1 (SP@1) and 3 (SP@3) – Number of relevant results divided by the number of retrieved results, but capped at 1 or 3, and expressed as a ratio – A result is considered relevant if and only if it receives a ‘Y’ relevant judgment • Precision at 1 (P@1) and 3 (P@3) – Number of relevant results divided by the number of retrieved results, but capped at 1 or 3, and expressed as a ratio – A result is considered relevant if and only if it receives a ‘Y’ or ‘S’ relevant judgment Yahoo! Confidential 22

Searching with Context Reiner Kraft Farzin Maghoul Chi Chao Chang - PowerPoint PPT Presentation

Searching with Context Reiner Kraft Farzin Maghoul Chi Chao Chang Ravi Kumar Yahoo!, Inc., Sunnyvale, CA 94089, USA Agenda Motivation Contextual Search Introduction Case Study: Y!Q Algorithms Query Rewriting

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

Searching in speech Language and Keyword searching in OSCAR Language and Computers Computers

Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching

Searching Documents and Pages Searching Documents and Pages Searching Documents and Pages Prof.

Searching and Sorting Mason Vail, Boise State University Computer Science Searching Searching is

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Searching Tiziana Ligorio 1 Todays Plan Searching algorithms and their analysis 2

Sorting and Searching Topic 11 Sorting and Searching S ti d S hi Fundamental problems in

CSN08101 Digital Forensics Lecture 3: Linux Searching Lecture 3: Linux Searching Module Leader:

#3: Trademark Two Problems Eric R. Waltmire Searching Patent Attorney 1. Not Searching 2.

Sorting and Searching Topic 14 Searching and Simple Sorts Fundamental problems in computer

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

Context Sensitivity Example of a CSG Informatics 2A: Lecture 26 2 Context in Programming

Searching in a Public Library Searching in a Public Library Some Experiences with the Search

Searching for information on-line iClicker Question I know a lot about searching for information

Searching Sequence databases 1: Searching Sequence databases 1: Blast Blast The Central dogma

Search and Information Retrieval Search on the Web 1 is a daily activity for many people

Web Search Engines Chapter 27, Part C Based on Larson and Hearsts slides at UC-Berkeley

External Plagiarism Detection using Information Retrieval and Sequence Alignment Rao Muhammad

Uncertainty in compositional models of alignment Ieva Kazlauskaite, University of Bath Neill D.F.

CS490W Without search engines the web wouldnt scale The acceptance of search interaction makes

Developing a Concept-Oriented Search Engine for Isabelle Based on Natural Language: Technical

Information Retrieval CS6200 Jesse Anderton College of Computer and Information Science

Identifying Web Spam Identifying Web Spam With User Behavior Analysis With User Behavior