Time- -dependent Similarity Measure dependent Similarity Measure - PowerPoint PPT Presentation

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity Measure of Queries Using Historical Click- - of Queries Using Historical Click- of Queries Using Historical Click through Data through Data through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan Liu * This work was done when Zhao and Hoi were interns at Microsoft Research Asia

Outline Outline � Background � Observations and Motivation � Our approach � Empirical study � Future work

� � Background Background • A dilemma for Web search engines Very short queries ~2.5 Inconsistency of term usages • The Web is not well-organized • Users express queries with their own vocabulary

� � � Background (cont’d) Background (cont’d) • Solution: query expansion Document term based expansion (KDD00, SIGIR05) • a query can be expanded with top keywords in the top- k relevant documents Query term based expansion (WWW02, CIKM04) • a query can be expanded with similar queries (queries are similar if they lead to similar pages, pages are similar if they are visited by issuing similar queries) Click-though data were used for query expansion in many previous work.

� Background (cont’d) Background (cont’d) • Click-through data Log data about the interactions between users and Web search engines • Typical Click-through data representation

Observation 1 Observation 1 • Accuracy of query similarity Calculated from all the click- through data before that time point Calculated only from the click- through data in that time interval. (month)

Observation 2 Observation 2 the keyword “firework” and related pages are becoming more popular one week before the event and reach the peak on July 4th • Event driven and dynamic character of query similarity “firework + market" and “firework + show" become popular and reach their peaks a few days before July 4th “firework + injuries" and “firework + “firework + injuries" and “firework + picture“ have a little delay in terms of the picture“ have a little delay in terms of the number of times being issued and visited. number of times being issued and visited.

Motivations Motivations • Exploit the click-through data for semantic similarity of queries by incorporating temporal information • To combine explicit content similarity and implicit semantic similarity

Our Approach Our Approach

� � � Time-Dependent Concepts Time-Dependent Concepts • Calendar schema and pattern • Example Calendar schema <day, month, year> Calendar pattern <15, *,*> <15, 1, 2002> is contained in the pattern <15, *,*>

� Time-Dependent Concepts Time-Dependent Concepts • Click-Through Subgroup • Example Based on the schema <day, week>, and the pattern <1,*>, <2,*>,…,<7,*>, we can partition the data into 7 groups, which correspond to Sun, Mon, Tue, …, Sat.

� � Similarity Measure Similarity Measure • For efficiency and simplicity, we measure the query similarity in a certain time slot only based on the click-through data. Vector representation of queries with respect to clicked documents. w i is defined by Page Frequency (PF) and Inverted Query Frequency (IQF)

� � Similarity Measure Similarity Measure • Query similarity measures Cosine function Marginalized kernel • By introducing query clusters, one can model the query similarity in a more semantic way. •

Time-Dependent Similarity Measure Time-Dependent Similarity Measure

� � Empirical Evaluation Empirical Evaluation • Dataset Click-through log of a commercial search engine: • June 16, 2005 to July 17,2005 • Total size of 22GB • Only queries from US Calendar schema and pattern • <hour, day, month>, <1, *, *>, <2, *, *>, … • Divide the data into 24 subgroups • Average subgroup size: 59,400,000 query-page pairs

Empirical Examples Empirical Examples • Kids+toy, map+route ��

Empirical Examples Empirical Examples • weather + forecast, fox + news ��

� � � Quality Evaluation Quality Evaluation • Experimental Settings Partition 32-day dataset into two parts • First part for model construction • Second part for model evaluation Accuracy is defined as the percentage of difference between the actual similarity and the model-based prediction 1000 representative query pairs, similarity larger than 0.3 using the entire dataset • Half of them are top queries of the month • Half are selected manually related to real world events such as “hurricane”.

Experimental Results Experimental Results

Experimental Results Experimental Results �� “ �� ” �� For example, when the distance is 1 and the training data size is 10 , we summarize all the accuracy values that use the I to 10+i days as training and use the 10+1+i as testing.

Experimental Results Experimental Results

Conclusion Conclusion � Presented a preliminary study of the dynamic nature of query similarity using click-through data � Observed and verified that query similarity are dynamic and event driven with real data � Proposed an time-dependent model � For our future work, we will investigate an adaptive way to determine the most suitable time granularity for two given queries.

Thanks! Thanks! tyliu@microsoft.com http://research.microsoft.com/users/tyliu

Time- -dependent Similarity Measure dependent Similarity Measure - PowerPoint PPT Presentation

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity Measure of Queries Using Historical Click- - of Queries Using Historical Click- of Queries Using Historical Click through Data through Data

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Slide 1 SPHSC 569 Dependent Variables Slide 2 Dependent Variable Data Collection What to

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Regional Measure 3 May 16, 2017 SFMTA Board of Directors Regional Measure 3 Prior Regional

Polynomial Julia sets with positive measure Why bother? Quasiconformal NILF Measure 0? Measure

A Semantic Similarity Measure for Formal Ontologies Mark Hall Final presentation for the master

Similarity Measures There are an enormous number of ways in which we can measure similarity

Seconds Aim I can measure and record time in seconds. Success Criteria I can measure time

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Dependent Eligibility Audit Dependent Eligibility Audit Purpose: The dependent eligibility audit

What is Measure FF? Measure FF is on the November 2018 ballot to extend existing,

COMMUNITY UPDATE Measure AA Voter Information CITY OF WILDOMAR Fall 2018 Measure AA on November

Measure M Draft Guidelines Workshop March 9, 2017 1 Introduction Measure M is Distinct from

Performance and Benefits Realisation HOW TO OPTIMISE AND MEASURE THE HOW TO OPTIMISE AND MEASURE

1 Introductions Measure H: Background Measure H: Bond Program Progress Measure H:

Response prediction using collaborative filtering with hierarchies and side-information Aditya

Enabling Operator Reordering in Data Flow Programs Through Static Code Analysis XLDI 2012 Fabian

Numerical Optimization Techniques L eon Bottou NEC Labs America COS 424 3/2/2010

Random Graphs Lecture 10 CSCI 4974/6971 3 Oct 2016 1 / 11 Todays Biz 1. Reminders 2.

Deep Character-Level Bora Edizel - Phd Student UPF Click-Through Rate Prediction Amin Mantrach -

4 Idiots Approach for Click-through Rate Prediction 1/15 Team Members 4 Idiots consist of:

Web Mining and Recommender Systems Algorithms for advertising Learning Goals Introduce the

Dynamic Marginal Contribution Mechanism Dirk Bergemann and Juuso Vlimki DIMACS: Economics and