Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Hao Ma, Haixuan Yang, Irwin King, Michael R. Lyu king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
http://www.blifaloo.com/humor/thesaurus.php Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
A Better Mousetrap? Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Challenges • Queries contain • Users tend to submit ambiguous and new short queries consisting terms of only one or two words • apple: “apple • almost 20% one-word computer” or “apple pie”? queries • NDCG:? • almost 30% two-word queries • Users may have little or even no knowledge about the topic they are searching for! Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Problems • Traditional query suggestion • local (i.e., search result sets) • global (i.e., thesauri) document analysis • Hard to remove noise in web pages • Difficult to summarize the latent meaning of documents (ill-posed inverse problem!) Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
What is Clickthrough Data • Query logs recorded by search engines � u, q, l, r, t � • Users’ relevance feedback to indicate desired/preferred/target results Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Joint Bipartite Graph B uq = ( V uq , E uq ) V uq = U ∪ Q U = { u 1 , u 2 , ..., u m } Q = { q 1 , q 2 , ..., q n } E uq = { ( u i , q j ) | there is an edge from u i to q j } is the set of all edges. The edge ( u i , q j ) exists in this bipartite graph if and only if a user u i issued a query q j . B ql = ( V ql , E ql ) V ql = Q ∪ L Q = { q 1 , q 2 , ..., q n } L = { l 1 , l 2 , ..., l p } E ql = { ( q i , l j ) | there is an edge from q i to l j } is the set of all edges. The edge ( q j , l k ) exists if and only if a user u i clicked a URL l k after issuing an query q j . Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Key Points • Two-level latent semantic analysis { • Consider the use of a joint user-query and Level query-URL bipartite graphs for query suggestion 1 • Use matrix factorization for learning query { features in constructing the Query Similarity Level Graph 2 • Use heat diffusion for similarity propagation for query suggestions Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Users Queries URLs 0.8 J I 0.3 0.5 G H 0.1 0.6 0.9 C 0.8 D 0.7 0.2 0.2 0.2 0.3 0.4 F A 0.1 0.1 0.8 E B Bipartite Graphs Query Similarity Graph • Queries are issued by the users, and which URLs to click are also decided by the users • Two distinct users are similar if they issued similar queries • Two queries are similar if they are issued by similar users Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
r ∗ Normalized weight, how many ij times u i issued q j s ∗ Normalized weight, how many jk times q j is linked to l k U i L -dimensional vector of user u i Q j L -dimensional vector of query q j L k L -dimensional vector of URL l k m n 1 � � I R ij − g ( U T i Q j )) 2 H ( R, U, Q ) = min ij ( r ∗ 2 U,Q i =1 j =1 α u F + α q 2 � U � 2 2 � Q � 2 + F p n 1 � � I S jk − g ( Q T j L k )) 2 H ( S, Q, L ) = min jk ( s ∗ 2 Q,L j =1 k =1 α q F + α l 2 � Q � 2 2 � L � 2 + F Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
H ( S, R, U, Q, L ) = p n m n 1 j L k )) 2 + α r � � � � I S jk − g ( Q T I R ij − g ( U T i Q j )) 2 jk ( s ∗ ij ( r ∗ 2 2 j =1 i =1 j =1 k =1 + α u F + α q F + α l 2 � U � 2 2 � Q � 2 2 � L � 2 F , • A local minimum can be found by performing gradient descent in U i , Q j and L k Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Gradient Descent Equations n ∂ H � I R ij g ′ ( U T i Q j )( g ( U T i Q j ) − r ∗ = α r ij ) Q j + α u U i , ∂ U i j =1 p ∂ H � I S jk g ′ ( Q T j L k )( g ( Q T j L k ) − s ∗ = jk ) L k ∂ Q j k =1 m � I R ij g ′ ( U T i Q j )( g ( U T i Q j ) − r ∗ + α r ij ) U i + α q Q j , i =1 n ∂ H � I S jk g ′ ( Q T j L k )( g ( Q T j L k ) − s ∗ = jk ) Q j + α l L k , ∂ L k j =1 Only the Q matrix, the queries’ latent features, is being used to generate the query similarity graph! Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Query Similarity Graph 0.8 J I 0.3 0.5 G H 0.1 0.6 0.9 C 0.8 D 0.7 0.2 0.2 0.2 0.3 0.4 F A 0.1 0.1 k = 4 0.8 E B • Similarities are calculated using queries’ latent features • Only the top- k similar neighbors (terms) are kept Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Similarity Propagation • Based on the Heat Diffusion Model • In the query graph, given the heat sources and the initial heat values, start the heat diffusion process and perform P steps • Return the Top- N queries in terms of highest heat values for query suggestions Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Heat Diffusion Model • Heat diffusion is a physical ∂ T = Q + ∇ · ( k ∇ T ) ρ C P ∂ t phenomena • Heat flows from high temperature Density ρ C P Heat capacity and to low temperature in a medium constant pressure • Heat kernel is used to describe ∂ T Change in temperature ∂ t the amount of heat that one point over time Q Heat added receives from another point k Thermal conductivity • The way that heat diffuse varies ∇ T Temperature gradient when the underlying geometry Divergence ∇ · v Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Heat Diffusion Process Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Similarity Propagation Model f i ( t + ∆ t ) − f i ( t ) Thermal conductivity = α ∆ t Heat value of node i d i (1) at time t w ji − τ i � � f i ( t ) w ik + f j ( t ) α f i ( t ) Heat value of node i d i d j k :( q i ,q k ) ∈ E j :( q j ,q i ) ∈ E at time t Weight between node w ik f (1) = e α H f (0) i and node k (2) f (0) Vector of the initial heat distribution w ji /d j , ( q j , q i ) ∈ E, f (1) Vector of the heat distribution at time 1 − ( τ i /d i ) � k :( i,k ) ∈ E w ik , i = j, H ij = (3) Equal to 1 if node i has 0 , otherwise . τ i outlinks, else equal to 0 Random jump parameter, γ f (1) = e α R f (0) , R = γ H + (1 − γ ) g1 T and set to 0.85 (4) Uniform stochastic g distribution vector Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Discrete Approximation • Compute is time consuming e α R • We use the discrete approximation to substitute � P � I + α f (1) = f (0) P R • For every heat source, only diffuse heat to its neighbors within P steps • In our experiments, P = 3 already generates fairly good results Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Query Suggestion Procedure • For a given query q 1. Select a set of n queries, each of which contains at least one word in common with q , as heat sources 2. Calculate the initial heat values by q = “Sony” q i (0) = |W ( q ) ∩ W (ˆ q i ) | “Sony” = 1 f ˆ “Sony Electronics” = 1/2 |W ( q ) ∪ W (ˆ q i ) | “Sony Vaio Laptop” = 1/3 3. Use to diffuse the heat in graph f (1) = e α R f (0) 4. Obtain the Top- N queries from f (1) Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion Irwin King, CIKM2008, Napa Valley, USA, October 26-30, 2008
Recommend
More recommend