Learnin ning g Maximal al Marginal nal Re Releva vanc nce e Model via Directly ctly Optim imizi izing ng Diver ersi sity ty Ev Evalua uatio tion Measur ures es Long Xia , Jun Xu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng Key Laboratory of Network Data Science and Technology Institute of Computing Technology Chinese Academy of Sciences
Outline • Background • Rela late ted work rk • Our r appro roach • Experim iments • Summary ry 1
2
Problem of diversity 3
Outline • Background • Rela late ted work rk • Our r appro roach • Experim iments • Summary ry 4
Related work Heuristic approaches Maximal marginal relevance (MMR) • Heuristic criterion (Carbonell and Goldstein, approaches SIGIR’98) Select documents with high • divergence (Zhai et al., SIGIR’03) Minimize the risk of dissatisfaction of • Learning the average user (Agrawal et al., WSDM’09) approaches Diversity by proportionality: an • election-based approach (Dang and Croft, SIGIR’12 ) Diversity … • evaluation measures 5
Related work Heuristic approaches Maximal marginal relevance (MMR) Learning approaches • Heuristic criterion (Carbonell and Goldstein, SVM-DIV: formulate the task as a • approaches SIGIR’98) problem of predicting diverse Select documents with high • subsets (Yue and Joachims, divergence (Zhai et al., SIGIR’03) ICML’04 ) Minimize the risk of dissatisfaction of • REC & RBA: online learning Learning • the average user (Agrawal et al., algorithms based on user’s clicking WSDM’09) approaches behavior (Radlinski et al., ICML’07) Diversity by proportionality: an • R-LTR: a process of sequential • election-based approach (Dang and document selection and optimizing Croft, SIGIR’12 ) Diversity the likelihood of ground truth … • rankings (Zhu et al., SIGIR’14) evaluation measures … • 5
Related work Heuristic approaches Maximal marginal relevance (MMR) Learning approaches • Heuristic criterion (Carbonell and Goldstein, SVM-DIV: formulate the task as a • Diversity evaluation measures approaches SIGIR’98) problem of predicting diverse Subtopic recall (Zhai et al., Select documents with high • • subsets (Yue and Joachims, divergence (Zhai et al., SIGIR’03) SIGIR’03) ICML’04 ) 𝛽 - NDCG (Clarke et al., SIGIR’08) Minimize the risk of dissatisfaction of • • REC & RBA: online learning Learning • the average user (Agrawal et al., ERR-IA (Chapella et al., CIKM’09) • algorithms based on user’s clicking WSDM’09) approaches NRBP (Clarke et al., ICTIT’09) • behavior (Radlinski et al., ICML’07) Diversity by proportionality: an • … • R-LTR: a process of sequential • election-based approach (Dang and document selection and optimizing Croft, SIGIR’12 ) Diversity the likelihood of ground truth … • rankings (Zhu et al., SIGIR’14) evaluation measures … • 5
Maximal marginal relevance (Carbonell and Goldstein, SIGIR’98) 𝑁𝑁𝑆 ≝ 𝐵𝑠 max 𝐸 𝑗 ∈𝑆\S 𝜇 𝑇𝑗𝑛 1 𝐸 𝑗 , 𝑅 − 1 − 𝜇 max 𝐸 𝑗 ∈𝑇 𝑇𝑗𝑛 2 𝐸 𝑗 , 𝐸 𝑘 query-document similarity with relevance selected documents • Advantage • top-down user browsing behavior • Disadvantage • non-learning: limited number of ranking signals • High parameter tuning cost 6
Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Formalization • Four key components: input space, output space, ranking function f, loss function L 𝑂 𝑀 𝐠 𝑌 (𝑗) , 𝑆 (𝑗) , 𝐳 (𝑗) 𝐠 = 𝑏𝑠 min 𝐠∈ℱ 𝑗=1 7
Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Formalization • Four key components: input space, output space, ranking function f, loss function L 𝑂 𝑀 𝐠 𝑌 (𝑗) , 𝑆 (𝑗) , 𝐳 (𝑗) 𝐠 = 𝑏𝑠 min 𝐠∈ℱ 𝑗=1 • Definition of ranking function 𝑈 𝐲 𝑗 + 𝜕 𝑒 𝑈 ℎ 𝑇 𝑆 𝑗 , ∀𝐲 𝑗 ∈ 𝑌\S 𝑔 𝑇 𝐲 𝑗 , 𝑆 𝑗 = 𝜕 𝑠 relevance diversity score score 7
Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Formalization • Four key components: input space, output space, ranking function f, loss function L 𝑂 𝑀 𝐠 𝑌 (𝑗) , 𝑆 (𝑗) , 𝐳 (𝑗) 𝐠 = 𝑏𝑠 min 𝐠∈ℱ 𝑗=1 • Definition of ranking function 𝑈 𝐲 𝑗 + 𝜕 𝑒 𝑈 ℎ 𝑇 𝑆 𝑗 , ∀𝐲 𝑗 ∈ 𝑌\S 𝑔 𝑇 𝐲 𝑗 , 𝑆 𝑗 = 𝜕 𝑠 matrix of relationships relevance diversity Relational between document 𝑦 𝑗 and score score function other documents 7
Relational Learning-to-Rank (Zhu et al., SIGIR’14) • Definition of loss function 𝑀 𝑔 𝑌, 𝑆 , 𝑧 = − log 𝑄 𝑧|𝑌 𝑄 𝑧|𝑌 = 𝑄(𝑦 𝑧 1 , 𝑦 𝑧 2 , ⋯ , 𝑦 𝑧 𝑜 |𝑌) • Plackett-Luce based Probability 𝑜 exp 𝑔 𝑇 𝑘−1 𝑦 𝑧 𝑘 , 𝑆 𝑧 𝑘 𝑄 𝑧 𝑌 = 𝑜 𝑙=𝑘 𝑓𝑦𝑞 𝑔 𝑇 𝑙−1 𝑦 𝑧 𝑙 , 𝑆 𝑧 𝑙 𝑘=1 8
Relational Learning-to-Rank (Zhu et al., SIGIR’14) • R-LTR Pros: • Modeling sequential user behavior in the MMR way • A learnable framework to combine complex features • State-of-the-art empirical performance Can R-LTR be further improved? 9
Motivation • R-LTR Cons: • Only utilizes “positive” rankings, but treat “negative” rankings equally • Not all negative rankings are equally negative (different scores) • How about using discriminative learning which is effective in many machine learning tasks? • Learning objective differs with diversity evaluation measures • How about directly optimizing evaluation measures? 10
Major Idea learn MMR model using both positive and negative rankings Ho How w to ac o achi hieve eve thi his? s? optimize diversity evaluation measures 11
Outline • Background • Rela late ted work rk • Our r appro roach • Experim iments • Summary ry 12
Learning the ranking model Basic loss function 𝐳 (𝑜) is the ranking constructed by the maximal marginal relevance model 𝐾 (𝑜) denotes 𝑂 the human 𝐳 (𝑜) , 𝑲 (𝑜) ) min 𝒈 𝑇 𝑴( labels on the documents 𝑜=1 𝐳 (𝑜) , 𝑲 (𝑜) ) is the function for 𝑴( judging the ‘loss’ of the predicted 𝐳 (𝑜) compared with the ranking human labels 𝑲 (𝑜) 13
Evaluation measures as loss function • Aim to maximize the diverse ranking accuracy in terms of a diversity evaluation measure on the training data Difficult to directly 𝑂 optimize the loss as 1 − 𝐹 𝑌 (𝑜) , 𝒛 (𝑜) , 𝐾 (𝑜) 𝑭 is a non-convex function. 𝑜=1 𝑭 represents the evaluation measures which measures the agreements between the ranking 𝐳 over documents in 𝒀 and the human judgements 𝑲 14
Evaluation measures as loss function • Resort to optimize the upper bound of the loss function 𝑂 1 − 𝐹 𝑌 (𝑜) , 𝒛 (𝑜) , 𝐾 (𝑜) ∙ is one if the condition is 𝑜=1 satisfied otherwise zero Upp pper bou bounded 𝑍 +(𝑜) : positive rankings 𝑂 𝐹 𝑌 (𝑜) , 𝐳 + , 𝐾 (𝑜) − 𝐹(𝑌 (𝑜) , 𝐳 − , 𝐾 (𝑜) ) ∙ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 + ) ≤ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 − ) max 𝐳 + ∈𝑍 +(𝑜) ; 𝑜=1 𝐳 − ∈𝑍 −(𝑜) 𝑍 −(𝑜) : negative rankings 𝐺 𝑌, 𝑆, 𝐳 : the query level ranking model 𝐺 𝑌, 𝑆, 𝐳 = Pr 𝐳|𝑌, 𝑆 = Pr 𝐲 𝐳(1) ⋯ 𝐲 𝐳(𝑁) |𝑌, 𝑆 𝑁−1 Pr 𝐲 𝐳(𝑠) |𝑌, 𝑇 𝑠−1 , 𝑆 = 𝑠=1 exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑠) 𝑁−1 = 𝑠=1 𝑁 𝑙=𝑠 exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑙) 15
Evaluation measures as loss function • Resort to optimize the upper bound of the loss function 𝑂 1 − 𝐹 𝑌 (𝑜) , 𝒛 (𝑜) , 𝐾 (𝑜) ∙ is one if the condition is 𝑜=1 satisfied otherwise zero Upp pper bou bounded 𝑍 +(𝑜) : positive rankings 𝑂 𝐹 𝑌 (𝑜) , 𝐳 + , 𝐾 (𝑜) − 𝐹(𝑌 (𝑜) , 𝐳 − , 𝐾 (𝑜) ) ∙ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 + ) ≤ 𝐺(𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 − ) max 𝐳 + ∈𝑍 +(𝑜) ; 𝑜=1 𝐳 − ∈𝑍 −(𝑜) f 𝑭 ∈ 𝟏, 𝟐 Upp pper bou bounded if 𝑍 −(𝑜) : negative rankings 𝐺 𝑌, 𝑆, 𝐳 : the query level ranking model 𝑂 𝐺 𝑌, 𝑆, 𝐳 = Pr 𝐳|𝑌, 𝑆 𝐺 𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 + − 𝐺 𝑌 (𝑜) , 𝑆 (𝑜) , 𝐳 − ≤ 𝐹 𝑌 (𝑜) , 𝐳 + , 𝐾 (𝑜) − 𝐹 𝑌 (𝑜) , 𝐳 − , 𝐾 (𝑜) = Pr 𝐲 𝐳(1) ⋯ 𝐲 𝐳(𝑁) |𝑌, 𝑆 𝑁−1 Pr 𝐲 𝐳(𝑠) |𝑌, 𝑇 𝑠−1 , 𝑆 = 𝑠=1 𝑜=1 𝐳 + ∈𝑍 +(𝑜) ; 𝐳 − ∈𝑍 −(𝑜) exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑠) 𝑁−1 = 𝑠=1 𝑁 𝑙=𝑠 exp 𝑔 𝑇𝑠−1 𝐲 𝑗 ,𝑆 𝐳(𝑙) 15
Recommend
More recommend