Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ - PowerPoint PPT Presentation

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca’ Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a b o r a t o r y

The Ranking Problem Ranking is at the core of several IR Tasks: • Document Ranking in Web Search • Ads Ranking in Web Advertising • Query suggestion & completion • Product Recommendation • Song Recommendation • … Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 2

The Ranking Problem Definition: Given a query q and a set of objects/documents D , to rank D so as to maximize users’ satisfaction Q . Goal #1: Effectiveness Goal #2: Efficiency • Maximize Q ! • Make sure the ranking process is feasible and not too expensive • but how to measure Q ? • In Bing ... “every 100msec improves revenue by 0.6%. Every millisecond counts.” [KDF+13] [KDF+13] Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., & Pohlmann, N. (2013, August). Online controlled experiments at large scale . In Proceedings of the 19th ACM SIGKDD interna:onal conference on Knowledge discovery and data mining (pp. 1168-1176). ACM. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 3

Agenda 1. Introduction to Learning to Rank (LtR) Background, algorithms, sources of cost in LtR, multi-stage ranking • 2. Dealing with the Efficiency/Effectiveness trade-off Feature Selection, Enhanced Learning, Approximate scoring, Fast Scoring • 3. Hands-on I Software, data and publicly available tools • Traversing Regression Forests, SoA tools and analysis • 4. Hands-on II Training models, Pruning strategies, Efficient scoring • At the end of the day you’ll be able to train a high quality ranking model, and to exploit SoA tools and techniques to reduce its computational cost up to 18x ! Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 4

Document Representations and Ranking Document Representa/ons Ranking Functions A document is a mul/-set of words Term-weighting [SJ72] Vector Space Model [SB88] A document may have fields, it can be split into zones, it can be enriched with external text data BM25 [JWR00] , BM25f [ RZT04 ] (e.g., anchors) Language Modeling [PC98] Addi/onal informa/on may be useful, such as In- Links, Out-Links, PageRank, # clicks, social links, etc. Linear Combination of features [MC07] Hundred signals in public LtR Datasets How to combine hundreds of signals? [SJ72] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval . Journal of documentation, 28(1):11–21, 1972. [SB88] Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval . Information processing & management, 24(5):513–523, 1988. [JWR00] K Sparck Jones, Steve Walker, and Stephen E. Robertson . A probabilistic model of information retrieval: development and comparative experiments . Information processing & management, 36(6):809–840, 2000 [RZT04] Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple bm25 extension to multiple weighted fields . In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 42–49. ACM, 2004. [PC98] Jay M Ponte and W Bruce Croft. A language modeling approach to information retrieval . In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–281. ACM, 1998. [MC07] Donald Metzler and W Bruce Croft . Linear feature-based models for information retrieval . Information Retrieval, 10(3):257–274, 2007. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 5

Ranking as a Supervised Learning Task Training Instance … … d i d 1 d 2 d 3 q … … y i y 1 y 2 y 3 Machine Learning Algo (NeuralNet, SVM, Decision-Tree) Loss Func)on Ranking Model Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 6

Ranking as a Supervised Learning Task Training Instance Run-Time Instance … … … … q d i d i d 1 d 2 d 3 d 1 d 2 d 3 q … … y i y 1 y 2 y 3 Top- k Results d 3 Ranking d 4 Machine Learning Algo Model d 7 (NeuralNet, SVM, Decision-Tree) d 9 Loss Function Scored Documents d 6 … … d i d 1 d 2 d 3 sort d 8 … … s i s 3 s 1 s 2 Ranking d 2 Model Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 7

Relevance Labels Query/Document Generation Representation Useful signals • Explicit Feedback q y • Link Analysis [H+00] • Thousands of Search Quality d • Term proximity [RS03] Raters • Query classification [BSD10] • Absolute vs. Relative • Query intent mining [JLN16, LOP+13] Judgments [CBCD08] • Finding entities documents [MW08] • Implicit Feedback and in queries [BOM15] • Document recency [DZK+10] • clicks/query chains [JGP+05, Joa02, RJ05] • Distributed representations of • Unbiased learning-to-rank [JSS17] words and their compositionality • Minimize annotation cost [MSC+13] • Convolutional neural networks • Active Learning [LCZ+10] [SHG+14] • Deep versus Shallow labelling [YR09] • …. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 8

Evaluation Measures for Ranking Top 10 Binary Graded Retrieved Relevance Relevance Rank Documents Labels Labels ☆ ☆ ☆ ☆ ✓ y 3 1 y 3 d 3 Account for labels: ✗ y 4 ✗ y 4 2 d 4 Q @10 = 4 + 1 + 3 ☆ ✓ y 7 3 y 7 d 7 Precision @10 ✗ y 9 ✗ 4 y 9 d 9 Account for ✗ y 6 ✗ P @10 = 3 5 y 6 d 6 labels and ranks: ✗ y 8 ✗ 6 y 8 10 d 8 ☆ ☆ ☆ Q @10 = 4 1 + 1 3 + 3 ✓ y 2 7 y 2 d 2 ✗ y 5 ✗ 7 8 y 5 d 5 ✗ y 1 ✗ 9 y 1 d 1 ✗ y 10 ✗ 10 y 10 d 10 Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 9

Evaluation Measures for Ranking X Gain ( d r ) · Discount ( r ) Q @ k = Many are in the form: ranks r =1 ...k 1 • (N)DCG [JK00] : Gain ( d ) = 2 y − 1 Discount ( r ) = log( r + 1) • RBP [MZ08] : Discount ( r ) = (1 − p ) p r − 1 Gain ( d ) = I ( y ) • ERR [CMZG09] : i − 1 Y (1 − R j ) with R i = (2 y − 1) / 2 y max Gain ( d ) = R i Discount ( r ) = 1 /r j =1 Do they match User satisfaction ? • ERR correlates better with user satisfaction (clicks and editorials) [CMZG09] • Results Interleaving to compare two rankings [CJRY12] • “major revisions of the web search rankers [Bing] ... The differences between these rankers involve changes of over half a percentage point , in absolute terms, of NDCG” [JK00] Kalervo J arvelin and Jaana Kekalainen. IR evalua)on methods for retrieving highly relevant documents . In Proceedings of the 23rd annual interna[onal ACM SIGIR conference on Research and development in informa[on retrieval, pages 41–48. ACM, 2000. [MZ08] Alistair Moffat and Jus[n Zobel. Rank-biased precision for measurement of retrieval effec)veness . ACM Transac[ons on Informa[on Systems (TOIS), 27(1):2, 2008. [CMZG09] Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. Expected reciprocal rank for graded relevance . In Proceedings of the 18th ACM conference on Informa[on and knowledge management, pages 621–630. ACM, 2009. [CJRY12] Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. Large-scale valida)on and analysis of interleaved search evalua)on . ACM Transac[ons on Informa[on Systems (TOIS), 30(1):6, 2012. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 10

Is it an easy or difficult task? d 2 NDCG@k d 3 Gradient descent cannot be applied directly d 1 Rank-based measures (NDCG, ERR, MAP, …) d 0 depend on documents sorted order d i document score gradient is either 0 (sorted order did not change) • (model parameters) or undefined (discontinuity) Proxy Quality Function d 2 Solution: we need a proxy Loss function • it should be differentiable d 3 • and with a similar behavior of the original cost function d 1 d 0 d i document score (model parameters) Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 11

Point-Wise Algorithms Training Instance Each document is considered d i y i independently from the others • No information about other candidates for the same query is used at training time A different cost-function is optimized Training Algo: GB GBRT • Several approaches: Regression, Multi-Class Loss Function: SSE SSE Classification, Ordinal regression, … [Liu11] Among Regression-Based: Gradient Boosting Regression Trees [Fri01] • Sum of Squared Errors (SSE) is minimized … [Liu11] Tie-Yan Liu. Learning to rank for informa-on retrieval , 2011. Springer. [Fri01] Jerome H Friedman. Greedy func-on approxima-on: a gradient boos-ng machine . Annals of staSsScs, pages 1189–1232, 2001. Lucchese C., Nardini F.M. Efficiency/Effectiveness Trade-offs in Learning to Rank 12

Gradient Boosting Regression Trees Error y-F(d) y Iterative algorithm : X F ( d ) = f i ( d ) Weak t 3 Learner i f 3 (d) predicted document score Each f i is regarded as a step in the best optimization direction, i.e., a steepest descent step : y-f 1 (d) negative gradient t 2 by line-search  ∂ L ( y, f ( d )) � f 2 (d) f i ( d ) = − ρ i g i ( d ) − g i ( d ) = − ∂ f ( d ) f = P j<i f j Given L = SSE/2 : t 1 pseudo-response − ∂ [ 1 = − ∂ [ 1 P ( y − f ( d )) 2 ] 2 SSE ( y, f ( d ))] f 1 (d) 2 = y − f ( d ) ∂ f ( d ) ∂ f ( d ) Gradient g i is approximated by a Regression Tree t i d

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ - PowerPoint PPT Presentation

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ICTIR 2017

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

TRADE-OFFS AMONG AI TRADE-OFFS AMONG AI TECHNIQUES TECHNIQUES Christian Kaestner With slides

Time-memory Trade-offs for Near-collisions Conclusion Combining trunc & codes Time-memory

Chapter 2 Trade-offs, Comparative Advantage, and the Market System Modeling Trade-offs:

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Efficiency and Distributional Trade-Offs in Recycling Carbon Cap-and-Trade Revenues Ian Parry

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

History of Operating Systems What drives these trade-offs? Hardware User Applications

Performance, Information Pattern Trade-offs and Computational Complexity Analysis of a Consensus

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

Oklahoma State Department of Health Health Efficiency & Effectiveness Workgroup Meeting

Image sharpening exercise Running a simple parallel program 1 Reusing this material This work

2.13 Point Kinematics Having introduced in the previous Sections the different representations of

Classical Solutions to Quantum Corrected Gravity K.S. Stelle Imperial College London Abdus Salam

Game Theory: Lecture #11 Outline: Strategic form games Best Response Nash equilibrium

Managing Audit Risks For the year ended 30 June, 2018 Presented by Belinda Aisbett CA. BAcc. SSA

The first mathematical model Modelling a rainbowfish population Dennis den Ouden-van der Horst

10-601B Recitation 2 Calvin McCarter September 10, 2015 1 Least squares problem In this

Class = functions + data (variables) in one unit INF1100 Lectures, Chapter 7: A class packs

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ - PowerPoint PPT Presentation

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018 http://learningtorank.isti.cnr.it/ Claudio Lucchese Franco Maria Nardini Ca Foscari University of Venice HPC Lab, ISTI-CNR Venice, Italy Pisa, Italy l a

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ICTIR 2017

Efficiency/Effectiveness Trade-offs in Learning to Rank Tutorial @ ECML PKDD 2018

TRADE-OFFS AMONG AI TRADE-OFFS AMONG AI TECHNIQUES TECHNIQUES Christian Kaestner With slides

Time-memory Trade-offs for Near-collisions Conclusion Combining trunc &amp; codes Time-memory

Chapter 2 Trade-offs, Comparative Advantage, and the Market System Modeling Trade-offs:

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Efficiency and Distributional Trade-Offs in Recycling Carbon Cap-and-Trade Revenues Ian Parry

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

History of Operating Systems What drives these trade-offs? Hardware User Applications

Performance, Information Pattern Trade-offs and Computational Complexity Analysis of a Consensus

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

PubPol 201 Module 3: International Trade Policy Class 1 Introduction to Trade and Trade Policy

Oklahoma State Department of Health Health Efficiency &amp; Effectiveness Workgroup Meeting

Image sharpening exercise Running a simple parallel program 1 Reusing this material This work

2.13 Point Kinematics Having introduced in the previous Sections the different representations of

Classical Solutions to Quantum Corrected Gravity K.S. Stelle Imperial College London Abdus Salam

Game Theory: Lecture #11 Outline: Strategic form games Best Response Nash equilibrium

Managing Audit Risks For the year ended 30 June, 2018 Presented by Belinda Aisbett CA. BAcc. SSA

The first mathematical model Modelling a rainbowfish population Dennis den Ouden-van der Horst

10-601B Recitation 2 Calvin McCarter September 10, 2015 1 Least squares problem In this

Class = functions + data (variables) in one unit INF1100 Lectures, Chapter 7: A class packs

Time-memory Trade-offs for Near-collisions Conclusion Combining trunc & codes Time-memory

Oklahoma State Department of Health Health Efficiency & Effectiveness Workgroup Meeting