learning to rank from pairwise approach to listwise
play

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe - PowerPoint PPT Presentation

Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kmmerle December 2, 2014 Christian Kmmerle (University of


  1. Learning to Rank: From Pairwise Approach to Listwise Approach Zhe Cao Tao Qin Tie-Yan Liu Ming-Feng Tsai Hang Li Microsoft Research Asia, Beijing (2007) Presented by Christian Kümmerle December 2, 2014 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  2. Content 1 Framework: Learning to Rank 2 The Listwise Approach 3 Loss function based on probability model 4 ListNet algorithm 5 Experiments and Conclusion Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  3. Framework: Learning to Rank What is Learning to Rank? Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  4. Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  5. Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g. ◮ PageRank ◮ URL-depth, e.g. http://sifaka.cs.uiuc.edu/ ∼ wang296/Course/IR_Fall/lectures.html has a depth of 4 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  6. Framework: Learning to Rank What is Learning to Rank? Classical IR ranking task: Given a query, rank documents to a list. Query-dependent ranking functions: Vector space model, BM25, Language model Query-independent features of documents: e.g. ◮ PageRank ◮ URL-depth, e.g. http://sifaka.cs.uiuc.edu/ ∼ wang296/Course/IR_Fall/lectures.html has a depth of 4 − → How can we combine all these "features" in order to get a better ranking function? Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  7. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  8. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: Input space Output space Hypothesis space Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  9. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  10. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space ← Neural network Loss function Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  11. Framework: Learning to Rank What is Learning to Rank? Idea: Learn the best way to combine the features from given training data, consisting of queries and corresponding labelled documents. Supervised learning: In the authors’ paper: Input space: X = { x ( 1 ) , x ( 2 ) , . . . } , x ( i ) : List of feature representations of documents for query q i ← Listwise approach Output space: Y = { y ( 1 ) , y ( 2 ) , . . . } , y ( i ) : List of judgements of the relevance degree of the documents for q i ← Listwise approach Hypothesis space ← Neural network Loss function: Probability model on the space of permutations Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  12. The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  13. The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Feature representation in input space: x ( i ) =( x ( i ) 1 , x ( i ) 2 , . . . , x ( i ) n i ) with x ( i ) = Ψ( q ( i ) , d ( i ) ) , e.g. j j x ( i ) =( BM25 ( q ( i ) , d ( i ) ) , LM ( q ( i ) , d ( i ) ) , TFIDF ( q ( i ) , d ( i ) ) , PageRank ( d ( i ) ) , URLdepth ( d ( i ) )) ∈ R 5 j j j j j j Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  14. The Listwise Approach Queries : Q = { q ( 1 ) , q ( 2 ) , . . . , q ( m ) } a set of m queries . List of documents : For query q ( i ) , there are n i documents: d ( i ) =( d ( i ) 1 , d ( i ) 2 , . . . , d ( i ) n i ) . Feature representation in input space: x ( i ) =( x ( i ) 1 , x ( i ) 2 , . . . , x ( i ) n i ) with x ( i ) = Ψ( q ( i ) , d ( i ) ) , e.g. j j x ( i ) =( BM25 ( q ( i ) , d ( i ) ) , LM ( q ( i ) , d ( i ) ) , TFIDF ( q ( i ) , d ( i ) ) , PageRank ( d ( i ) ) , URLdepth ( d ( i ) )) ∈ R 5 j j j j j j List of judgment scores in output space: y ( i ) =( y ( i ) 1 , y ( i ) 2 , . . . , y ( i ) n i ) with implicitly or explicitly given judgement scores y ( i ) for all j documents corresponding to query q ( i ) . � m � ( x ( i ) , y ( i ) ) − → Training data set T = i = 1 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  15. What is a meaningful loss function? We want: Find a function f : X → Y such that the f ( x ( i ) ) are "not very different" from the y ( i ) . − → Loss function penalizes too big differences. Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  16. What is a meaningful loss function? We want: Find a function f : X → Y such that the f ( x ( i ) ) are "not very different" from the y ( i ) . − → Loss function penalizes too big differences. Idea: Just take NDCG! Perfectly ordered list can be derived from the given judgements y ( i ) . Problem: Discontinuity of NDCG with respect to the ranking scores, since NDCG is position based : Example Training query with NDCG = 1 Training query with NDCG = 0 . 86 f ( x ( i ) ) f ( x ( i ) ) 1 . 2 0 . 7 3 . 110 3 . 109 1 . 2 0 . 7 3 . 110 3 . 111 y ( i ) y ( i ) 2 1 4 3 2 1 4 3 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  17. Loss function based on probability model on permutations Solution: Define probability distributions P y ( i ) and P z ( i ) (for z ( i ) := ( f ( x ( i ) 1 ) , . . . , f ( x ( i ) n i )) ) on the set of permutations π on { 1 , . . . , n i } , take the KL divergence as loss function: L ( y ( i ) , z ( i ) ) := − � P y ( i ) ( π ) log ( P z ( i ) ( π )) ∝ KL ( P y ( i ) ( · ) || P z ( i ) ( · )) π Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

  18. Loss function based on probability model on permutations Solution: Define probability distributions P y ( i ) and P z ( i ) (for z ( i ) := ( f ( x ( i ) 1 ) , . . . , f ( x ( i ) n i )) ) on the set of permutations π on { 1 , . . . , n i } , take the KL divergence as loss function: L ( y ( i ) , z ( i ) ) := − � P y ( i ) ( π ) log ( P z ( i ) ( π )) ∝ KL ( P y ( i ) ( · ) || P z ( i ) ( · )) π How to define the probability distribution? E.g. for the set of permutations on { 1 , 2 , 3 } , the scores ( y 1 , y 2 , y 3 ) and the permutation π := ( 1 , 3 , 2 ) : e y 1 e y 2 + e y 3 · e y 2 e y 3 P y ( π ) := e y 1 + e y 2 + e y 3 · e y 2 Definition If π is a permutation on { 1 , . . . , n } , its probability, given the list of scores y of length n , is: n exp ( y π − 1 ( j ) ) � P y ( π ) = � n l = j exp ( y π − 1 ( l ) ) j = 1 Christian Kümmerle (University of Virginia, TU Munich) Learning to Rank: A Listwise Approach

Recommend


More recommend