learning to rank with learning to rank with partially
play

Learning to Rank with Learning to Rank with Partially-Labeled Data - PowerPoint PPT Presentation

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin Duh University of Washington 1 The Ranking Problem The Ranking Problem Definition: Given a set of objects, sort them by preference. objectA


  1. Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin Duh University of Washington 1

  2. The Ranking Problem The Ranking Problem • Definition: Given a set of objects, sort them by preference. objectA objectA Ranking Function (obtained via machine learning) objectB objectB objectC objectC 2

  3. Application: Web Search Application: Web Search You enter “uw” into the searchbox… All webpages containing the term “uw”: Results presented to user, after ranking: 1st 2nd 3rd 4th 5th 3

  4. Application: Machine Translation Application: Machine Translation Basic 1 st Pass Decoder translation/language models 1 st : The vodka is good, but the meat is rotten N-best list: 2 nd : The spirit is willing but the flesh is weak 3 rd : The vodka is good. Advanced Ranker (Re-ranker) translation/language models 1 st : The spirit is willing but the flesh is weak 2 nd : The vodka is good, but the meat is rotten 3 rd : The vodka is good. 4

  5. Application: Protein Structure Application: Protein Structure Prediction Prediction Amino Acid Sequence: MMKLKSNQTRTYDGDGYKKRAACLCFSE various protein 1st folding simulations 2nd Ranker 3rd Candidate 3-D Structures 5

  6. Goal of this thesis Goal of this thesis Supervised Labeled Ranking function f(x) Learning Algorithm Data Labeled Data Semi-supervised Ranking function f(x) Learning Algorithm Unlabeled Data Can we build a better ranker by adding cheap, unlabeled data? 6

  7. Emerging field Emerging field Semi-supervised Ranking Semi-supervised Supervised Classification Ranking 7

  8. Outline Outline 1. Problem Setup 1. Background in Ranking 2. Two types of partially-labeled data 3. Methodology 2. Manifold Assumption 3. Local/Transductive Meta-Algorithm 4. Summary Problem Setup | Manifold | Local/Transductive | Summary 8

  9. Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem Labels Query: UW = i x tfidf pagerank ( ) [ , ,...] 3 1 i = x tfidf pagerank ( ) [ , ,...] 1 2 i = x tfidf pagerank ( ) [ , ,...] 2 3 Query: Seattle Traffic j = x tfidf pagerank ( ) [ , ,...] 2 1 j = x tfidf pagerank ( ) [ , ,...] 1 2 Problem Setup | Manifold | Local/Transductive | Summary 9

  10. Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem F x Query: UW ( ) Train such that = i x tfidf pagerank ( ) [ , ,...] 3 > > 1 F x F x F x (1) (1) (1) ( ) ( ) ( ) 1 3 2 > F x F x (2) (2) ( ) ( ) i = x tfidf pagerank ( ) [ , ,...] 1 1 2 2 i = x tfidf pagerank ( ) [ , ,...] 2 Test Query: MSR 3 ? Query: Seattle Traffic j = x tfidf pagerank ( ) ? [ , ,...] 2 1 j = x tfidf pagerank ( ) [ , ,...] 1 ? 2 Problem Setup | Manifold | Local/Transductive | Summary 10

  11. Semi-supervised Data: Some labels are missing Semi-supervised Data: Some labels are missing Labels Query: UW = i x tfidf pagerank ( ) [ , ,...] 3 1 i = x tfidf pagerank ( ) [ , ,...] 1 2 i = x tfidf pagerank ( ) X [ , ,...] 2 3 Query: Seattle Traffic j = x tfidf pagerank ( ) [ , ,...] X 2 1 j = x tfidf pagerank ( ) X [ , ,...] 1 2 Problem Setup | Manifold | Local/Transductive | Summary 11

  12. Two kinds of Semi-supervised Data Two kinds of Semi-supervised Data 1. Lack of labels for some documents (depth) Some references: Query1 Query2 Query3 Amini+, SIGIR’08 Agarwal, ICML’06 Doc1 Label Doc1 Label Doc1 Label Wang+, MSRA TechRep’05 Zhou+, NIPS’04 Doc2 Label Doc2 Label Doc2 Label He+, ACM Multimedia ‘04 Doc3 ? Doc3 ? Doc3 ? 2. Lack of labels for some queries (breadth) Query1 Query2 Query3 This thesis Duh&Kirchhoff, SIGIR’08 Doc1 Label Doc1 Label Doc1 ? Truong+, ICMIST’06 Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? Problem Setup | Manifold | Local/Transductive | Summary 12

  13. Why “Breadth” Scenario Why “Breadth” Scenario • Information Retrieval: Long tail of search queries “20-25% of the queries we will see today, we have never seen before” – Udi Manber (Google VP), May 2007 • Machine Translation and Protein Prediction: • Given references (costly), computing labels is trivial candidate 1 candidate 2 reference similarity=0.3 similarity=0.9 Problem Setup | Manifold | Local/Transductive | Summary 13

  14. Methodology of this thesis Methodology of this thesis 1. Make an assumption about how can unlabeled lists be useful • Borrow ideas from semi-supervised classification 2. Design a method to implement it • 4 unlabeled data assumptions & 4 methods 3. Test on various datasets • Analyze when a method works and doesn’t work Problem Setup | Manifold | Local/Transductive | Summary 14

  15. Datasets Datasets Information Retrieval datasets - from LETOR distribution [Liu’07] - TREC: Web search / OHSUMED: Medical search - Evaluation: MAP (measures how high relevant documents are on list) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 15

  16. Datasets Datasets Machine Translation datasets - from IWSLT 2007 competition, UW system [Kirchhoff’07] - translation in the travel domain - Evaluation: BLEU (measures word match to reference) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 16

  17. Datasets Datasets Protein Prediction dataset - from CASP competition [Qiu/Noble’07] - Evaluation: GDT-TS (measures closeness to true 3-D structure) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 17

  18. Outline Outline 1. Problem Setup 2. Manifold Assumption • Definition • Ranker Propagation Method • List Kernel similarity 3. Local/Transductive Meta-Algorithm 4. Summary Problem Setup | Manifold | Local/Transductive | Summary 18

  19. Manifold Assumption in Classification Manifold Assumption in Classification -Unlabeled data can help discover underlying data manifold -Labels vary smoothly over this manifold + + + + + + Prior work: + + + 1. How to give labels to test samples? + + + - Mincut [Blum01] + - Label Propagation [Zhu03] - - + + - - Regularizer+Optimization [Belkin03] - - - - + - 2. How to construct graph? - - - k-nearest neighbors, eps-ball - - - data-driven methods - - - - - - [Argyriou05,Alexandrescu07] - Problem Setup | Manifold | Local/Transductive | Summary 19

  20. Manifold Assumption in Ranking Manifold Assumption in Ranking Ranking functions vary smoothly over the manifold Each node is a List Edges represent “similarity” between two lists Problem Setup | Manifold | Local/Transductive | Summary 20

  21. Ranker Propagation Ranker Propagation Algorithm: w (1) 1. For each train list, fit a ranker w (4) = ∈ ∈ T d d F x w x w R x R ( ) , w (u) 2. Minimize objective: w (2) ∑ 2 ij i − j K w w ( ) ( ) ( ) || || ∈ ij edges Ranker for list i Similarity between list i,j w (3) u = − uu ul l W inv L L W ( ) ( ) ( ) ( ) ( ) Problem Setup | Manifold | Local/Transductive | Summary 21

  22. Similarity between lists: Similarity between lists: Desirable properties Desirable properties • Maps two lists of feature vectors to scalar K( , ) =0.7 • Work on variable length lists (different N in N-best) • Satisfy symmetric, positive semi-definite properties • Measure rotation/shape differences Problem Setup | Manifold | Local/Transductive | Summary 22

  23. List Kernel List Kernel u (i) u (j) List i List j 2 2 u (i) Step 1: 1 PCA u (j) 1 u (i) u (j) Step 2: Compute 1 1 similarity between axes u (j) u (i) 2 2 λ (i) 2 λ (j) 2 |<u (i) 2 ,u (j) 2 >| M = ∑ ij λ λ i j < i j > K u u ( ) ( ) ( ) ( ) ( ) Step 3: Maximum | , | λ i ⋅ λ j ( ) ( ) / || || || || m a m m a m Bipartite Matching ( ) ( ) = m 1 Problem Setup | Manifold | Local/Transductive | Summary 23

  24. Evaluation in Evaluation in Machine Translation & Protein Prediction Machine Translation & Protein Prediction Ranker Propagation (with List Kernel) outperforms Supervised Baseline (MERT linear ranker) translation Arabic 24.3 * 25.6 58.1 Baseline (MERT) Protein prediction Ranker translation Propagation * Italian 59.1 21.2 22.3 20 30 55 60 * Indicates statistically significant improvement (p<0.05) over baseline Problem Setup | Manifold | Local/Transductive | Summary 24

Recommend


More recommend