Learning to Rank with Learning to Rank with Partially-Labeled Data - PowerPoint PPT Presentation

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin Duh University of Washington 1

The Ranking Problem The Ranking Problem • Definition: Given a set of objects, sort them by preference. objectA objectA Ranking Function (obtained via machine learning) objectB objectB objectC objectC 2

Application: Web Search Application: Web Search You enter “uw” into the searchbox… All webpages containing the term “uw”: Results presented to user, after ranking: 1st 2nd 3rd 4th 5th 3

Application: Machine Translation Application: Machine Translation Basic 1 st Pass Decoder translation/language models 1 st : The vodka is good, but the meat is rotten N-best list: 2 nd : The spirit is willing but the flesh is weak 3 rd : The vodka is good. Advanced Ranker (Re-ranker) translation/language models 1 st : The spirit is willing but the flesh is weak 2 nd : The vodka is good, but the meat is rotten 3 rd : The vodka is good. 4

Application: Protein Structure Application: Protein Structure Prediction Prediction Amino Acid Sequence: MMKLKSNQTRTYDGDGYKKRAACLCFSE various protein 1st folding simulations 2nd Ranker 3rd Candidate 3-D Structures 5

Goal of this thesis Goal of this thesis Supervised Labeled Ranking function f(x) Learning Algorithm Data Labeled Data Semi-supervised Ranking function f(x) Learning Algorithm Unlabeled Data Can we build a better ranker by adding cheap, unlabeled data? 6

Emerging field Emerging field Semi-supervised Ranking Semi-supervised Supervised Classification Ranking 7

Outline Outline 1. Problem Setup 1. Background in Ranking 2. Two types of partially-labeled data 3. Methodology 2. Manifold Assumption 3. Local/Transductive Meta-Algorithm 4. Summary Problem Setup | Manifold | Local/Transductive | Summary 8

Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem Labels Query: UW = i x tfidf pagerank ( ) [ , ,...] 3 1 i = x tfidf pagerank ( ) [ , ,...] 1 2 i = x tfidf pagerank ( ) [ , ,...] 2 3 Query: Seattle Traffic j = x tfidf pagerank ( ) [ , ,...] 2 1 j = x tfidf pagerank ( ) [ , ,...] 1 2 Problem Setup | Manifold | Local/Transductive | Summary 9

Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem F x Query: UW ( ) Train such that = i x tfidf pagerank ( ) [ , ,...] 3 > > 1 F x F x F x (1) (1) (1) ( ) ( ) ( ) 1 3 2 > F x F x (2) (2) ( ) ( ) i = x tfidf pagerank ( ) [ , ,...] 1 1 2 2 i = x tfidf pagerank ( ) [ , ,...] 2 Test Query: MSR 3 ? Query: Seattle Traffic j = x tfidf pagerank ( ) ? [ , ,...] 2 1 j = x tfidf pagerank ( ) [ , ,...] 1 ? 2 Problem Setup | Manifold | Local/Transductive | Summary 10

Semi-supervised Data: Some labels are missing Semi-supervised Data: Some labels are missing Labels Query: UW = i x tfidf pagerank ( ) [ , ,...] 3 1 i = x tfidf pagerank ( ) [ , ,...] 1 2 i = x tfidf pagerank ( ) X [ , ,...] 2 3 Query: Seattle Traffic j = x tfidf pagerank ( ) [ , ,...] X 2 1 j = x tfidf pagerank ( ) X [ , ,...] 1 2 Problem Setup | Manifold | Local/Transductive | Summary 11

Two kinds of Semi-supervised Data Two kinds of Semi-supervised Data 1. Lack of labels for some documents (depth) Some references: Query1 Query2 Query3 Amini+, SIGIR’08 Agarwal, ICML’06 Doc1 Label Doc1 Label Doc1 Label Wang+, MSRA TechRep’05 Zhou+, NIPS’04 Doc2 Label Doc2 Label Doc2 Label He+, ACM Multimedia ‘04 Doc3 ? Doc3 ? Doc3 ? 2. Lack of labels for some queries (breadth) Query1 Query2 Query3 This thesis Duh&Kirchhoff, SIGIR’08 Doc1 Label Doc1 Label Doc1 ? Truong+, ICMIST’06 Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? Problem Setup | Manifold | Local/Transductive | Summary 12

Why “Breadth” Scenario Why “Breadth” Scenario • Information Retrieval: Long tail of search queries “20-25% of the queries we will see today, we have never seen before” – Udi Manber (Google VP), May 2007 • Machine Translation and Protein Prediction: • Given references (costly), computing labels is trivial candidate 1 candidate 2 reference similarity=0.3 similarity=0.9 Problem Setup | Manifold | Local/Transductive | Summary 13

Methodology of this thesis Methodology of this thesis 1. Make an assumption about how can unlabeled lists be useful • Borrow ideas from semi-supervised classification 2. Design a method to implement it • 4 unlabeled data assumptions & 4 methods 3. Test on various datasets • Analyze when a method works and doesn’t work Problem Setup | Manifold | Local/Transductive | Summary 14

Datasets Datasets Information Retrieval datasets - from LETOR distribution [Liu’07] - TREC: Web search / OHSUMED: Medical search - Evaluation: MAP (measures how high relevant documents are on list) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 15

Datasets Datasets Machine Translation datasets - from IWSLT 2007 competition, UW system [Kirchhoff’07] - translation in the travel domain - Evaluation: BLEU (measures word match to reference) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 16

Datasets Datasets Protein Prediction dataset - from CASP competition [Qiu/Noble’07] - Evaluation: GDT-TS (measures closeness to true 3-D structure) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 17

Outline Outline 1. Problem Setup 2. Manifold Assumption • Definition • Ranker Propagation Method • List Kernel similarity 3. Local/Transductive Meta-Algorithm 4. Summary Problem Setup | Manifold | Local/Transductive | Summary 18

Manifold Assumption in Classification Manifold Assumption in Classification -Unlabeled data can help discover underlying data manifold -Labels vary smoothly over this manifold + + + + + + Prior work: + + + 1. How to give labels to test samples? + + + - Mincut [Blum01] + - Label Propagation [Zhu03] - - + + - - Regularizer+Optimization [Belkin03] - - - - + - 2. How to construct graph? - - - k-nearest neighbors, eps-ball - - - data-driven methods - - - - - - [Argyriou05,Alexandrescu07] - Problem Setup | Manifold | Local/Transductive | Summary 19

Manifold Assumption in Ranking Manifold Assumption in Ranking Ranking functions vary smoothly over the manifold Each node is a List Edges represent “similarity” between two lists Problem Setup | Manifold | Local/Transductive | Summary 20

Ranker Propagation Ranker Propagation Algorithm: w (1) 1. For each train list, fit a ranker w (4) = ∈ ∈ T d d F x w x w R x R ( ) , w (u) 2. Minimize objective: w (2) ∑ 2 ij i − j K w w ( ) ( ) ( ) || || ∈ ij edges Ranker for list i Similarity between list i,j w (3) u = − uu ul l W inv L L W ( ) ( ) ( ) ( ) ( ) Problem Setup | Manifold | Local/Transductive | Summary 21

Similarity between lists: Similarity between lists: Desirable properties Desirable properties • Maps two lists of feature vectors to scalar K( , ) =0.7 • Work on variable length lists (different N in N-best) • Satisfy symmetric, positive semi-definite properties • Measure rotation/shape differences Problem Setup | Manifold | Local/Transductive | Summary 22

List Kernel List Kernel u (i) u (j) List i List j 2 2 u (i) Step 1: 1 PCA u (j) 1 u (i) u (j) Step 2: Compute 1 1 similarity between axes u (j) u (i) 2 2 λ (i) 2 λ (j) 2 |<u (i) 2 ,u (j) 2 >| M = ∑ ij λ λ i j < i j > K u u ( ) ( ) ( ) ( ) ( ) Step 3: Maximum | , | λ i ⋅ λ j ( ) ( ) / || || || || m a m m a m Bipartite Matching ( ) ( ) = m 1 Problem Setup | Manifold | Local/Transductive | Summary 23

Evaluation in Evaluation in Machine Translation & Protein Prediction Machine Translation & Protein Prediction Ranker Propagation (with List Kernel) outperforms Supervised Baseline (MERT linear ranker) translation Arabic 24.3 * 25.6 58.1 Baseline (MERT) Protein prediction Ranker translation Propagation * Italian 59.1 21.2 22.3 20 30 55 60 * Indicates statistically significant improvement (p<0.05) over baseline Problem Setup | Manifold | Local/Transductive | Summary 24

Learning to Rank with Learning to Rank with Partially-Labeled Data - PowerPoint PPT Presentation

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin Duh University of Washington 1 The Ranking Problem The Ranking Problem Definition: Given a set of objects, sort them by preference. objectA

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Probabilistic Graphical Models 10-708 Learning Partially Observed Learning Partially Observed

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Rayleigh- -Taylor instability Taylor instability Rayleigh in partially ionized in partially

Blind and Partially Sighted Blind and Partially Sighted People People Lifelong Learning

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Partially specified Probabilities: decisions and games May 2007 Ehud Lehrer The problem

SHOCK ACCELERATION SHOCK ACCELERATION IN PARTIALLY IONIZED PLASMAS IN PARTIALLY IONIZED

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Web Mining Mining content Simple rank is confused by rank sinks, e.g. two pages that

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Knowledge Elicitation Exercise COMP34512 Sebastian Brandt brandt@cs.man.ac.uk Wednesday, 5

A Tour of Market Imperfections (Welch, Chapter 11) Ivo Welch Opinions and Disagreements

Mainly nuts and bolts and how they could fit together. 1 We will focus on charged particle

Module 1: Introduction Deriving Business Information Deriving meaningful information from

Rascal The Metaprogramming Language Summer School on Software Technologies and Software

Systems@Google Vamsi Thummala Slides by Prof. Cox DeFiler FAQ Multiple writes to a dFile?

Requirements Elicitation Notes by mainly Jo Anne Atlee, with modifications by Daniel Berry dberry

Mining co-expression networks Nathalie Villa-Vialaneix http://www.nathalievilla.org INRA, Unit