1 Similarity ranking: example Weighted scoring with linear - PDF document

Table of Content • Weighted scoring for ranking • Learning to rank: A simple example • Learning to ranking as classification Ranking and Learning 290N UCSB, Tao Yang, 2013 Partially based on Manning, Raghavan, and Schütze‘s text book. Scoring Simple Model of Ranking with Similarity • Similarity-based approach  Similarity of query features with document features • Weighted approach: Scoring with weighted features  return in order the documents most likely to be useful to the searcher  Consider each document has subscores in each feature or in each subarea. 1

Similarity ranking: example Weighted scoring with linear combination • A simple weighted scoring method: use a linear combination of subscores:  E.g., Score = 0.6*< Title score> + 0.3*<Abstract score> + 0.1*<Body score>  The overall score is in [0,1]. Example with binary subscores Query term appears in title and body only Document score: (0.6 ・ 1) + (0.1 ・ 1) = 0.7. Example How to determine weights automatically: Motivation • Modern systems – especially on the Web – use a • On the query “ bill rights” suppose that we retrieve great number of features: the following docs from the various zone indexes: – Arbitrary useful features – not a single unified model  Log frequency of query word in anchor text?  Query word highlighted on page?  Span of query words on page Abstract 1 2 bill  # of (out) links on page? Compute the rights  PageRank of page? score  URL length? Title for each doc 3 5 8 bill  URL contains “~”? based on the rights 3 5 9  Page edit recency? weightings  Page length? 0.6,0.3,0.1 Body 1 2 5 9 • Major web search engines use “hundreds” of bill such features – and they keep changing rights 3 5 8 9 2

Sec. 15.4 Machine learning for computing weights Learning weights: Methodology  Given a set of training examples , • How do we combine these signals into a good  each contains (query q , document d , relevance ranker? score r(d,q)).  “machine - learned relevance” or “learning to rank”  r(d,q) is relevance judgment for d on q • Learning from examples  Simplest scheme  These examples are called training data  relevant (1) or nonrelevant (0)  More sophisticated: graded relevance judgments Training Ranking  1 (bad), 2 (Fair), 3 (Good), 4 (Excellent), 5 (Perfect) examples formula  Learn weights from these examples, so that the learned User query and scores approximate the relevance judgments in the training Ranked matched results 10 examples results 10 Simple example Learning w from training examples • Each doc has two zones, Title and Body • For a chosen w  [0,1], score for doc d on query q where: s T ( d , q )  {0,1} is a Boolean denoting whether q matches the Title and s B ( d , q )  {0,1} is a Boolean denoting whether q matches the Body 3

How? Optimizing w • For each example  t we can compute the score • There are 4 kinds of training examples based on • Thus only four possible values for score • We quantify Relevant as 1 and Non-relevant as 0  And only 8 possible values for error • Would like the choice of w to be such that the • Let n 01r be the number of training examples for computed scores are as close to these 1/0 which s T ( d , q )=0, s B ( d , q )=1, judgment = Relevant . judgments as possible • Similarly define n 00r , n 10r , n 11r , n 00i , n 01i , n 10i , n 11i  Denote by r(d t ,q t ) the judgment for  t  • Then minimize total squared error      Error:        2 2 1 ( 1 ) n 0 ( 1 ) n 01 r 01 i Total error – then calculus Generalizing this simple example • Add up contributions from various cases to get • More (than 2) features total error • Non-Boolean features  What if the title contains some but not all query terms … • Now differentiate with respect to w to get  Categorical features (query terms occur in plain, optimal value of w as: boldface, italics, etc) • Scores are nonlinear combinations of features • Multilevel relevance judgments (Perfect, Good, Fair, Bad, etc) • Complex error functions • Not always a unique, easily computable setting of score parameters 4

Learning-based Web Search Framework of Learning to Rank • Given a set of features e 1 ,e 2 ,…,e N , learn a ranking function f ( e 1 ,e 2 ,…,e N ) that minimizes the loss function L .    * f min L f e e ( , ,..., e ), GroundTruth 1 2 N  f F • Some related issues  The functional space F – linear/non-linear? continuous? Derivative?  The search strategy  The loss function Sec. 15.4.1 Sec. 15.4.1 A richer example Using classification for deciding relevance • Collect a training corpus of ( q, d, r ) triples • A linear score function is  Relevance r is still binary for now Score(d, q) = Score(α, ω) = aα + bω + c  Document is represented by a feature vector • And the linear classifier is – x = (α, ω) α is cosine similarity, ω is minimum query Decide relevant if Score(d, q) > θ window size  ω is the shortest text span that includes all query words (Query term proximity in the document) • … just like when we were doing text classification • Train a machine learning model to predict the class r of a document-query pair 5

Sec. 15.4.1 Using classification for deciding More complex example of using relevance classification for search ranking [Nallapati SIGIR 2004] cosine score  0.05 • We can generalize this to classifier functions over Decision R more features R surface N R R R • We can use methods we have seen previously for R R learning the linear classifier weights N N R 0.025 R R N R N N N N N N 0 2 3 4 5 Term proximity  An SVM classifier for relevance Ranking vs. Classification [Nallapati SIGIR 2004] • Classification • Let g ( r | d,q ) = w  f ( d , q ) + b  Well studied over 30 years • Derive weights from the training  Bayesian, Neural network, Decision tree, SVM, Boosting, … examples:  Training data: points – Pos: x1, x2, x3, Neg: x4, x5  want g ( r | d,q ) ≤ −1 for nonrelevant documents x 5 x 4 x 3 x 2 x 1 0  g ( r | d,q ) ≥ 1 for relevant documents • Ranking • Testing:  Less studied: only a few works published in recent years  Training data: pairs (partial order)  decide relevant iff g ( r | d,q ) ≥ 0 – (x1, x2), (x1, x3), (x1, x4), (x1, x5) • Use SVM classifier – (x2, x3), (x2, x4) … – … 6

Sec. 15.4.2 “Learning to rank” Learning to rank: Classification vs. regression • Assume a number of categories C of • Classification probably isn’t the right way to think about score learning: relevance exist  Classification problems: Map to an unordered set of  These are totally ordered: c 1 < c 2 < … < c J classes  This is the ordinal regression setup  Regression problems: Map to a real value • Assume training data is available  Ordinal regression problems: Map to an ordered set consisting of document-query pairs of classes represented as feature vectors ψ i and • This formulation gives extra power: relevance ranking c i  Relations between relevance levels are modeled  Documents are good versus other documents for query given collection; not an absolute scale of goodness Sec. 15.4.1 “Learning to rank” Modified example • Point-wise learning • Collect a training corpus of ( q, d, r ) triples  Given a query-document pair, predict a  Relevance r is here 4 values score (e.g. relevancy score)  Perfect, Relevant, Weak, Nonrelevant • Pair-wise learning • Train a machine learning model to predict the class r of a document-query pair  the input is a pair of results for a query, and the class is the relevance ordering relationship between them • List-wise learning Perfect Nonrelevant  Directly optimize the ranking metric for Relevant Weak each query Relevant Perfect Nonrelevant 7

Sec. 15.4.2 Point-wise learning: Example The Ranking SVM : Pairwise Learning [Herbrich et al. 1999, 2000; Joachims et al. KDD 2002] • Goal is to learn a threshold to separate each rank • Aim is to classify instance pairs as  correctly ranked  or incorrectly ranked • This turns an ordinal regression problem back into a binary classification problem • We want a ranking function f such that c i is ranked before c k : c i < c k iff f (ψ i ) > f (ψ k ) • Suppose that f is a linear function f (ψ i ) = w  ψ i • Thus c i < c k iff w( ψ i - ψ k )>0 Ranking SVM Ranking SVM • Training Set  for each query q , we have a ranked list of documents totally ordered by a person for relevance to the query. • Features  vector of features for each document/query pair  feature differences for two documents d i and d j • optimization problem is equivalent to that of a classification SVM on pairwise difference vectors • Classification Φ (q k , d i ) - Φ (q k , d j )  if d i is judged more relevant than d j , denoted d i ≺ d j  then assign the vector Φ ( d i , d j , q ) the class y ijq =+1; otherwise −1. 8

1 Similarity ranking: example Weighted scoring with linear - PDF document

Table of Content Weighted scoring for ranking Learning to rank: A simple example Learning to ranking as classification Ranking and Learning 290N UCSB, Tao Yang, 2013 Partially based on Manning, Raghavan, and Schtzes text book.

DUAL-CODE POSITION Lecturer: Dr. Benjamin Amponsah, Dept. of Psychology, UG, Legon Contact

THE AHRQ AND LANCET REPORTS ON DEMENTIA INTERVENTIONS: INTERPRETATION AND IMPLICATIONS FOR

Lesson 2.4: Reading the SERP SERP Search Engine Results Page Reading a single result

Making the most of your story: Abstract submission tips from the review committee Webinar Apri

The Future of Advanced Dialogue Applications Simona Gandrabur, Enterprise NLU & Dialogue

AIHce EXP Virtual Advancing Worker Health and Safety 1 Accessing your feedback Before you start

Consumer Protection for Community Solar June 22, 2017 Housekeeping Use the red arrow to open

Cost and Coverage Implications of the ACA Medicaid Expansion: National and State by State

CFM Grand Rounds Continuing Education In order to receive credit for participating today,

To Your Health What the ACA Has Meant and Will Mean for Consumers and Health Insurers NOLHGA

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Next Steps for the ACA in Reaching Uninsured Low-Income Americans Linda Blumberg and Pamela Herd

Double Drift in DUNE: Discussion on HV Implications including trade-offs between ACA and CAC

Come rain or shine Commercial solutions for corporate lives Investigating Dishonesty, Fraud and

Employment law breakfast seminar Thursday 5 June 2014 Justin Govier, Partner Jonathan Bruck,

In search of sofware perfection Xavier Leroy 2019-08-21 Coll` ege de France and Inria 1 A

Todays session will focus on homeworking Discussion on: Will permanent homeworking

Aid, Donors and Corruption: Emerging Issues Liz Hart, Director U4 Anti-Corruption Resource

EGHD/19 10001530 (CET), 15 th November 2018 Brussels 1. Welcome Chairs / Commission 2

through the complexity Can I start a redundancy process while employees are on furlough? ACAS

individual & career questions as England goes back to work Will be starting in a few minutes

Algorithms in the sky: How to design an optimal airspace? Valentin Polishchuk Linkoping

The Business of Charities 2019: Keeping Ahead of the Curve Presented by: Neale Grearson Partner

Graph Dynamical Systems and Coxeter Groups Matthew Macauley Department of Mathematical Sciences

1 Similarity ranking: example Weighted scoring with linear - PDF document

Table of Content Weighted scoring for ranking Learning to rank: A simple example Learning to ranking as classification Ranking and Learning 290N UCSB, Tao Yang, 2013 Partially based on Manning, Raghavan, and Schtzes text book.

DUAL-CODE POSITION Lecturer: Dr. Benjamin Amponsah, Dept. of Psychology, UG, Legon Contact

THE AHRQ AND LANCET REPORTS ON DEMENTIA INTERVENTIONS: INTERPRETATION AND IMPLICATIONS FOR

Lesson 2.4: Reading the SERP SERP Search Engine Results Page Reading a single result

Making the most of your story: Abstract submission tips from the review committee Webinar Apri

The Future of Advanced Dialogue Applications Simona Gandrabur, Enterprise NLU &amp; Dialogue

AIHce EXP Virtual Advancing Worker Health and Safety 1 Accessing your feedback Before you start

Consumer Protection for Community Solar June 22, 2017 Housekeeping Use the red arrow to open

Cost and Coverage Implications of the ACA Medicaid Expansion: National and State by State

CFM Grand Rounds Continuing Education In order to receive credit for participating today,

To Your Health What the ACA Has Meant and Will Mean for Consumers and Health Insurers NOLHGA

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Next Steps for the ACA in Reaching Uninsured Low-Income Americans Linda Blumberg and Pamela Herd

Double Drift in DUNE: Discussion on HV Implications including trade-offs between ACA and CAC

Come rain or shine Commercial solutions for corporate lives Investigating Dishonesty, Fraud and

Employment law breakfast seminar Thursday 5 June 2014 Justin Govier, Partner Jonathan Bruck,

In search of sofware perfection Xavier Leroy 2019-08-21 Coll` ege de France and Inria 1 A

Todays session will focus on homeworking Discussion on: Will permanent homeworking

Aid, Donors and Corruption: Emerging Issues Liz Hart, Director U4 Anti-Corruption Resource

EGHD/19 10001530 (CET), 15 th November 2018 Brussels 1. Welcome Chairs / Commission 2

through the complexity Can I start a redundancy process while employees are on furlough? ACAS

individual &amp; career questions as England goes back to work Will be starting in a few minutes

Algorithms in the sky: How to design an optimal airspace? Valentin Polishchuk Linkoping

The Business of Charities 2019: Keeping Ahead of the Curve Presented by: Neale Grearson Partner

Graph Dynamical Systems and Coxeter Groups Matthew Macauley Department of Mathematical Sciences

The Future of Advanced Dialogue Applications Simona Gandrabur, Enterprise NLU & Dialogue

individual & career questions as England goes back to work Will be starting in a few minutes