Chapitre : Recherche d information et apprentissage Slides - PowerPoint PPT Presentation

Chapitre : Recherche d ’ information et apprentissage Slides empruntés De la présentation Tie-Yan Liu Microsoft Research Asia

Conventional Ranking Models • Query-dependent – Boolean model, extended Boolean model, etc. – Vector space model, latent semantic indexing (LSI), etc. – BM25 model, statistical language model, etc. • Query-independent – PageRank, TrustRank, BrowseRank, Toolbar Clicks, etc.

Generative vs. Discriminative • All of the probabilistic retrieval models (PRP, LM, Inference model presented so far fall into the category of generative models – A generative model assumes that documents were generated from some underlying model (in this case, usually a multinomial distribution) and uses training data to estimate the parameters of the model – probability of belonging to a class (i.e. the relevant documents for a query) is then estimated using Bayes’ Rule and the document model

Discriminative model for IR • Discriminative models can be trained using – explicit relevance judgments – or click data in query logs • Click data is much cheaper, more noisy

Relevance judgement • Degree of relevance l k – Binary: relevant vs. irrelevant – Multiple ordered categories: Perfect > Excellent > Good > Fair > Bad • Pairwise preference l u,v – Document A is more relevant than document B • Total order π l – Documents are ranked as {A,B,C,..} according to their relevance

Apprentissage de l’ordonnacement : Learning to rank

Machine learning can help • Machine learning is an effective tool – To automatically tune parameters. – To combine multiple evidences. – To avoid over-fitting (by means of regularization, etc.) • “Learning to Rank” – In general, those methods that use machine learning technologies to solve the problem of ranking can be named as “learning to rank” methods.

Machine learning • Given a training set of examples, each of which is a tuple of: a query q, a document d, a relevance judgment for d on q • Learn weights from this training set, so that the learned scores approximate the relevance judgments in the training set

Discriminative Training • An automatic learning process based on the training data • With the four pillars of discriminative learning – Input space, (features vectors) – Output space (+1/-1; real value, ranking) – Hypothesis space (function mapping the input to the output) – Function quality (Loss function: risk, error between the hypothesis and the ground truth)

� � Learning to rank: general approach Use � the � Learned � Model � to � Infer � the � Ranking � Use � the � Learned � Model � to � Infer � the � Ranking � of � Documents � for � New � Queries of � Documents � for � New � Queries Learning � the � Ranking � Model � by � Minimizing � a � Learning � the � Ranking � Model � by � Minimizing � a � Loss � Function � on � the � Training � Data Loss � Function � on � the � Training � Data Feature � Extraction � for � Query � document � Pairs Feature � Extraction � for � Query � document � Pairs Collect � Training � Data � Collect � Training � Data � (Queries � and � their � labeled � documents) (Queries � and � their � labeled � documents) � � � � � ��

Example of features 39

Categorization: Basic Unit of Learning • Pointwise – Input: single document – Output: scores or class labels (relevant/non relevant) • Pairwise – Input: document pairs – Output: partial order preference • Listwise – Input: document collections – Output: ranked document List

Catergoriztion of the algorithms

The Pointwise approach The Pointwise Approach Regression Classification Ordinal Regression Input Space Single documents y j Non-ordered Ordinal categories Output Space Real values Categories ( x ) f Hypothesis Space Scoring function Ordinal regression Regression loss Classification loss loss Loss Function ( ; , ) L f x j y j • – • – � � • � � � � • � � �� – � � � � � • � � • � �

The Pointwise approach • Reduce ranking to • – – Regression • • Subset Ranking – � � • – Classification x � � 1 � � • x q �� 2 � � • Discriminative model for IR – � � � � � • MCRank • � x � m • ple – Ordinal regression • PRanking � � ( , ), ( , ),..., ( , ) x y x y x m y 1 1 2 2 m • Ranking with large margin principle

Introduction to Information Retrieval Sec. 15.4.1 Exemple pointwise • Collecter des exemples d’entraînement (q, d, y) triplets – Pertinence r est binaire (peut être graduée) – Document représenté par deux « features » • Le vecteur x=( α , ω ), représenté par deux caractéristiques α est la similarité (entre q et d) , ω est la proximité entre les termes de la requête dans le document – ω est la taille de la partie du texte du document qui inclut tous les mots de la requête • Deux exemples d’approches : – Régression linéaire – Classification

Pointwise approach: linear regression • La pertinence est vue comme une valeur de score • But apprendre la fonction de score qui combine les différentes caractéristiques m ∑ f ( x ) = w i x i + w 0 i = 1 - w les poids ajustés par apprentissage - (x 1 , ..x m ) les caractéristiques du document-requête • Trouver les w i qui réduisent l’erreur suivante : L ( f , x , y ) → 1 2 ∑ (y i - f (x i )) 2 L ( f ; x i , y i ) = f ( x ) − y i 2 n i = 1 • à pertinence (y=1), non pertinence (y=0)

Exemple Régression § Apprendre une fonction de score qui combine les deux « features » (x 1 ,x 2 )= ( α , ω ) f ( d , q ) = w 1 * α ( d , q ) + w 2 * ω ( d , q ) + w 0

Pointwise approach: Classification (SVM) • Ramène la RI à un problème de classification: – Une requête, un document, une classe (Pertinent, non pertinent) (plusieurs catégories) • On cherche une fonction de décision de la forme : – f(x)=sign <(x.w)+b> – On souhaite f(x) ≤ − 1 pour non pertinent et f(x) ≥ 1 pour pertinent

Support Vector Machines B 1 • Find a linear hyperplane (decision boundary) that will separate the data • One Possible Solution

Support Vector Machines B 2 • Another possible solution

Support Vector Machines B 2 • Other possible solutions

Support Vector Machines B 1 B 2 • Which one is better? B1 or B2? • How do you define better?

Support Vector Machines B 1 B 2 b 21 b 22 margin b 11 b 12 • Find hyperplane maximizes the margin => B1 is better than B2

Support Vector Machines B 1 Support Vectors B 2 b 21 b 22 margin b 11 b 12

Support Vector Machines B 1 x 2 x + < w , x > + b = 0 < w . x > + b = + 1 < w , x > + b = − 1 x - b 11 M =Margin Width b 12 x 1 $ f ( ! & ( x + x − ) w 2 1 if <w,x> + b ≥ 1 − ⋅ x ) = M % = = − 1 if <w,x> + b ≤ − 1 w w & '

Linear SVM n Goal: 1) Correctly classify all training data w . x i + b ≥ 1 if y i = +1 wx i + b ≤ 1 if y i = -1 y i ( wx i + b ) ≥ 1 for all i 2 2) Maximize the Margin M = w 1 same as minimize w t w 2 n We can formulate a Quadratic Optimization Problem and solve for w and b n Minimize 1 2 w t w subject to i y i ( wx i + b ) ≥ 1 ∀

Linear SVM(if no separable) Noisy data, outliers, etc. Slack variables ξ i ξ 2 ξ 1 $ 1 if <w,x> + b ≥ 1- ξ i & f ( x ) = % − 1 if <w,x> + b ≤ − 1 + ξ i & '

SVM : Hard Margin v.s. Soft Margin n The old formulation: Find w and b such that Minimize ½ w T w and for all { ( x i ,y i )} y i ( w T x i + b) ≥ 1 n The new formulation incorporating slack variables: Find w and b such that Minimize ½ w T w + C Σ ξ i for all { ( x i ,y i )} y i ( w T x i + b) ≥ 1- ξ i and ξ i ≥ 0 for all i n Parameter C can be viewed as a way to control overfitting.

Sec. ¡15.4.2 ¡ Learning ¡to ¡rank ¡ • Classifica2on ¡(regression) ¡probably ¡isn’t ¡the ¡right ¡way ¡to ¡think ¡ about ¡approaching ¡ad ¡hoc ¡IR: ¡ – Classifica2on ¡problems: ¡Map ¡to ¡a ¡unordered ¡set ¡of ¡classes ¡ – Regression ¡problems: ¡Map ¡to ¡a ¡real ¡value ¡ ¡ – Ordinal ¡regression ¡problems: ¡Map ¡to ¡an ¡ ordered ¡set ¡of ¡ classes ¡ • A ¡fairly ¡obscure ¡sub-‑branch ¡of ¡sta2s2cs, ¡but ¡what ¡we ¡want ¡here ¡ • This ¡formula2on ¡gives ¡extra ¡power: ¡ – Rela2ons ¡between ¡relevance ¡levels ¡are ¡modeled ¡ – Documents ¡are ¡good ¡versus ¡other ¡documents ¡for ¡query ¡ given ¡collec2on; ¡not ¡an ¡absolute ¡scale ¡of ¡goodness ¡

Chapitre : Recherche d information et apprentissage Slides - PowerPoint PPT Presentation

Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation Tie-Yan Liu Microsoft Research Asia Conventional Ranking Models Query-dependent Boolean model, extended Boolean model, etc. Vector space

Grand Chapitre April 17-20, 2015 35th Annual Chapitre of the United Arab Emirates 1981 - 2015

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications Anne Auger

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2

Apprentissage Automatique et Fouille de donnes textuelles Jean-Michel RENDERS Xerox Research

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

La th eorie PAC-Bayes en apprentissage supervis e Pr esentation au LRI de luniversit

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV Sept. 2013 1 Where we are

I A O Inference, Apprentissage & Optimisation Head: Michele Sebag Joint INRIA project

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &

T A O Themes Apprentissage & Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 octobre 2013 1 Validation

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal

CRESITT EVENT IA EMBARQUEE ET RECHERCHE AMONT CEA Presentation for CRESITT | October 17th, 2019

MBAweb Panel 2019-12-23 1 MBA Recherche MBAweb Panel MBAweb Panel Presentation 2019-12-23

Nud : ( ) Nud : ( ) Relation : [ ] Nud : ( ) Relation : [ ] Modles de recherche possibles

t ts str ss

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Submarine platform automation enabler of an optimized crew concept H. Wehner 1 , Dr. M. Mohr 2

Taxi Operational Performance Seminar 2 Notes The Transport for London financial year consists of

The Penrose inequality for the perturbed Schwarzschild initial data J. Tafel University of

Sur lalgorithme de d ecodage en liste de Guruswami-Sudan sur les anneaux finis. Guillaume

Wavelet-based study of dissipation in plasma and fluid flows Romain Nguyen van yen Soutenance

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Chapitre : Recherche d information et apprentissage Slides - PowerPoint PPT Presentation

Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation Tie-Yan Liu Microsoft Research Asia Conventional Ranking Models Query-dependent Boolean model, extended Boolean model, etc. Vector space

Grand Chapitre April 17-20, 2015 35th Annual Chapitre of the United Arab Emirates 1981 - 2015

Master Recherche IAC Apprentissage Statistique, Optimisation &amp; Applications Anne Auger

A &amp; O Apprentissage &amp; Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2

Apprentissage Automatique et Fouille de donnes textuelles Jean-Michel RENDERS Xerox Research

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage &amp; Optimization

La th eorie PAC-Bayes en apprentissage supervis e Pr esentation au LRI de luniversit

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV Sept. 2013 1 Where we are

I A O Inference, Apprentissage &amp; Optimisation Head: Michele Sebag Joint INRIA project

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &amp;

T A O Themes Apprentissage &amp; Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA

M1 Apprentissage Mich` ele Sebag Benoit Barbot LRI LSV 14 octobre 2013 1 Validation

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal

CRESITT EVENT IA EMBARQUEE ET RECHERCHE AMONT CEA Presentation for CRESITT | October 17th, 2019

MBAweb Panel 2019-12-23 1 MBA Recherche MBAweb Panel MBAweb Panel Presentation 2019-12-23

Nud : ( ) Nud : ( ) Relation : [ ] Nud : ( ) Relation : [ ] Modles de recherche possibles

t ts str ss

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Submarine platform automation enabler of an optimized crew concept H. Wehner 1 , Dr. M. Mohr 2

Taxi Operational Performance Seminar 2 Notes The Transport for London financial year consists of

The Penrose inequality for the perturbed Schwarzschild initial data J. Tafel University of

Sur lalgorithme de d ecodage en liste de Guruswami-Sudan sur les anneaux finis. Guillaume

Wavelet-based study of dissipation in plasma and fluid flows Romain Nguyen van yen Soutenance

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications Anne Auger

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

I A O Inference, Apprentissage & Optimisation Head: Michele Sebag Joint INRIA project

Introduction to Machine Learning Part I. Mich` ele Sebag TAO: Theme Apprentissage &

T A O Themes Apprentissage & Optimisation Head: Marc Schoenauer and Michele Sebag EPI INRIA