Bayesian Classifiers LM, session 2 CS6200: Information Retrieval - PowerPoint PPT Presentation

Bayesian Classifiers LM, session 2 CS6200: Information Retrieval Slides by: Jesse Anderton

Ranking with Probabilistic Models Imagine we have a function that gives us the probability that a document D is relevant to a query Q , P ( R =1| D, Q ) . We call this function a probabilistic model , and can rank documents by decreasing probability of relevance. There are many useful models, which differ by things like: • Sensitivity to different document properties, like grammatical context • Amount of training data needed to train the model parameters • Ability to handle noise in document data or relevance labels For simplicity here, we will hold the query constant and consider P ( R =1| D ) .

The Flaw in our Plan D=1 D=2 D=3 D=5 D=4 Suppose we have documents and relevance labels, and we want to R=1 R=1 R=0 R=0 R=0 empirically measure P ( R=1 | D ) . Each document has only one relevance label, so every probability is P ( R = 1 | D ) = 1 P ( R = 1 | D ) = 0 either 0 or 1. Worse, there is no way to generalize to new documents. D=1 D=2 D=3 D=4 D=5 Instead, we estimate the probability of P(D|R=1) 1/2 1/2 0 0 0 documents given relevance labels, P ( D | R =1) . P(D|R=0) 0 0 1/3 1/3 1/3

Bayes’ Rule We can estimate P ( D | R =1) , not P ( R=1 | D ) , so we apply Bayes’ Rule to estimate document relevance. P ( R = 1 | D ) = P ( D | R = 1 ) P ( R = 1 ) • P ( D | R=1 ) gives the probability that a P ( D ) relevant document would have the properties encoded by the random P ( D | R = 1 ) P ( R = 1 ) = variable D . � r P ( D | R = r ) P ( R = r ) • P ( R =1) is the probability that a randomly-selected document is relevant.

Bayesian Classification Starting from Bayes’ Rule, we can easily build a classifier to tell us whether documents are relevant. We will say a document is relevant if: P ( R = 1 | D ) > P ( R = 0 | D ) ⇒ P ( D | R = 1 ) P ( R = 1 ) > P ( D | R = 0 ) P ( R = 0 ) = P ( D ) P ( D ) ⇒ P ( D | R = 1 ) P ( D | R = 0 ) > P ( R = 0 ) = P ( R = 1 ) We can estimate P ( D | R =1) and P ( D | R =0) using a language model, and P ( R =0) and P ( R =1) based on the query, or using a constant. Note that for large web collections, P ( R =1) is very small for virtually any query.

Example Let’s consider querying a collection of five short documents with a simplified vocabulary: the only words are apple, baker, and crab. Document Rel? apple? baker? crab? # Non Rel P ( w | R =1) P ( w | R =0) Term # Rel apple apple crab apple 2 1 2/2 1/3 1 1 0 1 baker 1 2 1/2 2/3 crab baker crab 0 0 1 1 crab 1 3 1/2 3/3 apple baker baker 1 1 1 0 P ( R = 1 ) = 2 / 5 P ( R = 0 ) = 3 / 5 crab crab apple 0 1 0 1 baker baker crab 0 0 1 1

Wrapping Up Bayesian classification gives us a probabilistic approach to ranking documents, and a reasonable relevance threshold. By choosing an appropriate document model, we can easily modify our ranker to take different document properties into account. For instance, we’ll see how to add contextual information to help discriminate between different senses of the same word. Next, we’ll see how Bayesian classifiers relate to TF-IDF and its more sophisticated cousin, Okapi BM25.

Bayesian Classifiers LM, session 2 CS6200: Information Retrieval - PowerPoint PPT Presentation

Bayesian Classifiers LM, session 2 CS6200: Information Retrieval Slides by: Jesse Anderton Ranking with Probabilistic Models Imagine we have a function that gives us the probability that a document D is relevant to a query Q , P ( R =1| D, Q ) .

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

CSSE463: Image Recognition Day 31 Today: Bayesian classifiers Questions? Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Data Mining 2016 Bayesian Network Classifiers Ad Feelders Universiteit Utrecht Ad Feelders (

Non-Bayesian Classifiers Part I: k -Nearest Neighbor Classifier and Distance Functions Selim

Full Bayesian Network Classifiers by Jiang Su and Harry Zhang Flemming Jensen November 2008

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

The Forbes Funds' mission is building the management capacity and impact of community-based

The relativistic wind in PWNe Niccolo Bucciantini Astronomy Group, NORDITA, Albanova

Review of crosstalk between beam- beam interaction and lattice nonlinearity in e+e- colliders

Recent results from the Tibet air shower experiment Masato TAKITA (ICRR, Univ. of Tokyo) For the

ECE 550D Fundamentals of Computer Systems and Engineering Fall 2016 Intro to Intel x86 Tyler

AP Physics 2 Thermal Physics Multiple Choice www.njctl.org Slide 2 / 59 1 Which of the

Update for the Policy Group Meeting EM rootstock club Felicidad Fernndez Fernndez Breeding

Why study algorithms? Their impact is broad and far-reaching. Internet. Web search, packet

Sambuz

Useful Links

Newsletter

Mail Us