Combining Evidence Module Introduction CS6200: Information - PowerPoint PPT Presentation

Combining Evidence Module Introduction CS6200: Information Retrieval

Evidence of Relevance So far, we have tried to determine a document’s relevance to a query by comparing document terms to query terms. This works relatively well, but it’s far from perfect. We can use many additional forms of evidence to improve our relevance estimates. • Document quality scores: Is a document written and presented well? Is it authoritative, and written by a reputable source? Does it look like spam? • Document categories: Does a document present information about news, sports, some other common category? Is it providing a service, such as a storefront? • Internet link structure: Which pages link to the document, and where does it link to? What does the anchor text of those links say? • Document structure: Does the document have a title? Section headings? A table of contents? • User behavior: click information, duration of page visits, etc.

Types of Evidence In the last module, we saw how to obtain various forms of evidence of document relevance. Here, we will assume the evidence is provided to us and focus on what to do with it. Evidence can come in several forms: • Binary features: presence or absence of terms, whether the page is on a well-known domain (wikipedia.org, cnn.com, …), whether the user has previously visited the page… • Real-valued features: probabilities, term counts, page visit durations, product prices… • Categorical features: page categories (sports, news, shopping, reviews…), language, domain categories (business, social site, news, informational) • Timestamps: date crawled, date of last update, date of first appearance on the web, … All of these are generally treated as real numbers, after some pre-processing.

Machine Learning Tasks for IR Document Classification Social? IR is concerned with a few main ML tasks: Shopping? • Document classification: Which categories does a document fit into? News? • Document ranking: Which documents Document Clustering are probably more relevant to a query? • Document clustering: Which Feature 1 documents are most similar to each other? Feature 2

Module Goals By the end of this module, you should be able to: • Classify and rank documents using Support Vector Machines. • Cluster documents, and use the clusters to produce a more diverse ranking.

Let’s get started!

Supervised Learning Combining Evidence, session 2 CS6200: Information Retrieval

Document Classification Suppose we know various features of a Goal: Pick θ to predict true labels from features document, and we want to decide whether � = � ( � ; � ) it’s a news article. We select a model (“hypothesis space”) – Document Features = X Label = Y a function which determines whether a document is news, based on its features – Known news Facebook tf tf news website? Likes and want to choose the best model parameters (“hypothesis”). 1 1 0 123 1 0 0 1 54 1 We find the best parameters using 0 0 0 1,213 0 supervised learning – we use a collection of documents whose true labels are 2 0 0 0 0 known, and we pick the parameters which 0 1 1 560 1 best predict those labels.

Supervised Learning Supervised Learning is essentially learning by example. A machine learning algorithm takes as input a set of training data: Document Features = X Label = Y • An n ⨉ p feature matrix X of n training Known news Facebook tf tf news instances, each with p features. website? Likes 1 1 0 123 1 • An n ⨉ 1 label vector Y which provides the 0 0 1 54 1 correct label for each training instance in 0 0 0 1,213 0 X . 2 0 0 0 0 Each of the n rows of X represents a distinct 0 1 1 560 1 training instance. The goal of the learning algorithm is to find a function which outputs the correct Y value for each training instance.

Training and Test Data When the machine learning algorithm has Confusion Matrix chosen a function, we evaluate it by using it Y = 1 Y = -1 to classify a second data set, the test data . f(X) = 1 TP FP • The fraction of correctly-classified instances is called accuracy . f(X) = -1 FN TN • The fraction of incorrectly-classified �� + �� instances is called error . �� = �� + �� + �� + �� The test data should be generated by the �� + �� same process as the training data. �� = �� + �� + �� + �� = � − �� Commonly, we will receive a large data set which we randomly split into training and test sets.

Linear Classifiers One of the simplest models is the set of 2 lines: everything above a line has one Decision Boundary Relevant Docs Non-Relevant Docs label, and everything below it has the 1.5 other label. The model for a k - 1 dimensional linear classifier is: 0.5 Feature 2 � � = �� ( � · � � ) � 0 �� + � � = � ( � � · � � , � ) > � -0.5 � = �� − � -1 � -1.5 We typically define X 0 = 1 so we can use -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 θ 0 as the y-intercept. Feature 1

Visualizing Linear Classifiers A k-dimensional linear classifier is a generalized equation for a line. The Relevant Docs Non-Relevant Docs decision boundary is always one fewer 2 dimensions less than the feature space. 1.5 1 • For k = 2 , the boundary is a line. Feature 3 0.5 0 • For k = 3 , it is a plane. -0.5 -1 • For k > 3 , it is a k - 1 dimensional -1.5 -2 hyperplane 2 1 2 0 1 The region on the same side of a linear 0 -1 Feature 2 decision boundary is known as a half -1 -2 Feature 1 space .

Linearly Separable Data If any linear decision surface exists which 2 perfectly divides the training instances, the Decision Boundary Relevant Docs data is said to be linearly separable . Most data Non-Relevant Docs 1.5 sets can’t be neatly separated in this way. 1 • Some data points closely resemble points of the opposite class, and are harder to classify 0.5 correctly. Feature 2 0 • Some points may have incorrect feature or -0.5 label values, leading the learning algorithm astray. -1 • Other points are near the decision boundary, -1.5 and are susceptible to misclassification if a slightly different classification function was -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 Feature 1 chosen.

Wrapping Up Although most data sets can’t be perfectly classified using machine learning techniques, there are many good techniques which can generally achieve high accuracy. One of the most important techniques uses linear decision surfaces to make a decision: points “above the line” are considered positive instances, and points “below the line” are negative instances. Next, we’ll look at linear classifiers in more depth.

Combining Evidence Module Introduction CS6200: Information - PowerPoint PPT Presentation

Combining Evidence Module Introduction CS6200: Information Retrieval Evidence of Relevance So far, we have tried to determine a documents relevance to a query by comparing document terms to query terms. This works relatively well, but its

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Mountain Sheep Evidence Evidence 2: Horn Growth Evidence

Chapter 6 Evidence Chapter 6. Audit Evidence Why does the auditor need evidence ? 1.

EVIDENCE EVIDENCE- -BASED HEALTH CARE BASED HEALTH CARE BASED HEALTH CARE EVIDENCE EVIDENCE

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

ALNAP Insufficient evidence? The quality and use of evidence in humanitarian action Paul Knox

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Validity-preservation properties of rules for combining inferential models combining

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining GLM and ABI Data for Enhanced GOES-R Rainfall Estimates A New GOES-R3 Project (combining

Combining Local and Global History for High Combining Local and Global History for High

Combining XML querying Combining XML querying with ontology reasoning: with ontology reasoning:

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

Combining Point and Line Samples for Direct Illumination Points only Points + Lines Katherine

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48

Software Verification : Introduction Ranjit Jhala, UC San Diego April 4, 2013 What is

Verified Decision Procedures for Equivalence of Regular Expressions Tobias Nipkow & Dmitriy

Seminar Decision Procedures and Applications Instructor: Viorica Sofronie-Stokkermans Universit

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy

Sambuz

Useful Links

Newsletter

Mail Us

Combining Evidence Module Introduction CS6200: Information - PowerPoint PPT Presentation

Combining Evidence Module Introduction CS6200: Information Retrieval Evidence of Relevance So far, we have tried to determine a documents relevance to a query by comparing document terms to query terms. This works relatively well, but its

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Mountain Sheep Evidence Evidence 2: Horn Growth Evidence

Chapter 6 Evidence Chapter 6. Audit Evidence Why does the auditor need evidence ? 1.

EVIDENCE EVIDENCE- -BASED HEALTH CARE BASED HEALTH CARE BASED HEALTH CARE EVIDENCE EVIDENCE

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

ALNAP Insufficient evidence? The quality and use of evidence in humanitarian action Paul Knox

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Validity-preservation properties of rules for combining inferential models combining

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining GLM and ABI Data for Enhanced GOES-R Rainfall Estimates A New GOES-R3 Project (combining

Combining Local and Global History for High Combining Local and Global History for High

Combining XML querying Combining XML querying with ontology reasoning: with ontology reasoning:

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

Combining Point and Line Samples for Direct Illumination Points only Points + Lines Katherine

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48

Software Verification : Introduction Ranjit Jhala, UC San Diego April 4, 2013 What is

Verified Decision Procedures for Equivalence of Regular Expressions Tobias Nipkow &amp; Dmitriy

Seminar Decision Procedures and Applications Instructor: Viorica Sofronie-Stokkermans Universit

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos &amp; Aarti Singh 1

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy

Sambuz

Useful Links

Newsletter

Mail Us

Verified Decision Procedures for Equivalence of Regular Expressions Tobias Nipkow & Dmitriy

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1