Chapter X: Classification Information Retrieval & Data Mining - PowerPoint PPT Presentation

Chapter X: Classification Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12 X.1&2- 1

Chapter X: Classification* 1. Basic idea 2. Decision trees 3. Naïve Bayes classifier 4. Support vector machines 5. Ensemble methods * Zaki & Meira: Ch. 24, 26, 28 & 29; Tan, Steinbach & Kumar: Ch. 4, 5.3–5.6 IR&DM, WS'11/12 26 January 2012 X.1&2- 2

X.1 Basic idea 1. Definitions 1.1. Data 1.2. Classification function 1.3. Predictive vs. descriptive 1.4. Supervised vs. unsupervised IR&DM, WS'11/12 26 January 2012 X.1&2- 3

Definitions • Data for classification comes in tuples ( x , y ) – Vector x is the attribute (feature) set • Attributes can be binary, categorical or numerical – Value y is the class label • We concentrate on binary or nominal class labels • Compare classification with regression! • A classifier is a function that maps attribute sets to class labels, f ( x ) = y IR&DM, WS'11/12 26 January 2012 X.1&2- 4

Definitions • Data for classification comes in tuples ( x , y ) – Vector x is the attribute (feature) set • Attributes can be binary, categorical or numerical – Value y is the class label • We concentrate on binary or nominal class labels • Compare classification with attribute set regression! • A classifier is a function that maps attribute sets to class labels, f ( x ) = y IR&DM, WS'11/12 26 January 2012 X.1&2- 4

Definitions • Data for classification comes in tuples ( x , y ) – Vector x is the attribute (feature) set • Attributes can be binary, categorical or numerical – Value y is the class label • We concentrate on binary or nominal class labels • Compare classification with class regression! • A classifier is a function that maps attribute sets to class labels, f ( x ) = y IR&DM, WS'11/12 26 January 2012 X.1&2- 4

Classification function as a black box Classification Input Output function Attribute set Class label f x y IR&DM, WS'11/12 26 January 2012 X.1&2- 5

Descriptive vs. predictive • In descriptive data mining the goal is to give a description of the data – Those who have bought diapers have also bought beer – These are the clusters of documents from this corpus • In predictive data mining the goal is to predict the future – Those who will buy diapers will also buy beer – If new documents arrive, they will be similar to one of the cluster centroids • The difference between predictive data mining and machine learning is hard to define IR&DM, WS'11/12 26 January 2012 X.1&2- 6

Descriptive vs. predictive classification • Who are the borrowers that will default? – Descriptive • If a new borrower comes, will they default? – Predictive • Predictive classification is the usual application – What we will concentrate on IR&DM, WS'11/12 26 January 2012 X.1&2- 7

General classification framework IR&DM, WS'11/12 26 January 2012 X.1&2- 8

Classification model evaluation • Recall the confusion matrix : Predicted class • Much the same measures as Actual class Class ¡= ¡1 Class ¡= ¡0 with IR methods Class ¡= ¡1 f 11 f 10 – Focus on accuracy and Class ¡= ¡0 f 01 f 00 error rate f 11 + f 00 Accuracy = f 11 + f 00 + f 10 + f 01 f 10 + f 01 Error rate = f 11 + f 00 + f 10 + f 01 – But also precision, recall, F-scores, … IR&DM, WS'11/12 26 January 2012 X.1&2- 9

Supervised vs. unsupervised learning • In supervised learning – Training data is accompanied by class labels – New data is classified based on the training set • Classification • In unsupervised learning – The class labels are unknown – The aim is to establish the existence of classes in the data based on measurements, observations, etc. • Clustering IR&DM, WS'11/12 26 January 2012 X.1&2- 10

X.2 Decision trees 1. Basic idea 2. Hunt’s algorithm 3. Selecting the split 4. Combatting overfitting Zaki & Meira: Ch. 24; Tan, Steinbach & Kumar: Ch. 4 IR&DM, WS'11/12 26 January 2012 X.1&2- 11

Basic idea • We define the label by asking series of questions about the attributes – Each question depends on the answer to the previous one – Ultimately, all samples with satisfying attribute values have the same label and we’re done • The flow-chart of the questions can be drawn as a tree • We can classify new instances by following the proper edges of the tree until we meet a leaf – Decision tree leafs are always class labels IR&DM, WS'11/12 26 January 2012 X.1&2- 12

Example: training data age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31…40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31…40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes >40 medium no excellent no IR&DM, WS'11/12 26 January 2012 X.1&2- 13

Example: decision tree age? ≤ 30 31..40 > 40 student? yes credit rating? no yes excellent fair no yes no yes IR&DM, WS'11/12 26 January 2012 X.1&2- 14

Hunt’s algorithm • The number of decision trees for a given set of attributes is exponential • Finding the the most accurate tree is NP-hard • Practical algorithms use greedy heuristics – The decision tree is grown by making a series of locally optimum decisions on which attributes to use • Most algorithms are based on Hunt’s algorithm IR&DM, WS'11/12 26 January 2012 X.1&2- 15

Hunt’s algorithm • Let X t be the set of training records for node t • Let y = { y 1 , … y c } be the class labels • Step 1 : If all records in X t belong to the same class y t , then t is a leaf node labeled as y t • Step 2: If X t contains records that belong to more than one class – Select attribute test condition to partition the records into smaller subsets – Create a child node for each outcome of test condition – Apply algorithm recursively to each child IR&DM, WS'11/12 26 January 2012 X.1&2- 16

Example decision tree construction IR&DM, WS'11/12 26 January 2012 X.1&2- 17

Example decision tree construction Has multiple labels IR&DM, WS'11/12 26 January 2012 X.1&2- 17

Example decision tree construction Has multiple labels Only one label Has multiple labels IR&DM, WS'11/12 26 January 2012 X.1&2- 17

Example decision tree construction Has multiple labels Only one label Has multiple labels Has multiple Only one label labels IR&DM, WS'11/12 26 January 2012 X.1&2- 17

Example decision tree construction Has multiple labels Only one label Has multiple labels Has multiple Only one label Only one label Only one label labels IR&DM, WS'11/12 26 January 2012 X.1&2- 17

Selecting the split • Designing a decision-tree algorithm requires answering two questions 1. How should the training records be split? 2. How should the splitting procedure stop? IR&DM, WS'11/12 26 January 2012 X.1&2- 18

Splitting methods • Binary attributes IR&DM, WS'11/12 26 January 2012 X.1&2- 19

Splitting methods • Nominal attributes • Multiway split Binary split IR&DM, WS'11/12 26 January 2012 X.1&2- 20

Splitting methods • Ordinal attributes IR&DM, WS'11/12 26 January 2012 X.1&2- 21

Splitting methods Continuous attributes • • IR&DM, WS'11/12 26 January 2012 X.1&2- 22

Selecting the best split • Let p ( i | t ) be the fraction of records belonging to class i at node t • Best split is selected based on the degree of impurity of the child nodes – p (0 | t ) = 0 and p (1 | t ) = 1 has high purity – p (0 | t ) = 1/2 and p (1 | t ) = 1/2 has the smallest purity ( highest impurity ) • Intuition: high purity ⇒ small value of impurity measures ⇒ better split IR&DM, WS'11/12 26 January 2012 X.1&2- 23

Example of purity IR&DM, WS'11/12 26 January 2012 X.1&2- 24

Example of purity high impurity high purity IR&DM, WS'11/12 26 January 2012 X.1&2- 24

Impurity measures 0 × log 2 (0) = 0 c − 1 X Entropy ( t ) = − p ( i | t ) log 2 p ( i | t ) ≤ 0 i = 0 c − 1 � 2 X � Gini ( t ) = 1 − p ( i | t ) i = 0 Classification error ( t ) = 1 − max i { p ( i | t ) } IR&DM, WS'11/12 26 January 2012 X.1&2- 25

Comparing impurity measures IR&DM, WS'11/12 26 January 2012 X.1&2- 26

Comparing conditions • The quality of the split: the change in the impurity – Called the gain of the test condition k N ( v j ) X ∆ = I ( p ) − I ( v j ) N j = 1 • I ( ) is the impurity measure • k is the number of attribute values • p is the parent node, v j is the child node • N is the total number of records at the parent node • N ( v j ) is the number of records associated with the child node • Maximizing the gain ⇔ minimizing the weighted average impurity measure of child nodes • If I () = Entropy(), then Δ = Δ info is called information gain IR&DM, WS'11/12 26 January 2012 X.1&2- 27

Chapter X: Classification Information Retrieval & Data Mining - PowerPoint PPT Presentation

Chapter X: Classification Information Retrieval & Data Mining Universitt des Saarlandes, Saarbrcken Winter Semester 2011/12 X.1&2- 1 Chapter X: Classification* 1. Basic idea 2. Decision trees 3. Nave Bayes classifier 4.

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Management of Classification Lookup Files The basics of classification The basics of

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Classification Classification TNM classification Survival time Survival time Tumour size,

ADEQ Lakes Classification ADEQ Lakes Classification ADEQ Lakes Classification Project Project

OVERVIEW U.S. National Vegetation Classification A Classification Partnership Don Faber-

Welcome to the Board of Visitors Virtual Meeting 9 June 2020 CLASSIFICATION CLASSIFICATION

Need for Classification Classification required To isolate traffic of interest

Communication Protocol for Enhanced Errors and Notifications PCE WG, IETF104, Prague

A Coq Formalization of Digital Filters Diane Gallois-Wong, Sylvie Boldo and Thibault Hilaire

Detecting Floating-Point Errors via Atomic Conditions Daming Zou, Muhan Zeng, Yingfei Xiong,

A Novel Design of Spatially Coupled LDPC Codes for Sliding Window Decoding Min Zhu 1 , David G. M.

a := z * (x + y) ; Semantic Error Recovery Check ST & ST-2 (oops) expr y...int semError

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant Professor School of Computing

Syntax Analysis Context-Free Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic

AUTOMATION Dr Dr. . Ib Ibrahim rahim Al Al-Naimi Naimi Chapter Three Introduction to

Chapter X: Classification Information Retrieval & Data Mining - PowerPoint PPT Presentation

Chapter X: Classification Information Retrieval & Data Mining Universitt des Saarlandes, Saarbrcken Winter Semester 2011/12 X.1&2- 1 Chapter X: Classification* 1. Basic idea 2. Decision trees 3. Nave Bayes classifier 4.

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Management of Classification Lookup Files The basics of classification The basics of

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Classification Classification TNM classification Survival time Survival time Tumour size,

ADEQ Lakes Classification ADEQ Lakes Classification ADEQ Lakes Classification Project Project

OVERVIEW U.S. National Vegetation Classification A Classification Partnership Don Faber-

Welcome to the Board of Visitors Virtual Meeting 9 June 2020 CLASSIFICATION CLASSIFICATION

Need for Classification Classification required To isolate traffic of interest

Communication Protocol for Enhanced Errors and Notifications PCE WG, IETF104, Prague

A Coq Formalization of Digital Filters Diane Gallois-Wong, Sylvie Boldo and Thibault Hilaire

Detecting Floating-Point Errors via Atomic Conditions Daming Zou, Muhan Zeng, Yingfei Xiong,

A Novel Design of Spatially Coupled LDPC Codes for Sliding Window Decoding Min Zhu 1 , David G. M.

a := z * (x + y) ; Semantic Error Recovery Check ST &amp; ST-2 (oops) expr y...int semError

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant Professor School of Computing

Syntax Analysis Context-Free Grammars Wilhelm/Seidl/Hack: Compiler Design, Syntactic

AUTOMATION Dr Dr. . Ib Ibrahim rahim Al Al-Naimi Naimi Chapter Three Introduction to

a := z * (x + y) ; Semantic Error Recovery Check ST & ST-2 (oops) expr y...int semError