Algorithms: The basic methods Inferring rudimentary rules - PDF document

� � � � � � � � � Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Data Mining Constructing decision trees Constructing rules Practical Machine Learning Tools and Techniques Association rule learning Slides for Chapter 4 of Data Mining by I. H. Witten and E. Frank Linear models Instance-based learning Clustering 1 2 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) Simplicity first Inferring rudimentary rules ✁ 1R: learns a 1-level decision tree ✁ Simple algorithms often work very well! z I.e., rules that all test one particular attribute ✁ There are many kinds of simple structure, eg: ✁ Basic version z One attribute does all the work z One branch for each value z All attributes contribute equally & independently z Each branch assigns most frequent class z A weighted linear combination might do z Error rate: proportion of instances that don’t z Instance-based: use a few prototypes belong to the majority class of their z Use simple logical rules corresponding branch ✁ Success of method depends on the domain z Choose attribute with lowest error rate ( assumes nominal attributes ) 3 4 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) Pseudo-code for 1R Evaluating the weather attributes Outlook Temp Humidity Windy Play Attribute Rules Errors Total Sunny Hot High False No For each attribute, errors Sunny Hot High True No For each value of the attribute, make a rule as follows: Outlook Sunny A No 2/ 5 4/ 14 Overcast Hot High False Yes count how often each class appears Overcast A Yes 0/ 4 find the most frequent class Rainy Mild High False Yes Rainy A Yes 2/ 5 make the rule assign that class to this attribute-value Rainy Cool Normal False Yes Temp Hot A No* 2/ 4 5/ 14 Calculate the error rate of the rules Rainy Cool Normal True No Mild A Yes 2/ 6 Choose the rules with the smallest error rate Overcast Cool Normal True Yes Cool A Yes 1/ 4 Sunny Mild High False No Humidity High A No 3/ 7 4/ 14 Sunny Cool Normal False Yes Note: “missing” is treated as a separate attribute Normal A Yes 1/ 7 Rainy Mild Normal False Yes value Windy False A Yes 2/ 8 5/ 14 Sunny Mild Normal True Yes True A No* 3/ 6 Overcast Mild High True Yes Overcast Hot Normal False Yes * indicates a tie Rainy Mild High True No 5 6 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 1

� � ✁ The problem of overfitting Dealing with numeric attributes ✁ Discretize numeric attributes ✁ This procedure is very sensitive to noise ✁ Divide each attribute’s range into intervals z One instance with an incorrect class label will probably produce a separate interval z Sort instances according to attribute’s values ✁ Also: time stamp attribute will have zero errors z Place breakpoints where class changes (majority class) ✁ Simple solution: z This minimizes the total error ✁ Example: temperature from weather data enforce minimum number of instances in majority class per interval ✁ Example (with min = 3): 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Outlook Temperature Humidity Windy Play Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No Sunny 85 85 False No Sunny 80 90 True No 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Overcast 83 86 False Yes Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No Rainy 75 80 False Yes … … … … … 7 8 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) With overfitting avoidance Discussion of 1R ✁ 1R was described in a paper by Holte (1993) Resulting rule set: z Contains an experimental evaluation on 16 datasets (using cross-validation so that results were representative of performance on future data) Attribute Rules E rrors Total errors z Minimum number of instances was set to 6 after Outlook S unny A No 2/ 5 4/ 14 Overcast A Yes 0/ 4 some experimentation Rainy A Yes 2/ 5 z 1R’s simple rules performed not much worse than Temperature ) 77.5 A Yes 3/ 10 5/ 14 much more complex decision trees > 77.5 A No* 2/ 4 ✁ Simplicity first pays off! Humidity ) 82.5 A Yes 1/ 7 3/ 14 > 82.5 and ) 95.5 A No 2/ 6 > 95.5 A Yes 0/ 1 Very Simple Classification Rules Perform Well on Most Windy False A Yes 2/ 8 5/ 14 Commonly Used Datasets True A No* 3/ 6 Robert C. Holte, Computer Science Department, University of Ottawa 9 10 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) Discussion of 1R: Hyperpipes Statistical modeling Another simple technique: build one rule for each class ✁ “Opposite” of 1R: use all the attributes z Each rule is a conjunction of tests, one for each attribute ✁ Two assumptions: Attributes are z For numeric attributes: test checks whether instance's z equally important value is inside an interval z statistically independent (given the class value) � Interval given by minimum and maximum observed I.e., knowing the value of one attribute says nothing in training data about the value of another (if the class is known) ✁ Independence assumption is never correct! z For nominal attributes: test checks whether value is one ✁ But … this scheme works well in practice of a subset of attribute values � Subset given by all possible values observed in training data z Class with most matching tests is predicted 11 12 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 07/20/06 Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 2

Algorithms: The basic methods Inferring rudimentary rules - PDF document

Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Data Mining Constructing decision trees Constructing rules Practical Machine Learning Tools and Techniques Association

Th The Unbefriended Patient: : 32,000 foot Perspective Some rudimentary legal ground rules:

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by

Inferring Internet Inferring Internet Denial- -of of- -Service Activity Service Activity

On Inferring and Characterizing On Inferring and Characterizing Internet Routing Policies

Mat 2170 Methods GPoint Julia Sets Algorithms & Methods Lab 8 Spring 2014 Student

Mat 2170 Methods Week 7 Scope return Examples Methods Algorithms Predicate Methods

RARE PRESENTATION OF RUPTURED RUDIMENTARY HORN PREGNANCY Shergill Harbhajan K 1 , Grover Suparna 2

Rudimentary Constructive Set Theory Set Theory, Model Theory, Generalized Quantifiers and

Nemesis: Studying Microarchitectural Timing Leaks in Rudimentary CPU Interrupt Logic Jo Van Bulck

Basic Classification Algorithms Rules, Linear Regression, Nearest Neighbour Outline Rules

Basic Classification Algorithms (2) Rules, Linear Regression, Nearest Neighbour Outline Rules

Graphs Part I: Basic algorithms Laura Toma Algorithms (csci2200), Bowdoin College Part I: Basic

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Inferring Temporal System Properties Samuel Huang, joint work with Rance Cleaveland University of

The Challenge of Cultural The Challenge of Cultural Modeling for Inferring Modeling for

Review of Synchrotron Radiation based Diagnostics for Transverse Profile Measurements Gero Kube

Outline Part I: fundamentals Part II: tools hardware: Colossus software Open

written in a conversational style to accompany what you are seeing as a picture / graph / table.

Diffraction Methods & Electron Microscopy Lecture 2 Sandeep Gorantla 1 FYS 4340/9340 course

Spectroscopic Instrumentation Theodor Pribulla Astronomical Institute of the Slovak Academy of

U l t r a f a s t , i n t e n s e l a s e r Consiglio Nazionale p u l s

Interpretability in Machine Learning Why Interpret ? The current state of machine learning And

Data Mining with Weka Class 3 Lesson 1 Simplicity first! Ian H. Witten Department of Computer

Sambuz

Useful Links

Newsletter

Mail Us

Algorithms: The basic methods Inferring rudimentary rules - PDF document

Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Data Mining Constructing decision trees Constructing rules Practical Machine Learning Tools and Techniques Association

Th The Unbefriended Patient: : 32,000 foot Perspective Some rudimentary legal ground rules:

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by

Inferring Internet Inferring Internet Denial- -of of- -Service Activity Service Activity

On Inferring and Characterizing On Inferring and Characterizing Internet Routing Policies

Mat 2170 Methods GPoint Julia Sets Algorithms &amp; Methods Lab 8 Spring 2014 Student

Mat 2170 Methods Week 7 Scope return Examples Methods Algorithms Predicate Methods

RARE PRESENTATION OF RUPTURED RUDIMENTARY HORN PREGNANCY Shergill Harbhajan K 1 , Grover Suparna 2

Rudimentary Constructive Set Theory Set Theory, Model Theory, Generalized Quantifiers and

Nemesis: Studying Microarchitectural Timing Leaks in Rudimentary CPU Interrupt Logic Jo Van Bulck

Basic Classification Algorithms Rules, Linear Regression, Nearest Neighbour Outline Rules

Basic Classification Algorithms (2) Rules, Linear Regression, Nearest Neighbour Outline Rules

Graphs Part I: Basic algorithms Laura Toma Algorithms (csci2200), Bowdoin College Part I: Basic

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Inferring Temporal System Properties Samuel Huang, joint work with Rance Cleaveland University of

The Challenge of Cultural The Challenge of Cultural Modeling for Inferring Modeling for

Review of Synchrotron Radiation based Diagnostics for Transverse Profile Measurements Gero Kube

Outline Part I: fundamentals Part II: tools hardware: Colossus software Open

written in a conversational style to accompany what you are seeing as a picture / graph / table.

Diffraction Methods &amp; Electron Microscopy Lecture 2 Sandeep Gorantla 1 FYS 4340/9340 course

Spectroscopic Instrumentation Theodor Pribulla Astronomical Institute of the Slovak Academy of

U l t r a f a s t , i n t e n s e l a s e r Consiglio Nazionale p u l s

Interpretability in Machine Learning Why Interpret ? The current state of machine learning And

Data Mining with Weka Class 3 Lesson 1 Simplicity first! Ian H. Witten Department of Computer

Sambuz

Useful Links

Newsletter

Mail Us

Mat 2170 Methods GPoint Julia Sets Algorithms & Methods Lab 8 Spring 2014 Student

Diffraction Methods & Electron Microscopy Lecture 2 Sandeep Gorantla 1 FYS 4340/9340 course