an exercise in an exercise in machine learning machine
play

An Exercise in An Exercise in Machine Learning Machine Learning - PowerPoint PPT Presentation

An Exercise in An Exercise in Machine Learning Machine Learning http://www.cs.iastate.edu/~cs573x/bbsilab.html Machine Learning Software Preparing Data Building Classifiers Interpreting Results Machine Learning Software


  1. An Exercise in An Exercise in Machine Learning Machine Learning http://www.cs.iastate.edu/~cs573x/bbsilab.html • Machine Learning Software • Preparing Data • Building Classifiers • Interpreting Results

  2. Machine Learning Software Machine Learning Software � Suites (General Purpose) Suites (General Purpose) � � WEKA WEKA (Source: Java) (Source: Java) � � MLC++ MLC++ (Source: C++) (Source: C++) � � SAS SAS � � List from List from KDNuggets KDNuggets (Various) (Various) � � Specific Specific � � Classification: C5.0, Classification: C5.0, SVMlight SVMlight � � Association Rule Mining Association Rule Mining � � Bayesian Net Bayesian Net … … … … � � Commercial vs. Free vs. Programming Commercial vs. Free vs. Programming �

  3. What does WEKA do? What does WEKA do? � Implementation of state Implementation of state- -of of- -art learning art learning � algorithm algorithm � Main strengths in the classification Main strengths in the classification � � Regression, Association Rules and clustering Regression, Association Rules and clustering � algorithms algorithms � Extensible to try new learning schemes Extensible to try new learning schemes � � Large variety of handy tools (transforming Large variety of handy tools (transforming � datasets, filters, visualization etc … ) datasets, filters, visualization etc … )

  4. WEKA resources WEKA resources � API Documentation, Tutorial, Source code. API Documentation, Tutorial, Source code. � � WEKA mailing list WEKA mailing list � � Data Mining: Practical Machine Learning Tools and Data Mining: Practical Machine Learning Tools and � Techniques with Java Implementations Techniques with Java Implementations � Weka Weka- -related Projects: related Projects: � � Weka Weka- -Parallel Parallel - - parallel processing for parallel processing for Weka Weka � � RWeka RWeka - - linking R and linking R and Weka Weka � � YALE YALE - - Yet Another Learning Environment Yet Another Learning Environment � � Many others Many others … … �

  5. Getting Started Getting Started � Installation (Java runtime +WEKA) Installation (Java runtime +WEKA) � � Setting up the environment ( Setting up the environment ( CLASSPATH ) CLASSPATH ) � � Reference Book and online API document Reference Book and online API document � � Preparing Data sets Preparing Data sets � � Running WEKA to build classifiers Running WEKA to build classifiers � � Interpreting Results Interpreting Results �

  6. ARFF Data Format ARFF Data Format � Attribute Attribute- -Relation File Format Relation File Format � � Header Header – – describing the attribute describing the attribute � types types � Data Data – – (instances, examples) (instances, examples) � comma- -separated list separated list comma � Use the right data format: Use the right data format: � � ARFF format � Filestem Filestem, CSV , CSV � ARFF format � � Use Use C45Loader C45Loader and and CSVLoader CSVLoader to to � convert convert

  7. Launching WEKA Launching WEKA

  8. Load Dataset into WEKA Load Dataset into WEKA

  9. Data Filters Data Filters � Useful support for data preprocessing Useful support for data preprocessing � � Removing or adding attributes, Removing or adding attributes, resampling resampling � the dataset, removing examples, etc. the dataset, removing examples, etc. � Creates stratified cross Creates stratified cross- -validation folds of the validation folds of the � given dataset, and class distributions are given dataset, and class distributions are approximately retained within each fold. approximately retained within each fold. � Typically split data as 2/3 in training and 1/3 Typically split data as 2/3 in training and 1/3 � in testing in testing

  10. Building Classifiers Building Classifiers � A classifier model A classifier model - - mapping from dataset mapping from dataset � attributes to the class (target) attribute. attributes to the class (target) attribute. Creation and form differs. Creation and form differs. � Decision Tree and Na Decision Tree and Na ï ve Bayes Bayes Classifiers Classifiers ï ve � � Which one is the better? Which one is the better? � � No Free Lunch! No Free Lunch! �

  11. Building Classifier Building Classifier

  12. (1) weka.classifiers.rules.ZeroR weka.classifiers.rules.ZeroR (1) � Building and using a 0 Building and using a 0- -R classifier. Predicts the R classifier. Predicts the � mean (for a numeric class) or the mode (for a mean (for a numeric class) or the mode (for a nominal class). nominal class). (2) weka.classifiers.bayes.NaiveBayes weka.classifiers.bayes.NaiveBayes (2) � Class for building a Naive Bayesian classifier Class for building a Naive Bayesian classifier �

  13. (3) weka.classifiers.trees.J48 (3) weka.classifiers.trees.J48 � Class for generating an Class for generating an � unpruned or a pruned or a pruned unpruned C4.5 decision tree. C4.5 decision tree.

  14. Test Options Test Options � Percentage Split (2/3 Training; 1/3 Testing) Percentage Split (2/3 Training; 1/3 Testing) � � Cross Cross- -validation validation � � Estimating the generalization error based on Estimating the generalization error based on � resampling when limited data; averaged error when limited data; averaged error resampling estimate. estimate. � Stratified 10 Stratified 10- -fold fold � � Leave Leave- -one one- -out ( out (Loo Loo) ) � � 10 10- -fold vs. fold vs. Loo Loo �

  15. Understanding Output Understanding Output

  16. Decision Tree Output (1) Decision Tree Output (1) === Error on training data === === Error on training data === J48 pruned tree J48 pruned tree ------------------ ------------------ Correctly Classified Instance 14 100 % Correctly Classified Instance 14 100 % Incorrectly Classified Instances 0 0 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Kappa statistic 1 outlook = sunny outlook = sunny Mean absolute error 0 Mean absolute error 0 | humidity <= 75: yes (2.0) | humidity <= 75: yes (2.0) Root mean squared error 0 Root mean squared error 0 | humidity > 75: no (3.0) | humidity > 75: no (3.0) Relative absolute error 0% Relative absolute error 0% Root relative squared error 0% Root relative squared error 0% outlook = overcast: yes (4.0) outlook = overcast: yes (4.0) Total Number of Instances 14 Total Number of Instances 14 outlook = rainy outlook = rainy | windy = TRUE: no (2.0) | windy = TRUE: no (2.0) === Detailed Accuracy By Class === === Detailed Accuracy By Class === TP FP Precision Recall F- -Measure Class Measure Class TP FP Precision Recall F | windy = FALSE: yes (3.0) | windy = FALSE: yes (3.0) 1 0 1 1 1 yes 1 0 1 1 1 yes 1 0 1 1 1 no 1 0 1 1 1 no Number of Leaves : 5 Number of Leaves : 5 === Confusion Matrix === === Confusion Matrix === a b <-- -- classified as classified as a b < Size of the tree : 8 Size of the tree : 8 0 | a = yes 0 | a = yes 9 9 0 5 | b = no 0 5 | b = no 10 10

  17. Decision Tree Output (2) Decision Tree Output (2) === Stratified cross- -validation === validation === === Stratified cross Correctly Classified Instances 9 64.2857 % Correctly Classified Instances 9 64.2857 % Incorrectly Classified Instances 5 35.7143 % Incorrectly Classified Instances 5 35.7143 % Kappa statistic 0.186 Kappa statistic 0.186 Mean absolute error 0.2857 Mean absolute error 0.2857 Root mean squared error 0.4818 Root mean squared error 0.4818 Relative absolute error 60% Relative absolute error 60% Root relative squared error 97.6586 % Root relative squared error 97.6586 % Total Number of Instances 14 Total Number of Instances 14 === Detailed Accuracy By Class === === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F- -Measure Class Measure Class TP Rate FP Rate Precision Recall F 0.778 0.6 0.7 0.778 0.737 yes 0.778 0.6 0.7 0.778 0.737 yes 0.4 0.222 0.5 0.4 0.444 no 0.4 0.222 0.5 0.4 0.444 no === Confusion Matrix === === Confusion Matrix === a b <-- -- classified as classified as a b < 7 2 | a = yes 7 2 | a = yes 3 2 | b = no 3 2 | b = no

Recommend


More recommend