Introduction CSCE CSCE In Homework 1, you are (supposedly) 478/878 478/878 Lecture 4: Lecture 4: CSCE 478/878 Lecture 4: Experimental Experimental Choosing a data set 1 Design and Design and Analysis Analysis Experimental Design and Analysis Extracting a test set of size > 30 2 Stephen Scott Stephen Scott Building a tree on the training set 3 Introduction Introduction Testing on the test set 4 Outline Outline Stephen Scott Reporting the accuracy 5 Goals Goals Estimating Estimating Does the reported accuracy exactly match the Error Error (Adapted from Ethem Alpaydin and Tom Mitchell) generalization performance of the tree? Comparing Comparing Learning Learning Algorithms Algorithms If a tree has error 10% and an ANN has error 11%, is the Other Other Performance Performance tree absolutely better? Measures Measures Why or why not? sscott@cse.unl.edu How about the algorithms in general? 1 / 35 2 / 35 Outline Setting Goals CSCE CSCE Before setting up an experiment, need to understand 478/878 478/878 Lecture 4: Lecture 4: exactly what the goal is Experimental Experimental Design and Design and Estimate the generalization performance of a Analysis Analysis Goals of performance evaluation hypothesis Stephen Scott Stephen Scott Tuning a learning algorithm’s parameters Estimating error and confidence intervals Introduction Introduction Comparing two learning algorithms on a specific task Paired t tests and cross-validation to compare learning Outline Outline Etc. algorithms Goals Goals Will never be able to answer the question with 100% Estimating Other performance measures Estimating certainty Error Error Confusion matrices Due to variances in training set selection, test set Comparing Comparing Learning Learning ROC analysis selection, etc. Algorithms Algorithms Precision-recall curves Will choose an estimator for the quantity in question, Other Other Performance Performance determine the probability distribution of the estimator, Measures Measures and bound the probability that the estimator is way off Estimator needs to work regardless of distribution of training/testing data 3 / 35 4 / 35 Setting Goals (cont’d) Types of Error CSCE CSCE For now, focus on straightforward, 0/1 classification 478/878 478/878 Lecture 4: Lecture 4: error Experimental Need to note that, in addition to statistical variations, Experimental Design and Design and For hypothesis h , recall the two types of classification what we determine is limited to the application that we Analysis Analysis error from Chapter 2: are studying Stephen Scott Stephen Scott Empirical error (or sample error ) is fraction of set V that E.g., if na¨ ıve Bayes better than ID3 on spam filtering, h gets wrong: Introduction Introduction that means nothing about face recognition Outline Outline error V ( h ) ⌘ 1 X δ ( C ( x ) 6 = h ( x )) , In planning experiments, need to ensure that training Goals Goals |V| data not used for evaluation x 2 V Estimating Estimating Error Error I.e., don’t test on the training set! where δ ( C ( x ) 6 = h ( x )) is 1 if C ( x ) 6 = h ( x ) , and 0 otherwise Types of Error Comparing Will bias the performance estimator Generalization error (or true error ) is probability that a Estimating Error Learning Confidence Intervals Also holds for validation set used to prune DT, tune new, randomly selected, instance is misclassified by h Algorithms Comparing parameters, etc. Other Learning error D ( h ) ⌘ Pr x 2 D [ C ( x ) 6 = h ( x )] , Performance Algorithms Validation set serves as part of training set, but not used Measures for model building Other where D is probability distribution instances are drawn Performance Measures from Why do we care about error V ( h ) ? 5 / 35 6 / 35
Estimating True Error Estimating True Error (cont’d) CSCE CSCE 478/878 478/878 Lecture 4: Lecture 4: Experimental Experimental Design and Design and Experiment: Analysis Analysis Bias : If T is training set, error T ( h ) is optimistically Stephen Scott Stephen Scott biased Choose sample V of size N according to distribution D 1 Introduction bias ⌘ E [ error T ( h )] � error D ( h ) Introduction Measure error V ( h ) 2 Outline Outline For unbiased estimate ( bias = 0 ), h and V must be Goals Goals error V ( h ) is a random variable (i.e., result of an experiment) chosen independently ) Don’t test on training set! Estimating Estimating Error Error (Don’t confuse with inductive bias!) error V ( h ) is an unbiased estimator for error D ( h ) Types of Error Types of Error Estimating Error Estimating Error Variance : Even with unbiased V , error V ( h ) may still Confidence Intervals Confidence Intervals Given observed error V ( h ) , what can we conclude about Comparing vary from error D ( h ) Comparing Learning Learning error D ( h ) ? Algorithms Algorithms Other Other Performance Performance Measures Measures 7 / 35 8 / 35 Confidence Intervals Confidence Intervals (cont’d) CSCE CSCE If If 478/878 478/878 Lecture 4: Lecture 4: Experimental Experimental Design and V contains N examples, drawn independently of h and Design and V contains N examples, drawn independently of h and Analysis Analysis each other each other Stephen Scott Stephen Scott N � 30 N � 30 Introduction Introduction Outline Outline Then with approximately 95% probability, error D ( h ) lies in Then with approximately c% probability, error D ( h ) lies in Goals Goals Estimating r Estimating r error V ( h )( 1 � error V ( h )) error V ( h )( 1 � error V ( h )) Error Error error V ( h ) ± 1 . 96 error V ( h ) ± z c Types of Error Types of Error N N Estimating Error Estimating Error Confidence Intervals Confidence Intervals E.g. hypothesis h misclassifies 12 of the 40 examples in test Comparing Comparing N % : Learning Learning 50% 68% 80% 90% 95% 98% 99% set V : Algorithms Algorithms z c : 0.67 1.00 1.28 1.64 1.96 2.33 2.58 error V ( h ) = 12 Other Other 40 = 0 . 30 Performance Performance Measures Measures Why? Then with approx. 95% confidence, error D ( h ) 2 [ 0 . 158 , 0 . 442 ] 9 / 35 10 / 35 error V ( h ) is a Random Variable Binomial Probability Distribution CSCE Repeatedly run the experiment, each with different CSCE 478/878 478/878 ✓ N ◆ N ! p r ( 1 � p ) N � r = randomly drawn V (each of size N ) Lecture 4: Lecture 4: r !( N � r )! p r ( 1 � p ) N � r P ( r ) = Experimental Experimental r Design and Design and Probability of observing r misclassified examples: Analysis Analysis Probability P ( r ) of r heads in N coin flips, if p = Pr ( heads ) Stephen Scott Binomial distribution for n = 40, p = 0.3 Stephen Scott 0.14 0.12 Introduction Introduction Expected, or mean value of X , E [ X ] (= # heads on N 0.1 Outline Outline flips = # mistakes on N test exs), is 0.08 P(r) Goals Goals 0.06 N Estimating Estimating 0.04 X Error Error E [ X ] ⌘ iP ( i ) = Np = N · error D ( h ) Types of Error Types of Error 0.02 Estimating Error Estimating Error i = 0 0 Confidence Intervals Confidence Intervals 0 5 10 15 20 25 30 35 40 Variance of X is Comparing Comparing ✓ N ◆ error D ( h ) r ( 1 � error D ( h )) N � r Learning Learning P ( r ) = Var ( X ) ⌘ E [( X � E [ X ]) 2 ] = Np ( 1 � p ) Algorithms Algorithms r Other Other Standard deviation of X , σ X , is Performance Performance Measures Measures I.e., let error D ( h ) be probability of heads in biased coin, then q p E [( X � E [ X ]) 2 ] = σ X ⌘ Np ( 1 � p ) P ( r ) = prob. of getting r heads out of N flips 11 / 35 12 / 35
Recommend
More recommend