Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating - PDF document

Evaluating Hypotheses IEEE Expert, October 1996 1

Evaluating Hypotheses • Sample error, true error • Confidence intervals for observed hypothesis error • Estimators • Binomial distribution, Normal distribution, Central Limit Theorem • Paired t tests • Comparing learning methods 2

Evaluating Hypotheses and Learners Consider hypotheses H 1 and H 2 learned by learners L 1 and L 2 • How to learn H and estimate accuracy with limited data? • How well does observed accuracy of H over limited sample estimate accuracy over unseen data? • If H 1 outperforms H 2 on sample, will H 1 outperform H 2 in general? • Same conclusion for L 1 and L 2 ? 3

Two Definitions of Error The true error of hypothesis h with respect to target function f and distribution D is the probability that h will misclassify an instance drawn at random according to D . error D ( h ) ≡ Pr x ∈D [ f ( x ) � = h ( x )] The sample error of h with respect to target function f and data sample S is the proportion of examples h misclassifies error S ( h ) ≡ 1 x ∈ S δ ( f ( x ) � = h ( x )) � n Where δ ( f ( x ) � = h ( x )) is 1 if f ( x ) � = h ( x ), and 0 otherwise. How well does error S ( h ) estimate error D ( h )? 4

Problems Estimating Error 1. Bias: If S is training set, error S ( h ) is optimistically biased bias ≡ E [ error S ( h )] − error D ( h ) For unbiased estimate, h and S must be chosen independently 2. Variance: Even with unbiased S , error S ( h ) may still vary from error D ( h ) 5

Example Hypothesis h misclassifies 12 of the 40 examples in S error S ( h ) = 12 40 = . 30 What is error D ( h )? 6

Estimators Experiment: 1. choose sample S of size n according to distribution D 2. measure error S ( h ) error S ( h ) is a random variable (i.e., result of an experiment) error S ( h ) is an unbiased estimator for error D ( h ) Given observed error S ( h ) what can we conclude about error D ( h )? 7

Confidence Intervals If • S contains n examples, drawn independently of h and each other • n ≥ 30 Then • With approximately 95% probability, error D ( h ) lies in interval � � error S ( h )(1 − error S ( h )) � � error S ( h ) ± 1 . 96 � � n 8

Confidence Intervals If • S contains n examples, drawn independently of h and each other • n ≥ 30 Then • With approximately N% probability, error D ( h ) lies in interval � � error S ( h )(1 − error S ( h )) � � error S ( h ) ± z N � � n where N %: 50% 68% 80% 90% 95% 98% 99% z N : 0.67 1.00 1.28 1.64 1.96 2.33 2.58 9

error S ( h ) is a Random Variable Rerun the experiment with different randomly drawn S (of size n ) Probability of observing r misclassified examples: Binomial distribution for n = 40, p = 0.3 0.14 0.12 0.1 0.08 P(r) 0.06 0.04 0.02 0 0 5 10 15 20 25 30 35 40 n ! r !( n − r )! error D ( h ) r (1 − error D ( h )) n − r P ( r ) = 10

Binomial Probability Distribution Binomial distribution for n = 40, p = 0.3 0.14 0.12 0.1 0.08 P(r) 0.06 0.04 0.02 0 0 5 10 15 20 25 30 35 40 n ! r !( n − r )! p r (1 − p ) n − r P ( r ) = Probability P ( r ) of r heads in n coin flips, if p = Pr( heads ) • Expected, or mean value of X , E [ X ], is n E [ X ] ≡ i =0 iP ( i ) = np � • Variance of X is V ar ( X ) ≡ E [( X − E [ X ]) 2 ] = np (1 − p ) • Standard deviation of X , σ X , is � � E [( X − E [ X ]) 2 ] = σ X ≡ np (1 − p ) 11

Normal Distribution Approximates Binomial error S ( h ) follows a Binomial distribution, with • mean µ error S ( h ) = error D ( h ) • standard deviation σ error S ( h ) � � error D ( h )(1 − error D ( h )) � � σ error S ( h ) = � � n Approximate this by a Normal distribution with • mean µ error S ( h ) = error D ( h ) • standard deviation σ error S ( h ) � � error S ( h )(1 − error S ( h )) � � σ error S ( h ) ≈ � � n 12

Normal Probability Distribution Normal distribution with mean 0, standard deviation 1 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 1 2 3 1 2 πσ 2 e − 1 2 ( x − µ σ ) 2 √ p ( x ) = The probability that X will fall into the interval ( a, b ) is given by � b a p ( x ) dx • Expected, or mean value of X , E [ X ], is E [ X ] = µ • Variance of X is V ar ( X ) = σ 2 • Standard deviation of X , σ X , is σ X = σ 13

Normal Probability Distribution 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -3 -2 -1 0 1 2 3 80% of area (probability) lies in µ ± 1 . 28 σ N% of area (probability) lies in µ ± z N σ N %: 50% 68% 80% 90% 95% 98% 99% z N : 0.67 1.00 1.28 1.64 1.96 2.33 2.58 14

Confidence Intervals, More Correctly If • S contains n examples, drawn independently of h and each other • n ≥ 30 Then • With approximately 95% probability, error S ( h ) lies in interval � � error D ( h )(1 − error D ( h )) � � error D ( h ) ± 1 . 96 � � n equivalently, error D ( h ) lies in interval � � error D ( h )(1 − error D ( h )) � � error S ( h ) ± 1 . 96 � � n which is approximately � � error S ( h )(1 − error S ( h )) � � error S ( h ) ± 1 . 96 � � n 15

Two-Sided and One-Sided Bounds 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 • If µ − z N σ ≤ y ≤ µ + z N σ with confidence N = 100(1 − α )% • Then −∞ ≤ y ≤ µ + z N σ with confidence N = 100(1 − α/ 2)% and µ − z N σ ≤ y ≤ + ∞ with confidence N = 100(1 − α/ 2)% • Example: n = 40, r = 12 – Two-sided, 95% confidence ( α = 0 . 05) P (0 . 16 ≤ y ≤ 0 . 44) = 0 . 95 – One-sided P ( y ≤ 0 . 44) = P ( y ≥ 0 . 16) = (1 − α/ 2) = 0 . 975 16

Calculating Confidence Intervals 1. Pick parameter p to estimate • error D ( h ) 2. Choose an estimator • error S ( h ) 3. Determine probability distribution that governs estimator • error S ( h ) governed by Binomial distribution, approximated by Normal when n ≥ 30 4. Find interval ( L, U ) such that N% of probability mass falls in the interval • Use table of z N values 17

Central Limit Theorem Consider a set of independent, identically distributed random variables Y 1 . . . Y n , all governed by an arbitrary probability distribution with mean µ and finite variance σ 2 . Define the sample mean, Y ≡ 1 n ¯ i =1 Y i � n Central Limit Theorem. As n → ∞ , the distribution governing ¯ Y approaches a Normal distribution, with mean µ and variance σ 2 n . 18

Difference Between Hypotheses Test h 1 on sample S 1 , test h 2 on S 2 1. Pick parameter to estimate d ≡ error D ( h 1 ) − error D ( h 2 ) 2. Choose an estimator ˆ d ≡ error S 1 ( h 1 ) − error S 2 ( h 2 ) 3. Determine probability distribution that governs estimator � error S 1 ( h 1 )(1 − error S 1 ( h 1 )) + error S 2 ( h 2 )(1 − error S 2 ( h 2 )) d ≈ σ ˆ n 1 n 2 Find interval ( L, U ) such that N% of probability mass falls in the interval � error S 1 ( h 1 )(1 − error S 1 ( h 1 )) + error S 2 ( h 2 )(1 − error S 2 ( h 2 )) ˆ d ± z N n 1 n 2 19

Hypothesis Testing P ( error D ( h 1 ) > error D ( h 2 )) =? • Example ◦ | S 1 | = | S 2 | = 100 ◦ error S 1 ( h 1 ) = 0 . 30 ◦ error S 2 ( h 2 ) = 0 . 20 ◦ ˆ d = 0 . 10 ◦ σ ˆ d = 0 . 061 • P ( ˆ d + 0 . 10) = probability ˆ d < µ ˆ d does not overestimate d by more than 0.10 ◦ z N · σ ˆ d = 0 . 10 ◦ z N = 1 . 64 • P ( ˆ d < µ ˆ d + 1 . 64 σ ˆ d ) = 0 . 95 • I.e., reject null hypothesis with 0.05 level of significance 20

Paired t test to compare h A , h B 1. Partition data into k disjoint test sets T 1 , T 2 , . . . , T k of equal size, where this size is at least 30. 2. For i from 1 to k , do δ i ← error T i ( h A ) − error T i ( h B ) 3. Return the value ¯ δ , where δ ≡ 1 k ¯ i =1 δ i � k N % confidence interval estimate for δ : ¯ δ ± t N,k − 1 s ¯ δ � 1 � k � i =1 ( δ i − ¯ δ ) 2 s ¯ δ ≡ � � � � k ( k − 1) � Note δ i approximately Normally distributed 21

Comparing learning algorithms L A and L B What we’d like to estimate: E S ⊂D [ error D ( L A ( S )) − error D ( L B ( S ))] where L ( S ) is the hypothesis output by learner L using training set S i.e., the expected difference in true error between hypotheses output by learners L A and L B , when trained using randomly selected training sets S drawn according to distribution D . But, given limited data D 0 , what is a good estimator? • could partition D 0 into training set S 0 and testing set T 0 , and measure error T 0 ( L A ( S 0 )) − error T 0 ( L B ( S 0 )) • even better, repeat this many times and average the results (next slide) 22

Comparing learning algorithms L A and L B 1. Partition data D 0 into k disjoint test sets T 1 , T 2 , . . . , T k of equal size, where this size is at least 30. 2. For i from 1 to k , do use T i for the test set, and the remaining data for training set S i • S i ← { D 0 − T i } • h A ← L A ( S i ) • h B ← L B ( S i ) • δ i ← error T i ( h A ) − error T i ( h B ) 3. Return the value ¯ δ , where δ ≡ 1 k ¯ i =1 δ i � k 23

Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating - PDF document

Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution, Central Limit Theorem

Evaluating Hypotheses Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 5

1 Z-Score Test for Comparing One-sided vs Two-sided Tests Learned Hypotheses Assumes h 1 is

Hypotheses Question What are the hypotheses for testing for a difference between the aver- age

Hypotheses with two variates Two sample hypotheses R.W. Oldford Common hypotheses Recall some

Hypotheses with two variates Paired data R.W. Oldford Common hypotheses Recall some common

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

Business Statistics CONTENTS A hypothesis test Hypotheses Rejection region and significance

Learning Logically Defined Hypotheses Martin Grohe RWTH Aachen Outline I. A Declarative

Last Time u Debugging It s a science use experiments to refine hypotheses about

Course 02402 Overview, Hypotheses Concerning Means Introduction to Statistics Motivating Example

Hypotheses testing, p-values, Type I and Type II Errors Statistics are not substitute for

The Logic of Statistical Inference -- Testing Hypotheses Confirming your research hypothesis

Fictions Functions: Three Data-Driven Hypotheses Andrew Piper, McGill University How can we

Course 02402 Overview, Hypotheses Tests Concerning Two Means Introduction to Statistics

Verifying Test Hypotheses - HOL/TestGen An Experiment in Test and Proof Thomas Malcher January

Constraint satisfaction Jets and Sharks (McClelland, 1981) Units represent hypotheses about parts

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods

Evaluating the Productivity of a Evaluating the Productivity of a Multicore Architecture

1 2 3 4 5 6 7 8 9 10 Several hypotheses have been proposed for the mechanism of CFRD

Some simple hypotheses to be Some simple hypotheses to be tested by IBOY-DIWPA data Takakazu

Chapter 9: Testing Hypotheses In this chapter we will cover: 1. Hypothesis tests ( 9.1, 9.2

Quantum Algorithms for Quantum Algorithms for Evaluating M IN Evaluating M IN -M -M AX AX Trees

In this video Evaluating a students ability to do headstand Evaluating students The

CSCE 478/878 Lecture 6: Bayesian Learning MAP learners 1. Provide practical learning