Machine Learning
Computational Learning Theory: Occam’s Razor
1
Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others
Computational Learning Theory: Occams Razor Machine Learning 1 - - PowerPoint PPT Presentation
Computational Learning Theory: Occams Razor Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others This lecture: Computational Learning Theory The Theory of Generalization Probably
1
Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others
2
3
4
5
“Numquam ponenda est pluralitas sine necessitate”
6
(Never posit plurality without necessity.)
7
8
9
10
(Assuming consistency)
11
(Assuming consistency) That is, consistent yet bad
12
(Assuming consistency) That is, consistent yet bad
13
(Assuming consistency) That is, consistent yet bad
14
(Assuming consistency) That is, consistent yet bad
15
(Assuming consistency) That is, consistent yet bad Union bound For a set of events, the probability that at least one of them happens < the sum of the probabilities of the individual events
% − $" & … > 1 − 𝑦
16
This situation is a bad one. Let us try to see what we need to do to ensure that this situation is rare.
% − $" & … > 1 − 𝑦
17
% − $" & … > 1 − 𝑦
18
% − $" & … > 1 − 𝑦
19
% − $" & … > 1 − 𝑦
20
% − $" & … > 1 − 𝑦
21
% − $" & … > 1 − 𝑦
22
% − $" & … > 1 − 𝑦
23
That is, if 𝑛 > '
( ln 𝐼 + ln ' )
then, the probability of getting a bad hypothesis is small
% − $" & … > 1 − 𝑦
24
That is, if 𝑛 > '
( ln 𝐼 + ln ' )
then, the probability of getting a bad hypothesis is small If this is true
% − $" & … > 1 − 𝑦
25
That is, if 𝑛 > '
( ln 𝐼 + ln ' )
then, the probability of getting a bad hypothesis is small If this is true Then, this holds
% − $" & … > 1 − 𝑦
26
That is, if 𝑛 > '
( ln 𝐼 + ln ' )
then, the probability of getting a bad hypothesis is small If this is true Then, this holds Then, this is improbable
27
28
increases sample complexity (i.e more examples needed for the guarantee)
29
space, then we will make learning harder (i.e higher sample complexity)
increases sample complexity (i.e more examples needed for the guarantee)
30
in the classifier we will produce, sample complexity will be higher.
space, then we will make learning harder (i.e higher sample complexity)
increases sample complexity (i.e more examples needed for the guarantee)
31
32
33
– If 𝑛 is large enough, a consistent hypothesis must be close enough to 𝑔 – Check that 𝑛 does not have to be too large (i.e., polynomial in the relevant parameters): we showed that the “closeness” guarantee requires that
𝑛 > 1 𝜗 ln 𝐼 + ln 1 𝜀
34
35