Learning From Data Lecture 5 Training Versus Testing The Two - PowerPoint PPT Presentation

Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization ( E in ≈ E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI 4100/6100

recap: The Two Questions of Learning 1. Can we make sure that E out ( g ) is close enough to E in ( g )? 2. Can we make E in ( g ) small enough? out-of-sample error The Hoeffding generalization bound : model complexity � Error 2 N ln 2 |H| 1 E out ( g ) ≤ E in ( g ) + δ in-sample error � �� generalization error bar |H| |H| ∗ E in : training (eg. the practice exam) There is a tradeoff when picking |H| . E out : testing (eg. the real exam) M Training Versus Testing : 2 /18 � A c L Creator: Malik Magdon-Ismail Goal of generalization theory − →

What Will The Theory of Generalization Achieve? out-of-sample error model complexity � Error 2 N ln 2 |H| 1 E out ( g ) ≤ E in ( g ) + δ in-sample error |H| |H| ∗ ↓ out-of-sample error model complexity � Error N ln 4 m H 8 E out ( g ) ≤ E in ( g ) + δ in-sample error model complexity The new bound will be applicable to infinite H . M Training Versus Testing : 3 /18 � A c L Creator: Malik Magdon-Ismail |H| is overkill − →

Why is |H| an Overkill How did |H| come in? B ad events B 2 B 1 B g = {| E out ( g ) − E in ( g ) | > ǫ } B m = {| E out ( h m ) − E in ( h m ) | > ǫ } We do not know which g , so use a worst case union bound. |H| � P [ B g ] ≤ P [any B m ] ≤ P [ B m ] . B 3 m =1 • B m are events (sets of outcomes); they can overlap. • If the B m overlap, the union bound is loose. • If many h m are similar, the B m overlap. • There are “effectively” fewer than |H| hypotheses,. • We can replace |H| by something smaller. |H| fails to account for similarity between hypotheses. M Training Versus Testing : 4 /18 � A c L Creator: Malik Magdon-Ismail Measuring diversity on N points − →

Measuring the Diversity (Size) of H We need a way to measure the diversity of H . A simple idea: Fix any set of N data points. If H is diverse it should be able to implement all functions . . . on these N points. M Training Versus Testing : 5 /18 � A c L Creator: Malik Magdon-Ismail Example: large H − →

A Data Set Reveals the True Colors of an H H M Training Versus Testing : 6 /18 � A c L Creator: Malik Magdon-Ismail . . . through the eyes of D − →

A Data Set Reveals the True Colors of an H H H through the eyes of the D M Training Versus Testing : 7 /18 � A c L Creator: Malik Magdon-Ismail Just one dichotomy − →

A Data Set Reveals the True Colors of an H From the point of view of D , the entire H is just one dichotomy . M Training Versus Testing : 8 /18 � A c L Creator: Malik Magdon-Ismail An effective number of hypotheses − →

An Effective Number of Hypotheses If H is diverse it should be able to implement many dichotomys. |H| only captures the maximum possible diversity of H . Consider an h ∈ H , and a data set x 1 , . . . , x N . h gives us an N -tuple of ± 1’s: ( h ( x 1 ) , . . . , h ( x N )). A dichotomy of the inputs. If H is diverse, we get many different dichotomies. dichotomy If H contains similar functions, we only get a few dichotomies. The growth function quantifies this. M Training Versus Testing : 9 /18 � A c L Creator: Malik Magdon-Ismail Growth function − →

The Growth Function m H ( N ) Define the the restriction of H to the inputs x 1 , x 2 , . . . , x N : H ( x 1 , . . . , x N ) = { ( h ( x 1 ) , . . . , h ( x N )) | h ∈ H} (set of dichotomies induced by H ) The Growth Function m H ( N ) The largest set of dichotomies induced by H : m H ( N ) = max x 1 ,..., x N |H ( x 1 , . . . , x N ) | . m H ( N ) ≤ 2 N . Can we replace |H| by m H , an effective number of hypotheses? � � � • Replacing |H| with 2 N is no help in the bound. (why?) 2 N ln 2 |H| 1 the error bar is δ • We want m H ( N ) ≤ poly(N) to get a useful error bar. M Training Versus Testing : 10 /18 � A c L Creator: Malik Magdon-Ismail Example: 2-d perceptron − →

Example: 2-D Perceptron Model Cannot implement Can implement all 8 Can implement at most 14 m H (3) = 8 = 2 3 . m H (4) = 14 < 2 4 . What is m H (5)? M Training Versus Testing : 11 /18 � A c L Creator: Malik Magdon-Ismail Example: 1-d positive ray − →

Example: 1-D Positive Ray Model + w 0 · · · x 1 x 2 · · · x N • h ( x ) = sign ( x − w 0 ) • Consider N points. • There are N + 1 dichotomies depending on where you put w 0 . • m H ( N ) = N + 1. M Training Versus Testing : 12 /18 � A c L Creator: Malik Magdon-Ismail Example: 2-d positive rectangle − →

Example: Positive Rectangles in 2-D N = 4 N = 5 x 2 x 2 x 1 x 3 x 1 x 3 x 4 x 4 x 4 H implements all dichotomies some point will be inside a rectangle defined by others m H (4) = 2 4 m H (5) < 2 5 We have not computed m H (5) – not impossible, but tricky. M Training Versus Testing : 13 /18 � A c L Creator: Malik Magdon-Ismail The growth functions summarized − →

Example Growth Functions N 1 2 3 4 5 · · · 2-D perceptron 2 4 8 14 · · · 1-D pos. ray 2 3 4 5 · · · < 2 5 · · · 2-D pos. rectangles 2 4 8 16 • m H ( N ) drops below 2 N – there is hope for the generalization bound. • A break point is any n for which m H ( n ) < 2 n . M Training Versus Testing : 14 /18 � A c L Creator: Malik Magdon-Ismail Combinatorial puzzle: dichotomys on 3 points − →

A Combinatorial Puzzle x 1 x 2 x 3 ◦ ◦ ◦ ◦ ◦ • ◦ • ◦ ◦ • • A set of dichotomys M Training Versus Testing : 15 /18 � A c L Creator: Malik Magdon-Ismail Two points shattered − →

A Combinatorial Puzzle x 1 x 2 x 3 ◦ ◦ ◦ ◦ ◦ • ◦ • ◦ ◦ • • Two points are shattered M Training Versus Testing : 16 /18 � A c L Creator: Malik Magdon-Ismail Another set of dichotomys − →

A Combinatorial Puzzle x 1 x 2 x 3 ◦ ◦ ◦ ◦ ◦ • ◦ • ◦ • ◦ ◦ No pair of points is shattered M Training Versus Testing : 17 /18 � A c L Creator: Malik Magdon-Ismail What about N = 4? − →

A Combinatorial Puzzle x 1 x 2 x 3 x 1 x 2 x 3 x 4 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ • ◦ ◦ ◦ • . . ◦ • ◦ . • ◦ ◦ 4 dichotomies is max. If N = 4 how many possible dichotomys with no 2 points shattered? M Training Versus Testing : 18 /18 � A c L Creator: Malik Magdon-Ismail

Learning From Data Lecture 5 Training Versus Testing The Two - PowerPoint PPT Presentation

Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization ( E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Two Questions

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Testing III Testing III Week 16 Agenda (Lecture) Agenda (Lecture) White box testing White box

Testing I Testing I Week 14 Agenda (Lecture) Agenda (Lecture) Concepts and principles of

Overview Objective Types of testing ECE 553: TESTING AND Verification testing

Object Oriented Testing Chapter 23 1 OO Testing Class Testing: Equivalent to unit testing

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

Three key debates about nationalism (in Europe) Instrumental versus Intrinsic Binary

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

Radial Versus Bias Tires OTR TRAINING Radial Versus Bias Tires Guizhou Tyre Co., LTD. produces

TDDD04: System level testing Lena Buffoni lena.buffoni@liu.se Lecture plan System testing

Development Services in Automotive TESTING LABORATORY Accredited Testing Laboratory Nr. 1552

Lecture 4: Linear Regression Optimization Generalization Model complexity

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND

Video to Text Description Jia Chen 1 , Shizhe Chen 2 , Qin Jin 2 , Alexander Hauptmann 1 Carnegie

Lecture 4.5: Generalized Fourier series Matthew Macauley Department of Mathematical Sciences

Generalization Error MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON Elie Kawerk

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer NNs

Outline Learning from Examples 1 Motivation Supervised Learning Aspects of Supervised Learning

Diagnosing ML System Shih-Yang Su Virginia Tech ECE-5424G / CS-5824 Spring 2019 Today's