MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco UNIGE-MIT-IIT

Outline Learning from Examples Data Space and Distribution Loss Function and Expected Risk Stability, Overfitting and Regularization MLCC 2017 2

Learning from Examples ◮ Machine Learning deals with systems that are trained from data rather than being explicitly programmed ◮ Here we describe the framework considered in statistical learning theory. MLCC 2017 3

Supervised Learning The goal of supervised learning is to find an underlying input-output relation f ( x n ew ) ∼ y, given data. MLCC 2017 4

Supervised Learning The goal of supervised learning is to find an underlying input-output relation f ( x n ew ) ∼ y, given data. The data, called training set , is a set of n input-output pairs (examples) S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } . MLCC 2017 5

We Need a Model to Learn ◮ We consider the approach to machine learning based on the learning from examples paradigm ◮ Goal: Given the training set, learn a corresponding I/O relation ◮ We have to postulate the existence of a model for the data ◮ The model should take into account the possible uncertainty in the task and in the data MLCC 2017 6

Data Space ◮ The inputs belong to an input space X , we assume that X ⊆ R D ◮ The outputs belong to an output space Y , typycally a subset of R ◮ The space X × Y is called the data space MLCC 2017 8

Examples of Data Space We consider several possible situations: ◮ Regression: Y ⊆ R ◮ Binary classification Y = {− 1 , 1 } ◮ Multi-category (multiclass) classification Y = { 1 , 2 , . . . , T } . ◮ . . . MLCC 2017 9

Modeling Uncertainty in the Data Space ◮ Assumption: ∃ a fixed unknown distribution p ( x, y ) according to which the data are identically and independently sampled ◮ The distribution p models different sources of uncertainty ◮ Assumption: p factorizes as p ( x, y ) = p X ( x ) p ( y | x ) MLCC 2017 10

Marginal and Conditional p ( y | x ) can be seen as a form of noise in the output p (y|x) x Y X Figure: For each input x there is a distribution of possible outputs p ( y | x ) . MLCC 2017 11

Marginal and Conditional p ( y | x ) can be seen as a form of noise in the output p (y|x) x Y X Figure: For each input x there is a distribution of possible outputs p ( y | x ) . The marginal distribution p X ( x ) models uncertainty in the sampling of the input points. MLCC 2017 12

Data Models ◮ In regression , the following model is often considered: y = f ∗ ( x ) + ǫ where: – f ∗ : fixed unknown ( regression ) function – ǫ : random noise, e.g. standard Gaussian N (0 , σI ) , σ ∈ [0 , ∞ ) ◮ In classification , p (1 | x ) = 1 − p ( − 1 | x ) , ∀ x Noiseless classification, p (1 | x ) = { 1 , 0 } , ∀ x ∈ X MLCC 2017 13

Loss Function Goal of learning: Estimate “best” I/O relation (not the whole p ( x, y ) ) ◮ We need to fix a loss function ℓ : Y × Y → [0 , ∞ ) ℓ ( y, f ( x )) is a point-wise error measure. It is the cost of when predicting f ( x ) in place of y MLCC 2017 15

Expected Risk and Target Function The expected loss (or expected risk) � E ( f ) = E [ ℓ ( y, f ( x ))] = p ( x, y ) ℓ ( y, f ( x )) dxdy can be seen as a measure of the error on past as well as future data. MLCC 2017 16

Expected Risk and Target Function The expected loss (or expected risk) � E ( f ) = E [ ℓ ( y, f ( x ))] = p ( x, y ) ℓ ( y, f ( x )) dxdy can be seen as a measure of the error on past as well as future data. Given ℓ and a distribution, the ”best” I/O relation is the target function f ∗ : X → Y that minimizes the expected risk MLCC 2017 17

Learning from Data ◮ The target function f ∗ cannot be computed, since p is unknown MLCC 2017 18

Learning from Data ◮ The target function f ∗ cannot be computed, since p is unknown ◮ The goal of learning is to find an estimator of the target function from data MLCC 2017 19

Learning Algorithms and Generalization ◮ A learning algorithm is a procedure that given a training set S computes an estimator f S MLCC 2017 21

Learning Algorithms and Generalization ◮ A learning algorithm is a procedure that given a training set S computes an estimator f S ◮ An estimator should mimic the target function, in which case we say that it generalizes MLCC 2017 22

Learning Algorithms and Generalization ◮ A learning algorithm is a procedure that given a training set S computes an estimator f S ◮ An estimator should mimic the target function, in which case we say that it generalizes ◮ More formally we are interested in an estimator such that the excess expected risk E ( f S ) − E ( f ∗ ) , is small MLCC 2017 23

Learning Algorithms and Generalization ◮ A learning algorithm is a procedure that given a training set S computes an estimator f S ◮ An estimator should mimic the target function, in which case we say that it generalizes ◮ More formally we are interested in an estimator such that the excess expected risk E ( f S ) − E ( f ∗ ) , is small The latter requirement needs some care since f S depends on the training set and hence is random MLCC 2017 24

Generalization and Consistency A natural approach is to consider the expectation of the excess expected risk E S [ E ( f S ) − E ( f ∗ )] MLCC 2017 25

Generalization and Consistency A natural approach is to consider the expectation of the excess expected risk E S [ E ( f S ) − E ( f ∗ )] ◮ A basic requirement is consistency n →∞ E S [ E ( f S ) − E ( f ∗ )] = 0 lim MLCC 2017 26

Generalization and Consistency A natural approach is to consider the expectation of the excess expected risk E S [ E ( f S ) − E ( f ∗ )] ◮ A basic requirement is consistency n →∞ E S [ E ( f S ) − E ( f ∗ )] = 0 lim ◮ Learning rates provide finite sample information, for all ǫ > if n ≥ n ( ǫ ) , then E S [ E ( f S ) − E ( f ∗ )] ≤ ǫ, ◮ n ( ǫ ) is called sample complexity MLCC 2017 27

Generalization: Fitting and Stability How to design a good algorithm? MLCC 2017 28

Generalization: Fitting and Stability How to design a good algorithm? Two concepts are key: MLCC 2017 29

Generalization: Fitting and Stability How to design a good algorithm? Two concepts are key: ◮ Fitting : an estimator should fit data well MLCC 2017 30

Generalization: Fitting and Stability How to design a good algorithm? Two concepts are key: ◮ Fitting : an estimator should fit data well ◮ Stability : an estimator should be stable, it should not change much if data change slightly MLCC 2017 31

Generalization: Fitting and Stability How to design a good algorithm? We say that an algorithms overfits , if it fits the data while being unstable We say that an algorithms oversmooth , if it is stable while disregarding the data MLCC 2017 32

Regularization as a Fitting-Stability Trade-off ◮ Most learning algorithms depend on one (or more) regularization parameter , that controls the trade-off between data-fitting and stability ◮ We broadly refer to this class of approaches as regularization algorithms , our main topic of discussion MLCC 2017 33

Wrapping up In this class, we introduced the basic definitions in statistical learning theory, including the key concepts of overfitting, stability and generalization. MLCC 2017 34

Next Class We will introduce the a first basic class of learning methods, namely local methods, and study more formally the fundamental trade-off between overfitting and stability. MLCC 2017 35

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco UNIGE-MIT-IIT Outline Learning from Examples Data Space and Distribution Loss Function and Expected Risk Stability, Overfitting and Regularization MLCC 2017 2 Learning from

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

MLCC 2017 Machine Learning Crash Course Universita' di Genova, Summer, 2017 Instructor : Lorenzo

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this

MLCC 2017 Deep Learning Lorenzo Rosasco UNIGE-MIT-IIT June 29, 2017 What? Classification

MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017

CONCEPTS AND CONCEPTS AND CONCEPTS AND CONCEPTS AND PR PR PRINC PRINC NCIPLES OF NCIPLES

Basic Concepts of I R: Outline Basic Concepts of Information Retrieval: Task definition of

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

Current C Current C Current C Current C Concepts of Concepts of Concepts of Concepts of

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Important Concepts Important Concepts Some important concepts in financial and derivative

Input-Output CS190 Functional Programming Techniques Dr Hans Georg Schaathun University of

Worst-Case Analysis of Digital Control Loops with Uncertain Input/Output Timing (Benchmark

Piecewise Affine Models from Input-Output Data Rajeev Alur Nimit Singhania University of

Motivations Chapter 12 Exceptions and File Input/Output When a program runs into a runtime

BASIC INPUT/OUTPUT Fundamentals of Computer Science Outline: Basic Input/Output Screen Output

Neural Inference of API Functions from Input Output Examples Rohan Bavishi, Caroline Lemieux,

The Stream Hierarchy Inheritance of istream and ostream from ios ios istream ostream Stream

Identifiability of linear compartment models Anne Shiu Texas A&M University ICERM 15

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco UNIGE-MIT-IIT Outline Learning from Examples Data Space and Distribution Loss Function and Expected Risk Stability, Overfitting and Regularization MLCC 2017 2 Learning from

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

MLCC 2017 Machine Learning Crash Course Universita' di Genova, Summer, 2017 Instructor : Lorenzo

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this

MLCC 2017 Deep Learning Lorenzo Rosasco UNIGE-MIT-IIT June 29, 2017 What? Classification

MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017

CONCEPTS AND CONCEPTS AND CONCEPTS AND CONCEPTS AND PR PR PRINC PRINC NCIPLES OF NCIPLES

Basic Concepts of I R: Outline Basic Concepts of Information Retrieval: Task definition of

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Basic Experimental Design Basic Concepts in Experimental Design Prof. Dr. Luc Duchateau Ghent

Current C Current C Current C Current C Concepts of Concepts of Concepts of Concepts of

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Important Concepts Important Concepts Some important concepts in financial and derivative

Input-Output CS190 Functional Programming Techniques Dr Hans Georg Schaathun University of

Worst-Case Analysis of Digital Control Loops with Uncertain Input/Output Timing (Benchmark

Piecewise Affine Models from Input-Output Data Rajeev Alur Nimit Singhania University of

Motivations Chapter 12 Exceptions and File Input/Output When a program runs into a runtime

BASIC INPUT/OUTPUT Fundamentals of Computer Science Outline: Basic Input/Output Screen Output

Neural Inference of API Functions from Input Output Examples Rohan Bavishi, Caroline Lemieux,

The Stream Hierarchy Inheritance of istream and ostream from ios ios istream ostream Stream

Identifiability of linear compartment models Anne Shiu Texas A&amp;M University ICERM 15

Identifiability of linear compartment models Anne Shiu Texas A&M University ICERM 15