MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT

About this class 1. Introduce a basic class of learning methods, namely local methods . 2. Discuss the fundamental concept of bias-variance trade-off to understand parameter tuning (a.k.a. model selection) MLCC 2019 2

Outline Learning with Local Methods From Bias-Variance to Cross-Validation MLCC 2019 3

The problem What is the price of one house given its area? MLCC 2019 4

The problem What is the price of one house given its area? Start from data... MLCC 2019 5

The problem What is the price of one house given its area? Start from data... 5 x 10 6 5 Area ( m 2 ) Price ( A C) x 1 = 62 y 1 = 99 , 200 4 x 2 = 64 y 2 = 135 , 700 3 x 3 = 65 y 3 = 93 , 300 x 4 = 66 y 4 = 114 , 000 2 . . . . . . 1 0 50 100 150 200 250 300 350 400 450 500 Let S the houses example dataset ( n = 100 ) S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } MLCC 2019 6

The problem What is the price of one house given its area? Start from data... 5 x 10 6 5 Area ( m 2 ) Price ( A C) x 1 = 62 y 1 = 99 , 200 4 x 2 = 64 y 2 = 135 , 700 3 x 3 = 65 y 3 = 93 , 300 x 4 = 66 y 4 = 114 , 000 2 . . . . . . 1 0 50 100 150 200 250 300 350 400 450 500 Let S the houses example dataset ( n = 100 ) S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } Given a new point x ∗ we want to predict y ∗ by means of S . MLCC 2019 7

Example Let x ∗ a 300 m 2 house. MLCC 2019 8

Example Let x ∗ a 300 m 2 house. 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 What is its price? MLCC 2019 9

Nearest Neighbors Nearest Neighbor : y ∗ is the same of the closest point to x ∗ in S . y ∗ = 311 , 200 5 x 10 6 5 Area ( m 2 ) Price ( A C) . . . . 4 . . x 93 = 255 y 93 = 274 , 600 3 x 94 = 264 y 94 = 324 , 900 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . 1 . . . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 10

Nearest Neighbors Nearest Neighbor : y ∗ is the same of the closest point to x ∗ in S . y ∗ = 311 , 200 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 11

Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R MLCC 2019 12

Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , MLCC 2019 13

Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ y ∗ the predicted output y ∗ = ˆ f ( x ∗ ) where y ∗ = y j j = arg i =1 ,...,n � x − x i � min MLCC 2019 14

Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ y ∗ the predicted output y ∗ = ˆ f ( x ∗ ) where y ∗ = y j j = arg i =1 ,...,n � x − x i � min Computational cost O ( nD ) : we compute n times the distance � x − x i � that costs O ( D ) MLCC 2019 15

Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ y ∗ the predicted output y ∗ = ˆ f ( x ∗ ) where y ∗ = y j j = arg i =1 ,...,n � x − x i � min Computational cost O ( nD ) : we compute n times the distance � x − x i � that costs O ( D ) In general let d : R D × R D a distance on the input space, then f ( x ) = y j j = arg i =1 ,...,n d ( x, x i ) min MLCC 2019 16

Extensions Nearest Neighbor takes y ∗ is the same of the closest point to x ∗ in S . 5 x 10 6 5 Area ( m 2 ) Price ( A C) . . . . 4 . . x 93 = 255 y 93 = 274 , 600 3 x 94 = 264 y 94 = 324 , 900 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . 1 . . . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 17

Extensions Nearest Neighbor takes y ∗ is the same of the closest point to x ∗ in S . 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 Can we do better? (for example using more points) MLCC 2019 18

K-Nearest Neighbors K-Nearest Neighbor : y ∗ is the mean of the values of the K closest point to x ∗ in S . If K = 3 we have y ∗ = 274 , 600 + 324 , 900 + 311 , 200 = 303 , 600 3 5 x 10 6 5 Area ( m 2 ) Price ( A C) . . . . 4 . . x 93 = 255 y 93 = 274 , 600 3 x 94 = 264 y 94 = 324 , 900 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . 1 . . . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 19

K-Nearest Neighbors K-Nearest Neighbor : y ∗ is the mean of the values of the K closest point to x ∗ in S . If K = 3 we have y ∗ = 274 , 600 + 324 , 900 + 311 , 200 = 303 , 600 3 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 20

K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , MLCC 2019 21

K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ Let K be an integer K << n , MLCC 2019 22

K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ Let K be an integer K << n , ◮ j 1 , . . . , j K defined as j 1 = arg min i ∈{ 1 ,...,n } � x ∗ − x i � and j t = arg min i ∈{ 1 ,...,n }\{ j 1 ,...,j t − 1 } � x ∗ − x i � for t ∈ { 2 , . . . , K } , MLCC 2019 23

K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ Let K be an integer K << n , ◮ j 1 , . . . , j K defined as j 1 = arg min i ∈{ 1 ,...,n } � x ∗ − x i � and j t = arg min i ∈{ 1 ,...,n }\{ j 1 ,...,j t − 1 } � x ∗ − x i � for t ∈ { 2 , . . . , K } , ◮ predicted output y ∗ = 1 � y i K i ∈{ j 1 ,...,j K } MLCC 2019 24

K-Nearest Neighbors (cont.) K f ( x ) = 1 � y j i K i =1 MLCC 2019 25

K-Nearest Neighbors (cont.) K f ( x ) = 1 � y j i K i =1 ◮ Computational cost O ( nD + n log n ) : compute the n distances � x − x i � for i = { 1 , . . . , n } (each costs O ( D ) ). Order them O ( n log n ) . MLCC 2019 26

K-Nearest Neighbors (cont.) K f ( x ) = 1 � y j i K i =1 ◮ Computational cost O ( nD + n log n ) : compute the n distances � x − x i � for i = { 1 , . . . , n } (each costs O ( D ) ). Order them O ( n log n ) . ◮ General Metric d f is the same, but j 1 , . . . , j K are defined as j 1 = arg min i ∈{ 1 ,...,n } d ( x, x i ) and j t = arg min i ∈{ 1 ,...,n }\{ j 1 ,...,j t − 1 } d ( x, x i ) for t ∈ { 2 , . . . , K } MLCC 2019 27

Parzen Windows K-NN puts equal weights on the values of the selected points. MLCC 2019 28

Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? MLCC 2019 29

Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x ∗ should influence more its value MLCC 2019 30

Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x ∗ should influence more its value PARZEN WINDOWS: � n i =1 y i k ( x, x i ) ˆ f ( x ) = � n i =1 k ( x, x i ) where k is a similarity function ◮ k ( x, x ′ ) ≥ 0 for all x, x ′ ∈ R D ◮ k ( x, x ′ ) → 1 when � x − x ′ � → 0 ◮ k ( x, x ′ ) → 0 when � x − x ′ � → ∞ MLCC 2019 31

Parzen Windows Examples of k � � 1 − � x − x ′ � ◮ k 1 ( x, x ′ ) = sign + with a σ > 0 σ � � 1 − � x − x ′ � ◮ k 2 ( x, x ′ ) = + with a σ > 0 σ � � 1 − � x − x ′ � 2 ◮ k 3 ( x, x ′ ) = + with a σ > 0 σ 2 ◮ k 4 ( x, x ′ ) = e − � x − x ′� 2 with a σ > 0 2 σ 2 ◮ k 5 ( x, x ′ ) = e − � x − x ′� with a σ > 0 σ MLCC 2019 32

K-NN example K -Nearest neighbor depends on K . When K = 1 1.5 1 0.5 0 −0.5 −1 −1.5 −0.5 0 0.5 1 1.5 MLCC 2019 33

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this class 1. Introduce a basic class of learning methods, namely local methods . 2. Discuss the fundamental concept of bias-variance trade-off to

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco UNIGE-MIT-IIT Outline Learning

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

GMM & EM Last time summary Normalization Bias-Variance trade-off Overfitting and

Introduction to Machine Learning Model Validation and Selection Dr. Ilija Bogunovic Learning

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

RECSM Summer School: Machine Learning for Social Sciences Session 1.3: Supervised Learning and

A major risk in classification: overfitting Assume we have a small data set We fit a model that

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Deep Nets with http://vem.quantumunlimited.org/the-gates-of-horn/ Keras Professor

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco - PowerPoint PPT Presentation

MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this class 1. Introduce a basic class of learning methods, namely local methods . 2. Discuss the fundamental concept of bias-variance trade-off to

MLCC 2017 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2017

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias-Variance Tradeoff Machine Learning 1 Bias and variance Every learning algorithm requires

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

MLCC 2018 Statistical Learning: Basic Concepts Lorenzo Rosasco UNIGE-MIT-IIT Outline Learning

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

GMM &amp; EM Last time summary Normalization Bias-Variance trade-off Overfitting and

Introduction to Machine Learning Model Validation and Selection Dr. Ilija Bogunovic Learning

STAT 213 Cross-Validation (and Multifactor ANOVA?) Colin Reimer Dawson Oberlin College 12

Introduction to Data Science: Classifier n 1 n 1 k k Suppose you want to compare two

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

RECSM Summer School: Machine Learning for Social Sciences Session 1.3: Supervised Learning and

A major risk in classification: overfitting Assume we have a small data set We fit a model that

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Deep Nets with http://vem.quantumunlimited.org/the-gates-of-horn/ Keras Professor

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

GMM & EM Last time summary Normalization Bias-Variance trade-off Overfitting and