MLCC 2019 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT
About this class 1. Introduce a basic class of learning methods, namely local methods . 2. Discuss the fundamental concept of bias-variance trade-off to understand parameter tuning (a.k.a. model selection) MLCC 2019 2
Outline Learning with Local Methods From Bias-Variance to Cross-Validation MLCC 2019 3
The problem What is the price of one house given its area? MLCC 2019 4
The problem What is the price of one house given its area? Start from data... MLCC 2019 5
The problem What is the price of one house given its area? Start from data... 5 x 10 6 5 Area ( m 2 ) Price ( A C) x 1 = 62 y 1 = 99 , 200 4 x 2 = 64 y 2 = 135 , 700 3 x 3 = 65 y 3 = 93 , 300 x 4 = 66 y 4 = 114 , 000 2 . . . . . . 1 0 50 100 150 200 250 300 350 400 450 500 Let S the houses example dataset ( n = 100 ) S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } MLCC 2019 6
The problem What is the price of one house given its area? Start from data... 5 x 10 6 5 Area ( m 2 ) Price ( A C) x 1 = 62 y 1 = 99 , 200 4 x 2 = 64 y 2 = 135 , 700 3 x 3 = 65 y 3 = 93 , 300 x 4 = 66 y 4 = 114 , 000 2 . . . . . . 1 0 50 100 150 200 250 300 350 400 450 500 Let S the houses example dataset ( n = 100 ) S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } Given a new point x ∗ we want to predict y ∗ by means of S . MLCC 2019 7
Example Let x ∗ a 300 m 2 house. MLCC 2019 8
Example Let x ∗ a 300 m 2 house. 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 What is its price? MLCC 2019 9
Nearest Neighbors Nearest Neighbor : y ∗ is the same of the closest point to x ∗ in S . y ∗ = 311 , 200 5 x 10 6 5 Area ( m 2 ) Price ( A C) . . . . 4 . . x 93 = 255 y 93 = 274 , 600 3 x 94 = 264 y 94 = 324 , 900 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . 1 . . . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 10
Nearest Neighbors Nearest Neighbor : y ∗ is the same of the closest point to x ∗ in S . y ∗ = 311 , 200 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 11
Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R MLCC 2019 12
Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , MLCC 2019 13
Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ y ∗ the predicted output y ∗ = ˆ f ( x ∗ ) where y ∗ = y j j = arg i =1 ,...,n � x − x i � min MLCC 2019 14
Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ y ∗ the predicted output y ∗ = ˆ f ( x ∗ ) where y ∗ = y j j = arg i =1 ,...,n � x − x i � min Computational cost O ( nD ) : we compute n times the distance � x − x i � that costs O ( D ) MLCC 2019 15
Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ y ∗ the predicted output y ∗ = ˆ f ( x ∗ ) where y ∗ = y j j = arg i =1 ,...,n � x − x i � min Computational cost O ( nD ) : we compute n times the distance � x − x i � that costs O ( D ) In general let d : R D × R D a distance on the input space, then f ( x ) = y j j = arg i =1 ,...,n d ( x, x i ) min MLCC 2019 16
Extensions Nearest Neighbor takes y ∗ is the same of the closest point to x ∗ in S . 5 x 10 6 5 Area ( m 2 ) Price ( A C) . . . . 4 . . x 93 = 255 y 93 = 274 , 600 3 x 94 = 264 y 94 = 324 , 900 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . 1 . . . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 17
Extensions Nearest Neighbor takes y ∗ is the same of the closest point to x ∗ in S . 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 Can we do better? (for example using more points) MLCC 2019 18
K-Nearest Neighbors K-Nearest Neighbor : y ∗ is the mean of the values of the K closest point to x ∗ in S . If K = 3 we have y ∗ = 274 , 600 + 324 , 900 + 311 , 200 = 303 , 600 3 5 x 10 6 5 Area ( m 2 ) Price ( A C) . . . . 4 . . x 93 = 255 y 93 = 274 , 600 3 x 94 = 264 y 94 = 324 , 900 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . 1 . . . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 19
K-Nearest Neighbors K-Nearest Neighbor : y ∗ is the mean of the values of the K closest point to x ∗ in S . If K = 3 we have y ∗ = 274 , 600 + 324 , 900 + 311 , 200 = 303 , 600 3 5 x 10 6 Area ( m 2 ) Price ( A C) 5 . . . . . . 4 x 93 = 255 y 93 = 274 , 600 x 94 = 264 y 94 = 324 , 900 3 x 95 = 310 y 95 = 311 , 200 2 x 96 = 480 y 96 = 515 , 400 . . . . 1 . . 0 50 100 150 200 250 300 350 400 450 500 MLCC 2019 20
K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , MLCC 2019 21
K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ Let K be an integer K << n , MLCC 2019 22
K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ Let K be an integer K << n , ◮ j 1 , . . . , j K defined as j 1 = arg min i ∈{ 1 ,...,n } � x ∗ − x i � and j t = arg min i ∈{ 1 ,...,n }\{ j 1 ,...,j t − 1 } � x ∗ − x i � for t ∈ { 2 , . . . , K } , MLCC 2019 23
K-Nearest Neighbors ◮ S = { ( x i , y i ) } n i =1 with x i ∈ R D , y i ∈ R ◮ x ∗ the new point x ∗ ∈ R D , ◮ Let K be an integer K << n , ◮ j 1 , . . . , j K defined as j 1 = arg min i ∈{ 1 ,...,n } � x ∗ − x i � and j t = arg min i ∈{ 1 ,...,n }\{ j 1 ,...,j t − 1 } � x ∗ − x i � for t ∈ { 2 , . . . , K } , ◮ predicted output y ∗ = 1 � y i K i ∈{ j 1 ,...,j K } MLCC 2019 24
K-Nearest Neighbors (cont.) K f ( x ) = 1 � y j i K i =1 MLCC 2019 25
K-Nearest Neighbors (cont.) K f ( x ) = 1 � y j i K i =1 ◮ Computational cost O ( nD + n log n ) : compute the n distances � x − x i � for i = { 1 , . . . , n } (each costs O ( D ) ). Order them O ( n log n ) . MLCC 2019 26
K-Nearest Neighbors (cont.) K f ( x ) = 1 � y j i K i =1 ◮ Computational cost O ( nD + n log n ) : compute the n distances � x − x i � for i = { 1 , . . . , n } (each costs O ( D ) ). Order them O ( n log n ) . ◮ General Metric d f is the same, but j 1 , . . . , j K are defined as j 1 = arg min i ∈{ 1 ,...,n } d ( x, x i ) and j t = arg min i ∈{ 1 ,...,n }\{ j 1 ,...,j t − 1 } d ( x, x i ) for t ∈ { 2 , . . . , K } MLCC 2019 27
Parzen Windows K-NN puts equal weights on the values of the selected points. MLCC 2019 28
Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? MLCC 2019 29
Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x ∗ should influence more its value MLCC 2019 30
Parzen Windows K-NN puts equal weights on the values of the selected points. Can we generalize it? Closer points to x ∗ should influence more its value PARZEN WINDOWS: � n i =1 y i k ( x, x i ) ˆ f ( x ) = � n i =1 k ( x, x i ) where k is a similarity function ◮ k ( x, x ′ ) ≥ 0 for all x, x ′ ∈ R D ◮ k ( x, x ′ ) → 1 when � x − x ′ � → 0 ◮ k ( x, x ′ ) → 0 when � x − x ′ � → ∞ MLCC 2019 31
Parzen Windows Examples of k � � 1 − � x − x ′ � ◮ k 1 ( x, x ′ ) = sign + with a σ > 0 σ � � 1 − � x − x ′ � ◮ k 2 ( x, x ′ ) = + with a σ > 0 σ � � 1 − � x − x ′ � 2 ◮ k 3 ( x, x ′ ) = + with a σ > 0 σ 2 ◮ k 4 ( x, x ′ ) = e − � x − x ′� 2 with a σ > 0 2 σ 2 ◮ k 5 ( x, x ′ ) = e − � x − x ′� with a σ > 0 σ MLCC 2019 32
K-NN example K -Nearest neighbor depends on K . When K = 1 1.5 1 0.5 0 −0.5 −1 −1.5 −0.5 0 0.5 1 1.5 MLCC 2019 33
K-NN example K -Nearest neighbor depends on K . When K = 2 1.5 1 0.5 0 −0.5 −1 −1.5 −0.5 0 0.5 1 1.5 MLCC 2019 34
K-NN example K -Nearest neighbor depends on K . When K = 3 1.5 1 0.5 0 −0.5 −1 −1.5 −0.5 0 0.5 1 1.5 MLCC 2019 35
K-NN example K -Nearest neighbor depends on K . When K = 4 1.5 1 0.5 0 −0.5 −1 −1.5 −0.5 0 0.5 1 1.5 MLCC 2019 36
K-NN example K -Nearest neighbor depends on K . When K = 5 1.5 1 0.5 0 −0.5 −1 −1.5 −0.5 0 0.5 1 1.5 MLCC 2019 37
Recommend
More recommend