applied machine learning applied machine learning
play

Applied Machine Learning Applied Machine Learning Linear Regression - PowerPoint PPT Presentation

Applied Machine Learning Applied Machine Learning Linear Regression Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1 Learning objectives Learning objectives linear model evaluation criteria how to find


  1. Applied Machine Learning Applied Machine Learning Linear Regression Siamak Ravanbakhsh Siamak Ravanbakhsh COMP 551 COMP 551 (winter 2020) (winter 2020) 1

  2. Learning objectives Learning objectives linear model evaluation criteria how to find the best fit geometric interpretation 2

  3. Motivation Motivation History: method of least squares was invented by Legendre and Gauss (1800's) Gauss at age 24 used it to predict the future location of Ceres (largest astroid in the astroid belt) 3 . 1 source: http://chrisauld.com/2012/10/07/what-do-we-know-about-the-effect-of-income-inequality-on-health/

  4. Motivation Motivation History: method of least squares was invented by Legendre and Gauss (1800's) Gauss at age 24 used it to predict the future location of Ceres (largest astroid in the astroid belt) effect of income inequality on health and social problems 3 . 1 source: http://chrisauld.com/2012/10/07/what-do-we-know-about-the-effect-of-income-inequality-on-health/

  5. Motivation (?) Motivation (?) 3 . 2 Winter 2020 | Applied Machine Learning (COMP551)

  6. Representing data Representing data ( n ) R D ∈ each instance: x ( n ) R ∈ y 4 . 1

  7. Representing data Representing data one instance ( n ) R D ∈ each instance: x ( n ) R ∈ y 4 . 1

  8. Representing data Representing data one instance ( n ) R D ∈ each instance: x ( n ) R ∈ y ⎡ x ⎤ 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ x 2 ⊤ ⎢ ⎥ vectors are assume to be column vectors x = = , , … , [ x D ] x x 1 2 ⋮ ⎣ D ⎦ x 4 . 1

  9. Representing data Representing data one instance ( n ) R D ∈ each instance: x ( n ) R ∈ y ⎡ x ⎤ a feature 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ x 2 ⊤ ⎢ ⎥ vectors are assume to be column vectors x = = , , … , [ x D ] x x 1 2 ⋮ ⎣ D ⎦ x 4 . 1

  10. Representing data Representing data one instance ( n ) R D ∈ each instance: x ( n ) R ∈ y ⎡ x ⎤ a feature 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ x 2 ⊤ ⎢ ⎥ vectors are assume to be column vectors x = = , , … , [ x D ] x x 1 2 ⋮ ⎣ D ⎦ x we assume N instances in the dataset D = {( x ( n ) ( n , y )} N n =1 each instance has D features indexed by d for example, is the feature d of instance n ( n ) R ∈ x d 4 . 1

  11. Representing data Representing data design matrix: concatenate all instances each row is a datapoint, each column is a feature ⎡ x (1)⊤ ⎤ ⎢ ⎢ ⎥ ⎥ x (2)⊤ ⎢ ⎥ ⎢ ⎥ X = ⎢ ⎥ ⋮ ⎣ x ( N )⊤ ⎦ 4 . 2

  12. Representing data Representing data design matrix: concatenate all instances each row is a datapoint, each column is a feature ⎡ x ⎤ ⎡ x (1)⊤ ⎤ (1) (1) (1) , , ⋯ , x x ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ 1 2 D x (2)⊤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = X = ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎣ ( N ) ⎦ ⋮ ⎣ x ( N )⊤ ⎦ ( N ) ( N ) , , ⋯ , x x x 1 2 D 4 . 2

  13. Representing data Representing data design matrix: concatenate all instances each row is a datapoint, each column is a feature ⎡ x ⎤ ⎡ x (1)⊤ ⎤ (1) (1) (1) , , ⋯ , one instance x x ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ 1 2 D x (2)⊤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = X = ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎣ ( N ) ⎦ ⋮ ⎣ x ( N )⊤ ⎦ ( N ) ( N ) , , ⋯ , x x x 1 2 D 4 . 2

  14. Representing data Representing data design matrix: concatenate all instances each row is a datapoint, each column is a feature ⎡ x ⎤ ⎡ x (1)⊤ ⎤ (1) (1) (1) , , ⋯ , one instance x x ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ 1 2 D x (2)⊤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = X = ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎣ ( N ) ⎦ ⋮ ⎣ x ( N )⊤ ⎦ ( N ) ( N ) , , ⋯ , x x x 1 2 D one feature 4 . 2

  15. Representing data Representing data design matrix: concatenate all instances each row is a datapoint, each column is a feature ⎡ x ⎤ ⎡ x (1)⊤ ⎤ (1) (1) (1) , , ⋯ , one instance x x ⎢ ⎢ ⎥ ⎥ ⎢ ⎥ 1 2 D x (2)⊤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = ∈ R N × D X = ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎣ ( N ) ⎦ ⋮ ⎣ x ( N )⊤ ⎦ ( N ) ( N ) , , ⋯ , x x x 1 2 D one feature 4 . 2

  16. Representing data Representing data Example: Micro array data (X), contains gene expression levels labels (y) can be {cancer/no cancer} label for each patient patient (n) ∈ R N × D gene (d) 4 . 3 Winter 2020 | Applied Machine Learning (COMP551)

  17. Linear model Linear model R R : → D assuming a scalar output f w will generalize to a vector later ( x ) = + + … + w f w w x x 0 1 1 w D D 5

  18. Linear model Linear model R R : → D assuming a scalar output f w will generalize to a vector later ( x ) = + + … + w f w w x x 0 1 1 w D D model parameters or weights 5

  19. Linear model Linear model R R : → D assuming a scalar output f w will generalize to a vector later ( x ) = + + … + w f w w x x 0 1 1 w D D model parameters or weights bias or intercept 5

  20. Linear model Linear model R R : → D assuming a scalar output f w will generalize to a vector later ( x ) = + + … + w f w w x x 0 1 1 w D D model parameters or weights bias or intercept simplification D ⊤ concatenate a 1 to x x = [1, x , … , x ] 1 ⊤ ( x ) = f w x w 5

  21. Linear model Linear model R R : → D assuming a scalar output f w will generalize to a vector later ( x ) = + + … + w f w w x x 0 1 1 w D D model parameters or weights bias or intercept yh_n = np.dot(w,x) simplification D ⊤ concatenate a 1 to x x = [1, x , … , x ] 1 ⊤ ( x ) = f w x w 5

  22. Loss function Loss function ( n ) ( n ) , y ∀ n objective: find parameters to fit the data x 6 . 1

  23. Loss function Loss function ( n ) ( n ) , y ∀ n objective: find parameters to fit the data x ( n ) y ( n ) ( x ) ≈ f w 6 . 1

  24. Loss function Loss function ( n ) ( n ) , y ∀ n objective: find parameters to fit the data x ( n ) y ( n ) ( x ) ≈ f w and minimize a measure of difference between ^ ( n ) ( n ) y ( n ) = ( x ) y f w 6 . 1

  25. Loss function Loss function ( n ) ( n ) , y ∀ n objective: find parameters to fit the data x ( n ) y ( n ) ( x ) ≈ f w and minimize a measure of difference between ^ ( n ) ( n ) y ( n ) = ( x ) y f w ) ≜ 1 ^ 2 L ( y , ^ ( y − ) square error loss (a.k.a. L2 loss) y y 2 for a single instance (a function of labels) 6 . 1

  26. Loss function Loss function ( n ) ( n ) , y ∀ n objective: find parameters to fit the data x ( n ) y ( n ) ( x ) ≈ f w and minimize a measure of difference between ^ ( n ) ( n ) y ( n ) = ( x ) y f w ) ≜ 1 ^ 2 L ( y , ^ ( y − ) square error loss (a.k.a. L2 loss) y y 2 for a single instance (a function of labels) for future convenience 6 . 1

  27. Loss function Loss function ( n ) ( n ) , y ∀ n objective: find parameters to fit the data x ( n ) y ( n ) ( x ) ≈ f w and minimize a measure of difference between ^ ( n ) ( n ) y ( n ) = ( x ) y f w ) ≜ 1 ^ 2 L ( y , ^ ( y − ) square error loss (a.k.a. L2 loss) y y 2 for a single instance (a function of labels) for future convenience sum of squared errors cost function 2 ( y ) 1 ∑ n =1 N ( n ) ⊤ ( n ) J ( w ) = − w x 2 6 . 1

  28. Loss function Loss function ( n ) ( n ) , y ∀ n objective: find parameters to fit the data x ( n ) y ( n ) ( x ) ≈ f w and minimize a measure of difference between ^ ( n ) ( n ) y ( n ) = ( x ) y f w ) ≜ 1 ^ 2 L ( y , ^ ( y − ) square error loss (a.k.a. L2 loss) y y 2 for a single instance (a function of labels) for future convenience versus for the whole dataset sum of squared errors cost function 2 ( y ) 1 ∑ n =1 N ( n ) ⊤ ( n ) J ( w ) = − w x 2 6 . 1

  29. Example Example (D = 1) (D = 1) +bias (D=2)! 6 . 2

  30. Example Example (D = 1) (D = 1) +bias (D=2)! y x = [ x ] 1 6 . 2

  31. Example Example (D = 1) (D = 1) +bias (D=2)! y (1) (1) ( x , y ) (2) (2) ( x , y ) (3) (3) ( x , y ) (4) (4) ( x , y ) x = [ x ] 1 6 . 2

  32. Example Example (D = 1) (D = 1) +bias (D=2)! y (1) (1) ( x , y ) ∗ ∗ ( x ) = + f w w x w ∗ 0 1 (2) (2) ( x , y ) (3) (3) ( x , y ) (4) (4) ( x , y ) ∗ w 0 x = [ x ] 1 6 . 2

  33. Example Example (D = 1) (D = 1) +bias (D=2)! y (1) (1) ( x , y ) ∗ ∗ ( x ) = + f w w x w ∗ 0 1 (2) (2) ( x , y ) (3) (3) − f ( x ) y (3) (3) ( x , y ) (4) (4) ( x , y ) ∗ w 0 x = [ x ] 1 6 . 2

  34. Example Example (D = 1) (D = 1) +bias (D=2)! y (1) (1) ( x , y ) ∗ ∗ ( x ) = + f w w x w ∗ 0 1 (2) (2) ( x , y ) (3) (3) − f ( x ) y (3) (3) ( x , y ) (4) (4) ( x , y ) ∗ w 0 x = [ x ] 1 2 ( y ) Linear Least Squares ( n ) ( n ) min − w ∑ n T w x 6 . 2

  35. Example Example (D=2) (D=2) +bias (D=3)! y ∗ ∗ ∗ ( x ) = + + f w w x w x w ∗ 1 2 0 1 2 ∗ x w 2 0 x 1 2 ( y ) Linear Least Squares ∗ ( n ) ( n ) w = arg min − w ∑ n T w x 6 . 3 Winter 2020 | Applied Machine Learning (COMP551)

  36. Matrix form Matrix form ^ ( n ) ⊤ ( n ) = instead of y w x ∈ R 1 × D D × 1 7

  37. Matrix form Matrix form ^ ( n ) ⊤ ( n ) = instead of y w x ∈ R 1 × D D × 1 ^ = y Xw use design matrix to write N × 1 D × 1 N × D 7

Recommend


More recommend