csi5180 machinelearningfor bioinformaticsapplications
play

CSI5180. MachineLearningfor BioinformaticsApplications Regularized - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel Turcotte Version November 6, 2019 Preamble Preamble 2/42 Preamble Regularized Linear Models In this lecture, we introduce the concept of


  1. CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel Turcotte Version November 6, 2019

  2. Preamble Preamble 2/42

  3. Preamble Regularized Linear Models In this lecture, we introduce the concept of regularization. We consider the specific context of linear models: Ridge Regression, Lasso Regression, and Elastic Net. Finally, we discuss a simple technique called early stopping. General objective : Explain the concept of regularization in the context of linear regression and logistic Preamble 3/42

  4. Learning objectives Explain the concept of regularization in the context of linear regression and logistic Reading: Simon Dirmeier, Christiane Fuchs, Nikola S Mueller, and Fabian J Theis, netReg: network-regularized linear models for biological association studies, Bioinformatics 34 (2018), no. 5, 896898. Preamble 4/42

  5. Plan 1. Preamble 2. Introduction 3. Polynomial Regression 4. Regularization 5. Logistic Regression 6. Prologue Preamble 5/42

  6. Introduction Introduction 6/42

  7. Supervised learning The data set is a collection of labelled examples. { ( x i , y i ) } N i = 1 Each x i is a feature vector with D dimensions. x ( j ) is the value of the feature j of the example k , for j ∈ 1 . . . D and k k ∈ 1 . . . N . The label y i is either a class, taken from a finite list of classes, { 1 , 2 , . . . , C } , or a real number , or a more complex object (vector, matrix, tree, graph, etc). Problem : given the data set as input, create a “ model ” that can be used to predict the value of y for an unseen x . Classification : y i ∈ { Positive , Negative } , a binary classification problem. Regression : y i is a real number. Introduction 7/42

  8. Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Introduction 8/42

  9. Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Here, θ j is the j the parameter of the (linear) model , with θ 0 being the bias term/parameter, θ 1 . . . θ D being the feature weights . Introduction 8/42

  10. Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Here, θ j is the j the parameter of the (linear) model , with θ 0 being the bias term/parameter, θ 1 . . . θ D being the feature weights . Problem: find values for all the model parameters so that the model “best fit” the training data. Introduction 8/42

  11. Linear Regression A linear model assumes that the value of the label, ˆ y i , can be expressed as a linear combination of the feature values, x ( j ) i : y i = h ( x i ) = θ 0 + θ 1 x ( 1 ) + θ 2 x ( 2 ) + . . . + θ D x ( D ) ˆ i i i Here, θ j is the j the parameter of the (linear) model , with θ 0 being the bias term/parameter, θ 1 . . . θ D being the feature weights . Problem: find values for all the model parameters so that the model “best fit” the training data. The Root Mean Square Error is a common performance measure for regression problems. � N � � 1 � � [ h ( x i ) − y i ] 2 N 1 Introduction 8/42

  12. PolynomialRegression Polynomial Regression 9/42

  13. Polynomial Regression What if the data is more complex? Polynomial Regression 10/42

  14. Polynomial Regression What if the data is more complex? In our discussion on underfitting and overfitting the training data, we did look at polynomial models, but did not discuss how to learn them. Polynomial Regression 10/42

  15. Polynomial Regression What if the data is more complex? In our discussion on underfitting and overfitting the training data, we did look at polynomial models, but did not discuss how to learn them. Can we use our linear model to “fit” non linear data , and specifically data would have been generated by a polynomial “process”? Polynomial Regression 10/42

  16. Polynomial Regression What if the data is more complex? In our discussion on underfitting and overfitting the training data, we did look at polynomial models, but did not discuss how to learn them. Can we use our linear model to “fit” non linear data , and specifically data would have been generated by a polynomial “process”? How? Polynomial Regression 10/42

  17. sklearn.preprocessing.PolynomialFeatures A surprisingly simple solution consists of generating new features that are powers of existing ones! Polynomial Regression 11/42

  18. sklearn.preprocessing.PolynomialFeatures A surprisingly simple solution consists of generating new features that are powers of existing ones! Polynomial Regression 11/42

  19. sklearn.preprocessing.PolynomialFeatures A surprisingly simple solution consists of generating new features that are powers of existing ones! from s k l e a r n . p r e p r o c e s s i n g import PolynomialFeatures p o l y _ f e a t u r e s = PolynomialFeatures ( degree =2, i n c l u d e _ b i a s=Fa l s e ) X_poly = p o l y _ f e a t u r e s . f i t _ t r a n s f o r m (X) Polynomial Regression 11/42

  20. sklearn.preprocessing.PolynomialFeatures A surprisingly simple solution consists of generating new features that are powers of existing ones! from s k l e a r n . p r e p r o c e s s i n g import PolynomialFeatures p o l y _ f e a t u r e s = PolynomialFeatures ( degree =2, i n c l u d e _ b i a s=Fa l s e ) X_poly = p o l y _ f e a t u r e s . f i t _ t r a n s f o r m (X) from s k l e a r n . linear_model import L i n e a r R e g r e s s i o n l i n _ r e g = L i n e a r R e g r e s s i o n () l i n _ r e g . f i t ( X_poly , y ) p r i n t ( l i n _ r e g . intercept_ , l i n _ r e g . coef_ ) Polynomial Regression 11/42

  21. Example fitting a linear model import numpy as np X = 2 ∗ np . random . rand (100 , 1) y = 4 + 3 ∗ X + np . random . randn (100 , 1) from s k l e a r n . linear_model import L i n e a r R e g r e s s i o n l i n _ r e g = L i n e a r R e g r e s s i o n () l i n _ r e g . f i t (X, y ) l i n _ r e g . intercept_ , l i n _ r e g . coef_ # [4.07916603] [ [ 2 . 9 0 1 7 3 9 4 9 ] ] y = 4 + 3 x + noise ˆ y = 4 . 07916603 + 2 . 90173949 x Polynomial Regression 12/42

  22. Example fitting a polynomial model import numpy as np X = 6 ∗ np . random . rand (100 , 1) − 3 y = 2 + 0.5 ∗ X ∗∗ 2 + X + np . random . randn (100 , 1) from s k l e a r n . p r e p r o c e s s i n g import PolynomialFeatures p o l y _ f e a t u r e s = PolynomialFeatures ( degree =2, i n c l u d e _ b i a s=Fa l s e ) X_poly = p o l y _ f e a t u r e s . f i t _ t r a n s f o r m (X) l i n _ r e g = L i n e a r R e g r e s s i o n () l i n _ r e g . f i t ( X_poly , y ) l i n _ r e g . intercept_ , l i n _ r e g . coef_ # [ 1 . 7 0 1 1 4 4 ] [[1.02118676 0.55725864]] y = 2 . 0 + 0 . 5 x 2 + 1 . 0 x + noise y = 1 . 701144 + 0 . 55725864 x 2 + 1 . 02118676 x ˆ Polynomial Regression 13/42

  23. Remarks For higher degrees, PolynomialFeatures adds all the combination of features. Polynomial Regression 14/42

  24. Remarks For higher degrees, PolynomialFeatures adds all the combination of features. Given two features a and b , PolynomialFeatures generates, a 2 , a 3 , b 2 , b 3 , but also ab , a 2 b , ab 2 . Polynomial Regression 14/42

  25. Remarks For higher degrees, PolynomialFeatures adds all the combination of features. Given two features a and b , PolynomialFeatures generates, a 2 , a 3 , b 2 , b 3 , but also ab , a 2 b , ab 2 . Given n features and degree d , PolynomialFeatures produces ( n + d )! d ! n ! combinations! Polynomial Regression 14/42

  26. Regularization Regularization 15/42

  27. Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Regularization 16/42

  28. Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Bias: “is due to wrong assumptions ”, “A high-bias model is most likely to underfit the training data” Regularization 16/42

  29. Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Bias: “is due to wrong assumptions ”, “A high-bias model is most likely to underfit the training data” Variance: “the model’s excessive sensitivity to small variations in the training data”. A model with many parameters “is likely to have high variance and thus overfit the training data.” Regularization 16/42

  30. Bias/Variance trade-off From [2] §4: “(. . . ) a models generalization error can be expressed as the sum of three very different errors:” Bias: “is due to wrong assumptions ”, “A high-bias model is most likely to underfit the training data” Variance: “the model’s excessive sensitivity to small variations in the training data”. A model with many parameters “is likely to have high variance and thus overfit the training data.” Irreducible error: “noisiness of the data itself” Regularization 16/42

Recommend


More recommend