Logistic Regression CS60010: Deep Learning Abir Das IIT Kharagpur Jan 22, 23 and 24, 2020
Logistics Agenda Linear Regression Logistic Regression Some Logistics Related Information § This Friday (Jan 24), no paper will be presented. It will be a regular lecture. § The first surprise quiz is today!! Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 2 / 35
Logistics Agenda Linear Regression Logistic Regression Surprise Quiz 1 § The duration of the test is 10 minutes. § Question 1: Find the eigenvalues of the following matrix A . Clearly mention if you are making any assumption. [2 Marks] 2 0 0 1 3 0 − 1 0 1 § Question 2: Consider the half-space given by the set of points S = { x ∈ R d | a T x ≤ b } . Prove that the halfspace is convex. [3 Marks] Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 3 / 35
Logistics Agenda Linear Regression Logistic Regression Surprise Quiz 1: Answer Keys § Question 1: Find the eigenvalues of the following matrix A . Clearly mention if you are making any assumption. 2 0 0 1 3 0 − 1 0 1 Use the property of eigenvalues of a triangular matrix. § Question 2: Consider the half-space given by the set of points S = { x ∈ R d | a T x ≤ b } . Prove that the halfspace is convex. : If x , y belong to S , then a T x ≤ b and a T y ≤ b . Now, for 0 ≤ θ ≤ 1 , a T { θ x + (1 − θ ) y } = θ a T x + (1 − θ ) a T y ≤ θb + (1 − θ ) b = b Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 4 / 35
Logistics Agenda Linear Regression Logistic Regression Agenda § Understand regression and classification with linear models. § Brush-up concepts of maximum likelihood and its use to understand linear regression. § Using logistic function for binary classification and estimating logistic regression parameters. Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 5 / 35
Logistics Agenda Linear Regression Logistic Regression Resources § The Elements of Statistical Learning by T Hastie, R Tibshirani, J Friedman. [Link] [Chapter 3 and 4] § Artificial Intelligence: A Modern Approach by S Russell and P Norvig. [Link] [Chapter 18] Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 6 / 35
Logistics Agenda Linear Regression Logistic Regression Linear Regression § In a regression problem we want to find the relation between some input variables x and output variables y , where x ∈ R d and y ∈ R . § Inputs are also often referred to as covariates, predictors and features; while outputs are known as variates, targets and labels. § Examples of such input-output pairs can be ◮ { Outside temperature, People inside classroom, target room temperature | Energy requirement } ◮ { Size, Number of Bedrooms, Number of Floors, Age of the Home | Price } Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 7 / 35
Logistics Agenda Linear Regression Logistic Regression Linear Regression § In a regression problem we want to find the relation between some input variables x and output variables y , where x ∈ R d and y ∈ R . § Inputs are also often referred to as covariates, predictors and features; while outputs are known as variates, targets and labels. § Examples of such input-output pairs can be ◮ { Outside temperature, People inside classroom, target room temperature | Energy requirement } ◮ { Size, Number of Bedrooms, Number of Floors, Age of the Home | Price } § We have a set of N observations of y as { y 1 , y 2 , · · · , y N } and the corresponding input variables { x 1 , x 2 , · · · , x N } . 𝑧 (#) 𝒚 (#) Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 7 / 35
Logistics Agenda Linear Regression Logistic Regression Linear Regression § The input and output variables are assumed to be related via a relation, known as hypothesis. � y = h θ ( x ) , where θ is the parameter vector. y ∗ = f ( x ∗ ) for an arbitrary § The goal is to predict the output variable � value of the input variable x ∗ . § Let us start with scalar inputs ( x ) and scalar outputs ( y ). Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 8 / 35
Logistics Agenda Linear Regression Logistic Regression Univariate Linear Regression § hypothesis: h θ ( x ) = θ 0 + θ 1 x . § Cost Function: Sum of squared errors. 𝑧 (#) N � � h θ ( x ( i ) ) − y ( i ) � 2 1 J ( θ 0 , θ 1 ) = 2 N i =1 𝒚 (#) § Optimization objective: find model parameters ( θ 0 , θ 1 ) that will minimize the sum of squared errors. Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 9 / 35
Logistics Agenda Linear Regression Logistic Regression Univariate Linear Regression § hypothesis: h θ ( x ) = θ 0 + θ 1 x . § Cost Function: Sum of squared errors. 𝑧 (#) N � � h θ ( x ( i ) ) − y ( i ) � 2 1 J ( θ 0 , θ 1 ) = 2 N i =1 𝒚 (#) § Optimization objective: find model parameters ( θ 0 , θ 1 ) that will minimize the sum of squared errors. § Gradient of the cost function w.r.t. θ 0 : � N � h θ ( x ( i ) ) − y ( i ) � J ( θ 0 , θ 1 ) = 1 θ 0 N i =1 § Gradient of the cost function w.r.t. θ 1 : � N � h θ ( x ( i ) ) − y ( i ) � J ( θ 0 , θ 1 ) = 1 x ( i ) θ 1 N i =1 § Apply your favorite gradient based optimization algorithm. Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 9 / 35
Logistics Agenda Linear Regression Logistic Regression Univariate Linear Regression § These being linear equations of θ , have a unique closed form solution too. � N x ( i ) �� N y ( i ) � � N � � y ( i ) x ( i ) − N i =1 i =1 i =1 θ 1 = � x ( i ) � 2 − � N x ( i ) � 2 � N � N i =1 i =1 � N N � � x ( i ) � θ 0 = 1 y ( i ) − θ 1 N i =1 i =1 Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 10 / 35
Logistics Agenda Linear Regression Logistic Regression Multivariate Linear Regression § We can easily extend to multivariate linear regression problems, where x ∈ R d § hypothesis: h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + · · · + θ d x d . For convenience of notation, define x 0 = 1 . § Thus h is simply the dot product of the parameters and the input vector. h θ ( x ) = θ T x § Cost Function: Sum of squared errors. N � � θ T x ( i ) − y ( i ) � 2 1 J ( θ ) = J ( θ 0 , θ 1 , · · · , θ d ) = (1) 2 N i =1 § We will use the following to write the cost function in a compact matrix vector notation h θ ( x ) = θ T x = x T θ Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 11 / 35
Logistics Agenda Linear Regression Logistic Regression Multivariate Linear Regression θ 0 x (1) x (1) x (1) x (1) y (1) h θ ( x ( 1 ) ) · · · � 0 1 2 d θ 1 x (2) x (2) x (2) x (2) y (2) h θ ( x ( 2 ) ) � · · · 0 1 2 θ 2 d = = (2) . . . . . ... . . . . . . . . . . . . . y ( N ) h θ ( x ( N ) ) x ( N ) x ( N ) x ( N ) x ( N ) � · · · θ d 0 1 2 d � y = X θ Here, X is a N × ( d + 1) matrix with each row an input vector. � y is a N length vector of the outputs in the training set. Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 12 / 35
Logistics Agenda Linear Regression Logistic Regression Multivariate Linear Regression § Eqn. (1), gives, N N � � � θ T x ( i ) − y ( i ) � 2 = � y ( i ) − y ( i ) � 2 1 1 J ( θ ) = � (3) 2 N 2 N i =1 i =1 �� � T �� � 1 1 y − y || 2 = 2 N || � 2 = y − y y − y 2 N � � T � � � θ T � � � 1 1 X T X θ − θ T X T y − y T X θ + y T y = X θ − y X θ − y = 2 N 2 N � θ T � � � � T θ − � � T θ + y T y � 1 X T X X T y X T y = θ − 2 N � θ T � � � � T θ + y T y � 1 X T X X T y = θ − 2 2 N Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 13 / 35
Logistics Agenda Linear Regression Logistic Regression Multivariate Linear Regression § Equating the gradient of the cost function to 0, � � 1 2 X T X θ − 2 X T y + 0 ∇ θ J ( θ ) = = 0 2 N X T X θ − X T y = 0 � � − 1 X T y X T X θ = (4) Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 14 / 35
Logistics Agenda Linear Regression Logistic Regression Multivariate Linear Regression § Equating the gradient of the cost function to 0, � � 1 2 X T X θ − 2 X T y + 0 ∇ θ J ( θ ) = = 0 2 N X T X θ − X T y = 0 � � − 1 X T y X T X θ = (4) § This gives a closed form solution, but another option is to use iterative solution (just like the univariate case). � N � h θ ( x ( i ) ) − y ( i ) � ∂J ( θ ) = 1 x ( i ) j ∂θ j N i =1 Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 14 / 35
Logistics Agenda Linear Regression Logistic Regression Multivariate Linear Regression § Iterative Gradient Descent needs to perform many iterations and need to choose a stepsize parameter judiciously. But it works equally well even if the number of features ( d ) is large. § For the least square solution, there is no need to choose the step size � � − 1 can be X T X parameter or no need to iterate. But, evaluating slow if d is large. Abir Das (IIT Kharagpur) CS60010 Jan 22, 23 and 24, 2020 15 / 35
Recommend
More recommend