Lecture 4: Logistic Regression Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of Washington
Extend linear model for classification π β’ Need a mathematic transfer function to connect πΎ 0 + Ο π=1 πΎ π π¦ π with a binary outcome π§ β’ How? β’ Logistic regression chooses to use π π π 1βπ π = πΎ 0 + Ο π=1 log πΎ π π¦ π . β’ Why?
Justification for the logistic regression model β’ It works in many applications β’ It leads to analytical tractability (in some senses) and encourages in- depth theoretical investigation β’ It has a strong tie with linear regression model. Therefore, methodologically there is much we can translate from linear regression to logistic regression. Conceptually, it inherits the aura of linear regression model and users can assume a similar degree of confidence of linear regression model onto the logistic regression model
Parameter estimation β’ The likelihood function is: 1βπ§ π . π π π π§ π 1 β π π π π π πΈ = Ο π=1 β’ We use the log-likelihood to turn products into sums: π π πΈ = Ο π=1 π§ π log π π π + 1 β π§ π log 1 β π π π . This could be further transformed into π π πΎ π π¦ ππ β Ο π=1 β log 1 + π πΎ 0 +Ο π=1 π π π πΈ = Ο π=1 π§ π πΎ 0 + Ο π=1 πΎ π π¦ ππ , Then we can have π Ο π=1 π§ π log π π π + 1 β π§ π log 1 β π π π , π π π π π = Ο π=1 β Ο π=1 log 1 β π π π π§ π log 1βπ π π , π π πΎ π π¦ ππ β Ο π=1 π β log 1 + π πΎ 0 +Ο π=1 π = Ο π=1 π§ π πΎ 0 + Ο π=1 πΎ π π¦ ππ .
Application of the Newton-Raphson algorithm β’ The Newton-Raphson algorithm is an iterative algorithm that seeks updates of the current solution using the following formula: β1 ππ πΈ π 2 π πΈ πΈ πππ₯ = πΈ πππ β ππΈ . ππΈππΈ π β’ We can show that ππ πΈ π ππΈ = Ο π=1 π π π§ π β π π π , π 2 π πΈ π π π π π ππΈππΈ π = β Ο π=1 π π π π 1 β π π π . β’ A certain structure can then be revealed if we rewrite it in matrix form: ππ πΈ ππΈ = π π π β π , π 2 π πΈ ππΈππΈ π = βπ π ππ . where π is the π Γ π + 1 input matrix, π is the π Γ 1 column vector of π§ π , π is the π Γ 1 column vector of π π π , and π is a π Γ π diagonal matrix of weights with the n th diagonal element as π π π 1 β π π π .
The updating rule Plugging this into the updating formula of the Newton-Raphson algorithm, β1 ππ πΈ π 2 π πΈ πΈ πππ₯ = πΈ πππ β ππΈ , ππΈππΈ π we can derive that β1 π π π π β π , πΈ πππ₯ = πΈ πππ + π π ππ β1 π π π ππΈ πππ + π β1 π β π = π π ππ , β1 π π ππ , = π π ππ where π΄ = ππΈ πππ + π β1 π β π .
Another look at the updating rule β’ This resembles the generalized least squares (GLS) estimator of a regression model, where each data point π π , π§ π is associated with a weight π₯ π to reduce the influence of potential outliers in fitting the regression model. πΈ πππ₯ β΅ arg min π β ππΈ π π π β ππΈ . πΈ β’ For this reason, this algorithm is also called the Iteratively Reweighted Least Square or IRLS algorithm. π is referred as the adjusted response. β’ Why the weighting makes sense? Or, what are the implications of this?
A summary of the IRLS algorithm Putting all these together, a complete flow of the IRLS is shown in below: β’ Initialize πΈ . 1 β’ Compute π by its definition: π π π = πΎππ¦ππ for π = 1,2, β¦ , π . π 1+π β πΎ0+Ο π=1 β’ Compute the diagonal matrix π , while the n th diagonal element as π π π 1 β π π π for π = 1,2, β¦ , π . β’ Set π as = ππΈ + π β1 π β π . β1 π π ππ . β’ Set πΈ as π π ππ β’ If the stopping criteria is met, stop; otherwise go back to step 2.
R lab β’ Download the markdown code from course website β’ Conduct the experiments β’ Interpret the results β’ Repeat the analysis on other datasets
Recommend
More recommend