regression
play

Regression Instructor: Prof. Shuai Huang Industrial and Systems - PowerPoint PPT Presentation

Lecture 4: Logistic Regression Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of Washington Extend linear model for classification Need a mathematic transfer function to connect 0 + =1


  1. Lecture 4: Logistic Regression Instructor: Prof. Shuai Huang Industrial and Systems Engineering University of Washington

  2. Extend linear model for classification π‘ž β€’ Need a mathematic transfer function to connect 𝛾 0 + Οƒ 𝑗=1 𝛾 𝑗 𝑦 𝑗 with a binary outcome 𝑧 β€’ How? β€’ Logistic regression chooses to use π‘ž π’š π‘ž 1βˆ’π‘ž π’š = 𝛾 0 + Οƒ 𝑗=1 log 𝛾 𝑗 𝑦 𝑗 . β€’ Why?

  3. Justification for the logistic regression model β€’ It works in many applications β€’ It leads to analytical tractability (in some senses) and encourages in- depth theoretical investigation β€’ It has a strong tie with linear regression model. Therefore, methodologically there is much we can translate from linear regression to logistic regression. Conceptually, it inherits the aura of linear regression model and users can assume a similar degree of confidence of linear regression model onto the logistic regression model

  4. Parameter estimation β€’ The likelihood function is: 1βˆ’π‘§ π‘œ . π‘ž π’š π‘œ 𝑧 π‘œ 1 βˆ’ π‘ž π’š π‘œ 𝑂 𝑀 𝜸 = Ο‚ π‘œ=1 β€’ We use the log-likelihood to turn products into sums: 𝑂 π‘š 𝜸 = Οƒ π‘œ=1 𝑧 π‘œ log π‘ž π’š π‘œ + 1 βˆ’ 𝑧 π‘œ log 1 βˆ’ π‘ž π’š π‘œ . This could be further transformed into π‘ž π‘ž 𝛾 𝑗 𝑦 π‘œπ‘— βˆ’ Οƒ π‘œ=1 βˆ’ log 1 + 𝑓 𝛾 0 +Οƒ 𝑗=1 𝑂 𝑂 π‘š 𝜸 = Οƒ π‘œ=1 𝑧 π‘œ 𝛾 0 + Οƒ 𝑗=1 𝛾 𝑗 𝑦 π‘œπ‘— , Then we can have 𝑂 Οƒ π‘œ=1 𝑧 π‘œ log π‘ž π’š π‘œ + 1 βˆ’ 𝑧 π‘œ log 1 βˆ’ π‘ž π’š π‘œ , π‘ž π’š π‘œ 𝑂 𝑂 = Οƒ π‘œ=1 βˆ’ Οƒ π‘œ=1 log 1 βˆ’ π‘ž π’š π‘œ 𝑧 π‘œ log 1βˆ’π‘ž π’š π‘œ , π‘ž π‘ž 𝛾 𝑗 𝑦 π‘œπ‘— βˆ’ Οƒ π‘œ=1 𝑂 βˆ’ log 1 + 𝑓 𝛾 0 +Οƒ 𝑗=1 𝑂 = Οƒ π‘œ=1 𝑧 π‘œ 𝛾 0 + Οƒ 𝑗=1 𝛾 𝑗 𝑦 π‘œπ‘— .

  5. Application of the Newton-Raphson algorithm β€’ The Newton-Raphson algorithm is an iterative algorithm that seeks updates of the current solution using the following formula: βˆ’1 πœ–π‘š 𝜸 πœ– 2 π‘š 𝜸 𝜸 π‘œπ‘“π‘₯ = 𝜸 π‘π‘šπ‘’ βˆ’ πœ–πœΈ . πœ–πœΈπœ–πœΈ π‘ˆ β€’ We can show that πœ–π‘š 𝜸 𝑂 πœ–πœΈ = Οƒ π‘œ=1 π’š π‘œ 𝑧 π‘œ βˆ’ π‘ž π’š π‘œ , πœ– 2 π‘š 𝜸 𝑂 π‘ˆ π‘ž π’š π‘œ πœ–πœΈπœ–πœΈ π‘ˆ = βˆ’ Οƒ π‘œ=1 π’š π‘œ π’š π‘œ 1 βˆ’ π‘ž π’š π‘œ . β€’ A certain structure can then be revealed if we rewrite it in matrix form: πœ–π‘š 𝜸 πœ–πœΈ = 𝐘 π‘ˆ 𝒛 βˆ’ 𝒒 , πœ– 2 π‘š 𝜸 πœ–πœΈπœ–πœΈ π‘ˆ = βˆ’π˜ π‘ˆ π—π˜ . where 𝐘 is the 𝑂 Γ— π‘ž + 1 input matrix, 𝒛 is the 𝑂 Γ— 1 column vector of 𝑧 𝑗 , 𝒒 is the 𝑂 Γ— 1 column vector of π‘ž π’š π‘œ , and 𝐗 is a 𝑂 Γ— 𝑂 diagonal matrix of weights with the n th diagonal element as π‘ž π’š π‘œ 1 βˆ’ π‘ž π’š π‘œ .

  6. The updating rule Plugging this into the updating formula of the Newton-Raphson algorithm, βˆ’1 πœ–π‘š 𝜸 πœ– 2 π‘š 𝜸 𝜸 π‘œπ‘“π‘₯ = 𝜸 π‘π‘šπ‘’ βˆ’ πœ–πœΈ , πœ–πœΈπœ–πœΈ π‘ˆ we can derive that βˆ’1 𝐘 π‘ˆ 𝐗 𝒛 βˆ’ 𝒒 , 𝜸 π‘œπ‘“π‘₯ = 𝜸 π‘π‘šπ‘’ + 𝐘 π‘ˆ π—π˜ βˆ’1 𝐘 π‘ˆ 𝐗 𝐘𝜸 π‘π‘šπ‘’ + 𝐗 βˆ’1 𝒛 βˆ’ 𝒒 = 𝐘 π‘ˆ π—π˜ , βˆ’1 𝐘 π‘ˆ π—π’œ , = 𝐘 π‘ˆ π—π˜ where 𝐴 = 𝐘𝜸 π‘π‘šπ‘’ + 𝐗 βˆ’1 𝒛 βˆ’ 𝒒 .

  7. Another look at the updating rule β€’ This resembles the generalized least squares (GLS) estimator of a regression model, where each data point π’š π‘œ , 𝑧 π‘œ is associated with a weight π‘₯ π‘œ to reduce the influence of potential outliers in fitting the regression model. 𝜸 π‘œπ‘“π‘₯ ⟡ arg min π’œ βˆ’ 𝐘𝜸 π‘ˆ 𝐗 π’œ βˆ’ 𝐘𝜸 . 𝜸 β€’ For this reason, this algorithm is also called the Iteratively Reweighted Least Square or IRLS algorithm. π’œ is referred as the adjusted response. β€’ Why the weighting makes sense? Or, what are the implications of this?

  8. A summary of the IRLS algorithm Putting all these together, a complete flow of the IRLS is shown in below: β€’ Initialize 𝜸 . 1 β€’ Compute 𝒒 by its definition: π‘ž π’š π‘œ = π›Ύπ‘—π‘¦π‘œπ‘— for π‘œ = 1,2, … , 𝑂 . π‘ž 1+𝑓 βˆ’ 𝛾0+Οƒ 𝑗=1 β€’ Compute the diagonal matrix 𝐗 , while the n th diagonal element as π‘ž π’š π‘œ 1 βˆ’ π‘ž π’š π‘œ for π‘œ = 1,2, … , 𝑂 . β€’ Set π’œ as = 𝐘𝜸 + 𝐗 βˆ’1 𝒛 βˆ’ 𝒒 . βˆ’1 𝐘 π‘ˆ π—π’œ . β€’ Set 𝜸 as 𝐘 π‘ˆ π—π˜ β€’ If the stopping criteria is met, stop; otherwise go back to step 2.

  9. R lab β€’ Download the markdown code from course website β€’ Conduct the experiments β€’ Interpret the results β€’ Repeat the analysis on other datasets

Recommend


More recommend