Machine Learning for Computational Linguistics Classifjcation Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft May 3, 2016
Practical matters Classifjcation Logistic Regression More than two classes Practical issues libraries (like NLTK) end of May. Ç. Çöltekin, SfS / University of Tübingen May 3, 2016 1 / 23 ▶ Homework 1: try to program it without help from specialized ▶ Time to think about projects. A short proposal towards the
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, good idea here label of an unknown label. In the example: 2 / 23 Logistic Regression More than two classes The problem x 2 + ▶ The response (outcome) is a + + positive + or negative − − + − + ? ▶ Given the features ( x 1 and − + x 2 ), we want to predict the − − instance ? − − ▶ Note: regression is not a x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, 3 / 23 The problem (with a single predictor) Logistic Regression More than two classes y + + + + + + 1 − − − − − − 0 x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, 4 / 23 A quick survey of some solutions Decision trees Logistic Regression More than two classes x 2 x 2 < a 2 + + + yes n − o + − + ? − − x 1 < a 1 + a 2 − − s no e − y − + − a 1 x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, regression non-parametric neighbors the instances 5 / 23 Logistic Regression More than two classes A quick survey of some solutions Instance/memory based methods x 2 ▶ No training: just memorize + + + − + − ▶ During test time, decide + based on the k nearest ? − + − − ▶ Like decision trees, kNN is − − ▶ It can also be used for x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, instances predict the label of unknown Use the discriminant to defjnition of ‘best’) training instance best (for a 6 / 23 (Linear) discriminant functions Logistic Regression More than two classes A quick survey of some solutions x 2 ▶ Find a discriminant function ( f ) that separates the + + + − + − + − + − − − − x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, instances predict the label of unknown defjnition of ‘best’) training instance best (for a 6 / 23 Logistic Regression More than two classes A quick survey of some solutions (Linear) discriminant functions x 2 ▶ Find a discriminant function ( f ) that separates the + + + − + − + ? − ▶ Use the discriminant to + − − − { − + f ( x ) > 0 y = ˆ − f ( x ) < 0 x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, training data 7 / 23 Logistic Regression More than two classes A quick survey of some solutions Probability-based solutions x 2 + + + ▶ Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the − + − − ▶ Assign the new items to the − class c with the highest − p ( x | y = c ) x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, training data 7 / 23 Logistic Regression More than two classes A quick survey of some solutions Probability-based solutions x 2 + + + ▶ Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the ? − + − − ▶ Assign the new items to the − class c with the highest − p ( x | y = c ) x 1
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, 8 / 23 More than two classes Artifjcial neural networks A quick survey of some solutions Logistic Regression x 2 + + + − + − + x 1 ? − + y − − x 2 − − x 1
Practical matters Classifjcation Logistic Regression More than two classes Logistic regression regression. It is a member of the family of models called generalized linear models Ç. Çöltekin, SfS / University of Tübingen May 3, 2016 9 / 23 ▶ Logistic regression is a classifjcation method ▶ In logistic regression, we fjt a model that predicts P ( y | x ) ▶ Alternatively, logistic regression is an extension of linear
Practical matters 1 May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, * The research question is from a real study by Ben Maasen and his colleagues. Data is fake as usual. . . . . . . 1 62 22 Classifjcation 0 82 Dyslexia Test score simplifjed problem: not based on a test applied to pre-verbal children. Here is a We would like to guess whether a child would develop dyslexia or A simple example More than two classes Logistic Regression 10 / 23 ▶ We test children when they are less than 2 years of age. ▶ We want to predict the diagnosis from the test score ▶ The data looks like
Practical matters 1 May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, distributed normally for correct predictions are not bounded Problems: P(dyslexia|score) Test score Classifjcation 2 0 -1 100 80 60 40 20 0 Example: fjtting ordinary least squares regression More than two classes Logistic Regression 11 / 23 ▶ The probability values between 0 and 1 ▶ Residuals will be large ▶ Residuals are not
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, function, with some ambiguity). 12 / 23 More than two classes Logistic Regression Example: transforming the output variable Instead of predicting the probability p , we predict logit(p) p y = logit ( p ) = log 1 − p = w 0 + w 1 x ˆ p 1 − p (odds) is bounded between 0 and ∞ ▶ p ▶ log 1 − p (log odds) is bounded between − ∞ and ∞ ▶ we can estimate logit ( p ) with regression, and convert it to a probability using the inverse of logit e w 0 + w 1 x 1 p = ˆ 1 + e w 0 + w 1 x = 1 + e − w 0 − w 1 x which is called logistic function (or sometimes sigmoid
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, logit(p) p 4 2 0 -2 -4 1.0 0.8 0.6 0.4 0.2 0.0 Logit function More than two classes Logistic Regression 13 / 23 p logit ( p ) = log 1 − p
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, logistic(x) x 1.0 0.8 0.6 0.4 0.2 0.0 4 2 0 -2 -4 Logit function More than two classes Logistic Regression 14 / 23 1 logistic ( x ) = 1 − e − x
Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, distributed normally distributed binomially family 15 / 23 (GLM). GLMs are expressed with, Logistic regression is a special case of generalized linear models Logistic regression as a generalized linear model More than two classes Logistic Regression g ( y ) = Xw + ϵ ▶ The function g () is called the link function ▶ ϵ is distributed according to a distribution from exponential ▶ For logistic regression, g () is the logit function, ϵ is ▶ For linear regression g () is the identity function, ϵ is
Practical matters 0.04493 -3.225 0.00126 ** May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, Number of Fisher Scoring iterations: 5 AIC: 34.337 Residual deviance: 30.337 on 38 degrees of freedom Null deviance: 54.548 on 39 degrees of freedom (Dispersion parameter for binomial family taken to be 1) --- -0.14491 Classifjcation score 2.978 0.00290 ** 2.31737 (Intercept) 6.90079 Estimate Std. Error z value Pr(>|z|) Coefficients: glm(formula = diag ~ score, family = binomial, data = dys) Interpreting the dyslexia example More than two classes Logistic Regression 16 / 23
Practical matters 100 May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, P(dyslexia|score) Test score 1 Classifjcation 0 80 60 40 20 0 Interpreting the dyslexia example More than two classes Logistic Regression 17 / 23 1 logit ( p ) = 6 . 9 − 0 . 14x p = 1 + e − 6 . 9 + 0 . 14x
Practical matters The likelihood of the training set is, May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, To maximize, we fjnd the gradient: Classifjcation 18 / 23 Reminder: How to fjt a logistic regression model More than two classes Logistic Regression e − wx 1 P ( y = 1 | x ) = p = P ( y = 0 | x ) = 1 − p = 1 + e − wx 1 + e − wx ∏ ∏ p y i ( 1 − p ) 1 − y i L ( w ) = P ( y i | x i ) = i i In practice, maximizing log likelihood is more practical: ∑ ∑ w = arg max ˆ log L ( w ) = P ( y i | x i ) = y i log p +( 1 − y i ) log ( 1 − p ) w i i ∑ 1 ∇ log L ( w ) = ( y i − 1 + e − wx ) x i i
Recommend
More recommend