Statistical Natural Language Processing Classifjcation Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2019
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, As opposed to regression the outcome is a ‘category’. 1 / 40 When/why do we do classifjcation Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • Is a given email spam or not? • What is the gender of the author of a document? • Is a product review positive or negative? • Who is the author of a document? • What is the subject of an article? • …
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, As opposed to regression the outcome is a ‘category’. 1 / 40 When/why do we do classifjcation Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • Is a given email spam or not? • What is the gender of the author of a document? • Is a product review positive or negative? • Who is the author of a document? • What is the subject of an article? • …
Introduction Train a model to predict Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron same distribution future data points from the 2 / 40 with (categorical) labels The task Evaluation More methods Logistic Regression Naive Bayes Multi-class strategies − − − • Given a set of training data − x 2 + + + + x 1
Introduction future data points from the Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, ? Perceptron same distribution 2 / 40 with (categorical) labels Multi-class strategies The task Evaluation More methods Logistic Regression Naive Bayes − − − • Given a set of training data − x 2 • Train a model to predict + + + + x 1
Introduction Outline Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron 3 / 40 Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • Perceptron • Logistic regression • Naive Bayes • Multi-class strategies for binary classifjers • Evaluation metrics for classifjcation • Brief notes on what we skipped
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, to one is often used (called bias in ANN literature) which is always set Similar to the intercept in linear models, an additional input otherwise if where 4 / 40 . Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation . The perceptron . ( n ) ∑ x 1 y = f w i x i i w 1 x 2 y w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n
Introduction . Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, to one is often used (called bias in ANN literature) otherwise if where Perceptron 4 / 40 . Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation The perceptron . x 0 = 1 ( n ) ∑ x 1 w 0 y = f w i x i i w 1 x 2 y w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n Similar to the intercept in linear models, an additional input x 0 which is always set
Introduction . Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, negative otherwise positive the sum is larger than 0 function Perceptron . 5 / 40 . The perceptron: in plain words Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation x 0 = 1 x 1 w 0 • Sum all input x i weighted with corresponding weight w i w 1 x 2 y • Classify the input using a threshold w 2 w n x n
Introduction Learning with perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron 6 / 40 Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • We do not update the parameters if classifjcation is correct • For misclassifjed examples, we try to minimize ∑ E ( w ) = − wx i y i i where i ranges over all misclassifjed examples • Perceptron algorithm updates the weights such that w ← w − η ∇ E ( w ) w ← w + η x i y i for misclassifjed examples. η is the learning rate
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, – Number of iterations without improvement – Number of misclassifjed examples – Maximum number iterations/updates algorithm converges linearly separable batch updates weights for all misclassifjed examples at once online update weights for a single misclassifjed example The perceptron algorithm Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression 7 / 40 • The perceptron algorithm can be • The perceptron algorithm converges to the global minimum if the classes are • If the classes are not linearly separable, the perceptron algorithm will not stop • We do not know whether the classes are linearly separable or not before the • In practice, one can set a stopping condition, such as
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (0) 8 / 40 More methods Logistic Regression Naive Bayes Evaluation Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + −
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (1) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + w −
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (2) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + w −
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (3) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w w 3. Set w ← w + y i x i , go to step 2 until + −
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (4) 8 / 40 More methods Evaluation Logistic Regression Naive Bayes Multi-class strategies − + + w − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + −
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (5) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + −
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, problems that are not linearly separable Minsky and Papert 1969) intelligence, cognitive science 9 / 40 1958) Perceptron: a bit of history Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • The perceptron was developed in late 1950’s and early 1960’s (Rosenblatt • It caused excitement in many fjelds including computer science, artifjcial • The excitement (and funding) died away in early 1970’s (after the criticism by • The main issue was the fact that the perceptron algorithm cannot handle
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, max-ent) in the NLP literature multiple classes – it is a member of the family of models called generalized linear models 10 / 40 Logistic regression Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • Logistic regression is a classifjcation method • In logistic regression, we fjt a model that predicts P ( y | x ) • Logistic regression is an extension of linear regression • Typically formulated for binary classifjcation, but it has a natural extension to • The multi-class logistic regression is often called maximum-entropy model (or
Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Is RMS error appropriate? ? What is Why not just use linear regression? 11 / 40 Multi-class strategies Logistic Regression Naive Bayes an example with a single predictor Data for logistic regression Evaluation More methods y 1 0 . 5 0 x − 2 − 1 0 1 2
Introduction an example with a single predictor Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron 11 / 40 Data for logistic regression Evaluation Logistic Regression Naive Bayes Multi-class strategies More methods y 1 • Why not just use linear regression? 0 . 5 • What is P ( y | x = 2 ) ? • Is RMS error appropriate? 0 x − 2 − 1 0 1 2
Recommend
More recommend