Linear Models Continued: Perceptron & Logistic Regression CMSC - PowerPoint PPT Presentation

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein

Linear Models for Classification Feature function representation Weights

Naïve Bayes recap

The Perceptron

The perceptron • A linear model for classification • An algorithm to learn feature weights given labeled data • online algorithm • error-driven

Multiclass perceptron

Understanding the perceptron • What’s the impact of the update rule on parameters? • The perceptron algorithm will converge if the training data is linearly separable • Proof: see “A Course In Machine Learning” Ch.4 • Practical issues • How to initalize? • When to stop? • How to order training examples?

When to stop? • One technique • When the accuracy on held out data starts to decrease • Early stopping Requires splitting data into 3 sets: training/development/test

ML fundamentals aside: overfitting/underfitting/generalization

Training error is not sufficient • We care about generalization to new examples • A classifier can classify training data perfectly, yet classify new examples incorrectly • Because training examples are only a sample of data distribution • a feature might correlate with class by coincidence • Because training examples could be noisy • e.g., accident in labeling

Overfitting • Consider a model 𝜄 and its: • Error rate over training data 𝑓𝑠𝑠𝑝𝑠 %&'() (𝜄) • True error rate over all data 𝑓𝑠𝑠𝑝𝑠 %&,- 𝜄 • We say ℎ overfits the training data if 𝑓𝑠𝑠𝑝𝑠 %&'() 𝜄 < 𝑓𝑠𝑠𝑝𝑠 %&,- 𝜄

Evaluating on test data • Problem: we don’t know 𝑓𝑠𝑠𝑝𝑠 %&,- 𝜄 ! • Solution: • we set aside a test set • some examples that will be used for evaluation • we don’t look at them during training! • after learning a classifier 𝜄 , we calculate 𝑓𝑠𝑠𝑝𝑠 %-0% 𝜄

Overfitting • Another way of putting it • A classifier 𝜄 is said to overfit the training data, if there is another hypothesis 𝜄′ , such that • 𝜄 has a smaller error than 𝜄′ on the training data • but 𝜄 has larger error on the test data than 𝜄′ .

Underfitting/Overfitting • Underfitting • Learning algorithm had the opportunity to learn more from training data, but didn’t • Overfitting • Learning algorithm paid too much attention to idiosyncracies of the training data; the resulting classifier doesn’t generalize

Back to the Perceptron

Averaged Perceptron improves generalization

What objective/loss does the perceptron optimize? • Zero-one loss function • What are the pros and cons compared to Naïve Bayes loss?

Logistic Regression

Perceptron & Probabilities • What if we want a probability p(y|x)? • The perceptron gives us a prediction y • Let’s illustrate this with binary classification Illustrations: Graham Neubig

The logistic function • “Softer” function than in perceptron • Can account for uncertainty • Differentiable

Logistic regression: how to train? • Train based on conditional likelihood • Find parameters w that maximize conditional likelihood of all answers 𝑧 ( given examples 𝑦 (

Stochastic gradient ascent (or descent) • Online training algorithm for logistic regression • and other probabilistic models Update weights for every training example • Move in direction given by gradient • Size of update step scaled by learning rate •

What you should know • Standard supervised learning set-up for text classification • Difference between train vs. test data • How to evaluate • 3 examples of supervised linear classifiers • Naïve Bayes, Perceptron, Logistic Regression • Learning as optimization: what is the objective function optimized? • Difference between generative vs. discriminative classifiers • Smoothing, regularization • Overfitting, underfitting

online learning algorithm An on

Perceptron weight update • If y = 1, increase the weights for features in • If y = -1, decrease the weights for features in

Linear Models Continued: Perceptron & Logistic Regression CMSC - PowerPoint PPT Presentation

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function representation Weights Nave

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Topics of the day Logistic regression and generalized linear models Rasmus Waagepetersen

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes Rundi Wu Yixin Zhuang Kai Xu Hao

FX Options Trading Class 7 Gregory McDermott, MApp Fin OU Chief FX Strategist U.S.

CS 309: Autonomous Intelligent Robotics FRI I Lecture 4: AI Part 2 & C++ Part 2 Instructor:

Introduction to Data Mining CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Digital Citizen Someone who uses technology safely and responsibly commonsense.org/education

www.prospectlaw.co.uk t: +44 (0)20 7947 5354 | t:+44 (0)1332 818 785 | info@prospectlaw.co.uk

CONTAINER ORCHESTRATION WITH SWARM MODE, MESOS/MARATHON AND KUBERNETES ADRIAN MOUAT WHO AM I?

Challenges for Effective Procurement Control in New Reactor Construction June 3, 2009 Naoki

Linear Models Continued: Perceptron & Logistic Regression CMSC - PowerPoint PPT Presentation

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function representation Weights Nave

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Topics of the day Logistic regression and generalized linear models Rasmus Waagepetersen

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes Rundi Wu Yixin Zhuang Kai Xu Hao

FX Options Trading Class 7 Gregory McDermott, MApp Fin OU Chief FX Strategist U.S.

CS 309: Autonomous Intelligent Robotics FRI I Lecture 4: AI Part 2 &amp; C++ Part 2 Instructor:

Introduction to Data Mining CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University

Digital Citizen Someone who uses technology safely and responsibly commonsense.org/education

www.prospectlaw.co.uk t: +44 (0)20 7947 5354 | t:+44 (0)1332 818 785 | info@prospectlaw.co.uk

CONTAINER ORCHESTRATION WITH SWARM MODE, MESOS/MARATHON AND KUBERNETES ADRIAN MOUAT WHO AM I?

Challenges for Effective Procurement Control in New Reactor Construction June 3, 2009 Naoki

CS 309: Autonomous Intelligent Robotics FRI I Lecture 4: AI Part 2 & C++ Part 2 Instructor: