Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12: Regularization, regression, and multi-class classification Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1

HW 2 2

Learning objective • Review homeworks and multi-class classification • Linear regression • Examine regularization in the regression context • Recognize the effects of regularization on bias/variance 3

Outline • Multi-class classification • Linear regression • Regularization 4

Multi-class classification • Binary examples • Spam classification • Sentiment classification 6

Multi-class classification • Binary examples • Spam classification • Sentiment classification • Multi-class examples • Star-ratings classification • Part-of-speech tagging • Image classification 7

What we learned so far • KNN • Naïve Bayes • Logistic regression • Neural networks • Support vector machines 8

Binary vs. Multi-class classification 9

Multi-class logistic regression 10

Multi-class Support Vector Machines • Reduction • One-against-all • All-pairs • Modify objective function (SSBD 17.2) 12

Reduction 13

How do we use binary classifier to output categorical labels? 14

One-against-all 15

One-against-all • Break k- class problem into k binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest confidence 16

One-against-all 17

All-pairs 18

All-pairs • Break k- class problem into k(k-1)/2 binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest sum confidence 19

All-pairs 20

Linear regression • Data are continuous inputs and outputs 22

Linear regression example • Given a person’s age and gender, predict their height • Given the square footage and number of bathrooms in a house, predict its sale price • Given unemployment, inflation, number of wars, and economics growth, predict the president’s approval rating • Given a user’s browsing history, predict how long he will stay on product page • Given the advertising budget expenditures in various markets, predict the number of products sold 23

Linear regression example 24

Linear regression example 25

Derived features 26

Derived features 27

Objective function The objective function is called the residual sum of squares : 28

Probabilistic interpretation A discriminative model that assumes the response Gaussian with mean 29

Probabilistic interpretation A discriminative model that assumes the response Gaussian with mean 30

Probabilistic interpretation Assuming i.i.d. samples, we can write the likelihood of the data as 31

Probabilistic interpretation Negative log likelihood 32

Probabilistic interpretation Negative log likelihood 33

Revisiting Bias-variance Tradeoff • Consider the case of fitting linear regression with derived polynomial features to a set of training data • In general, want a model that explains the training data and can still generalize to unseen test data 34

Revisiting Bias-variance Tradeoff 35

High variance • Model wiggles wildly to get close to data • To get big swings, model coefficients are very large • weight go to 10 6 37

Regularization • Keep all the features, but force the coefficients to be smaller • This is called regularization 38

Regularization • Add penalty term to RSS objective function • Balance between small RSS and small coefficients 39

Regularization • Add penalty term to RSS objective function • Balance between small RSS and small coefficients • HW 2 extra credit question 40

Regularization 42

Ridge regularization 43

Ridge regularization 44

Bias-variance tradeoff The penalty decreases, then A. the bias increases, the variance increases B. the bias increases, the variance decreases C. the bias decreases, the variance increases D. the bias decreases, the variance decreases 45

Ridge regularization vs. lasso regularization • How do the coefficients behave as increases? λ 46

Ridge regularization • Coefficients shrink to zero uniformly smoothly 47

Lasso regularization • Some coefficients shrink to zero very fast 48

Ridge regularization vs. lasso regularization • Why does the choice between the two types of regularization lead to very different behavior? • Several ways to look at it • Constrained minimization • Look at a simplified case of data • Prior probabilities on parameters 49

Intuition 1: Constrained Minimization 50

Intuition 1: Constrained Minimization 51

Intuition 1: Constrained Minimization Minimum more likely to be at point of diamond with Lasso, causing some feature weights to be set to zero. 52

Intuition 2: A Simplified Case 53

Intuition 3: Prior Distribution 57

Intuition 3: Prior Distribution • Lasso's prior peaked at 0 means expect many params to be zero • Ridge's prior flatter and fatter around 0 means we expect many coefficients to be smallish 61

Wrap up • Regularization and the idea behind it is crucial for machine learning • Always use regularization in some form • Next • Ensemble methods 62

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12: Regularization, regression, and multi-class classification Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 HW 2 2 Learning objective Review

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm,

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 14: PAC

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Sorting in the labor Market Part 1: AKM framework Thibaut Lamadon U. Chicago October 24, 2017

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Network Flow II Lecture 13 October 10, 2013 http://www.cs.berkeley.edu/jrs/ Calvin Sariel

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING KUAN Department of Finance &

The MassHealth Waiver & 2009-2011 Yes We Can: The MassHealth Waiver 2009-2011 MA Health

Introduction Literature Methodology Empirical Finding Robustness Tests

March Family Internship Fund The Economics Department would like to remind you about the March

Social Network Games with Obligatory Product Selection Krzysztof R. Apt CWI and University of

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12: Regularization, regression, and multi-class classification Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 HW 2 2 Learning objective Review

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm,

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 14: PAC

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Sorting in the labor Market Part 1: AKM framework Thibaut Lamadon U. Chicago October 24, 2017

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Network Flow II Lecture 13 October 10, 2013 http://www.cs.berkeley.edu/jrs/ Calvin Sariel

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING KUAN Department of Finance &amp;

The MassHealth Waiver &amp; 2009-2011 Yes We Can: The MassHealth Waiver 2009-2011 MA Health

Introduction Literature Methodology Empirical Finding Robustness Tests

March Family Internship Fund The Economics Department would like to remind you about the March

Social Network Games with Obligatory Product Selection Krzysztof R. Apt CWI and University of

Lecture 1 Economic Data and Simple Linear Regression CHUNG-MING KUAN Department of Finance &

The MassHealth Waiver & 2009-2011 Yes We Can: The MassHealth Waiver 2009-2011 MA Health