Linear Models for Classification Greg Mori - CMPT 419/726 Bishop - PowerPoint PPT Presentation

Discriminant Functions Generative Models Discriminative Models Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4

Discriminant Functions Generative Models Discriminative Models Classification: Hand-written Digit Recognition x i = t i = ( 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 ) • Each input vector classified into one of K discrete classes • Denote classes by C k • Represent input image as a vector x i ∈ R 784 . • We have target vector t i ∈ { 0 , 1 } 10 • Given a training set { ( x 1 , t 1 ) , . . . , ( x N , t N ) } , learning problem is to construct a “good” function y ( x ) from these. • y : R 784 → R 10

Discriminant Functions Generative Models Discriminative Models Generalized Linear Models • Similar to previous chapter on linear models for regression, we will use a “linear” model for classification: y ( x ) = f ( w T x + w 0 ) • This is called a generalized linear model • f ( · ) is a fixed non-linear function • e.g. � 1 if u ≥ 0 f ( u ) = 0 otherwise • Decision boundary between classes will be linear function of x • Can also apply non-linearity to x , as in φ i ( x ) for regression

Discriminant Functions Generative Models Discriminative Models Outline Discriminant Functions Generative Models Discriminative Models

Discriminant Functions Generative Models Discriminative Models Discriminant Functions with Two Classes y > 0 x 2 • Start with 2 class problem, y = 0 R 1 y < 0 t i ∈ { 0 , 1 } R 2 • Simple linear discriminant x y ( x ) = w T x + w 0 w y ( x ) � w � x ⊥ apply threshold function to get x 1 classification − w 0 • Projection of x in w dir. is w T x � w � || w ||

Discriminant Functions Generative Models Discriminative Models Multiple Classes • A linear discriminant between two classes separates with a hyperplane • How to use this for multiple classes? • One-versus-the-rest method: build K − 1 classifiers, between C k and all others • One-versus-one method: build K ( K − 1 ) / 2 classifiers, between all pairs

Discriminant Functions Generative Models Discriminative Models Multiple Classes ? R 1 R 2 C 1 R 3 C 2 not C 1 not C 2 • A linear discriminant between two classes separates with a hyperplane • How to use this for multiple classes? • One-versus-the-rest method: build K − 1 classifiers, between C k and all others • One-versus-one method: build K ( K − 1 ) / 2 classifiers, between all pairs

Discriminant Functions Generative Models Discriminative Models Multiple Classes C 3 C 1 ? R 1 R 3 R 1 C 1 ? R 2 C 3 C 1 R 2 R 3 C 2 C 2 not C 1 C 2 not C 2 • A linear discriminant between two classes separates with a hyperplane • How to use this for multiple classes? • One-versus-the-rest method: build K − 1 classifiers, between C k and all others • One-versus-one method: build K ( K − 1 ) / 2 classifiers, between all pairs

Discriminant Functions Generative Models Discriminative Models Multiple Classes R j R i R k x B x A ˆ x • A solution is to build K linear functions: y k ( x ) = w T k x + w k 0 assign x to class arg max k y k ( x ) • Gives connected, convex decision regions ˆ = λ x A + ( 1 − λ ) x B x y k (ˆ x ) = λ y k ( x A ) + ( 1 − λ ) y k ( x B ) ⇒ y k (ˆ x ) > y j (ˆ x ) , ∀ j � = k

Discriminant Functions Generative Models Discriminative Models Least Squares for Classification • How do we learn the decision boundaries ( w k , w k 0 ) ? • One approach is to use least squares, similar to regression • Find W to minimize squared error over all examples and all components of the label vector: N K E ( W ) = 1 � � ( y k ( x n ) − t nk ) 2 2 n = 1 k = 1 • Some algebra, we get a solution using the pseudo-inverse as in regression

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 2 0 −2 −4 −6 −8 −4 −2 0 2 4 6 8 • Looks okay... least squares decision boundary • Similar to logistic regression decision boundary (more later)

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 • Gets worse by adding easy points?! • Looks okay... least squares decision boundary • Similar to logistic regression decision boundary (more later)

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 • Gets worse by adding easy points?! • Looks okay... least squares • Why? decision boundary • Similar to logistic regression decision boundary (more later)

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 • Gets worse by adding easy points?! • Looks okay... least squares • Why? decision boundary • If target value is 1, points far • Similar to logistic regression from boundary will have high decision boundary (more later) value, say 10; this is a large error so the boundary is moved

Discriminant Functions Generative Models Discriminative Models More Least Squares Problems 6 4 2 0 −2 −4 −6 −6 −4 −2 0 2 4 6 • Easily separated by hyperplanes, but not found using least squares! • We’ll address these problems later with better models

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop - PowerPoint PPT Presentation

Discriminant Functions Generative Models Discriminative Models Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant Functions Generative Models Discriminative Models Classification: Hand-written Digit

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Discriminant

E9 205 Machine Learning for Signal Processing Probablistic Linear Models 30-09-2019 Linear

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture18: Logistic regression

Generalized Linear Factor Models: a local EM estimation Xavier Bry a, Christian Lavergne ab and

Session 06 Generalized Linear Models 1 Nature of the generalization Single response variable,

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Beyond GLM: The potential for a generic likelihood toolbox Peter Dalgaard Department of

Multiclass Logistic Regression, Multilayer Perceptron Milan Straka October 26, 2020 Charles

Introduction to GSEM in Stata Christopher F Baum ECON 8823: Applied Econometrics Boston College,

Linear Regression Let us assume that the target variable and the inputs are related by the