Linear Models for Classification Oliver Schulte - CMPT 726 Bishop - PowerPoint PPT Presentation

Discriminant Functions Generative Models Discriminative Models Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4

Discriminant Functions Generative Models Discriminative Models Classification: Hand-written Digit Recognition x i = t i = ( 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 ) • Each input vector classified into one of K discrete classes • Denote classes by C k • Represent input image as a vector x i ∈ R 784 . • We have target vector t i ∈ { 0 , 1 } 10 • Given a training set { ( x 1 , t 1 ) , . . . , ( x N , t N ) } , learning problem is to construct a “good” function y ( x ) from these. • y : R 784 → R 10

Discriminant Functions Generative Models Discriminative Models Generalized Linear Models • Similar to previous chapter on linear models for regression, we will use a “linear” model for classification: y ( x ) = f ( w T x + w 0 ) • This is called a generalized linear model • f ( · ) is a fixed non-linear function • e.g. � 1 if u ≥ 0 f ( u ) = 0 otherwise • Decision boundary between classes will be linear function of x • Can also apply non-linearity to x , as in φ i ( x ) for regression

Discriminant Functions Generative Models Discriminative Models Overview • Linear regression for Classification • The Fisher Linear Discriminant, or How to Draw a Line Between Classes • The Perceptron, or The Smallest Neural Net • Logistic Regression—The Statistician’s Classifier

Discriminant Functions Generative Models Discriminative Models Outline Discriminant Functions Generative Models Discriminative Models

Discriminant Functions Generative Models Discriminative Models Discriminant Functions with Two Classes • Start with 2 class problem, y > 0 x 2 t i ∈ { 0 , 1 } y = 0 R 1 y < 0 • Simple linear discriminant R 2 y ( x ) = w T x + w 0 x w y ( x ) � w � apply threshold function to get x ⊥ classification x 1 • Decision surface is line; − w 0 orthogonal to w . � w � • Projection of x in w dir. is w T x || w ||

Discriminant Functions Generative Models Discriminative Models Multiple Classes • A linear discriminant between two classes separates with a hyperplane • How to use this for multiple classes? • One-versus-the-rest method: build K − 1 classifiers, between C k and all others • One-versus-one method: build K ( K − 1 ) / 2 classifiers, between all pairs

Discriminant Functions Generative Models Discriminative Models Multiple Classes ? R 1 R 2 C 1 R 3 C 2 not C 1 not C 2 • A linear discriminant between two classes separates with a hyperplane • How to use this for multiple classes? • One-versus-the-rest method: build K − 1 classifiers, between C k and all others • One-versus-one method: build K ( K − 1 ) / 2 classifiers, between all pairs

Discriminant Functions Generative Models Discriminative Models Multiple Classes C 3 C 1 ? R 1 R 3 R 1 C 1 ? R 2 C 3 C 1 R 2 R 3 C 2 C 2 not C 1 C 2 not C 2 • A linear discriminant between two classes separates with a hyperplane • How to use this for multiple classes? • One-versus-the-rest method: build K − 1 classifiers, between C k and all others • One-versus-one method: build K ( K − 1 ) / 2 classifiers, between all pairs

Discriminant Functions Generative Models Discriminative Models Multiple Classes R j R i R k x B x A ˆ x • A solution is to build K linear functions: y k ( x ) = w T k x + w k 0 assign x to class max k y k ( x ) • Gives connected, convex decision regions ˆ = λ x A + ( 1 − λ ) x B x y k (ˆ x ) = λ y k ( x A ) + ( 1 − λ ) y k ( x B ) ⇒ y k (ˆ x ) > y j (ˆ x ) , ∀ j � = k

Discriminant Functions Generative Models Discriminative Models Least Squares for Classification • How do we learn the decision boundaries ( w k , w k 0 ) ? • One approach is to use least squares, similar to regression • Find W to minimize squared error over all examples and all components of the label vector: N K E ( W ) = 1 � � ( y k ( x n ) − t nk ) 2 2 n = 1 k = 1 • Some algebra, we get a solution using the pseudo-inverse as in regression

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 2 0 −2 −4 −6 −8 −4 −2 0 2 4 6 8 • Looks okay... least squares decision boundary • Similar to logistic regression decision boundary (more later)

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 • Gets worse by adding easy points?! • Looks okay... least squares decision boundary • Similar to logistic regression decision boundary (more later)

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 • Gets worse by adding easy points?! • Looks okay... least squares • Why? decision boundary • Similar to logistic regression decision boundary (more later)

Discriminant Functions Generative Models Discriminative Models Problems with Least Squares 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 • Gets worse by adding easy points?! • Looks okay... least squares • Why? decision boundary • If target value is 1, points far • Similar to logistic regression from boundary will have high decision boundary (more later) value, say 10; this is a large error so the boundary is moved

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop - PowerPoint PPT Presentation

Discriminant Functions Generative Models Discriminative Models Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Discriminant Functions Generative Models Discriminative Models Classification: Hand-written Digit

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

E9 205 Machine Learning for Signal Processing Probablistic Linear Models 30-09-2019 Linear

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Lecture 12: Midterm Exam Review Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc.

E9 205 Machine Learning for Signal Processing Supervised-Dimensionality-Reduction. Decision

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Lecture 8 N.MORGAN / B.GOLD LECTURE 8

Feature Reduction and Selection Selim Aksoy Bilkent University Department of Computer

Class discrimination for microarray studies Vlad Popovici Swiss Institute of Bioinformatics

Constrained discriminative speaker verification specific to normalized i-vectors P.M. Bousquet,

Discriminative Feature Extraction and Dimension Reduction - PCA & LDA Berlin Chen, 2004

Sambuz

Useful Links

Newsletter

Mail Us

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop - PowerPoint PPT Presentation

Discriminant Functions Generative Models Discriminative Models Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Discriminant Functions Generative Models Discriminative Models Classification: Hand-written Digit

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

E9 205 Machine Learning for Signal Processing Probablistic Linear Models 30-09-2019 Linear

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Lecture 12: Midterm Exam Review Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc.

E9 205 Machine Learning for Signal Processing Supervised-Dimensionality-Reduction. Decision

Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall

Lecture 8 N.MORGAN / B.GOLD LECTURE 8

Feature Reduction and Selection Selim Aksoy Bilkent University Department of Computer

Class discrimination for microarray studies Vlad Popovici Swiss Institute of Bioinformatics

Constrained discriminative speaker verification specific to normalized i-vectors P.M. Bousquet,

Discriminative Feature Extraction and Dimension Reduction - PCA &amp; LDA Berlin Chen, 2004

Sambuz

Useful Links

Newsletter

Mail Us

Discriminative Feature Extraction and Dimension Reduction - PCA & LDA Berlin Chen, 2004