Pattern Recognition 2019 Linear Models for Classification Ad - PowerPoint PPT Presentation

Pattern Recognition 2019 Linear Models for Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 55

Classification Problems We are concerned with the problems of 1 Predicting the class of an object, on the basis of a number of variables that describe the object. 2 Estimating the class probabilities of an object. Interconnected, since prediction is usually based on the estimated probabilities. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 2 / 55

Examples of Classification Problems Churn: is customer going to leave for a competitor? SPAM filter: e-mail message is SPAM or not? Medical diagnosis: does patient have breast cancer? Handwritten digit recognition. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 3 / 55

Classification Problems In this kind of classification problem there is a target variable t that assumes values in an unordered discrete set. An important special case is when there are only two classes, in which case we usually choose t ∈ { 0 , 1 } . The goal of a classification procedure is to predict the target value (class label) given a set of input values x = { x 1 , . . . , x D } measured on the same object. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 4 / 55

Classification Problems At a particular point x the value of t is not uniquely determined. It can assume both its values with respective probabilities that depend on the location of the point x in the input space. We write p ( C 1 | x ) = 1 − p ( C 2 | x ) = y ( x ) . The goal of a classification procedure is to produce an estimate of y ( x ) at every input point. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 5 / 55

Two types of approaches to classification Discriminative Models (“regression”; section 4.3). Generative Models (“density estimation”; section 4.2). Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 6 / 55

Discriminative Models Discriminative methods only model the conditional distribution of t given x . The probability distribution of x itself is not modeled. For the binary classification problem: y ( x ) = p ( C 1 | x ) = p ( t = 1 | x ) = f ( x , w ) where f ( x , w ) is some deterministic function of x . Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 7 / 55

Discriminative Models Examples of discriminative classification methods: Linear probability model Logistic regression Feed-forward neural networks . . . Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 8 / 55

Generative Models An alternative paradigm for estimating y ( x ) is based on density estimation. Here Bayes’ theorem y ( x ) = p ( C 1 | x ) p ( C 1 ) p ( x |C 1 ) = p ( C 1 ) p ( x |C 1 ) + p ( C 2 ) P ( x |C 2 ) is applied where p ( x |C k ) are the class conditional probability density functions and p ( C k ) are the unconditional (“prior”) probabilities of each class. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 9 / 55

Generative Models Examples of generative classification methods: Linear/Quadratic Discriminant Analysis, Naive Bayes classifier, . . . Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 10 / 55

Discriminative Models: linear probability model In the linear probability model, we assume that: p ( t = 1 | x ) = E [ t | x ] = w ⊤ x Problem: The linear function w ⊤ x is not guaranteed to produce values between 0 and 1. Negative probabilities and probabilities bigger than 1 go against the axioms of probability. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 11 / 55

Linear response function 1 0 1 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 12 / 55

Logistic regression Logistic response function e w ⊤ x E [ t | x ] = p ( t = 1 | x ) = 1 + e w ⊤ x or (divide numerator and denominator by e w ⊤ x ) 1 1 + e − w ⊤ x = (1 + e − w ⊤ x ) − 1 p ( t = 1 | x ) = (4.59 and 4.87) Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 13 / 55

Logistic Response Function 1.0 0.5 0.0 0 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 14 / 55

Maximum Likelihood Estimation t = 1 if heads, t = 0 if tails. µ = p ( t = 1). One coin flip p ( t ) = µ t (1 − µ ) 1 − t Note that p (1) = µ , and p (0) = 1 − µ as required. Sequence of N independent coin flips N � µ t n (1 − µ ) 1 − t n p ( t ) = p ( t 1 , t 2 , ..., t N ) = n =1 which defines the likelihood function when viewed as a function of µ . Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 17 / 55

Maximum Likelihood Estimation In a sequence of 10 coin flips we observe t = (1 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0). The corresponding likelihood function is p ( t | µ ) = µ · (1 − µ ) · µ · µ · (1 − µ ) · µ · µ · µ · µ · (1 − µ ) = µ 7 (1 − µ ) 3 The corresponding loglikelihood function is ln p ( t | µ ) = ln( µ 7 (1 − µ ) 3 ) = 7 ln µ + 3 ln(1 − µ ) Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 18 / 55

Computing the maximum To determine the maximum we take the derivative and equate it to zero d ln p ( t | µ ) = 7 3 µ − 1 − µ = 0 d µ which yields maximum likelihood estimate µ ML = 0 . 7. This is just the relative frequency of heads in the sample. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 19 / 55

Loglikelihood function for t = (1 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0) −10 −15 −20 −25 −30 0.0 0.2 0.4 0.6 0.8 1.0 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 20 / 55

ML estimation for logistic regression Now probability of success p ( t n = 1) depends on the value of x n : p ( t n = 1 | x n ) = (1 + e − w ⊤ x n ) − 1 = y n p ( t n = 0 | x n ) = (1 + e w ⊤ x n ) − 1 1 − y n = we can represent its probability distribution as follows n (1 − y n ) 1 − t n p ( t n ) = y t n t n ∈ { 0 , 1 } ; n = 1 , . . . , N Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 21 / 55

ML estimation for logistic regression Example p ( t n ) n x n t n (1 + e w 0 +8 w 1 ) − 1 1 8 0 (1 + e w 0 +12 w 1 ) − 1 2 12 0 (1 + e − w 0 − 15 w 1 ) − 1 3 15 1 (1 + e − w 0 − 10 w 1 ) − 1 4 10 1 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 22 / 55

LR: likelihood function Since the t n observations are independent: N N � � n (1 − y n ) 1 − t n y t n p ( t | w ) = p ( t n ) = (4.89) n =1 n =1 Or, taking minus the natural log: N � n (1 − y n ) 1 − t n y t n − ln p ( t | w ) = − ln n =1 N � = − { t n ln y n + (1 − t n ) ln(1 − y n ) } (4.90) n =1 This is called the cross-entropy error function. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 23 / 55

LR: error function Since for the logistic regression model (1 + e − w ⊤ x n ) − 1 y n = (1 + e w ⊤ x n ) − 1 1 − y n = we get � � N � t n ln(1 + e − w ⊤ x n ) + (1 − t n ) ln(1 + e w ⊤ x n ) E ( w ) = n =1 Non-linear function of the parameters. No closed form solution. Error function globally convex. Estimate with e.g. gradient descent . . . Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 24 / 55

Fitted Response Function Substitute maximum likelihood estimates into the response function to obtain the fitted response function e w ⊤ ML x p ( t = 1 | x ) = ˆ 1 + e w ⊤ ML x Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 25 / 55

Example: Programming Assignment Model the probability of successfully completing a programming assignment. Explanatory variable: “programming experience”. We find w 0 = − 3 . 0597 and w 1 = 0 . 1615, so e − 3 . 0597+0 . 1615 x n p ( t = 1 | x n ) = ˆ 1 + e − 3 . 0597+0 . 1615 x n 14 months of programming experience: e − 3 . 0597+0 . 1615(14) p ( t = 1 | x = 14) = ˆ 1 + e − 3 . 0597+0 . 1615(14) ≈ 0 . 31 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 26 / 55

Interpretation of weights In case of a single predictor variable, the odds of t = 1 are given by: p ( t = 1 | x ) p ( t = 0 | x ) = e w 0 + w 1 x If we increase x by 1 unit, then the odds become: e w 0 + w 1 ( x +1) = e w 0 + w 1 x + w 1 = e w 0 + w 1 x e w 1 , since e a + b = e a × e b . We have e w 1 = e 0 . 1615 ≈ 1 . 175 Hence, every extra month of programming experience increases the odds of success by 17 . 5%. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 27 / 55

Pattern Recognition 2019 Linear Models for Classification Ad - PowerPoint PPT Presentation

Pattern Recognition 2019 Linear Models for Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 55 Classification Problems We are concerned with the problems of 1 Predicting the class

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Pattern Recognition 2019 Linear Models for Classification (2) Ad Feelders Universiteit Utrecht

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Pattern Recognition Linear Models for Classification Extra Slides Ad Feelders Universiteit

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Filter Banks SPEECH RECOGNITION 40833 1 2 Spectral Analysis Models (a) Pattern Recognition

Pattern Review Pattern Name and Classification: A descriptive and unique name that helps in

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

Ca Calmi ming ng an n an anxio ious brain ain An Anxiety and stress Children currently

1 Welcome and introductions Indicate if you are: EC, NC Pre-K, Title I, Head Start, anybody we

Introduction Image Analysis & Computer Vision Guido Gerig CS/BIOEN 6640 FALL 2010 August

Enterprise Green Communities Certification: 2015 & 2020 June 30, 2020 E N T E R P R I S E C

3-D integration with an intra-cortical electrode array. B.Dierickx 1 , P.Gao 1 , A.Babaiefishani 1

ADOLESCENT BRAIN DEVELOPMENT Kate Mills, Jenn Pfeifer, and Nick Allen AND MENTAL HEALTH

Robust Cortical Reconstruction and Mapping Tools Using Intrinsic Analysis of Geometry and

Cogmed UK 25 th April 2013 Agenda I ntroduction Cogmed in brief What is working memory Cogmed

Sambuz

Useful Links

Newsletter

Mail Us

Pattern Recognition 2019 Linear Models for Classification Ad - PowerPoint PPT Presentation

Pattern Recognition 2019 Linear Models for Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 55 Classification Problems We are concerned with the problems of 1 Predicting the class

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Pattern Recognition 2019 Linear Models for Classification (2) Ad Feelders Universiteit Utrecht

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Pattern Recognition Linear Models for Classification Extra Slides Ad Feelders Universiteit

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Filter Banks SPEECH RECOGNITION 40833 1 2 Spectral Analysis Models (a) Pattern Recognition

Pattern Review Pattern Name and Classification: A descriptive and unique name that helps in

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

Ca Calmi ming ng an n an anxio ious brain ain An Anxiety and stress Children currently

1 Welcome and introductions Indicate if you are: EC, NC Pre-K, Title I, Head Start, anybody we

Introduction Image Analysis &amp; Computer Vision Guido Gerig CS/BIOEN 6640 FALL 2010 August

Enterprise Green Communities Certification: 2015 &amp; 2020 June 30, 2020 E N T E R P R I S E C

3-D integration with an intra-cortical electrode array. B.Dierickx 1 , P.Gao 1 , A.Babaiefishani 1

ADOLESCENT BRAIN DEVELOPMENT Kate Mills, Jenn Pfeifer, and Nick Allen AND MENTAL HEALTH

Robust Cortical Reconstruction and Mapping Tools Using Intrinsic Analysis of Geometry and

Cogmed UK 25 th April 2013 Agenda I ntroduction Cogmed in brief What is working memory Cogmed

Sambuz

Useful Links

Newsletter

Mail Us

Introduction Image Analysis & Computer Vision Guido Gerig CS/BIOEN 6640 FALL 2010 August

Enterprise Green Communities Certification: 2015 & 2020 June 30, 2020 E N T E R P R I S E C