Pattern Recognition Linear Models for Classification Extra Slides - PowerPoint PPT Presentation

Pattern Recognition Linear Models for Classification Extra Slides Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 9

Maximum Likelihood Estimation: Coin Tossing Y = 1 if heads, Y = 0 if tails. β = Pr( Y = 1). In a sequence of 10 coin flips we observe y = (1 , 0 , 1 , 1 , 0 , 1 , 1 , 1 , 1 , 0). The likelihood function is ℓ ( β ) = β · (1 − β ) · β · β · (1 − β ) · β · β · β · β · (1 − β ) = β 7 (1 − β ) 3 The corresponding log-likelihood function is ln ℓ ( β ) = ln( β 7 (1 − β ) 3 ) = 7 ln β + 3 ln(1 − β ) Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 2 / 9

Computing the maximum To determine the maximum we take the derivative and equate it to zero d ln ℓ ( β ) = 7 3 β − 1 − β = 0 d β which yields maximum likelihood estimate ˆ β = 0 . 7. This is the relative frequency of heads in the sample. Show that in general β = n 1 ˆ n , where n is the number of coin tosses, and n 1 is the number of times heads comes up. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 3 / 9

ML estimation for logistic regression In the logistic regression model, the probability of “heads” is assumed to depend on x in the following way: e β 0 + β 1 x Pr( Y = 1 | X = x ) = (1) 1 + e β 0 + β 1 x 1 Pr( Y = 0 | X = x ) = (2) 1 + e β 0 + β 1 x Given observations Y i and X i ( i = 1 , . . . , n ), if Y i = 1 then (1) enters into the likelihood function, and if Y i = 0 then (2) enters into the likelihood function. There is no closed form solution for the maximum likelihood estimates of β 0 and β 1 in this case. Except for some pathological cases, the likelihood function is concave, so there is a unique global maximum. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 4 / 9

Multinomial Logit in R # load training data > optdigits.train <- read.csv("D:/Pattern Recognition/Datasets/optdigits-tra.txt", header=F) # convert class label to factor > optdigits.train[,65] <- as.factor(optdigits.train[,65]) # same for test data > optdigits.test <- read.csv("D:/Pattern Recognition/Datasets/optdigits-tes.txt", header=F) > optdigits.test[,65] <- as.factor(optdigits.test[,65]) Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 5 / 9

Multinomial Logit in R # load nnet library > library(nnet) # fit multinomial logistic regression model # column 1 and 40 are not used (always 0) > optdigits.multinom <- multinom(V65 ∼ ., data = optdigits.train[,-c(1,40)], maxit = 1000) # weights: 640 (567 variable) initial value 8802.782811 ... converged # predict class label on training data > optdigits.multinom.pred <- predict(optdigits.multinom, optdigits.train[,-c(1,40,65)],type="class") Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 6 / 9

Multinomial Logit in R # make confusion matrix: true label vs. predicted label > table(optdigits.train[,65],optdigits.multinom.pred) optdigits.multinom.pred 0 1 2 3 4 5 6 7 8 9 0 376 0 0 0 0 0 0 0 0 0 1 0 389 0 0 0 0 0 0 0 0 2 0 0 380 0 0 0 0 0 0 0 3 0 0 0 389 0 0 0 0 0 0 4 0 0 0 0 387 0 0 0 0 0 5 0 0 0 0 0 376 0 0 0 0 6 0 0 0 0 0 0 377 0 0 0 7 0 0 0 0 0 0 0 387 0 0 8 0 0 0 0 0 0 0 0 380 0 9 0 0 0 0 0 0 0 0 0 382 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 7 / 9

Multinomial Logit in R # predict class label on test data > optdigits.multinom.test.pred <- predict(optdigits.multinom, optdigits.test[,-c(1,40,65)],type="class") > table(optdigits.test[,65],optdigits.multinom.test.pred) optdigits.multinom.test.pred 0 1 2 3 4 5 6 7 8 9 0 170 1 0 0 1 6 0 0 0 0 1 1 170 0 0 4 1 3 1 1 1 2 4 7 157 1 0 0 6 1 1 0 3 0 0 10 155 0 2 2 8 3 3 4 0 8 0 0 153 1 9 3 1 6 5 0 0 1 5 1 173 0 1 0 1 6 4 2 0 0 4 3 168 0 0 0 7 0 0 4 0 2 17 2 149 0 5 8 2 5 0 7 3 5 5 4 142 1 9 1 6 0 0 2 5 0 4 3 159 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 8 / 9

Multinomial Logit in R # make confusion matrix for predictions on test data > confmat <- table(optdigits.test[,65], optdigits.multinom.test.pred) # use it to compute accuracy on test data > sum(diag(confmat))/sum(confmat) [1] 0.888147 The accuracy on the test sample is about 89%. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 9 / 9

Pattern Recognition Linear Models for Classification Extra Slides - PowerPoint PPT Presentation

Pattern Recognition Linear Models for Classification Extra Slides Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 9 Maximum Likelihood Estimation: Coin Tossing Y = 1 if heads, Y = 0 if tails.

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Pattern Recognition 2019 Linear Models for Classification Ad Feelders Universiteit Utrecht Ad

Pattern Recognition 2019 Linear Models for Classification (2) Ad Feelders Universiteit Utrecht

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Filter Banks SPEECH RECOGNITION 40833 1 2 Spectral Analysis Models (a) Pattern Recognition

Pattern Review Pattern Name and Classification: A descriptive and unique name that helps in

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

Looking to the long-term: hearing the public interest voice in energy and water @SustainFirst

Honest Signaling and the Maxim of Quality Christopher Ahern University of Pennsylvania April 13,

Visual Analytics and Information Retrieval Giuseppe Santucci Dipartimento di Informatica e

ESSAY WRITING Dr Jeunese Adrienne Payne When we teach children to write, they also learn to

Introduction to Recognition Computer Vision CS 543 / ECE 549 University of Illinois Many Slides

1 Further information: go.ifrs.org/IFRS-17-implementation and IFRS 17 webcasts youtube playlist:

MLPR Preliminaries Machine Learning and Pattern Recognition Chris Williams and Iain Murray

template matching Sunset detector due Saturday night: literature review due Questions?