CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick - PowerPoint PPT Presentation

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick CS 7616 Pattern Recognition Linear, Linear, Linear… Aaron Bobick School of Interactive Computing

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Administrivia • First problem set will be out tonight (Thurs 1/23). Due in more than one week, Sunday Feb 2 (touchdown…), 11:55pm. • General description: for a trio of data sets (one common, one from the sets we provide, one from those sets or your own), use parametric density estimation for normal densities to find best result. Use both MLE methods and Bayes. • But next one may be out before this one is due.

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Today brought to you by… • Some materials borrowed from Jie Lu, Joy, Lucian @ CMU, Geoff Hinton (U Toronto), and Reza Shadmehr (Hopkins)

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Outline for “today” • We have seen linear discriminants arise in the case of normal distributions. (When?) • Now we’ll approach from another way: • Linear regression – really least squares • “Hat” operator • From regression to classification: Indicator Matrix • Logistic regression – which is not regression but classification • Reduced rank linear discriminants - Fischer Linear Discriminant Analysis

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Jumping ahead… • Last time regression and some discussion of discriminants from normal distributions. • This time logistic regression and Fisher LDA

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick First regression 𝑈 be a random vector. Unfortunately, • Let 𝑌 = 𝑌 1 , 𝑌 2 , … 𝑌 𝑞 𝒚 𝑗 is the i th vector. Let 𝑧 𝑗 be a real value associated with 𝒚 𝑗 . • Let us assume we want want to build a predictor of y based upon a linear model. • Choose 𝛾 such that the residual is smallest:

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear regression • Easy to do with vector notation: Let 𝒀 be a matrix (N x (p+1)) where each row is (1, 𝑦 𝑗 ) (why p+1?). Let y be a N long column vector of outputs. Then: • Want to minimize this. How? Differentiate: •

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Continuing… • Setting derivative to zero: • Solving: = x β ˆ T y • Predicting 0 0 • Could now predict the original y’s: • The matrix called H for “hat”:

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Two views of regression

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Methods for Classification • What are they? Methods that give linear decision boundaries between classes Linear decision boundaries { x: β 0 + β 1 T x = 0 } • How to define decision boundaries? Two classes of methods • Model discriminant functions δ k ( x ) for each class as linear • Model the boundaries between classes as linear

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Two Classes of Linear Methods • Model discriminant functions δ k ( x ) for each class as linear; choose the k for which δ k ( x ) is largest. • Different models/methods: • Linear regression fit to the class indicator variables • Linear discriminant analysis (LDA) • Logistic regression (LOGREG) • Model the boundaries between classes as linear (will be discussed later in class) • Perceptron • Support vector classifier (SVM)

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Regression Fit to the Class Indicator Variables • Linear model for k th indicator response variable ∧ ∧ ∧ = β 0 + β T ( ) f x x k k k • Decision boundary is set of points ∧ ∧ ∧ ∧ ∧ ∧ = = β − β + β − β = T { : ( ) ( )} { : ( ) ( ) 0 } x f x f x x x 0 0 k l k l k l • Linear discriminant function for class k ∧ δ = ( ) ( ) x f x k k

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Regression Fit to the Class Indicator Variables • Let Y be a vector where the k th element 𝑍 𝑙 is a 1 if the class of the corresponding input is K, zero otherwise. This vector Y is an indicator vector • For a set of N training points we can stack the Y’s into an NxK matrix such that each row is the Y for a single input. In this case each column is a different indicator function to be learned. A different regression problem. This image cannot currently be displayed.

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Regression Fit to the Class Indicator Variables • Best linear fit: for a single column we know how to solve this: = x β ˆ T y 0 • So for the stacked Y :

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Regression Fit to the Class Indicator Variables • So given columns of weights B (just columns of 𝛾 ) • Compute the discriminant functions as a row vector : ̂ • And choose class k for whichever 𝑔 𝑙 𝑦 is largest

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Regression Fit to the Class Indicator Variables • So why is this a good idea? Or is it? • This is actually a sum of squares approach: define the class indicator as a target value of 1 or 0. Goal is to fit each class target function as well as possible. • How well does it work? • Pretty well when K=2 (number of classes) • But…

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Regression Fit to the Class Indicator Variables •Problem –When K ≥ 3, classes can be masked by others –Because the rigid nature of the regression model:

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Regression Fit to the Class Indicator Variables Quadratic Polynomials

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Discriminant Analysis (Common Convariance Matrix Σ ) • Model class-conditional density of X in class k as multivariate Gaussian 1 − 1 − − µ ∑ − µ T 1 ( ) ( ) x x = k k ( ) 2 f x e π ∑ k / 2 1 / 2 p ( 2 ) | | • Class posterior π ( ) f x = = = k k Pr( | ) G k X x ∑ = K π ( ) f x l l 1 l • Decision boundary is set of points = = Pr( | ) G k X x = = = = = = = { : Pr( | ) Pr( | )} { : log 0 } x G k X x G l X x x = = Pr( | ) G l X x π 1 − − = − µ + µ ∑ µ − µ + ∑ µ − µ = 1 1 T T k { : log ( ) ( ) ( ) 0 } x x π k l k l k l 2 l

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Linear Discriminant Analysis (Common Σ ) con’t • Linear discriminant function for class k 1 − − δ = ∑ µ − µ ∑ µ + π T T 1 1 ( ) log x x k k k k k 2 • Classify to the class with the largest value for its δ k(x) ∧ = δ ( ) arg max ( ) G x x ∈ k g k • Parameters estimation • Objective function ∧ ∑ ∑ N N β = = arg max log Pr ( , ) arg max log Pr ( | ) Pr ( ) x y x y y β β β β β = i i = i i i i 1 i 1 • Estimated parameters ∧ π = / N k N k ∧ ∑ µ = / x N ∧ ∑ ∑ k = i k g k i ∧ ∧ ∑ = K − µ − µ − T ( )( ) /( ) x x N K = = i k i k 1 k g k i

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick More on being linear…

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick The planar decision surface in data-space for the simple linear discriminant function: + w 0 ≥ T x 0 w

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Gaussian Linear Discriminant Analysis with Common Convariance Matrix (GDA) • Model class-conditional density of X in class k as multivariate Gaussian 1 − 1 − − µ ∑ − µ T 1 ( ) ( ) x x = k k ( ) 2 f x e π ∑ k / 2 1 / 2 p ( 2 ) | | • Class posterior π ( ) f x = = = Pr( | ) C k X x k k π ∑ ( ) K f x = l 1 l l • Decision boundary is set of points = = Pr( | ) C k X x = = = = = => = { : Pr( | ) Pr( | )} { : log 0} x C k X x C l X x x = = Pr( | ) C l X x Pr( ) 1 C = − µ + µ ∑ µ − µ + ∑ µ − µ = − − { : log ( ) T 1 ( ) T 1 ( ) 0} x x k k l k l k l Pr ( ) 2 C l

Linear, Linear, Linear CS7616 Pattern Recognition – A. Bobick Gaussian Linear Discriminant Analysis with Common Convariance Matrix (GDA) • Linear discriminant function for class k 1 δ = ∑ µ − µ ∑ µ + − − ( ) T 1 T 1 log(Pr( )) x x C k k k k k 2 • Classify to the class with the largest value for its δ k(x) • Parameters estimation (where 𝑧 𝑗 is class of 𝒚 𝑗 ) • Objective function ∧ ∑ ∑ β = N = N arg max log Pr ( , ) arg max log Pr ( | ) Pr ( ) x y x y y β β β β β = = i i i i i 1 1 i i • MLE Estimated parameters  ∧ = µ = ∑ ( ) / Pr C N N / x N = ∧ ∑ ∑ C k k k k i k i ∧ ∧ K ∑ = − µ − µ − T ( )( ) /( ) x x N K = = i i k k 1 k g k i

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick - PowerPoint PPT Presentation

Linear, Linear, Linear CS7616 Pattern Recognition A. Bobick CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive Computing Linear, Linear, Linear CS7616 Pattern Recognition A. Bobick Administrivia

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Linear Manifold Embeddings of Pattern Clusters Robert Haralick Rave Harpaz Pattern Recognition

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Pattern Recognition Theory Lecture 12 : Correlation Filters Pattern Matching a How to match

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Cross-section systematics / inputs for VALOR DUNE ND analysis Costas Andreopoulos, Steve

Why Student Distributions? A Combination . . . Why Materns Covariance Main Result Derivation

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of

An assessment of the tropical Humidity Temperature covariance using AIRS Antonia Gambacorta,

Deep Neural Networks as Gaussian Processes Jaehoon Lee Google Brain Workshop on Accelerating the