Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 1

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Discriminant-Based Classification 1 Linearly Separable Systems Pairwise Separation Posteriors 2 Logistic Discrimination 3 2

Discriminant-Based Classification Posteriors Logistic Discrimination Discriminant-Based Classification Likelihood-based: Assume a model for p ( � x | C i ). Use Bayes’ rule to calculate P ( C i | � x ) x ) = log P ( C i | � g i ( � x ) x | � Discriminant-based: Assume a model for g i ( � φ i ). Vapnik: Estimating the class densities is a harder problem than estimating the class discriminants. It does not make sense to solve a hard problem to solve an easier one. 3

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation 4

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation Knowledge extraction: Weights sizes give an indication of significance of contribution of each attribute 4

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation Knowledge extraction: Weights sizes give an indication of significance of contribution of each attribute x | C i ) are Gaussian with shared covariance Optimal when p ( � matrix 4

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Linear discriminant: d � w T x | � g i ( � w i , w i 0 ) = � i � x + w i 0 = w ij x j + w i 0 j =1 Advantages: Simple: O(d) space/computation Knowledge extraction: Weights sizes give an indication of significance of contribution of each attribute x | C i ) are Gaussian with shared covariance Optimal when p ( � matrix Useful when classes are (almost) linearly separable 4

Discriminant-Based Classification Posteriors Logistic Discrimination More General Linear Models d � x | � g i ( � w i , w i 0 ) = w ij x j + w i 0 j =1 We can replace the x i on the right by any linearly independent set of basis functions: x ) − g 2 ( � g ( � x ) = g 1 ( � x ) w T � = x + w 0 � � C 1 if g ( � x ) > 0 Choose C 2 ow 5

Discriminant-Based Classification Posteriors Logistic Discrimination Geometric Interpretation Rewrite � x as w � � x = � x p + r || � w || where � x p is the projection of � x onto the hyperplane g ( � x ) = 0 w is normal to � the hyperplane r = g ( � x ) w || is the || � (signed) distance 6

Discriminant-Based Classification Posteriors Logistic Discrimination Linearly Separable Systems For multiple classes with w T g i ( � x | � w i , w i 0 ) = � i � x + w i 0 with the � w i normalized Choose C i if k g i ( � x ) = max j =1 g j ( � x ) 7

Discriminant-Based Classification Posteriors Logistic Discrimination Pairwise Separation If not linearly separable, compute discriminants between each pair of classes: w T x | � g ij ( � w ij , w ij 0 ) = � ij � x + w ij 0 Choose C i if ∀ j � = i , g ij ( � x ) > 0 8

Discriminant-Based Classification Posteriors Logistic Discrimination Revisiting Parametric Methods When p ( � x | C i ) ∼ N ( � µ, Σ), w T g i ( � x | � w i , w i 0 ) = � i � x + w i 0 w i = Σ − 1 � � µ i w i 0 = − 1 i Σ − 1 � µ T 2 � µ i + log P ( C i ) Let y ≡ P ( C 1 | � x ). Then P ( C 2 | � x ) = 1 − y y We choose C 1 if y > 0 . 5, or alternatively, if 1 − y > 1. � � y Equivalently, if log > 0 1 − y The latter is called the log odds of y or logit . 9

Discriminant-Based Classification Posteriors Logistic Discrimination log odds For 2 normal classes with a shared cov. matrix, the log odds is linear log P ( C 1 | � x ) logit ( P ( C 1 | � x )) = P ( C 2 | � x ) log P ( � x | C 1 ) x | C 2 ) + log P ( C 1 ) = P ( � P ( C 2 ) x | C 2 ) + log P ( C 1 ) = log P ( � x | C 1 ) − log P ( � P ( C 2 ) The P ( � x | C ) terms are exponential in � x (Gaussian pdf), so the log is linear w T � logit ( P ( C 1 | � x )) = � x + w 0 w = Σ − 1 ( � µ 2 ), w 0 = − 1 µ 2 ) T Σ − 1 ( � with � µ 1 − � 2 ( � µ 1 + � µ 1 + � µ 2 ) 10

Discriminant-Based Classification Posteriors Logistic Discrimination logistic The inverse of the logit function: w T � logit ( P ( C 1 | � x )) = � x + w 0 is called the logistic a.k.a. the sigmoid : 1 w T � P ( C 1 | � x ) = sigmoid ( � x + w 0 ) = w T � 1 + exp[ � x + w 0 ] 11

Discriminant-Based Classification Posteriors Logistic Discrimination Using the Sigmoid During training During training, estimate m 1 , � � m 2 , S , then compute the � w During testing, either Calculate x | � w T � g ( � w , w 0 ) = � x + w 0 and choose C i if g ( � x ) > 0, or Calculate w T � y = sigmoid ( � x + w 0 ) and choose C i if y > 0 . 5 12

Discriminant-Based Classification Posteriors Logistic Discrimination Logistic Discrimination For two classes, assume the log likelihood ratio is linear log p ( � x | C 1 ) w T � x | C 2 ) = � x + w 0 p ( � w T � logit ( p ( C 1 )) = � x + w 0 1 y = ˆ P ( C 1 | � x ) = w T � 1 + exp [ � x + w 0 ] Likelihood � ( y t ) r t (1 − y t ) 1 − r t l ( � w , w 0 |X ) = t Error (“cross-entropy”) r t log y t + (1 − r t ) log (1 − y t ) � E ( � w , w 0 |X ) = − t Train by numerical optimization to minimize E 13

Discriminant-Based Classification Posteriors Logistic Discrimination Estimating w 14

Discriminant-Based Classification Posteriors Logistic Discrimination Multiple classes For K classes, take C K as a reference class log p ( � x | C i ) w T � x | C K ) = � x + w 0 p ( � p ( C i | � x ) � � w T � x ) = exp � x + w 0 p ( C K | � w T � � � exp i � x + w i 0 y i = ˆ P ( C i | � x ) = � � 1 + � K w T j =1 exp � j � x + w j 0 This is called the softmax function because exponentiation combined with normalization tends to exaggerate weight of the maximum term Likelihood i ) r t � � ( y t l ( � w , w 0 |X ) = i t i 15

Discriminant-Based Classification Posteriors Logistic Discrimination Multiple classes (cont.) Error (“cross-entropy)”) � � r t i log y t w , w 0 |X ) = − E ( � i t i Train by numerical optimization to minimize E 16

Discriminant-Based Classification Posteriors Logistic Discrimination Softmax Classification 17

Discriminant-Based Classification Posteriors Logistic Discrimination Softmax Discriminants 18

Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 1 Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Discrimination in the Auto Loan Market Alexander W. Butler Rice Erik J. Mayer SMU James P.

Auditory Perception - Detection versus Discrimination - Localization versus Discrimination -

2.2 Price Discrimination Matilde Machado Download the slides from:

2.2 Price Discrimination Matilde Machado Download the slides from:

Racial Discrimination in the Coronary Racial Discrimination in the Artery Risk Development in

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Addressing discrimination against persons with disabilities: Key issues and strategies Equality

Category of Discrimination 16 th April 2018 Purpose of todays event: Provide information

the discrimination in risk underwriting? Francois Marais AGENDA Explain the nature of the

Sue Baker Director Time to Change Why Focus on Stigma and Discrimination? 58% of people said

Economic & Market Implications of COVID-19 Coronavirus/COVID-19: 2,518,275 Cases; > 200

Figure 1.a, Sample Warning Sign for Class 3B and Class 4 1 2 Wattage liner optional 3 4 5

Indication of bulk-ion heating by Energetic particle driven Geodesic Acoustic Mode on LHD NIFS

Leveraging RWE to Support Regulatory Decisions: An Update on Efforts to Inform Policy Gregory

Common Rheumatology Issues in Hospital Medicine Lianne Gensler, MD Associate Professor of

EA EARL RLY Y LUN UNG G CANC ANCER ER Pang Yong Kek Lecture Outline Why performing

Hashing to Elliptic Curves and Cryptanalysis of RSA-Based Schemes Mehdi Tibouchi Ecole

Charged Lepton Flavor Violation in Muon 3 Major Processes + e + + e + e +

Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 - PowerPoint PPT Presentation

Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 1 Discriminant-Based Classification Posteriors Logistic Discrimination Linear Discrimination

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Discrimination in the Auto Loan Market Alexander W. Butler Rice Erik J. Mayer SMU James P.

Auditory Perception - Detection versus Discrimination - Localization versus Discrimination -

2.2 Price Discrimination Matilde Machado Download the slides from:

2.2 Price Discrimination Matilde Machado Download the slides from:

Racial Discrimination in the Coronary Racial Discrimination in the Artery Risk Development in

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Addressing discrimination against persons with disabilities: Key issues and strategies Equality

Category of Discrimination 16 th April 2018 Purpose of todays event: Provide information

the discrimination in risk underwriting? Francois Marais AGENDA Explain the nature of the

Sue Baker Director Time to Change Why Focus on Stigma and Discrimination? 58% of people said

Economic &amp; Market Implications of COVID-19 Coronavirus/COVID-19: 2,518,275 Cases; &gt; 200

Figure 1.a, Sample Warning Sign for Class 3B and Class 4 1 2 Wattage liner optional 3 4 5

Indication of bulk-ion heating by Energetic particle driven Geodesic Acoustic Mode on LHD NIFS

Leveraging RWE to Support Regulatory Decisions: An Update on Efforts to Inform Policy Gregory

Common Rheumatology Issues in Hospital Medicine Lianne Gensler, MD Associate Professor of

EA EARL RLY Y LUN UNG G CANC ANCER ER Pang Yong Kek Lecture Outline Why performing

Hashing to Elliptic Curves and Cryptanalysis of RSA-Based Schemes Mehdi Tibouchi Ecole

Charged Lepton Flavor Violation in Muon 3 Major Processes + e + + e + e +

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Economic & Market Implications of COVID-19 Coronavirus/COVID-19: 2,518,275 Cases; > 200