Introduction Generative Models Prob. Disc. Models Class Projects Summary Linear Models for Classification II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear Bayes Classification 1 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Outline Introduction 1 Probabilistic Generative Models 2 Probabilistic Discriminative Models 3 Class Projects 4 Summary 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 2 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Introduction Recap: Last time we talked about linear classification as an optimization problem Today - Bayesian Models for Classification Discussion of possible class projects. Summary Henrik I Christensen (RIM@GT) Linear Bayes Classification 3 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Outline Introduction 1 Probabilistic Generative Models 2 Probabilistic Discriminative Models 3 Class Projects 4 Summary 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 4 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Probabilistic Generative Models Objective - p ( C k | x ) Modelling using p ( C k ) - the class priors p ( x |C k ) - the class conditionals Think two classes p ( x |C 1 ) p ( C 1 ) p ( C 1 | x ) = p ( x |C 1 ) p ( C 1 ) + p ( x |C 2 ) p ( C 2 ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 5 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Sigmoid Formulation Reformulation 1 p ( C k | x ) = 1 + e − a = σ ( a ) where a = ln p ( x |C 1 ) p ( C 1 ) p ( x |C 2 ) p ( C 2 ) Logistic Sigmoid, σ ( a ), defined by 1 σ ( a ) = 1 + e − a Note σ ( − a ) = 1 − σ ( a ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 6 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Sigmoid Function 1 0.5 0 −5 0 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 7 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Generalization beyond K > 2 Consider p ( x |C k ) p ( C k ) p ( C k | x ) = � i p ( x |C i ) p ( C i ) e − a k = � i e a i where a k = ln ( p ( x |C k ) p ( C k )) Henrik I Christensen (RIM@GT) Linear Bayes Classification 8 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary The case with Normal distributions Consider a D-dimensional distribution with mean µ k and covariance Σ The result would be p ( C k | x ) = σ ( w T x + w 0 ) where Σ − 1 ( µ 1 − µ 2 ) w = − 1 1 Σ − 1 µ 1 + 1 2 Σ − 1 µ 2 + ln p ( C 1 ) 2 µ T 2 µ T w 0 = p ( C 2 ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 9 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary The multi class Normal case For each case a k ( x ) = w T x + w k 0 then Σ − 1 µ k = w k − 1 k Σ − 1 µ k + ln p ( C k ) 2 µ T w k 0 = Henrik I Christensen (RIM@GT) Linear Bayes Classification 10 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Small multi-class Normal distribution example 2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 −2 −1 0 1 2 Henrik I Christensen (RIM@GT) Linear Bayes Classification 11 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary The Maximum Likelihood Solution For two class example with priors ( π, 1 − π ) Then we have p ( x n , C 1 ) = p ( C 1 ) p ( x n |C 1 ) = π N ( x n | µ 1 , Σ) The joint likelihood function is then N � [ π N ( x i | µ 1 , Σ)] t i [(1 − π ) N ( x i | µ 2 , Σ)] 1 − t i p ( t | π, µ 1 , µ 2 , Σ) = i =1 where t i is the classification result of the i ’th sample we can compute the maximum of p ( . ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 12 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary The Maximum Likelihood Solution (2) The class probabilities are then N 1 π = N 1 + N 2 the class means are µ i = 1 � t n x n N i n =1 and Σ = S = N 1 N S 1 + N 2 N S 2 In reality the results are not surprising Could we compute the optimal ML solution directly? Henrik I Christensen (RIM@GT) Linear Bayes Classification 13 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Outline Introduction 1 Probabilistic Generative Models 2 Probabilistic Discriminative Models 3 Class Projects 4 Summary 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 14 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Probabilistic Discriminative Models Could we analyze the problem direct rather than through a generative model? I.e. could we perform ML directly on p ( C k | x )? Could involve less parameters! Henrik I Christensen (RIM@GT) Linear Bayes Classification 15 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Logistic Regression Consider the two class problem. Formulation as a Sigmoid p ( C 1 | φ ) = y ( φ ) = σ ( w T φ ) then p ( C 2 | φ ) = 1 − p ( C 1 | phi ) Consider d σ da = σ (1 − σ ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 16 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Logistic Regression - II For a dataset { φ n , t n } we have N � y t i i { 1 − y i } 1 − t i p ( t | w ) = i =1 Associated error function N � E ( w ) = − ln p ( t | w ) = − { t i ln y i + (1 − t i ) ln(1 − y i ) } i =1 the gradient is then N � ∇ E ( w ) = ( y i − t i ) φ i i =1 Henrik I Christensen (RIM@GT) Linear Bayes Classification 17 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Newton-Raphson Optimization We want to find an extremum of a function f ( . ) f ( x + ∆ x ) = f ( x ) + f ′ ( x )∆ x + 1 2 f ′′ ( x )∆ x 2 Extremum when ∆ x solves: f ′ ( x ) + f ′′ ( x )∆ x = 0 In vector form: x n +1 = x n − [ Hf ( x n )] − 1 ] ∇ f ( x n ) Henrik I Christensen (RIM@GT) Linear Bayes Classification 18 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Iterated reweighted least square Formulate the optimization problem as w ( τ +1) = w ( τ ) − H − 1 ∇ E ( w ) the gradient and Hessian are given by Φ T Φ w − Φ T t ∇ E ( w ) = Φ T Φ = H Solution is “obvious” w ( τ +1) = w ( τ ) − (Φ T Φ) − 1 � � Φ T Φ w − Φ T t which results w = (Φ T Φ) − 1 Φ T t this is the LSQ solution! Henrik I Christensen (RIM@GT) Linear Bayes Classification 19 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Optimization for the cross-entropy For the function E ( w ) Φ T ( y − t ) ∇ E ( w ) = Φ T R Φ H = where R is a diagonal matrix R nn = y n (1 − y n ) The regression/discrimination is then w ( τ +1) = (Φ T R Φ) − 1 Φ R � Φ w ( τ ) − R − 1 ( y − t ) � Henrik I Christensen (RIM@GT) Linear Bayes Classification 20 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Outline Introduction 1 Probabilistic Generative Models 2 Probabilistic Discriminative Models 3 Class Projects 4 Summary 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 21 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Class Projects - Examples Feature integration for robust detection Multi-recognition strategies Comparison of recognition methods Space Categorization Learning of obstacle avoidance strategy Henrik I Christensen (RIM@GT) Linear Bayes Classification 22 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Class Projects - II Problems: Novel new “research” - robotics/mobile/manipulation Comparative evaluation Integration of methods Aspects Modelling - what is a good/adequate model? What is a good benchmark/evaluation Evaluation of method - alone or in comparison Teaming 2-3 students in a group Henrik I Christensen (RIM@GT) Linear Bayes Classification 23 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Outline Introduction 1 Probabilistic Generative Models 2 Probabilistic Discriminative Models 3 Class Projects 4 Summary 5 Henrik I Christensen (RIM@GT) Linear Bayes Classification 24 / 25
Introduction Generative Models Prob. Disc. Models Class Projects Summary Summary Consideration of a Bayesian formulation for class discrimination For linear systems the LSQ a solution Iterative solutions Discussion of class projects Henrik I Christensen (RIM@GT) Linear Bayes Classification 25 / 25
Recommend
More recommend