relevance vector machines
play

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 - PowerPoint PPT Presentation

Outline Introduction Relevance Vector Machines Examples Summary Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector Machines Outline Introduction Relevance Vector Machines Examples Summary


  1. Outline Introduction Relevance Vector Machines Examples Summary Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector Machines

  2. Outline Introduction Relevance Vector Machines Examples Summary Introduction Support Vector Machines Relevance Vector Machines Model / Regression Marginal Likelihood Classification Examples Regression Classification Summary Relevance vector machines Exercise Jukka Lankinen Relevance Vector Machines

  3. Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary Introduction ◮ The relevance vector machine (RVM) is a bayesian sparse kernel technique for regression and classification ◮ Solves some problems with the support vector machines (SVM) ◮ Used in detection and classification. Detecting cancer cells, classificating DNA sequences... etc. Jukka Lankinen Relevance Vector Machines

  4. Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary Support Vector Machines (SVM) ◮ A non-probabilistic decision machine. Returns point estimate for regression and binary decision for classification. ◮ Makes decisions based on the function: y ( x ; w ) = w i K ( x , x i ) + w 0 (1) ◮ where K is the kernel function and w 0 is the bias. ◮ Attempts to minimize the error while simultaneously maximize the margin between the two classes. Jukka Lankinen Relevance Vector Machines

  5. Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary Support Vector Machines (SVM) y = − 1 y = 1 y = 0 y = 0 y = − 1 y = 1 margin Jukka Lankinen Relevance Vector Machines

  6. Outline Introduction Relevance Vector Machines Support Vector Machines Examples Summary SVM Problems ◮ The number of required support vectors typically grows linearly with the size of the training set ◮ Non-probabilistic predictions. ◮ Requires estimation of error/margin trade-off parameters ◮ K ( x , x i ) must satisfy mercel’s condition. Jukka Lankinen Relevance Vector Machines

  7. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Relevance Vector Machines ◮ Apply bayesian treatment to SVM. ◮ Associates a prior over the model weights governed by a set of hyperparameters. ◮ Posterior distributions of the majority of weights are peaked around zero. Training vectors associated with the non-zero weights are the ’relevance vectors’. ◮ Typically utilizes fewer kernel functions than SVM. Jukka Lankinen Relevance Vector Machines

  8. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model ◮ For given data set of input-target pairs { x n , t n } N n =1 t n = y ( x n ; w ) + ǫ n (2) ◮ where ǫ n are samples from some noise process which is assumed to be mean-zero Gaussian with variance σ 2 . Thus, p ( t n | x ) = N ( t n | y ( x n ) , σ 2 ) (3) Jukka Lankinen Relevance Vector Machines

  9. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model (cont.) ◮ encode sparsity in the prior. N � N ( w i | 0 , α − 1 p ( w | α ) = ) (4) i i =0 ◮ which is Gaussian, but conditioned on α . ◮ we must define hyperpriors over all α m to complete the specification of hierarchical prior: � p ( w m | α m ) p ( α m ) d α m p ( w m ) = (5) Jukka Lankinen Relevance Vector Machines

  10. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Regression ◮ The model has independent Gaussian noise: t n ∼ N ( y ( x n ; w ) , σ 2 ) ◮ Corresponding likelihood: � − 1 � p ( t | w , σ 2 ) = (2 πσ 2 ) − N / 2 exp 2 σ 2 � t − Φ w � 2 (6) ◮ where t = ( t q , ..., t N ), w = ( w q , ..., w M ) and Φ is the NxM ’design’ matrix with Φ n m = φ m ( x n ) Jukka Lankinen Relevance Vector Machines

  11. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model (cont.) ◮ The desired posterior over all unknowns: p ( w , α, σ 2 | t ) = p ( t | w , α, σ 2 ) p ( w , α, σ 2 ) (7) p ( t ) ◮ When given a new test point, x ∗ , predictions are made for the corresponding target t ∗ , in terms of predictive distribution: � p ( t ∗ | w , α, σ 2 ) p ( w , α, σ 2 | t ) dwd α d σ 2 p ( t ∗ | t ) = (8) ◮ But we have a problem here. We cannot perform these computations analytically. Approximations are needed. Jukka Lankinen Relevance Vector Machines

  12. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary The model (cont.) ◮ We need to decompose the posterior as: p ( w , α, σ 2 | t ) = p ( w | t , α, σ 2 ) p ( α, σ 2 | t ) (9) ◮ And so, the posterior distribution over the weights is: p ( w | t , α, σ 2 ) = p ( t | w , α, σ 2 ) p ( w | α ) ∼ N ( w | µ, Σ) (10) p ( t | α, σ 2 ) ◮ where Σ = ( σ − 2 Φ T Φ + A ) − 1 (11) µ = σ − 2 ΣΦ T t (12) Jukka Lankinen Relevance Vector Machines

  13. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Marginal Likelihood ◮ Marginal Likelihood can be written as � p ( t | α, σ 2 ) = p ( t | w , σ 2 ) p ( w | α ) dw (13) ◮ Maximizing the marginal likelyhood function is known as the type-II maximum likelihood method. ◮ We must optimize p ( t | α, σ 2 ). There are a few ways to do this. Jukka Lankinen Relevance Vector Machines

  14. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Marginal Likelihood optimization ◮ Maximizes (13) with iterative re-estimation. ◮ Differentiating logp ( t | α, σ 2 ) gives iterative re-estimation approach: = γ i α new (14) i µ 2 i ( σ 2 ) new = � t − Φ µ � 2 (15) N − Σ M i =1 γ i ◮ where we have defined quantities as γ i = 1 − α i Σ ii . γ i is a measure of how ’well-determined’ is the parameter w i Jukka Lankinen Relevance Vector Machines

  15. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary RVMs for classification ◮ The likelihood P ( t | w ) is now Bernoulli: N � n [1 − g { y ( x n ; w ) } ] 1 − t n g { y ( x n ; w ) } t P ( t | w ) = (16) n =1 ◮ with g ( y ) = 1 / (1 + e − y ) the sigmoid function. ◮ No noise variance, same sparse prior as regression. ◮ Unlike regression, The weight posteriors p ( w | t , α ) cannot be obtained analytically. Approximations are once again needed. Jukka Lankinen Relevance Vector Machines

  16. Outline Introduction Model / Regression Relevance Vector Machines Marginal Likelihood Examples Classification Summary Gaussian posterior approximation ◮ Find posterior mode w M P for current values of α by using optimization ◮ Compute Hessian ◮ Negate and invert to give the covariance for a gaussian approximation p ( w | t , α ) ≈ N ( w M P , Σ) ◮ α are updated using µ and Σ. Jukka Lankinen Relevance Vector Machines

  17. Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Regression Example ◮ ’sinc’ function: sinc ( x ) = sin ( x ) / x ◮ Linear spline kernel: K ( x m , x n ) = min ( x m , x m ) 2 + min ( x m , x n ) 3 1 + x m x n + x m x n min ( x m , x n ) − x m + x n 2 3 ◮ with ǫ = 0 . 01, 100 uniform, noise-free samples. Jukka Lankinen Relevance Vector Machines

  18. Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Regression Example Jukka Lankinen Relevance Vector Machines

  19. Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Regression Example Jukka Lankinen Relevance Vector Machines

  20. Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Classification Example ◮ Ripley’s synthetic data ◮ Gaussian kernel: K ( x m , x n ) = exp ( − r − 2 ) � x m − x n � 2 ◮ with r = 0 . 5 Jukka Lankinen Relevance Vector Machines

  21. Outline Introduction Regression Relevance Vector Machines Classification Examples Summary RVM Classification Example Jukka Lankinen Relevance Vector Machines

  22. Outline Introduction Relevance vector machines Relevance Vector Machines Exercise Examples Summary Summary ◮ Sparsity: the prediction of new inputs depend on the kernel function evaluated at a subset of the training data points. ◮ TODO ◮ More detailed explanation in the original publication: Tipping M., Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research 1, 2001, pp. 211-244 Jukka Lankinen Relevance Vector Machines

  23. Outline Introduction Relevance vector machines Relevance Vector Machines Exercise Examples Summary Exercise ◮ Fetch Tipping’s matlab toolbox for sparse bayes from http: //www.vectoranomaly.com/downloads/downloads.htm . ◮ Try SparseBayesDemo.m with different likelihood models (Gaussian, Bernoulli...) and familiarize yourself with the toolbox ◮ Try to replicate results from the regression example. Jukka Lankinen Relevance Vector Machines

Recommend


More recommend