machine learning probabilistic knn
play

Machine Learning Probabilistic KNN. Mark Girolami - PowerPoint PPT Presentation

Machine Learning Probabilistic KNN. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow Probabilistic KNN June 21, 2007 p. 1/3 Probabilistic KNN KNN is a remarkably simple algorithm with proven


  1. Machine Learning Probabilistic KNN. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow Probabilistic KNN June 21, 2007 – p. 1/3

  2. Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates Probabilistic KNN June 21, 2007 – p. 2/3

  3. Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework Probabilistic KNN June 21, 2007 – p. 2/3

  4. Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework • No posterior probabilities of class membership Probabilistic KNN June 21, 2007 – p. 2/3

  5. Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework • No posterior probabilities of class membership • No way to infer number of neighbours or metric parameters probabilistically Probabilistic KNN June 21, 2007 – p. 2/3

  6. Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework • No posterior probabilities of class membership • No way to infer number of neighbours or metric parameters probabilistically • Let us try and get around this ’problem’ Probabilistic KNN June 21, 2007 – p. 2/3

  7. Probabilistic KNN • The first thing which is needed is a likelihood Probabilistic KNN June 21, 2007 – p. 3/3

  8. Probabilistic KNN • The first thing which is needed is a likelihood • Consider a finite data sample { ( t 1 , x 1 ) , · · · , ( t N , x N ) } where each t n ∈ { 1 , · · · , C } denotes the class label and D -dimensional feature vector x n ∈ R D . The feature space R D has an associated metric with parameters θ denoted as M θ . Probabilistic KNN June 21, 2007 – p. 3/3

  9. Probabilistic KNN • The first thing which is needed is a likelihood • Consider a finite data sample { ( t 1 , x 1 ) , · · · , ( t N , x N ) } where each t n ∈ { 1 , · · · , C } denotes the class label and D -dimensional feature vector x n ∈ R D . The feature space R D has an associated metric with parameters θ denoted as M θ . • A likelihood can be formed as � � M θ β � exp δ t n t j k j ∼ n | k � p ( t | X , β, k, θ , M ) ≈ � � M θ C n =1 β � � exp δ ct n k c =1 j ∼ n | k Probabilistic KNN June 21, 2007 – p. 3/3

  10. Probabilistic KNN • The number of nearest neighbours is k and β defines a scaling variable. The expression M θ � δ t n t j j ∼ n | k denotes the number of the nearest k neighbours of x n , as measured under the metric M θ within N − 1 samples from X remaining when x n is removed which we denote as X − n , and have the class label value of t n , whilst each of the terms in the summation of the denominator provides a count of the number of the k neighbours of x n which have class label equaling c . Probabilistic KNN June 21, 2007 – p. 4/3

  11. Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) Probabilistic KNN June 21, 2007 – p. 5/3

  12. Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) • This is a Leave-One-Out (LOO) predictive likelihood, where t − n denotes the vector t with the n ’th element removed Probabilistic KNN June 21, 2007 – p. 5/3

  13. Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) • This is a Leave-One-Out (LOO) predictive likelihood, where t − n denotes the vector t with the n ’th element removed • Approximate joint likelihood provides an overall measure of the LOO predictive likelihood Probabilistic KNN June 21, 2007 – p. 5/3

  14. Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) • This is a Leave-One-Out (LOO) predictive likelihood, where t − n denotes the vector t with the n ’th element removed • Approximate joint likelihood provides an overall measure of the LOO predictive likelihood • Should exhibit some resiliance to overfitting due to the LOO nature of the approximate likelihood Probabilistic KNN June 21, 2007 – p. 5/3

  15. Probabilistic KNN • Posterior inference will follow by obtaining the parameter posterior distribution p ( β, k, θ | t , X , M ) Probabilistic KNN June 21, 2007 – p. 6/3

  16. Probabilistic KNN • Posterior inference will follow by obtaining the parameter posterior distribution p ( β, k, θ | t , X , M ) • Predictions of the target class label t ∗ of a new datum x ∗ are made by posterior averaging such that p ( t ∗ | x ∗ , t , X , M ) equals � � p ( t ∗ | x ∗ , t , X , β, k, θ , M ) p ( β, k, θ | t , X , M ) dβd θ k Probabilistic KNN June 21, 2007 – p. 6/3

  17. Probabilistic KNN • Posterior inference will follow by obtaining the parameter posterior distribution p ( β, k, θ | t , X , M ) • Predictions of the target class label t ∗ of a new datum x ∗ are made by posterior averaging such that p ( t ∗ | x ∗ , t , X , M ) equals � � p ( t ∗ | x ∗ , t , X , β, k, θ , M ) p ( β, k, θ | t , X , M ) dβd θ k • Posterior takes an intractable form so MCMC procedure is proposed so that the following Monte-Carlo estimate is employed N s p ( t ∗ | x ∗ , t , X , M ) = 1 � p ( t ∗ | x ∗ , t , X , β ( s ) , k ( s ) , θ ( s ) , M ) ˆ N s s =1 Probabilistic KNN June 21, 2007 – p. 6/3

  18. Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm Probabilistic KNN June 21, 2007 – p. 7/3

  19. Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm • Assume priors on k and β are uniform over all possible values (integer & real) Probabilistic KNN June 21, 2007 – p. 7/3

  20. Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm • Assume priors on k and β are uniform over all possible values (integer & real) • Proposal distribution for β new is Gaussian i.e. N ( β ( i ) , h ) Probabilistic KNN June 21, 2007 – p. 7/3

  21. Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm • Assume priors on k and β are uniform over all possible values (integer & real) • Proposal distribution for β new is Gaussian i.e. N ( β ( i ) , h ) • Proposal distribution for k is uniform between Min & Max values index ∼ U (0 , k step + 1) k new = k old + k inc ( index ); Probabilistic KNN June 21, 2007 – p. 7/3

  22. Probabilistic KNN • Need to accept this new move using Metropolis ratio Probabilistic KNN June 21, 2007 – p. 8/3

  23. Probabilistic KNN • Need to accept this new move using Metropolis ratio � 1 , p ( t | X , β new , k new , θ new , M ) � min p ( t | X , β, k, θ , M ) Probabilistic KNN June 21, 2007 – p. 8/3

  24. Probabilistic KNN • Need to accept this new move using Metropolis ratio � 1 , p ( t | X , β new , k new , θ new , M ) � min p ( t | X , β, k, θ , M ) • Builds up a Markov Chain whose stationary distribution is p ( β, k, θ | t , X , M ) Probabilistic KNN June 21, 2007 – p. 8/3

  25. Probabilistic KNN • Need to accept this new move using Metropolis ratio � 1 , p ( t | X , β new , k new , θ new , M ) � min p ( t | X , β, k, θ , M ) • Builds up a Markov Chain whose stationary distribution is p ( β, k, θ | t , X , M ) • Very simple algorithm to implement - Matlab and C implementations available Probabilistic KNN June 21, 2007 – p. 8/3

  26. Probabilistic KNN • Trace of Metropolis Sampler for β & k 10 8 6 4 2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 4 x 10 100 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 4 x 10 Probabilistic KNN June 21, 2007 – p. 9/3

  27. Probabilistic KNN 18000 16000 28 14000 26 No. of SAMPLES 12000 24 10000 %CV−ERROR 22 8000 20 6000 18 4000 16 2000 14 0 0 10 20 30 40 50 60 70 80 90 12 0 10 20 30 40 50 60 70 80 90 K K Figure 1: The top graph shows a histogram of the marginal posterior for K on the synthetic Ripley dataset and the bottom shows the 10CV error against the value of K . Probabilistic KNN June 21, 2007 – p. 10/3

  28. Probabilistic KNN 18 PKNN KNN 16 Percentage Test Error 14 12 10 8 50 100 150 200 250 Size of Data Set Figure 2: The percentage test error obtained with training sets of varying size from 25 to 250 data points. For each sub-sample size, 50 random subsets were sampled and each of these used to obtain a KNN and PKNN classifier which were then used to make predictions on the 1000 independent test points. The mean percentage performance and associated standard error obtained for each training set are shown in the above figure for each classifier. Probabilistic KNN June 21, 2007 – p. 11/3

Recommend


More recommend