msc course
play

MSc Course MACHINE LEARNING TECHNIQUES AND APPLICATIONS - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING MSc Course MACHINE LEARNING TECHNIQUES AND APPLICATIONS Classification with GMM + Bayes 1 APPLIED MACHINE LEARNING Clustering, semi-supervised clustering and classification Classification Clustering


  1. APPLIED MACHINE LEARNING MSc Course MACHINE LEARNING TECHNIQUES AND APPLICATIONS Classification with GMM + Bayes 1

  2. APPLIED MACHINE LEARNING Clustering, semi-supervised clustering and classification Classification Clustering Semi-supervised clustering No labels for the points! Labels a faction of the points All points are labelled Labels class 1 Labels class 2 Unlabeled Use the labels to determine Group points according to the Use the labels to choose the boundary between the geometrical distribution of points hyperparameters of two classes clustering using F1-measure. 2

  3. APPLIED MACHINE LEARNING From Clustering to Classification Binary classification problem 3

  4. APPLIED MACHINE LEARNING From Clustering to Classification Solution of GMM clustering with two Gaussian functions with isotropic/spherical covariance Need to decide to which class each point belongs. What if the probability of belonging to several classes is not zero? 4

  5. APPLIED MACHINE LEARNING Gaussian Maximum Likelihood (ML) Discriminant Rule          N    N   1 2 | 1 ~ , | 2 ~ , p x y p x y 1 2        Boundary: all points such that | 1 | 2 x p x y p x y 5

  6. APPLIED MACHINE LEARNING Gaussian ML Discriminant Rule • 2-class problem, conditional densities to belong to classes y=1 and y=2:          T 1 1      1  1   1     x x 1 | 1 ~ , p x y N e 1   1/2 /2  N  1 2          1 1 T      2  2   2 x x     2 2 | 2 ~ , p x y N e   1/2  N /2  2 2 • To determine the class label, compute likelihood ratio (optimal Bayes classifier)  A new point x belong to class 1 if:        1| 2 | p y x p y x 6

  7. APPLIED MACHINE LEARNING Gaussian ML Discriminant Rule       | p x y i p y i      By Bayes: | , 1,2. p y i x i   p x        Assuming equal class distribution, 1 2 and replacing in (1) p y p y         | 1 | 1 p x y p x y       1 ln 0         | 2 | 2 p x y  p x y                1 1 T T                 1 1 1 1 2 2 2 2 log log x x x x        1| 2 | p y x p y x 7

  8. APPLIED MACHINE LEARNING From Clustering to Classification with GMM Example of binary classification using GMM + Bayes rule (isotropic Gaussian functions) Train each Gaussian separately, using dataset of Class 1 for Gaussian 1 and dataset of class 2 for Gaussian 2 8

  9. APPLIED MACHINE LEARNING From Clustering to Classification with GMM Example of binary classification using GMM + Bayes rule (diagonal Gaussian functions) Train each Gaussian separately, using dataset of Class 1 for Gaussian 1 and dataset of class 2 for Gaussian 2 9

  10. APPLIED MACHINE LEARNING From Clustering to Classification with GMM Example of binary classification using GMM + Bayes rule (full covariance Gaussian functions) Train each Gaussian separately, using dataset of Class 1 for Gaussian 1 and dataset of class 2 for Gaussian 2 10

  11. APPLIED MACHINE LEARNING Maximum Likelihood Discriminant Rule • A maximum likelihood classifier chooses the class label that is the most likely. • Conditional density that a data point x has associated class label y= k is:   ( ) ( | ) p x p x y k k • The maximum likelihood (ML) discriminant rule predicts the class of an observation x using:  ( ) argmax ( ) c x p x k k k 11

  12. APPLIED MACHINE LEARNING Gaussian ML Discriminant Rules • Muticlass problem with k=1…K classes, conditional densities for each class is a multivariate Gaussian:      N   | ~ , p x y k k k • ML discriminant rule is minimum of minus the log-likelihood (equiv. to maximizing the likelihood):          1 T         k k k k k ( ) arg min log C x x x k 12

  13. APPLIED MACHINE LEARNING Gaussian ML Discriminant Rules Example of 4-classes classification using four Gaussian distributions 13

  14. APPLIED MACHINE LEARNING Gaussian ML Discriminant Rules Example of 4-classes classification using four Gaussian distributions 14

  15. APPLIED MACHINE LEARNING Gaussian ML Discriminant Rules Example of 2-classes classification using 2 Gaussian distributions with equal covariance matrices 16

  16. APPLIED MACHINE LEARNING Classification with GMM-s Muti-class problem with l=1 … L classes, and each class is modeled with a GMM composed of K l multivariate Gaussian functions: l   K        | ~ , p x y l N k k k  1 k ML discriminant rule is minimum of minus the log-likelihood (equiv. to maximizing the likelihood):          ( ) argmin log | c x p x y l l l 17

  17. APPLIED MACHINE LEARNING Classification with GMM-s Example of binary classification using two Gaussian Mixture Models Train each Gaussian Mixture Model separately, using data set of Class 1 for GMM 1 and dataset of class 2 for GMM 2 (3 Gaussians for each GMM) 18

  18. APPLIED MACHINE LEARNING Practical Issues In practice, the population mean vectors  k and covariance matrices  k • are estimated from the training set.    k 1,... K  k k : the training set composed of data points per class X x n i  k 1,... i n k n 1    ˆ k Estimated mean = x x k k k n i  1 i k    n 1  ˆ T      k k Estimated Covariance S x x x x k k k k k n i i  1 i is also called the scatter matrix S k Leads to numerical imprecisions in the maximum likelihood discriminant rule. 19

  19. APPLIED MACHINE LEARNING Classification with two GMM per class. Each GMM has 2 Gaussians. Train each GMM (composed of 2 Gaussian each) separately, using data set of class 1 for the first GMM and dataset of class 2 for the second dataset 20

  20. APPLIED MACHINE LEARNING From clustering to classification Clustering with GMM  Clustering does not have the class labels and hence end-up merging the classes Classification with GMM using Naïve Bayes 21

  21. APPLIED MACHINE LEARNING Evaluating the classification 22

  22. APPLIED MACHINE LEARNING Estimating from sampling the datapoints Class 1 Class 2 If one trains the algorithm with all datapoints , one cannot test if the algorithm can predict well. To test the ability of the mode to predict correctly the class labels, one trains the model using only a subset of datapoints sampled randomly and one tests the prediction of the model on the datapoints not used during training . 23

  23. APPLIED MACHINE LEARNING Estimating from sampling the datapoints Class 1 Class 2 Sampled datapoints used for training Learned boundary between the classes Misclassified datapoint 1) Sample the datapoints 2) Train the algorithm on the sampled points 3) Test the prediction of the learned model on the rest of the points 24

  24. APPLIED MACHINE LEARNING Estimating from sampling the datapoints Class 1 Class 2 Sampled datapoints used for training Learned boundary between the classes Misclassified datapoint 1) Pick another sample of datapoints 2) Train the algorithm on the new sampled points 3) Test the prediction of the learned model on the rest of the points Crossvalidation: repeat training/testing procedure several times and compute average performance. 25

  25. APPLIED MACHINE LEARNING ML in Practice: Training and Evaluation Best practice to assess the validity of a Machine Learning algorithm is to measure its performance against the training and testing sets. These sets are built from partitioning the data set at hand. Crossvalidation Training Set Testing Set 26

  26. APPLIED MACHINE LEARNING ML in Practice: Training and Evaluation Training and validation sets are used to determine the sensitivity of the learning to the choice of hyperparameters (i.e. parameters not learned during training) . Values for the hyperparameters are set through a grid search. Once the optimal hyperparameters have been picked, the model is trained with complete training + validation set and tested on the testing set. Crossvalidation Training Set Validation Set Testing Set Crossvalidation In practice, one often uses solely training and testing sets and performs crossvalidation directly on these. 27

  27. APPLIED MACHINE LEARNING Crossvalidation “Cross Definition : validation is the practice of confirming an experimental finding by repeating the experiment using an independent assay technique" All dataset f-fold cross validation Random splits • Constant Train/Test ratio Test data Train data • At each iteration: f = 1 1) Random split of the data between Train and Test f = 2 F folds 2) Repetition of classification … • Averaging of the result across folds f = F 28

Recommend


More recommend