pattern recognition
play

Pattern Recognition Bertrand Thirion and John Ashburner Bertrand - PowerPoint PPT Presentation

Introduction Generalization Overview of the main methods Resources Pattern Recognition Bertrand Thirion and John Ashburner Bertrand Thirion and John Ashburner Pattern Recognition Introduction Definitions Generalization Classification and


  1. Introduction Generalization Overview of the main methods Resources Pattern Recognition Bertrand Thirion and John Ashburner Bertrand Thirion and John Ashburner Pattern Recognition

  2. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Bertrand Thirion and John Ashburner Pattern Recognition

  3. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Some key concepts supervised learning : The data comes with additional attributes that we want to predict = ⇒ classification and regression. unsupervised learning : No target values. Discover groups of similar examples within the data (clustering). Determine the distribution of data within the input space (density estimation). Project the data down to two or three dimensions for visualization. Bertrand Thirion and John Ashburner Pattern Recognition

  4. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources General supervised learning setting We have a training dataset of n observations, each consisting of an input x i and a target y i . Each input, x i , consists of a vector of p features. D = { ( x i , y i ) | i = 1 , .., n } The aim is to predict the target for a new input x ∗ . Bertrand Thirion and John Ashburner Pattern Recognition

  5. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Classification Classification −1 −2 Targets ( y ) are categorical −3 Feature 2 labels. −4 Train with D and use result to make best guess −5 of y ∗ given x ∗ . −6 −7 0 2 4 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

  6. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Probabilistic classification Probabilistic classification −1 −2 −3 Targets ( y ) are categorical Feature 2 labels. −4 Train with D and compute −5 P ( y ∗ = k | x ∗ , D ). −6 −7 0 2 4 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

  7. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Regression 63 55 70 Targets ( y ) are continuous Feature 2 60 27 real variables. 31 50 35 Train with D and compute 40 31 p ( y ∗ | x ∗ , D ). 58 30 14 23 20 14 10 0 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

  8. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Many other settings Multi-class classification when there are more than two possible categories. Ordinal regression for classification when there is some ordering of the categories. Chu, Wei, and Zoubin Ghahramani. “Gaussian processes for ordinal regression.” In Journal of Machine Learning Research, pp. 1019-1041. 2005. Multi-task learning when there are multiple targets to predict, which may be related. etc Bertrand Thirion and John Ashburner Pattern Recognition

  9. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Multi-Class classification Multinomial Logistic regression Theoretically optimal. Expensive optimization. One-versus-all classification [SVMs] Among several hyperplane, choose the one with maximal margin. = ⇒ recommended One-versus-one classification Vote across each pair of class. Expensive, not optimal. Bertrand Thirion and John Ashburner Pattern Recognition

  10. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Curse of dimensionality Large p , small n . Bertrand Thirion and John Ashburner Pattern Recognition

  11. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Nearest-neighbour classification 2 Not nice 1 smooth separations. Feature 2 Lots of sharp 0 corners. May be −1 improved with K-nearest −2 neighbours . −3 −2 −1 0 1 2 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

  12. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Behaviour changes in high-dimensions Bertrand Thirion and John Ashburner Pattern Recognition

  13. Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Behaviour changes in high-dimensions 1 0.9 0.8 Circle area = π r 2 0.7 Volume of hyper−sphere (r=1/2) 0.6 Sphere volume = 4/3 π r 3 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 20 Number of dimensions Bertrand Thirion and John Ashburner Pattern Recognition

  14. Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Occam’s razor “Everything should be kept as simple as possible, but no simpler.” — Einstein (allegedly) Complex models (with many estimated parameters) usually explain training data better than simpler models. Simpler models often generalise better to new data than nore complex models. Need to find the model with the optimal bias/variance tradeoff. Bertrand Thirion and John Ashburner Pattern Recognition

  15. Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Bayesian model selection Real Bayesians don’t cross-validate (except when they need to). P ( M |D ) = p ( D| M ) P ( M ) p ( D ) The Bayes factor allows the plausibility of two models ( M 1 and M 2 ) to be compared: � θ M 1 p ( D| θ M 1 , M 1 ) p ( θ M 1 | M 1 ) d θ M 1 K = p ( D| M 1 ) p ( D| M 2 ) = � θ M 2 p ( D| θ M 2 , M 2 ) p ( θ M 2 | M 2 ) d θ M 2 This is usually too costly in practice, so approximations are used. Bertrand Thirion and John Ashburner Pattern Recognition

  16. Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Model selection Some approximations/alternatives to the Bayesian approach: Laplace approximations : find the MAP/ML solution and use a Gaussian approximation to the parameter uncertainty. Minimum Message Length (MML): an information theoretic approach. Minimum Description Length (MDL): an information theoretic approach based on how well the model compresses the data. Akaike Information Criterion (AIC): − 2 log p ( D| θ ) + 2 k , where k is the number of estimated parameters. Bayesian Information Criterion (BIC): − 2 log p ( D| θ ) + k log q , where q is the number of observations. Bertrand Thirion and John Ashburner Pattern Recognition

  17. Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Model selection by nested cross-validation Inner cross-validation loop used to evaluate model’s performance on a pre-defined grid of parameters and retain the best one. Safe, but costly. Supported by some libraries (e.g. scikit-learn). Some estimators have path model, hence allow faster evaluation (e.g. LASSO). Randomized techniques also exist, sometimes more efficient. Caveat: Inner cross-validation loop � = outer cross-validation loop for parameter evaluation. Bertrand Thirion and John Ashburner Pattern Recognition

  18. Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Accuracy measures for regression Root-mean squared error for point predictions. Correlation coefficient for point predictions. Log predictive probability can be used for probabilistic predictions. Expected loss/risk for point predictions for decision making. Bertrand Thirion and John Ashburner Pattern Recognition

  19. Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Accuracy measures for binary classification Wikipedia contributors, “Sensitivity and specificity,” Wikipedia, The Free Encyclopedia, http: //en.wikipedia.org/w/index. php?title=Sensitivity_and_ specificity&oldid=655245669 (accessed April 9, 2015). Bertrand Thirion and John Ashburner Pattern Recognition

  20. Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Accuracy measures from ROC curve ROC Curve (AUC=0.9769) 1 The Receiver operating characteristic (ROC) curve is a 0.8 plot of true-positive rate (sensitivity) versus false-positive 0.6 Sensitivity rate (1-specificity) over the full range of possible thresholds. 0.4 The area under the curve 0.2 (AUC) is the integral under the ROC curve. 0 0.2 0.4 0.6 0.8 1 1−Specificity Bertrand Thirion and John Ashburner Pattern Recognition

Recommend


More recommend