pattern recognition and machine learning
play

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any regularity in data X Pattern Recognition: Discovery of any regularity in data, through computer algorithms and takes actions (Such as


  1. PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION

  2. Pattern Recogniton Pattern: Any regularity in data X Pattern Recognition: Discovery of any regularity in data, through computer algorithms and takes actions (Such as classification). Y( X ) Pattern Recognition X

  3. Example Handwritten Digit Recognition

  4. Some Terminologies Supervised Learnings : Inputs with their corresponding outputs are known. • Classification : Predicting the output into a finite number of discrete categories after supervised learning. • Regression : Predicting the output as a continuous variable after supervised learning Unsupervised learning (density estimation) : • Clustering the data into groups

  5. Some Terminologies Training set: A given set of sample input data used to tune the model parameter. Target vector: Represents the desired output for a given inputs. Training Phase: Determining the precise from of y( X ) based on training data. Generalization: The ability to correctly predict new data. Pre-processing: Reduction of dimension of X

  6. Polynomial Curve Fitting

  7. Sum-of-Squares Error Function

  8. 0 th Order Polynomial

  9. 1 st Order Polynomial

  10. 3 rd Order Polynomial

  11. 9 th Order Polynomial

  12. Over-fitting Root-Mean-Square (RMS) Error:

  13. Polynomial Coefficients

  14. Data Set Size: 9 th Order Polynomial

  15. Data Set Size: 9 th Order Polynomial

  16. Regularization Penalize large coefficient values

  17. Regularization:

  18. Regularization:

  19. Regularization: vs.

  20. Polynomial Coefficients

  21. Probability Theory Apples and Oranges

  22. Probability Theory Marginal Probability Joint Probability Conditional Probability

  23. Probability Theory Sum Rule Product Rule

  24. The Rules of Probability Sum Rule Product Rule

  25. Bayes ’ Theorem posterior  likelihood × prior

  26. Probability Densities

  27. Transformed Densities

  28. Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

  29. Variances and Covariances

  30. The Gaussian Distribution

  31. Gaussian Mean and Variance

  32. The Multivariate Gaussian

  33. Gaussian Parameter Estimation Likelihood function

  34. Maximum (Log) Likelihood

  35. Properties of and

  36. Curve Fitting Re-visited

  37. Maximum Likelihood Determine by minimizing sum-of-squares error, .

  38. Predictive Distribution

  39. MAP: A Step towards Bayes Determine by minimizing regularized sum-of-squares error, .

  40. Bayesian Curve Fitting

  41. Bayesian Predictive Distribution

  42. Model Selection Cross-Validation

  43. Curse of Dimensionality

  44. Curse of Dimensionality Polynomial curve fitting, M = 3 Gaussian Densities in higher dimensions

  45. Decision Theory Inference step Determine either or . Decision step For given x , determine optimal t .

  46. Minimum Misclassification Rate

  47. Minimum Expected Loss Example: classify medical images as ‘ cancer ’ or ‘ normal ’ Decision Truth

  48. Minimum Expected Loss Regions are chosen to minimize

  49. Reject Option

  50. Why Separate Inference and Decision? • Minimizing risk (loss matrix may change over time) • Reject option • Unbalanced class priors • Combining models

  51. Decision Theory for Regression Inference step Determine . Decision step For given x , make optimal prediction, y ( x ) , for t . Loss function:

  52. The Squared Loss Function

  53. Generative vs Discriminative Generative approach: Model Use Bayes ’ theorem Discriminative approach: Model directly

  54. Entropy Important quantity in • coding theory • statistical physics • machine learning

  55. Entropy Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x ? All states equally likely

  56. Entropy

  57. Entropy In how many ways can N identical objects be allocated M bins? Entropy maximized when

  58. Entropy

  59. Differential Entropy Put bins of width ¢ along the real line Differential entropy maximized (for fixed ) when in which case

  60. Conditional Entropy

  61. The Kullback-Leibler Divergence

  62. Mutual Information

Recommend


More recommend