Introduction to Machine Learning Introduction to machine learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Tom Mitchell, Eric Xing, Barnabas Poczos Yifeng Tao Carnegie Mellon University 1
Logistics o Course website: http://www.cs.cmu.edu/~yifengt/courses/machine-learning Slides uploaded after lecture o Time: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussion o Contact: yifengt@cs.cmu.edu Yifeng Tao Carnegie Mellon University 2
What is machine learning Natural language Computational Computer vision processing Biology Machine Learning Probability Statistics Calculus Linear algebra Yifeng Tao Carnegie Mellon University 3
Computer vision o Object detection [Figure from https://www.cvdeveloper.com/projects and Alex Krizhevsky et al. ] Yifeng Tao Carnegie Mellon University 4
Natural language processing o NER, translation, document classification… [Figure from Jacob Devlin et al. ] Yifeng Tao Carnegie Mellon University 5
Computational Biology o DNA-protein binding [Figure from Haoyang Zeng et al. ] Yifeng Tao Carnegie Mellon University 6
What is machine learning? o What are we talking when we talk about AI and ML? Machine learning Deep learning Artificial intelligence Yifeng Tao Carnegie Mellon University 7
What’s more after introduction? Probabilistic graphical models Deep learning Machine learning Conditional probability Learning theory Optimization Yifeng Tao Carnegie Mellon University 8
What is machine learning o Methods that can help generalize information from the observed data so that it can be used to make better decisions in the future. o Supervised learning: given a set of features and values Learn a model that will predict a label to a new feature set. o Regression: predict continuous values o Classification: predict discrete labels o Unsupervised learning: Discover patterns in data o And more! E.g., transfer learning, semi-supervised learning, reinforcement learning etc. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 9
Supervised learning o Goal of supervised learning: Construct a predictor to minimize a risk (performance measure) o Not minimizing empirical errors: o Training and test sets [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 10
Topics o Supervised learning: linear models o Kernel machines: SVMs and duality o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory o Neural network (basics) o Deep learning in CV and NLP o Probabilistic graphical models o Reinforcement learning and its application in clinical text mining o Attention mechanism and transfer learning in precision medicine Yifeng Tao Carnegie Mellon University 11
Supervised learning: linear models Yifeng Tao Lecture 1 May 13, 2019 Yifeng Tao Carnegie Mellon University 12
Example of regression o Predicting reviews of restaurant from factors i Price Distance Cuisine Review 1 30 21 7 4 2 15 12 8 2 3 27 53 9 5 [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 13
Empirical Risk Minimization (ERM) o More in the learning theory part… [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 14
Linear Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 15
Linear Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 16
Least Squares Estimator [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 17
Least Squares Estimator [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 18
Normal Equations [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 19
Cases A T A not invertible o Gene expression data: o n=20,000, p=50-4,000 o Regularization: Lasso [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 20
Geometric Interpretation [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 21
Pseudo Inverse (skip) [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 22
Pseudo Inverse (skip) [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 23
Polynomial Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 24
Maximum Likelihood Estimation (MLE) o Goal: estimate distribution parameters θ from a dataset of n independent, identically distributed (i.i.d.), fully observed training cases: o Maximum Likelihood Estimation (MLE): o One of the most common estimators o With iid and fully-observability assumptions: o Pick the setting of parameters most likely to have generated the data we saw: o Maximum conditional likelihood: [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 25
Least Squares and MLE [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 26
Least Squares and MLE o By independence assumption: o Therefore, [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 27
Regularized Least Squares o Recap the polynomial regression o Intuition of overfitting: very large weights o How to solve/alleviate it? [Figure from Christopher M. Bishop ] Yifeng Tao Carnegie Mellon University 28
Maximum a Posteriori Estimation (MAP) o The Bayesian theory: o The posterior equals to the likelihood times the prior, up to a constant o Maximum a posteriori estimator (MAP): o This allows us to capture uncertainty about the model in a principled way Yifeng Tao Carnegie Mellon University 29
Regularized Least Squares and MAP [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 30
Regularized Least Squares and MAP [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 31
Example of classification o Predicting reviews of restaurant from features i Price Distance Cuisine Review 1 30 21 7 Good 2 15 12 8 Bad 3 27 53 9 Good [Slide from Barnabas Poczos et al. ] Yifeng Tao Yifeng Tao Carnegie Mellon University Carnegie Mellon University 32 32
Logistic regression o Logistic/Sigmoid function: o p : the probability that y is 1 [Figure from Wikipedia ] Yifeng Tao Carnegie Mellon University 33
Logistic regression: MLE o The likelihood function is: o Therefore, the conditional log-likelihood function: o The -l(β) is also referred to as “cross-entropy loss” o Good new: l(β) is a concave function of β o Bad news: no closed-form solution to maximize l(β) o Solution: optimization algorithm (to be discussed) [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 34
Logistic regression: MAP o Gaussian prior of β: o Laplacian prior of β : Yifeng Tao Carnegie Mellon University 35
Maximize conditional likelihood: gradient ascent o Gradient ascent algorithm: iterate until change < ε: o This applies to linear regression as well, although there exist closed form. initial point initial point [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 36
Bayesian classifier i Price Distance Cuisine Review 1 30 21 7 Good 2 15 12 8 Bad 3 27 53 9 Good o Generative model vs discriminative model [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 37
Bayesian classifier X Yifeng Tao Carnegie Mellon University 38
Naïve Bayes o The Bayesian classifier requires large number of samples to train o Naïve Bayes assumes: o The X i are conditionally independent, given Y . o Therefore the classification rule for X new =(X 1 , …, X n ) is [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 39
Naïve Bayes Algorithm o Very fast to train/estimate! [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 40
Bag of words: model the documents o 8 th floor of Gates building, CMU [Figure from https://twitter.com/smithamilli/status/837153616116985856 ] Yifeng Tao Carnegie Mellon University 41
Document classification o The (independent) probability that the i-th word of a given document occurs in a document from class C: o The probability that a given document D contains all of the words given a class C : o What is the probability that a given document D belongs to a given class C ? [Slide from Wikipedia ] Yifeng Tao Carnegie Mellon University 42
Continuous X i in Naïve Bayes [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 43
Estimating parameters of GNB o Y discrete, X i continuous [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 44
Inference of Gaussian Naïve Bayes [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 45
Linear models in application o R: glmnet package o Comprehensive regression formats o Linear / Logistic / Cox regression… o Flexible penalty form o Ridge, Lasso, elastic net regression o Optimization algorithms with a lot of heuristics o E.g., coordinate descent, warm start… o Easy to analyze results in a few lines o Python: scikit-learn package Yifeng Tao Carnegie Mellon University 46
Recommend
More recommend