Introduction to machine learning Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to machine learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Tom Mitchell, Eric Xing, Barnabas Poczos Yifeng Tao Carnegie Mellon University 1

Logistics o Course website: http://www.cs.cmu.edu/~yifengt/courses/machine-learning Slides uploaded after lecture o Time: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussion o Contact: yifengt@cs.cmu.edu Yifeng Tao Carnegie Mellon University 2

What is machine learning Natural language Computational Computer vision processing Biology Machine Learning Probability Statistics Calculus Linear algebra Yifeng Tao Carnegie Mellon University 3

Computer vision o Object detection [Figure from https://www.cvdeveloper.com/projects and Alex Krizhevsky et al. ] Yifeng Tao Carnegie Mellon University 4

Natural language processing o NER, translation, document classification… [Figure from Jacob Devlin et al. ] Yifeng Tao Carnegie Mellon University 5

Computational Biology o DNA-protein binding [Figure from Haoyang Zeng et al. ] Yifeng Tao Carnegie Mellon University 6

What is machine learning? o What are we talking when we talk about AI and ML? Machine learning Deep learning Artificial intelligence Yifeng Tao Carnegie Mellon University 7

What’s more after introduction? Probabilistic graphical models Deep learning Machine learning Conditional probability Learning theory Optimization Yifeng Tao Carnegie Mellon University 8

What is machine learning o Methods that can help generalize information from the observed data so that it can be used to make better decisions in the future. o Supervised learning: given a set of features and values Learn a model that will predict a label to a new feature set. o Regression: predict continuous values o Classification: predict discrete labels o Unsupervised learning: Discover patterns in data o And more! E.g., transfer learning, semi-supervised learning, reinforcement learning etc. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 9

Supervised learning o Goal of supervised learning: Construct a predictor to minimize a risk (performance measure) o Not minimizing empirical errors: o Training and test sets [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 10

Topics o Supervised learning: linear models o Kernel machines: SVMs and duality o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory o Neural network (basics) o Deep learning in CV and NLP o Probabilistic graphical models o Reinforcement learning and its application in clinical text mining o Attention mechanism and transfer learning in precision medicine Yifeng Tao Carnegie Mellon University 11

Supervised learning: linear models Yifeng Tao Lecture 1 May 13, 2019 Yifeng Tao Carnegie Mellon University 12

Example of regression o Predicting reviews of restaurant from factors i Price Distance Cuisine Review 1 30 21 7 4 2 15 12 8 2 3 27 53 9 5 [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 13

Empirical Risk Minimization (ERM) o More in the learning theory part… [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 14

Linear Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 15

Linear Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 16

Least Squares Estimator [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 17

Least Squares Estimator [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 18

Normal Equations [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 19

Cases A T A not invertible o Gene expression data: o n=20,000, p=50-4,000 o Regularization: Lasso [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 20

Geometric Interpretation [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 21

Pseudo Inverse (skip) [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 22

Pseudo Inverse (skip) [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 23

Polynomial Regression [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 24

Maximum Likelihood Estimation (MLE) o Goal: estimate distribution parameters θ from a dataset of n independent, identically distributed (i.i.d.), fully observed training cases: o Maximum Likelihood Estimation (MLE): o One of the most common estimators o With iid and fully-observability assumptions: o Pick the setting of parameters most likely to have generated the data we saw: o Maximum conditional likelihood: [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 25

Least Squares and MLE [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 26

Least Squares and MLE o By independence assumption: o Therefore, [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 27

Regularized Least Squares o Recap the polynomial regression o Intuition of overfitting: very large weights o How to solve/alleviate it? [Figure from Christopher M. Bishop ] Yifeng Tao Carnegie Mellon University 28

Maximum a Posteriori Estimation (MAP) o The Bayesian theory: o The posterior equals to the likelihood times the prior, up to a constant o Maximum a posteriori estimator (MAP): o This allows us to capture uncertainty about the model in a principled way Yifeng Tao Carnegie Mellon University 29

Regularized Least Squares and MAP [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 30

Regularized Least Squares and MAP [Slide from Barnabas Poczos et al. ] Yifeng Tao Carnegie Mellon University 31

Example of classification o Predicting reviews of restaurant from features i Price Distance Cuisine Review 1 30 21 7 Good 2 15 12 8 Bad 3 27 53 9 Good [Slide from Barnabas Poczos et al. ] Yifeng Tao Yifeng Tao Carnegie Mellon University Carnegie Mellon University 32 32

Logistic regression o Logistic/Sigmoid function: o p : the probability that y is 1 [Figure from Wikipedia ] Yifeng Tao Carnegie Mellon University 33

Logistic regression: MLE o The likelihood function is: o Therefore, the conditional log-likelihood function: o The -l(β) is also referred to as “cross-entropy loss” o Good new: l(β) is a concave function of β o Bad news: no closed-form solution to maximize l(β) o Solution: optimization algorithm (to be discussed) [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 34

Logistic regression: MAP o Gaussian prior of β: o Laplacian prior of β : Yifeng Tao Carnegie Mellon University 35

Maximize conditional likelihood: gradient ascent o Gradient ascent algorithm: iterate until change < ε: o This applies to linear regression as well, although there exist closed form. initial point initial point [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 36

Bayesian classifier i Price Distance Cuisine Review 1 30 21 7 Good 2 15 12 8 Bad 3 27 53 9 Good o Generative model vs discriminative model [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 37

Bayesian classifier X Yifeng Tao Carnegie Mellon University 38

Naïve Bayes o The Bayesian classifier requires large number of samples to train o Naïve Bayes assumes: o The X i are conditionally independent, given Y . o Therefore the classification rule for X new =(X 1 , …, X n ) is [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 39

Naïve Bayes Algorithm o Very fast to train/estimate! [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 40

Bag of words: model the documents o 8 th floor of Gates building, CMU [Figure from https://twitter.com/smithamilli/status/837153616116985856 ] Yifeng Tao Carnegie Mellon University 41

Document classification o The (independent) probability that the i-th word of a given document occurs in a document from class C: o The probability that a given document D contains all of the words given a class C : o What is the probability that a given document D belongs to a given class C ? [Slide from Wikipedia ] Yifeng Tao Carnegie Mellon University 42

Continuous X i in Naïve Bayes [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 43

Estimating parameters of GNB o Y discrete, X i continuous [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 44

Inference of Gaussian Naïve Bayes [Slide from Tom Mitchell et al. ] Yifeng Tao Carnegie Mellon University 45

Linear models in application o R: glmnet package o Comprehensive regression formats o Linear / Logistic / Cox regression… o Flexible penalty form o Ridge, Lasso, elastic net regression o Optimization algorithms with a lot of heuristics o E.g., coordinate descent, warm start… o Easy to analyze results in a few lines o Python: scikit-learn package Yifeng Tao Carnegie Mellon University 46

Introduction to machine learning Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to machine learning Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Tom Mitchell, Eric Xing, Barnabas Poczos Yifeng Tao Carnegie Mellon University 1

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Chinese investors in Serbia: Case study Speaker: Ivana Kopilovi , Attorney at Law Kopilovic

Advanced Taxation 27.09.20 Virtual Lecture 1 Content / agenda 1. Welcome 2. Overview Weeks 1 4

CO COVID 1 19: 8 8 Key GST/ T/HST ST Issues es Ro Rosem emary An y Anderson son Terry

First thing first: Select a Track You should have selected a track by now. The deadline to

Multiple Regression Analysis Independent Variables Mechanics and Interpretation of OLS

Differential Privacy for Regularised Linear Regression (slides) Ashish Dandekar, Debabrota Basu,

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

CS 147: Computer Systems Performance Analysis Linear Regression Models 1 / 32 Overview CS147