Lecture 1: Linear Regression Princeton University COS 495 - PowerPoint PPT Presentation

Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang

Machine learning basics

What is machine learning? • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.” ------- Machine Learning , Tom Mitchell, 1997

Example 1: image classification Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification

Example 1: image classification Experience/Data: images with labels indoor Indoor outdoor

Example 1: image classification • A few terminologies • Training data: the images given for learning • Test data: the images to be classified • Binary classification: classify into two classes

Example 1: image classification (multi-class) ImageNet figure borrowed from vision.standford.edu

Example 2: clustering images Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images

Example 2: clustering images • A few terminologies • Unlabeled data vs labeled data • Supervised learning vs unsupervised learning

Math formulation Feature vector: 𝑦 𝑗 Color Histogram Extract features Label: 𝑧 𝑗 Red Green Blue Indoor 0

Math formulation Feature vector: 𝑦 𝑘 Color Histogram Extract features Label: 𝑧 𝑘 Red Green Blue outdoor 1

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) using training data • s.t. 𝑔 correct on test data What kind of functions?

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data Hypothesis class

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data Connection between training data and test data?

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸 They have the same distribution i.i.d.: independently identically distributed

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸 What kind of performance measure?

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] Various loss functions

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] • Examples of loss functions: • 0-1 loss: 𝑚 𝑔, 𝑦, 𝑧 = 𝕁[𝑔 𝑦 ≠ 𝑧] and 𝑀 𝑔 = Pr[𝑔 𝑦 ≠ 𝑧] • 𝑚 2 loss: 𝑚 𝑔, 𝑦, 𝑧 = [𝑔 𝑦 − 𝑧] 2 and 𝑀 𝑔 = 𝔽[𝑔 𝑦 − 𝑧] 2

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] How to use?

Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 that minimizes ෠ 𝑜 𝑜 σ 𝑗=1 𝑀 𝑔 = 𝑚(𝑔, 𝑦 𝑗 , 𝑧 𝑗 ) • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] Empirical loss

Machine learning 1-2-3 • Collect data and extract features • Build model: choose hypothesis class 𝓘 and loss function 𝑚 • Optimization: minimize the empirical loss

Wait… • Why handcraft the feature vectors 𝑦, 𝑧 ? • Can use prior knowledge to design suitable features • Can computer learn the features on the raw images? • Learn features directly on the raw images: Representation Learning • Deep Learning ⊆ Representation Learning ⊆ Machine Learning ⊆ Artificial Intelligence

Wait… • Does MachineLearning-1-2-3 include all approaches? • Include many but not all • Our current focus will be MachineLearning-1-2-3

Example: Stock Market Prediction Stock Market (Disclaimer: synthetic data/in another parallel universe) Orange MacroHard Ackermann 2013 2014 2015 2016 Sliding window over time: serve as input 𝑦 ; non-i.i.d.

Linear regression

𝑧 : prostate specific antigen Real data: Prostate Cancer by Stamey et al. (1989) Figure borrowed from The Elements of Statistical Learning (𝑦 1 , … , 𝑦 8 ) : clinical measures

Linear regression • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 𝑥 𝑦 = 𝑥 𝑈 𝑦 that minimizes ෠ 𝑜 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 • Find 𝑔 𝑜 σ 𝑗=1 𝑀 𝑔 𝑥 = 𝑚 2 loss; also called mean square error Hypothesis class 𝓘

Linear regression: optimization • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 𝑥 𝑦 = 𝑥 𝑈 𝑦 that minimizes ෠ 𝑜 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 • Find 𝑔 𝑜 σ 𝑗=1 𝑀 𝑔 𝑥 = 𝑈 , 𝑧 be the vector 𝑧 1 , … , 𝑧 𝑜 𝑈 • Let 𝑌 be a matrix whose 𝑗 -th row is 𝑦 𝑗 𝑜 𝑥 = 1 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 = 1 ෠ 2 𝑀 𝑔 𝑜 ෍ 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦ 2 𝑗=1

Linear regression: optimization • Set the gradient to 0 to get the minimizer 1 2 = 0 𝑥 ෠ 𝛼 𝑀 𝑔 𝑥 = 𝛼 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦ 2 𝑥 𝑥 [ 𝑌𝑥 − 𝑧 𝑈 (𝑌𝑥 − 𝑧)] = 0 𝛼 𝑥 [ 𝑥 𝑈 𝑌 𝑈 𝑌𝑥 − 2𝑥 𝑈 𝑌 𝑈 𝑧 + 𝑧 𝑈 𝑧] = 0 𝛼 2𝑌 𝑈 𝑌𝑥 − 2𝑌 𝑈 𝑧 = 0 w = 𝑌 𝑈 𝑌 −1 𝑌 𝑈 𝑧

Linear regression: optimization • Algebraic view of the minimizer • If 𝑌 is invertible, just solve 𝑌𝑥 = 𝑧 and get 𝑥 = 𝑌 −1 𝑧 • But typically 𝑌 is a tall matrix 𝑥 = 𝑥 = 𝑌 𝑈 𝑌 𝑌 𝑈 𝑧 𝑌 𝑧 Normal equation: w = 𝑌 𝑈 𝑌 −1 𝑌 𝑈 𝑧

Linear regression with bias Bias term • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 𝑥,𝑐 𝑦 = 𝑥 𝑈 𝑦 + 𝑐 to minimize the loss • Find 𝑔 • Reduce to the case without bias: • Let 𝑥 ′ = 𝑥; 𝑐 , 𝑦 ′ = 𝑦; 1 𝑥,𝑐 𝑦 = 𝑥 𝑈 𝑦 + 𝑐 = 𝑥 ′ 𝑈 (𝑦 ′ ) • Then 𝑔

Lecture 1: Linear Regression Princeton University COS 495 - PowerPoint PPT Presentation

Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang Machine learning basics What is machine learning? A computer program is said to learn from experience E with respect to some

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Simple Decision Rules for Classifying Human Cancers from Gene Expression Profiles Aik Choon TAN

1 INTRODUCTION A common challenge faced by an analytical chemist is the determination of

The Prostate Cancer Consensus: Myriad MDx Health Smarter Screening, Smarter Treatment

Meeting November 18 th , 2015 | Seattle Public Library Agenda Chair Report Action Item :

Chapters 1 & 2. Introduction & Overview Wei Pan Division of Biostatistics, School of

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

Best Practices for Managing Centralized Drug and Regimen Content Streamlining Clinical Workflows

Model-based clustering with mixed/missing data using the new software MixtComp

Lecture 1: Linear Regression Princeton University COS 495 - PowerPoint PPT Presentation

Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang Machine learning basics What is machine learning? A computer program is said to learn from experience E with respect to some

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

CSC321 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC321 Lecture 2: Linear

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Simple Decision Rules for Classifying Human Cancers from Gene Expression Profiles Aik Choon TAN

1 INTRODUCTION A common challenge faced by an analytical chemist is the determination of

The Prostate Cancer Consensus: Myriad MDx Health Smarter Screening, Smarter Treatment

Meeting November 18 th , 2015 | Seattle Public Library Agenda Chair Report Action Item :

Chapters 1 &amp; 2. Introduction &amp; Overview Wei Pan Division of Biostatistics, School of

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

Best Practices for Managing Centralized Drug and Regimen Content Streamlining Clinical Workflows

Model-based clustering with mixed/missing data using the new software MixtComp

Chapters 1 & 2. Introduction & Overview Wei Pan Division of Biostatistics, School of