lecture 1 linear regression
play

Lecture 1: Linear Regression Princeton University COS 495 - PowerPoint PPT Presentation

Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang Machine learning basics What is machine learning? A computer program is said to learn from experience E with respect to some


  1. Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang

  2. Machine learning basics

  3. What is machine learning? • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.” ------- Machine Learning , Tom Mitchell, 1997

  4. Example 1: image classification Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification

  5. Example 1: image classification Experience/Data: images with labels indoor Indoor outdoor

  6. Example 1: image classification • A few terminologies • Training data: the images given for learning • Test data: the images to be classified • Binary classification: classify into two classes

  7. Example 1: image classification (multi-class) ImageNet figure borrowed from vision.standford.edu

  8. Example 2: clustering images Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images

  9. Example 2: clustering images • A few terminologies • Unlabeled data vs labeled data • Supervised learning vs unsupervised learning

  10. Math formulation Feature vector: 𝑦 𝑗 Color Histogram Extract features Label: 𝑧 𝑗 Red Green Blue Indoor 0

  11. Math formulation Feature vector: 𝑦 𝑘 Color Histogram Extract features Label: 𝑧 𝑘 Red Green Blue outdoor 1

  12. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) using training data • s.t. 𝑔 correct on test data What kind of functions?

  13. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data Hypothesis class

  14. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data Connection between training data and test data?

  15. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸 They have the same distribution i.i.d.: independently identically distributed

  16. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸 What kind of performance measure?

  17. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] Various loss functions

  18. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] • Examples of loss functions: • 0-1 loss: 𝑚 𝑔, 𝑦, 𝑧 = 𝕁[𝑔 𝑦 ≠ 𝑧] and 𝑀 𝑔 = Pr[𝑔 𝑦 ≠ 𝑧] • 𝑚 2 loss: 𝑚 𝑔, 𝑦, 𝑧 = [𝑔 𝑦 − 𝑧] 2 and 𝑀 𝑔 = 𝔽[𝑔 𝑦 − 𝑧] 2

  19. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] How to use?

  20. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 that minimizes ෠ 𝑜 𝑜 σ 𝑗=1 𝑀 𝑔 = 𝑚(𝑔, 𝑦 𝑗 , 𝑧 𝑗 ) • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] Empirical loss

  21. Machine learning 1-2-3 • Collect data and extract features • Build model: choose hypothesis class 𝓘 and loss function 𝑚 • Optimization: minimize the empirical loss

  22. Wait… • Why handcraft the feature vectors 𝑦, 𝑧 ? • Can use prior knowledge to design suitable features • Can computer learn the features on the raw images? • Learn features directly on the raw images: Representation Learning • Deep Learning ⊆ Representation Learning ⊆ Machine Learning ⊆ Artificial Intelligence

  23. Wait… • Does MachineLearning-1-2-3 include all approaches? • Include many but not all • Our current focus will be MachineLearning-1-2-3

  24. Example: Stock Market Prediction Stock Market (Disclaimer: synthetic data/in another parallel universe) Orange MacroHard Ackermann 2013 2014 2015 2016 Sliding window over time: serve as input 𝑦 ; non-i.i.d.

  25. Linear regression

  26. 𝑧 : prostate specific antigen Real data: Prostate Cancer by Stamey et al. (1989) Figure borrowed from The Elements of Statistical Learning (𝑦 1 , … , 𝑦 8 ) : clinical measures

  27. Linear regression • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 𝑥 𝑦 = 𝑥 𝑈 𝑦 that minimizes ෠ 𝑜 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 • Find 𝑔 𝑜 σ 𝑗=1 𝑀 𝑔 𝑥 = 𝑚 2 loss; also called mean square error Hypothesis class 𝓘

  28. Linear regression: optimization • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 𝑥 𝑦 = 𝑥 𝑈 𝑦 that minimizes ෠ 𝑜 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 • Find 𝑔 𝑜 σ 𝑗=1 𝑀 𝑔 𝑥 = 𝑈 , 𝑧 be the vector 𝑧 1 , … , 𝑧 𝑜 𝑈 • Let 𝑌 be a matrix whose 𝑗 -th row is 𝑦 𝑗 𝑜 𝑥 = 1 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 = 1 ෠ 2 𝑀 𝑔 𝑜 ෍ 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦ 2 𝑗=1

  29. Linear regression: optimization • Set the gradient to 0 to get the minimizer 1 2 = 0 𝑥 ෠ 𝛼 𝑀 𝑔 𝑥 = 𝛼 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦ 2 𝑥 𝑥 [ 𝑌𝑥 − 𝑧 𝑈 (𝑌𝑥 − 𝑧)] = 0 𝛼 𝑥 [ 𝑥 𝑈 𝑌 𝑈 𝑌𝑥 − 2𝑥 𝑈 𝑌 𝑈 𝑧 + 𝑧 𝑈 𝑧] = 0 𝛼 2𝑌 𝑈 𝑌𝑥 − 2𝑌 𝑈 𝑧 = 0 w = 𝑌 𝑈 𝑌 −1 𝑌 𝑈 𝑧

  30. Linear regression: optimization • Algebraic view of the minimizer • If 𝑌 is invertible, just solve 𝑌𝑥 = 𝑧 and get 𝑥 = 𝑌 −1 𝑧 • But typically 𝑌 is a tall matrix 𝑥 = 𝑥 = 𝑌 𝑈 𝑌 𝑌 𝑈 𝑧 𝑌 𝑧 Normal equation: w = 𝑌 𝑈 𝑌 −1 𝑌 𝑈 𝑧

  31. Linear regression with bias Bias term • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 𝑥,𝑐 𝑦 = 𝑥 𝑈 𝑦 + 𝑐 to minimize the loss • Find 𝑔 • Reduce to the case without bias: • Let 𝑥 ′ = 𝑥; 𝑐 , 𝑦 ′ = 𝑦; 1 𝑥,𝑐 𝑦 = 𝑥 𝑈 𝑦 + 𝑐 = 𝑥 ′ 𝑈 (𝑦 ′ ) • Then 𝑔

Recommend


More recommend