linear models
play

Linear Models EECS 442 David Fouhey Fall 2019, University of - PowerPoint PPT Presentation

(Mainly ) Linear Models EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/ Next Few Classes Machine Learning (ML) Crash Course I cant cover everything If you can,


  1. (Mainly ) Linear Models EECS 442 – David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/

  2. Next Few Classes • Machine Learning (ML) Crash Course • I can’t cover everything • If you can, take a ML course or learn online • ML really won’t solve all problems and is incredibly dangerous if misused • But ML is a powerful tool and not going away

  3. Terminology • ML is incredibly messy terminology-wise. • Most things have at lots of names. • I will try to write down multiple of them so if you see it later you’ll know what it is.

  4. Pointers Useful book (Free too!): The Elements of Statistical Learning Hastie, Tibshirani, Friedman https://web.stanford.edu/~hastie/ElemStatLearn/ Useful set of data: UCI ML Repository https://archive.ics.uci.edu/ml/datasets.html A lot of important and hard lessons summarized: https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

  5. Machine Learning (ML) • Goal: make “sense” of data • Overly simplified version: transform vector x into vector y =T (x) that’s somehow better • Potentially you fit T using pairs of datapoints and desired outputs ( x i , y i ), or just using a set of datapoints ( x i ) • Always are trying to find some transformation that minimizes or maximizes some objective function or goal.

  6. Machine Learning Input: x Output: y Feature vector/Data point: Label / target: Vector representation of Fixed length vector of desired datapoint. Each dimension or output. Each dimension “ feature ” represents some represents some aspect of the aspect of the data. output data Supervised : we are given y. Unsupervised : we are not, and make our own ys.

  7. Example – Health Input: x in R N Output: y Blood pressure P(Has Diabetes) 50 f(Wx) Heart Rate P(No Diabetes) 60 … Glucose Level 0.2 Intuitive objective function : Want correct category to be likely with our model.

  8. Example – Health Input: x in R N Output: y Blood pressure 50 Wx Heart Rate Age 60 … Glucose Level 0.2 Intuitive objective function : Want our prediction of age to be “close” to true age.

  9. Example – Health Input: x in R N Output: discrete y (unsupervised) Blood pressure User group 1 50 0/1 f(x) Heart Rate User group 2 60 0/1 … … 0/1 User group K Glucose Level 0.2 Intuitive objective function : Want to find K groups that explain the data we see.

  10. Example – Health Input: x in R N Output: continuous y (discovered) Blood pressure User dimension 1 50 0.2 Wx Heart Rate User dimension 2 60 1.3 … … 0.7 User dimension K Glucose Level 0.2 Intuitive objective function : Want to K dimensions (often two) that are easier to understand but capture the variance of the data.

  11. Example – Credit Card Fraud Input: x in R N Output: y Bought before P(Fraud) 0 f(Wx) Amount P(No Fraud) $12 … Near Billing 1 Address Intuitive objective function : Want correct category to be likely with our model.

  12. Example – Computer Vision Input: x in R N Output: y Pixel at (0,0) P(Cat) f(Wx) Pixel at (0,1) P(Dog) … … Pixel at (H-1,W-1) P(Bird) Intuitive objective function : Want correct category to be likely with our model.

  13. Example – Computer Vision Input: x in R N Output: y Count of P(Cat) visual cluster 1 Count of f(Wx) P(Dog) visual cluster 2 … … Count of P(Bird) visual cluster K Intuitive objective function : Want correct category to be likely with our model.

  14. Example – Computer Vision Input: x in R N Output: y f 1 (Image) P(Cat) f(Wx) f 2 (Image) P(Dog) … … f N (Image) P(Bird) Intuitive objective function : Want correct category to be likely with our model.

  15. Abstractions • Throughout, assume we’ve converted data into a fixed-length feature vector. There are well- designed ways for doing this. • But remember it could be big! • Image (e.g., 224x224x3): 151K dimensions • Patch (e.g., 32x32x3) in image: 3072 dimensions

  16. ML Problems in Vision Image credit: Wikipedia

  17. ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Output Categorization Continuous Output Slide adapted from J. Hays

  18. ML Problem Examples in Vision Cat egorization/Classification Binning into K mutually-exclusive categories P(Cat) 0.9 P(Dog) 0.1 … P(Bird) 0.0 Image credit: Wikipedia

  19. ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Output Categorization Continuous Regression Output Slide adapted from J. Hays

  20. ML Problem Examples in Vision Regression Estimating continuous variable(s) 3.6 Cat weight kg Image credit: Wikipedia

  21. ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Clustering Output Categorization Continuous Regression Output Slide adapted from J. Hays

  22. ML Problem Examples in Vision Clustering Given a set of cats, automatically discover clusters or cat egories. 1 2 3 4 5 6 Image credit: Wikipedia, cattime.com

  23. ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Clustering Output Categorization Continuous Dimensionality Regression Output Reduction Slide adapted from J. Hays

  24. ML Problem Examples in Vision Dimensionality Reduction Find dimensions that best explain the whole image/input Cat size in image Location of cat in image For ordinary images, this is currently a totally hopeless task. For certain images (e.g., faces, this works reasonably well) Image credit: Wikipedia

  25. Practical Example • ML has a tendency to be mysterious • Let’s start with: • A model you learned in middle/high school (a line) • Least-squares • One thing to remember: • N eqns, <N vars = overdetermined (will have errors) • N eqns, N vars = exact solution • N eqns, >N vars = underdetermined (infinite solns)

  26. Example – Least Squares Let’s make the world’s worst weather model Data: (x 1 ,y 1 ), (x 2 ,y 2 ), …, ( x k ,y k ) Model: (m,b) y i =mx i +b Or ( w ) y i = w T x i Objective function: (y i - w T x i ) 2

  27. World’s Worst Weather Model Given latitude (distance above equator), predict temperature by fitting a line Temp City Latitude (°) Temp (F) Ann Arbor 42 33 Washington, DC 39 38 Austin, TX 30 62 Mexico City 19 67 Latitude Panama City 9 83

  28. Example – Least Squares 𝑙 𝑧 𝑗 − 𝒙 𝑈 𝒚 𝒋 2 2 ෍ 𝒛 − 𝒀𝒙 2 𝑗=1 Inputs : Model / Weights : Output : Temperature Latitude, 1 Latitude, “Bias” 𝑧 1 𝑦 1 1 𝑛 ⋮ ⋮ ⋮ 𝒀 = 𝒙 = 𝒛 = 𝑐 𝑧 𝑙 𝑦 𝑙 1

  29. Example – Least Squares 𝑙 𝑧 𝑗 − 𝒙 𝑈 𝒚 𝒋 2 2 ෍ 𝒛 − 𝒀𝒙 2 𝑗=1 Inputs: Model/Weights: Output: Temperature Latitude, 1 Latitude, “Bias” 42 1 33 𝑛 𝒀 = ⋮ ⋮ 𝒙 = 𝒛 = ⋮ 𝑐 9 1 83 Intuitively why do we add a one to the inputs?

  30. Example – Least Squares 2 arg min 𝒛 − 𝒀𝒙 2 or 𝒙 Training ( x i ,y i ): 𝑜 𝒙 𝑈 𝒚 𝒋 − 𝑧 𝑗 2 arg min 𝒙 ෍ 𝑗=1 Loss function/objective : evaluates correctness. Here: Squared L2 norm / Sum of Squared Errors Training/Learning/Fitting: try to find model that optimizes / minimizes an objective / loss function 𝒙 ∗ = 𝒀 𝑈 𝒀 −1 𝒀 𝑈 𝒛 Optimal w* is

  31. Example – Least Squares 2 arg min 𝒛 − 𝒀𝒙 2 or 𝒙 Training ( x i ,y i ): 𝑜 𝒙 𝑈 𝒚 𝒋 − 𝑧 𝑗 2 arg min 𝒙 ෍ 𝑗=1 𝒙 𝑈 𝒚 = 𝑥 1 𝑦 1 + ⋯ + 𝑥 𝐺 𝑦 𝐺 Inference (x): Testing/Inference: Given a new output, what’s the prediction?

  32. Least Squares: Learning Data Model City Latitude Temp Ann Arbor 42 33 Washington, DC 39 38 Temp = Austin, TX 30 62 -1.47*Lat + 97 Mexico City 19 67 Panama City 9 83 42 1 33 𝒀 𝑈 𝒀 −1 𝒀 𝑈 𝒛 39 1 38 𝑥 2𝑦1 = −1.47 𝒀 5𝑦2 = 𝒛 5𝑦1 = 30 1 62 97 19 1 67 9 1 83

  33. Let’s Predict The Weather EECS 442 City Latitude Temp Temp Error Ann Arbor 42 33 35.3 2.3 Washington, DC 39 38 39.7 1.7 Austin, TX 30 62 52.9 10.9 Mexico City 19 67 69.1 2.1 Panama City 9 83 83.8 0.8

  34. Is This a Minimum Viable Product? EECS 442 Pittsburgh : Actual Pittsburgh: Temp = -1.47*40 + 97 = 38 45 Berkeley : Actual Berkeley: Temp = -1.47*38 + 97 = 41 53 Sydney : Actual Sydney: Temp = -1.47*-33 + 97 = 146 74 Won’t do so well in the Australian market…

  35. Where Can This Go Wrong?

  36. Where Can This Go Wrong? Data Model City Latitude Temp Temp = Ann Arbor 42 33 -1.66*Lat + 103 Washington, DC 39 38 How well can we predict Ann Arbor and DC and why?

  37. Always Need Separated Testing Model might be fit data too precisely “ overfitting ” Remember: #datapoints = #params = perfect fit Model may only work under some conditions (e.g., trained on northern hemisphere). Sydney : Temp = -1.47*-33 + 97 = 146

  38. Training and Testing Fit model parameters on training set; evaluate on entirely unseen test set. Training Test “It’s tough to make predictions, especially about the future” -Yogi Berra Nearly any model can predict data it’s seen. If your model can’t accurately interpret “unseen” data, it’s probably useless. We have no clue whether it has just memorized.

Recommend


More recommend