machine learning chenhao tan
play

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Jordan Boyd-Graber, Thorsten Joachims, Kilian Weinberger Machine Learning: Chenhao Tan | Boulder | 1 of 31 Logistics Piazza:


  1. Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Jordan Boyd-Graber, Thorsten Joachims, Kilian Weinberger Machine Learning: Chenhao Tan | Boulder | 1 of 31

  2. Logistics • Piazza: https://piazza.com/colorado/fall2017/csci5622/ • Moodle: https://moodle.cs.colorado.edu/course/view.php?id=507 • Prerequisite quiz • Final project • iCliker Machine Learning: Chenhao Tan | Boulder | 2 of 31

  3. Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 3 of 31

  4. Supervised Learning Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 4 of 31

  5. Supervised Learning Supervised Learning Data Labels X Y • Supervised methods find patterns in fully observed data and then try to predict something from partially observed data. • For example, in sentiment analysis, after learning something from annotated reviews, we want to take new reviews and automatically identify sentiments. Machine Learning: Chenhao Tan | Boulder | 5 of 31

  6. Supervised Learning Formal Definitions • Labels Y , e.g., binary labels y ∈ { + 1 , − 1 } • Instance space X , all the possible instances (based on data representation) • Target function f : X → Y ( f is unknown) Machine Learning: Chenhao Tan | Boulder | 6 of 31

  7. Supervised Learning Formal Definitions • Labels Y , e.g., binary labels y ∈ { + 1 , − 1 } • Instance space X , all the possible instances (based on data representation) • Target function f : X → Y ( f is unknown) • Example/instance ( x , y ) • Training data S train : collection of examples observed by the algorithm Machine Learning: Chenhao Tan | Boulder | 6 of 31

  8. Supervised Learning Formal Definitions • Goal of a learning algorithm: Find a function h : X → Y from training data S train so that h approximates f Machine Learning: Chenhao Tan | Boulder | 7 of 31

  9. Supervised Learning Supervised learning in a nutshell S train = { ( x , y ) } → h Machine Learning: Chenhao Tan | Boulder | 8 of 31

  10. Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Machine Learning: Chenhao Tan | Boulder | 9 of 31

  11. Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Corollary I: there is no single ML algorithm that works for everything. Corollary II: every successful ML algorithm makes assumptions. Machine Learning: Chenhao Tan | Boulder | 9 of 31

  12. Supervised Learning No Free Lunch Theorems • No free lunch for supervised machine learning [Wolpert, 1996]: in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms. Corollary I: there is no single ML algorithm that works for everything. Corollary II: every successful ML algorithm makes assumptions. • No free lunch for search/optimization [Wolpert and Macready, 1997]: All algorithms that search for an extremum of a cost function perform exactly the same when averaged over all possible cost functions. Machine Learning: Chenhao Tan | Boulder | 9 of 31

  13. Data representation Outline Supervised Learning Data representation K-nearest neighbors Overview Performance Guarantee Curse of Dimensionality Machine Learning: Chenhao Tan | Boulder | 10 of 31

  14. Data representation Data representation Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan | Boulder | 11 of 31

  15. Data representation Data representation Let us have an interactive example to think through data representation! Machine Learning: Chenhao Tan | Boulder | 12 of 31

  16. Data representation Data representation Let us have an interactive example to think through data representation! Auto insurance quotes id rent income urban state car value car year 1 yes 50,000 no CO 20,000 2010 2 yes 70,000 no CO 30,000 2012 3 no 250,000 yes CO 55,000 2017 4 yes 200,000 yes NY 50,000 2016 Machine Learning: Chenhao Tan | Boulder | 12 of 31

Recommend


More recommend