learning from data lecture 1 the learning problem
play

Learning From Data Lecture 1 The Learning Problem Introduction - PowerPoint PPT Presentation

Learning From Data Lecture 1 The Learning Problem Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem M. Magdon-Ismail CSCI 4100/6100 Resources 1. Web Page: www.cs.rpi.edu/ magdon/courses/learn.php


  1. Learning From Data Lecture 1 The Learning Problem Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem M. Magdon-Ismail CSCI 4100/6100

  2. Resources 1. Web Page: www.cs.rpi.edu/ ∼ magdon/courses/learn.php – course info: www.cs.rpi.edu/ ∼ magdon/courses/learn/info.pdf – slides: www.cs.rpi.edu/ ∼ magdon/courses/learn/slides.html – assignments: www.cs.rpi.edu/ ∼ magdon/courses/learn/assign.html Learning From Data 2. Text Book: Abu-Mostafa, Magdon-Ismail, Lin 3. Book Forum: book.caltech.edu/bookforum – discussion about any material in book including problems and exercises. – additional material 4. TA. 5. Professor. 6. Prerequisites? assignment #0 M The Learning Problem : 2 /16 � A c L Creator: Malik Magdon-Ismail The storyline − →

  3. The Storyline 1. What is Learning? 2. Can We do it? 3. How to do it? concepts 4. How to do it well? theory practice 5. General principles? 6. Advanced techniques. 7. Other Learning Paradigms. our language will be mathematics . . . . . . our sword will be computer algorithms M The Learning Problem : 3 /16 � A c L Creator: Malik Magdon-Ismail The applications − →

  4. M The Learning Problem : 4 /16 � A c L Creator: Malik Magdon-Ismail Define a tree − →

  5. Let’s Define a Tree? M The Learning Problem : 5 /16 � A c L Creator: Malik Magdon-Ismail A definition − →

  6. Let’s Define a Tree? A brown trunk moving upwards and branching with leaves . . . M The Learning Problem : 6 /16 � A c L Creator: Malik Magdon-Ismail Does it work? − →

  7. Are These Trees? M The Learning Problem : 7 /16 � A c L Creator: Malik Magdon-Ismail Learning a Tree − →

  8. Learning “What are Trees” is ‘Easy’ M The Learning Problem : 8 /16 � A c L Creator: Malik Magdon-Ismail Recognizing is easy − →

  9. Defining is Hard; Recognizing is Easy Hard to give a complete mathematical definition of a tree. Even a 3 year old can tell a tree from a non-tree. The 3 year old has learned from data. (Other tasks like graphics or GAN?) M The Learning Problem : 9 /16 � A c L Creator: Malik Magdon-Ismail Rating movies − →

  10. Learning to Rate Movies • Can we predict how a viewer would rate a movie? • Why? So that Netflix can make better movie recommendations, and get more rentals. • $1 million prize for a mere 10% improvement in their recommendation system . M The Learning Problem : 10 /16 � A c L Creator: Malik Magdon-Ismail There’s a pattern, we have data − →

  11. Previous Ratings Reflect Future Ratings ? s r e ? t e s s ? u i y b u d ? r k C n e c m o o m i l o t b o c c a s T r s s s e e e e f k k e k i i r i l l p l • Viewer taste & movie content imply viewer rating. viewer: • No magical formula to predict viewer rating. predicted Match corresponding factors rating then add their contributions • Netflix has data. We can learn to identify movie movie: “categories” as well as viewer “preferences” T c a b o o c l o m t m i c o e k C n d b r y c u u o s i c s n t o e e t n e r i ? n t n e t i n t ? t Class Motto: A pattern exists. We don’t know it. We have data to learn it. M The Learning Problem : 11 /16 � A c L Creator: Malik Magdon-Ismail Credit approval − →

  12. Credit Approval Let’s use a conceptual example to crystallize the issues. age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit? M The Learning Problem : 12 /16 � A c L Creator: Malik Magdon-Ismail There’s a pattern, we have data − →

  13. Credit Approval Let’s use a conceptual example to crystallize the issues. age 32 years • Using salary, debt, years in residence, etc., approve for credit or not. gender male salary 40,000 • No magic credit approval formula. debt 26,000 • Banks have lots of data. years in job 1 year – customer information: salary, debt, etc. years at home 3 years . . . . . . – whether or not they defaulted on their credit. Approve for credit? A pattern exists. We don’t know it. We have data to learn it. M The Learning Problem : 13 /16 � A c L Creator: Malik Magdon-Ismail Key players − →

  14. The Key Players input x ∈ R d = X . • Salary, debt, years in residence, . . . • Approve credit or not output y ∈ {− 1 , +1 } = Y . • True relationship between x and y target function f : X �→ Y . (The target f is unknown .) • Data on customers data set D = ( x 1 , y 1 ) , . . . , ( x N , y N ). ( y n = f ( x n ) .) X Y and D are given by the learning problem; The target f is fixed but unknown. We learn the function f from the data D . M The Learning Problem : 14 /16 � A c L Creator: Malik Magdon-Ismail Learning − →

  15. Learning • Start with a set of candidate hypotheses H which you think are likely to represent f . H = { h 1 , h 2 , . . . , } is called the hypothesis set or model . • Select a hypothesis g from H . The way we do this is called a learning algorithm . • Use g for new customers. We hope g ≈ f . X Y and D are given by the learning problem; The target f is fixed but unknown . We choose H and the learning algorithm This is a very general setup (eg. choose H to be all possible hypotheses) M The Learning Problem : 15 /16 � A c L Creator: Malik Magdon-Ismail Summary of learning setup − →

  16. Summary of the Learning Setup UNKNOWN TARGET FUNCTION f : X �→ Y (ideal credit approval formula) y n = f ( x n ) TRAINING EXAMPLES ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) (historical records of credit customers) FINAL LEARNING HYPOTHESIS ALGORITHM g ≈ f A (learned credit approval formula) HYPOTHESIS SET H (set of candidate formulas) M The Learning Problem : 16 /16 � A c L Creator: Malik Magdon-Ismail

Recommend


More recommend