liacs data mining course
play

LIACS Data Mining course an introduction Course Textbook Data - PowerPoint PPT Presentation

Arno Knobbe Joaquin Vanschoren LIACS Data Mining course an introduction Course Textbook Data Mining Practical Machine Learning Tools and Techniques second edition, Morgan Kaufmann, ISBN 0-12-088407-0 by Ian Witten and Eibe Frank Course


  1. Arno Knobbe Joaquin Vanschoren LIACS Data Mining course an introduction

  2. Course Textbook Data Mining Practical Machine Learning Tools and Techniques second edition, Morgan Kaufmann, ISBN 0-12-088407-0 by Ian Witten and Eibe Frank

  3. Course Information  Course website: http://datamining.liacs.nl/DaMi/ (will be updated this week)  Old websites discontinued: http://datamining.liacs.nl/~akoopman/DaMi/ http://www.liacs.nl/~joost/DM/CollegeDataMining.htm  Practical exercises  New style of exam  fewer definitions, more understanding and applying  old exams ( ≤ 2009) should not be used  exam preparation important

  4. Course Outline 10-Sep Knobbe today 17-Sep Knobbe 24-Sep no lecture! 01-Oct Vanschoren 08-Oct Knobbe 15-Oct Knobbe + practical exercise 22-Oct Vanschoren 29-Oct Vanschoren 05-Nov Vanschoren 12-Nov Knobbe 19-Nov Takes guest lecture + practical exercise 26-Nov Vanschoren 03-Dec Vanschoren + pratical exercise TBD Vanschoren, Knobbe exam preparation!

  5. Introduction Data Mining an overview and some examples

  6. Data Mining definitions Data Mining : the concept of extracting previously unknown and potentially useful information from large sets of data. secondary statistics: analyzing data that wasn’t originally collected for analysis.

  7. Data Mining, the big idea  Organizations collect large amounts of data  Often for administrative purposes  Large body of experience  Learning from experience  Goals  Prediction  Optimization  Forecasting  Diagnostics  …

  8. 2 Streams

  9. 2 Streams  Mining for insight  Understanding a domain  Finding regularities between variables  Goal of Data Mining is mostly undefined  Interpretable models  Examples: Medicine, production, maintenance

  10. 2 Streams  Mining for insight  Understanding a domain  Finding regularities between variables  Goal of Data Mining is mostly undefined  Interpretable models  Examples: Medicine, production, maintenance  ‘Black-box’ Mining  Don’t care how you do it, just do it well  Optimization  Examples: Marketing, forecasting (financial, weather)

  11. example: Direct Mail Optimize the response to a mailing, by targeting only those that are likely to respond:  more response  fewer letters

  12. example: Direct Mail Optimize the response to a mailing, by targeting only those that are likely to respond:  more response  fewer letters test mailing Customer information

  13. example: Direct Mail Optimize the response to a mailing, by targeting only those that are likely to respond:  more response  fewer letters response test mailing Customer information 3%

  14. example: Direct Mail Optimize the response to a mailing, by targeting only those that are likely to respond:  more response  fewer letters response test mailing Customer information 3% Data Mining customer model

  15. example: Direct Mail Optimize the response to a mailing, by targeting only those that are likely to respond:  more response  fewer letters response test mailing Customer information 3% final Customer information mailing

  16. example: Direct Mail Optimize the response to a mailing, by targeting only those that are likely to respond:  more response  fewer letters response test mailing Customer information 3% final response Customer information mailing 30%

  17. example: Direct Mail Optimize the response to a mailing, by targeting only those that are likely to respond:  more response  fewer letters response test mailing Customer information 3% final response Customer information mailing 30% remainder

  18. example: Bioinformatics  Find genes involved in disease (Parkinson’s, Celiac, Neuroblastoma)  Measurements from patients (1) and controls (0)  Gene expression: measurements of 20k genes  dataset 20,001 x 100  Challenges  many variables  few examples (patients), testing is expensive  interactions between genes

  19. Data Mining paradigms  Classification  binary class variable  predict class of future cases  most popular paradigm  Clustering  divide dataset into groups of similar cases  Regression  numeric target variable  Association  find dependencies between variables  basket analysis, …

  20. Classification Predict the class (often 0/1) of an object on the basis of examples of other objects (with a class given).

  21. Classification Predict the class (often 0/1) of an object on the basis of examples of other objects (with a class given). Age < 35 Rent Age ≥ 35 Price < 200K Buy Price ≥ 200K Other

  22. Classification Predict the class (often 0/1) of an object on the basis of examples of other objects (with a class given). Age < 35 Yes Rent Age ≥ 35 No Yes Price < 200K Buy Price ≥ 200K No Other No

  23. Classification Predict the class (often 0/1) of an object on the basis of examples of other objects (with a class given). Age < 35 Yes Rent Age ≥ 35 No 0.2 Yes Price < 200K Buy Price ≥ 200K No Other No

  24. Classification Predict the class (often 0/1) of an object on the basis of examples of other objects (with a class given). Age < 35 Yes 0.4 Rent Age ≥ 35 No 0.2 Yes Price < 200K 0.1 Buy Price ≥ 200K No 0.07 Other No

  25. Classification Predict the class (often 0/1) of an object on the basis of examples of other objects (with a class given). 0.64 Age < 35 Yes 0.4 Rent 0.25 Age ≥ 35 No 0.2 0.51 Yes Price < 200K 0.1 Buy 0.01 Price ≥ 200K No 0.07 Other No

  26. Building (inducing) a decision tree Age Gender House Price Mortgage? 21 M Rent - No 30 F Rent - Yes 40 M Rent - No 32 F Buy 300K No 30 F Rent - Yes 55 M Buy 260K No 25 F Buy 180K Yes …

  27. Building (inducing) a decision tree Age Gender House Price Mortgage? 21 M Rent - No 30 F Rent - Yes 40 M Rent - No 32 F Buy 300K No 30 F Rent - Yes 55 M Buy 260K No 25 F Buy 180K Yes …

  28. Building (inducing) a decision tree Age Gender House Price Mortgage? 21 M Rent - No 30 F Rent - Yes 40 M Rent - No 32 F Buy 300K No 30 F Rent - Yes 55 M Buy 260K No 25 F Buy 180K Yes … Rent Buy Other

  29. Building (inducing) a decision tree Age Age Gender House Gender House Price Price Mortgage? Mortgage? 21 21 M M Rent Rent - - No No 30 30 F F Rent Rent - - Yes Yes 40 40 M M Rent Rent - - No No 32 32 F F Buy Buy 300K 300K No No 30 30 F F Rent Rent - - Yes Yes 55 55 M M Buy Buy 260K 260K No No 25 25 F F Buy Buy 180K 180K Yes Yes … … Rent Buy Other

  30. Building (inducing) a decision tree Age Age Gender House Gender House Price Price Mortgage? Mortgage? 21 21 M M Rent Rent - - No No 30 30 F F Rent Rent - - Yes Yes 40 40 M M Rent Rent - - No No 32 32 F F Buy Buy 300K 300K No No 30 30 F F Rent Rent - - Yes Yes 55 55 M M Buy Buy 260K 260K No No 25 25 F F Buy Buy 180K 180K Yes Yes … … Age < 35 Rent Age ≥ 35 Buy Other

  31. Building (inducing) a decision tree Age Age Gender House Gender House Price Price Mortgage? Mortgage? 21 21 M M Rent Rent - - No No 30 30 F F Rent Rent - - Yes Yes 40 40 M M Rent Rent - - No No 32 32 F F Buy Buy 300K 300K No No 30 30 F F Rent Rent - - Yes Yes 55 55 M M Buy Buy 260K 260K No No 25 25 F F Buy Buy 180K 180K Yes Yes … … Age < 35 Rent Age ≥ 35 Price < 200K Buy Price ≥ 200K Other

  32. Applying a classifier (decision tree) New customer: (House = Rent, Age = 32, …) Age < 35 Yes Rent Age ≥ 35 No Yes Price < 200K Buy Price ≥ 200K No Other No

  33. Applying a classifier (decision tree) New customer: (House = Rent, Age = 32, …) Age < 35 Yes Rent Age ≥ 35 No Yes Price < 200K Buy Price ≥ 200K No Other No

  34. Applying a classifier (decision tree) New customer: (House = Rent, Age = 32, …) prediction = Yes Age < 35 Yes Rent Age ≥ 35 No Yes Price < 200K Buy Price ≥ 200K No Other No

  35. Graphical interpretation  dataset with two variables + 1 class (+/-)  graphical interpretation of decision tree y + + + + + + + - + - + + - - + - - - + - 0 x

  36. Graphical interpretation  dataset with two variables + 1 class (+/-)  graphical interpretation of decision tree y + + + + + + + x < t - + - + + - x ≥ t - + - - - + - 0 x

  37. Graphical interpretation  dataset with two variables + 1 class (+/-)  graphical interpretation of decision tree y + + + + + + + x < t - + - y < t’ + + - x ≥ t - + - - y ≥ t’ - + - 0 x

  38. Graphical interpretation  dataset with two variables + 1 class (+/-)  other classifiers y + + + + + + + - + - + + - - + - - - + - 0 x

  39. Graphical interpretation  dataset with two variables + 1 class (+/-)  other classifiers Support Vector Machine y + + + + + + + - + - + + - - + - - - + - 0 x

  40. Graphical interpretation  dataset with two variables + 1 class (+/-)  other classifiers Support Vector Machine y + + + + + Neural Network + + - + - + + - - + - - - + - 0 x

Recommend


More recommend