Data Mining 2019 Introduction
Ad Feelders
Universiteit Utrecht
Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 53
Data Mining 2019 Introduction Ad Feelders Universiteit Utrecht Ad - - PowerPoint PPT Presentation
Data Mining 2019 Introduction Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 53 The Course Literature: Lecture Notes, Book Chapters, Articles, Slides (the slides appear in the schedule on the course
Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 2 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 3 / 53
1 Write your own classification tree and random forest algorithm in R,
2 Text Mining: analyze hotel reviews to distinguish genuine from fake
Ad Feelders ( Universiteit Utrecht ) Data Mining 4 / 53
1 Basic probability and statistics. 2 Elementary calculus and linear algebra. 3 Basic programming skills (not necessarily in R). Ad Feelders ( Universiteit Utrecht ) Data Mining 5 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 6 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 7 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 8 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 9 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 10 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 11 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 12 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 13 / 53
1 All horses are mammals 2 All mammals have lungs 3 Therefore, all horses have lungs
1 All horses observed so far have lungs 2 Therefore, all horses have lungs Ad Feelders ( Universiteit Utrecht ) Data Mining 14 / 53
1 4% of the products we tested are defective 2 Therefore, 4% of all products (tested or otherwise)
Ad Feelders ( Universiteit Utrecht ) Data Mining 15 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 16 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 17 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 18 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 19 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 20 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 21 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 22 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 23 / 53
1 data editing: what to do when records contain impossible
2 incomplete data: what to do with missing values? Ad Feelders ( Universiteit Utrecht ) Data Mining 24 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 25 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 26 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 27 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 28 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 29 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 30 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 31 / 53
Respondent Demographics Tv Program Viewing Telecast Program Respondent ID PK Household ID Number of Children ID PK Respondent ID Telecast ID Offset Seconds ID PK Program ID ID PK Program Name Channel Name Date Duration Seconds Date Duration Seconds m 1 1 m 1 m 1
Ad Feelders ( Universiteit Utrecht ) Data Mining 32 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 33 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 34 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 35 / 53
No condition [49.7%,50.3%] 100,000 age in[19,24] [54.6%,56.2%] 14,249 gender = m age in [19,24] [52.8%,53.7%] 53,179 [60.2%,62.3%] 8,130 age in[19,24] carprice in [59000,79995] [55.9%,59.6%] 2,831 [61.2,67.4] 1,134 category = lease gender = m age in[19,24] [50.7%,52.0%] 20,315 [53.5%,55.4%] 10,778 [59.4%,64.1%] 1,651 Ad Feelders ( Universiteit Utrecht ) Data Mining 36 / 53
Smoke (Y/ N) Mental work
Lipo ratio Anamnesis a b c d e f
Ad Feelders ( Universiteit Utrecht ) Data Mining 37 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 38 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 39 / 53
1 A representation language: what models are we looking for? 2 A quality function: when do we consider a model to be good? 3 A search algorithm: how de we go about finding good models? Ad Feelders ( Universiteit Utrecht ) Data Mining 40 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 41 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 42 / 53
4 6 8 10 10 20 30 40 Ad Feelders ( Universiteit Utrecht ) Data Mining 43 / 53
4 6 8 10 10 20 30 40 Ad Feelders ( Universiteit Utrecht ) Data Mining 44 / 53
4 6 8 10 10 20 30 40 50 Ad Feelders ( Universiteit Utrecht ) Data Mining 45 / 53
4 6 8 10 10 20 30 40 50 Ad Feelders ( Universiteit Utrecht ) Data Mining 46 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 47 / 53
1 Start with some initial model, e.g. y = a, and compute its quality. 2 Neighbours: add or remove a predictor. 3 If all neighbours have lower quality, then stop and return the current
Ad Feelders ( Universiteit Utrecht ) Data Mining 48 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 49 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 50 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 51 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 52 / 53
Ad Feelders ( Universiteit Utrecht ) Data Mining 53 / 53