lecture 8 regression trees
play

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan - PowerPoint PPT Presentation

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan Thirumuruganathan Outline 1 Regression 2 Linear Regression 3 Regression Trees CSE 5334 Saravanan Thirumuruganathan Regression and Linear Regression CSE


  1. Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan Thirumuruganathan

  2. Outline 1 Regression 2 Linear Regression 3 Regression Trees CSE 5334 Saravanan Thirumuruganathan

  3. Regression and Linear Regression CSE 5334 Saravanan Thirumuruganathan

  4. Supervised Learning Dataset: Training (labeled) data: D = { ( x i , y i ) } x i ∈ R d Test (unlabeled) data: x 0 ∈ R d Tasks: Classification: y i ∈ { 1 , 2 , . . . , C } Regression: y i ∈ R Objective: Given x 0 , predict y 0 Supervised learning as y i was given during training CSE 5334 Saravanan Thirumuruganathan

  5. Regression Predict cost of house from details Predict job salary from job description Predict SAT, GRE scores Predict future price of Petrol from past prices Predict future GDP of a country, valuation of a company CSE 5334 Saravanan Thirumuruganathan

  6. Linear Regression : One-dimensional Case CSE 5334 Saravanan Thirumuruganathan

  7. Linear Regression : One-dimensional Case CSE 5334 Saravanan Thirumuruganathan

  8. Linear Regression : One-dimensional Case CSE 5334 Saravanan Thirumuruganathan

  9. Linear Regression: Poverty vs HS Graduation Rate CSE 5334 Saravanan Thirumuruganathan

  10. Linear Regression: Poverty vs HS Graduation Rate CSE 5334 Saravanan Thirumuruganathan

  11. Residuals CSE 5334 Saravanan Thirumuruganathan

  12. Residuals CSE 5334 Saravanan Thirumuruganathan

  13. A measure for the best line CSE 5334 Saravanan Thirumuruganathan

  14. Least Squares Line CSE 5334 Saravanan Thirumuruganathan

  15. Prediction CSE 5334 Saravanan Thirumuruganathan

  16. Linear Regression in Higher Dimensions CSE 5334 Saravanan Thirumuruganathan

  17. Linear Regression in Higher Dimensions CSE 5334 Saravanan Thirumuruganathan

  18. Linear Regression in Higher Dimensions CSE 5334 Saravanan Thirumuruganathan

  19. Linear Regression: Objective Function CSE 5334 Saravanan Thirumuruganathan

  20. Linear Regression: Gradient Descent based Solution CSE 5334 Saravanan Thirumuruganathan

  21. Regression Trees CSE 5334 Saravanan Thirumuruganathan

  22. Predicting Baseball salary data Salary is color-coded from low (blue, green) to high (yellow,red) CSE 5334 Saravanan Thirumuruganathan

  23. Decision tree for Baseball Salary Prediction CSE 5334 Saravanan Thirumuruganathan

  24. Decision tree for Baseball Salary Prediction CSE 5334 Saravanan Thirumuruganathan

  25. Interpreting the Decision Tree CSE 5334 Saravanan Thirumuruganathan

  26. Interpreting the Decision Tree Years is the most important factor in determining Salary, and players with less experience earn lower salaries than more experienced players. Given that a player is less experienced, the number of Hits that he made in the previous year seems to play little role in his Salary . But among players who have been in the major leagues for five or more years, the number of Hits made in the previous year does affect Salary , and players who made more Hits last year tend to have higher salaries. Surely an over-simplification, but compared to a regression model, it is easy to display, interpret and explain CSE 5334 Saravanan Thirumuruganathan

  27. High Level Idea Classification Tree: Quality of split measured by general “Impurity measure” Regression Tree: Quality of split measured by “Squared error” CSE 5334 Saravanan Thirumuruganathan

  28. High Level Idea We divide the feature space into J distinct and non-overlapping regions R 1 , R 2 , . . . , R J For every observation that falls into the region R i , we make same prediction, which is simply the mean of the response values for the training observations in R i Objective: Find boxes R 1 , R 2 , . . . , R J that minimizes Residual Sum of Square (RSS) � J � y R i ) 2 ( y j − � RSS = i =1 j ∈ R i where � y R i is the mean response for the training in the i -th box. CSE 5334 Saravanan Thirumuruganathan

  29. Building Regression Trees We first select the feature X i and the cutpoint s such that splitting the feature space into the regions { X | X i < s } and { X | X i ≥ s } leads to the greatest possible reduction in RSS. Next, we repeat the process, looking for the best attribute and best cutpoint in order to split the data further so as to minimize the RSS within each of the resulting regions. The process continues until a stopping criterion is reached; for instance, we may continue until no region contains more than five observations. CSE 5334 Saravanan Thirumuruganathan

  30. Summary Major Concepts: Geometric interpretation of Classification Decision trees CSE 5334 Saravanan Thirumuruganathan

  31. Slide Material References Slides from ISLR book Slides by Piyush Rai Slides from OpenIntro Statistics book ( http://www.webpages.uidaho.edu/~stevel/251/ slides/os2_slides_07.pdf ) See also the footnotes CSE 5334 Saravanan Thirumuruganathan

Recommend


More recommend