house prices advanced regression techniques
play

House Prices: Advanced Regression Techniques Haiyang Shi Apr. 17, - PowerPoint PPT Presentation

House Prices: Advanced Regression Techniques Haiyang Shi Apr. 17, 2018 Outline Introduction ML Techniques Feature Engineering Experiments Observations The Ohio State University 2 Introduction Goal : predicting the


  1. House Prices: Advanced Regression Techniques Haiyang Shi Apr. 17, 2018

  2. Outline • Introduction • ML Techniques • Feature Engineering • Experiments • Observations The Ohio State University 2

  3. Introduction • Goal : predicting the final price for each house using advanced regression techniques. • Data : a Kaggle competition, based on property data in Ames, Iowa from 2006 and 2010. • Evaluation : Root-Mean-Square-Error (RMSE) (the log price is to reduce the impact of biased higher price). + (log 0 ∑ ()* ( ) 4 1 ( − log 1 !"#$% = 5 The Ohio State University 3

  4. ML Techniques • Linear Regression: Ridge, Lasso • Support Vector Regression • Random Forest • Adaptive Boosting • Gradient Boosted Decision Tree • K Nearest Neighbors • Neural Network The Ohio State University 4

  5. Feature Engineering • Impute missing values • Clean outliers • Categorize categorical attributes • Transform skewed attributes • Generate features* • Select feature subset The Ohio State University 5

  6. Missing Values and Highly Correlated Attributes Attribute Missing Values Attribute 1 Attribute 2 Correlation BsmtFullBath 2 MSSubClass BldgType 0.75 BsmtHalfBath 2 OverallQual ExterQual 0.72 GarageYrBlt 159 OverallQual SalePrice 0.82 GarageCars 1 YearBuilt GarageYrBlt 0.78 LotFrontage 486 Exterior1st Exterior2nd 0.86 MasVnrArea 23 ExterQual KitchenQual 0.72 BsmtFinSF1 1 TotalBsmtSF 1stFlrSF 0.78 BsmtFinSF2 1 GrLivArea TotRmsAbvGrd 0.82 BsmtUnfSF 1 GrLivArea SalePrice 0.73 TotalBsmtSF 1 Fireplaces FireplaceQu 0.80 GarageArea 1 GarageCars GarageArea 0.89 GarageQual GarageCond 0.90 The Ohio State University 6

  7. Outliers The Ohio State University 7

  8. Skewness • Log Transformation The Ohio State University 8

  9. Skewness (Cont.) Before After The Ohio State University 9

  10. Bivariate Relationship Analysis The Ohio State University 10

  11. Experiments • GridSearchCV to select hyperparameters • 10-fold cross validation The Ohio State University 11

  12. Experiments • Lasso Regression – Most important feature engineering • Transformation of skewed data • Categorization of categorical attributes • Imputation of missing values – Score • 0.12789 – Most important features • Above grade (ground) living area square feet • Lot size in square feet • Rates the overall material and finish of the house The Ohio State University 12

  13. Experiments • Random Forest – Main hyperparameters • n_estimators (800): number of trees in the forest • max_features (0.3): number of features to consider when looking for the best split • max_depth (20): maximum depth – Deeper trees with smaller max_features performs better – Resilient to data preprocessing with smaller max_features – Score • 0.14169 The Ohio State University 13

  14. Experiments • Gradient Boosted Decision Tree – Main hyperparameters: n_estimators (3000), learning_rate (0.05), max_features (log2) and max_depth (3) – Score: 0.12365 • Support Vector Regression – Main hyperparameters: kernel (linear) – Score: 0.15413 • Adaptive Boosting – Main hyperparameters: base_estimator (DecisionTree(max_features=0.3)) – Score: 0.14149 The Ohio State University 14

  15. Experiments • K Nearest Neighbors – Main hyperparameters: n_neighbors (11) – Score: 0.24084 • Neural Network – Main hyperparameters: hidden_layer_sizes ((30, 30, 30, 30)) – Score: 0.23495 The Ohio State University 15

  16. Observations • Feature engineering is very important – Feature selection – Feature creation • Transforming neighborhood attribute to geographical location – Feature combination • Overfitting is considered harmful, and cross validation alone is not enough • Tuning hyperparameters is very time consuming More details: http://www.shihaiyang.me/2018/04/16/house-prices/ The Ohio State University 16

  17. Thank You! The Ohio State University 17

Recommend


More recommend