confidential missing marital status prediction
play

Confidential Missing Marital Status Prediction for Hypermarkets - PowerPoint PPT Presentation

Confidential Missing Marital Status Prediction for Hypermarkets Project Presentation BADM Team B-5: Sankalp Gaur, Sonali Gadekar, Harshita Jujjuru, Tushna Mistry, Vineet Jain Business Problem Missing values for Marital Status 13%


  1. Confidential Missing Marital Status Prediction for Hypermarkets Project Presentation BADM Team B-5: Sankalp Gaur, Sonali Gadekar, Harshita Jujjuru, Tushna Mistry, Vineet Jain

  2. Business Problem Missing values for ‘Marital Status’ 13% Stakeholder • Marketing team of the supermarket could be the client Use Case • Targeting family bulk shopping offers to family customers Objective • To identify married customers in the customer data set. Benefit • Correct grasp of the marital status for customer segmentation. 2 BADM B-5

  3. Data Mining Problem Analytics Objective • To successful predict (classify) marital status in case the same is missing. Methodology • Supervised Predictive (Classification) task, and both forward-looking and retrospective task as new and old records would fall under its purview. Outcome Variable Objective • Marital status for rows where marital status • To identify FAMILY • To identify MARRIED is currently missing. In fact even those who customers in the customer customers in the customer are unmarried seem to exhibit married data set. data set. behavior 3 BADM B-5

  4. Data Description Customer Data Transaction Data Transaction Level Basket Level Customer Level • KNN • KNN (SKU#, • Classification (Frequency of Age (derived Trees (Age, Class/Subclass, field), Qty Qty Sold, Sex, Age, Dummy Sold, Extended Sex) Extended Price) • Logistic Price) • Association Regression Rules (Classes within a basket) 4 BADM B-5

  5. KNN (Transaction Level Data) Validation error log for different k Training Data scoring - Summary Report (for k=12) Cut off Prob.Val. for Success (Updatable) 0.5 % Error % Error Value of k Training Validation Classification Confusion Matrix 1 2.14 38.53 Predicted Class 2 19.17 37.98 Actual Class Y N 3 19.82 37.06 Y 3951 1008 4 23.96 36.07 N 1933 3105 5 24.52 36.22 6 26.21 35.07 Error Report 7 26.61 35.59 Class # Cases # Errors % Error 8 27.59 34.99 Y 4959 1008 20.33 9 27.98 35.29 N 5038 1933 38.37 10 28.66 34.81 Overall 9997 2941 29.42 11 28.93 35.04 12 29.42 34.71 <--- Best k Validation Data scoring - Summary Report (for k=12) 13 29.63 35.18 14 29.92 34.77 Cut off Prob.Val. for Success (Updatable) 0.5 15 29.94 35.19 Classification Confusion Matrix Predicted Class Actual Class Y N Y 12244 4199 N 7257 9301 Error Report Class # Cases # Errors % Error Y 16443 4199 25.54 N 16558 7257 43.83 Overall 33001 11456 34.71 5 BADM B-5

  6. KNN – Customer Level Aggregation 6 BADM B-5

  7. Classification Tree (Basket Level Data) Cut off Prob.Val. for Success (Updatable) 0.5 Classification Confusion Matrix Predicted Class Actual Class Y N 6309 1408 Y 4172 4887 N Error Report Class # Cases # Errors % Error 7717 1408 18.24543216 Y 9059 4172 46.05364831 N Overall 16776 5580 33.26180258 7 BADM B-5

  8. Association Rules (Basket Level Data) 8 BADM B-5

  9. KNN (Customer Level Data) Predictors • Frequency of Class (in transaction level data) • Age, Dummy variable for Sex 9 BADM B-5

  10. Logistic Regression (Customer Level Data) 10 BADM B-5

  11. Ensemble 11 BADM B-5

  12. 12 BADM B-5

Recommend


More recommend