BADM Project – Hyper Market Classifying Biscuit Brand Switchers for Targeted Marketing for a Biscuit Manufacturer Group B4 - Minesweepers Aditi Vaish | Pranav Maranganty | Kevin John | Deepak Agnihotri | Archana Rajan
Business Problem • Client Profile and Background • Minesweepers Biscuits (MSB) based out of Denmark • Renowned brand internationally but limited brand presence in India • Expansion to India with new products and innovative promotions • Competition • Brands like Britannia and Parle own a big pool of loyalists Number of Enrollments • Business Objective 10000 8000 • Increase Trial Rate of MSB products and hence Market Share by 6000 partnering with Hyper Market 4000 2000 • Improve Marketing Efficiency by targeting only the “Brand Switchers” 0 • Offer samples and personalized promotions at checkout counters or at kiosks of the Hyper Market to the “Brand Switchers” Selected All Consumers for Consumers better ROI on marketing
Data Mining Problem • Objective • Classify new customers as “Brand Loyalist” (1) and “Brand Switchers” (0) based on demographics and purchase patterns in “Ready Food” • New Customer – A customer who has made 2 purchases from the “Ready Food” department “Brand Loyalist”: A person who has Continuous Evaluation and Classification purchased biscuits at least 3 times and purchased the same brand over 50% of the times Business Rules Classification MSB’s Target
Data • Key Inputs ( Vary depending on the various models) Customer Historical Purchase Pattern Last Purchase Second Last Purchase Demographics ( Ready Food) (Ready Food) (Ready Food) Age Average Basket Price Quantity Quantity Sex Average Basket Quantity Price Price Marital Status Average Basket Unique Count Unique SKU Count Unique SKU Count Enrollment Store Number of Baskets Standard Deviation of Basket Price A person who Standard Deviation of Basket Quantity has purchased biscuits at least 3 times Data Data Partition Total Unique Customer: 4843 Training Set : 2172 Brand Loyalists : 2886 (59.6%) Validation Set : 1303 Brand Switchers: 1957 (40.4%) Test Set: 868 Initial Classification: Hold Out for Model Evaluation: 500 A person who has purchased biscuits at least 3 times and purchased the same • Output brand over 50% of the times • Loyalists? (0 – Brand Switcher , 1 – Brand Loyalist)
Probability Cutoff: 0.4 Methods Training Set Validation Set Test Set Input variables Coefficient Logistic Regression - Stepwise Initial No. of Variables: 21 # Variables based on Cp: 20 Constant term -47.1890297 Age 0.00544622 Sex_F 1.11986947 Class # Cases # Errors % Error Class # Cases # Errors % Error Class # Cases # Errors % Error Sex_M 1.03916395 0 905 386 42.65 0 524 222 42.37 0 359 142 39.55 Enrollment Store_1001 -0.67768991 Enrollment Store_1002 0.1495695 1 1267 444 35.04 1 779 272 34.92 1 509 198 38.90 Marital Status_N 0.09804565 Marital Status_Y 0.07624547 Overall 2172 830 38.21 Overall 1303 494 37.91 Overall 868 340 39.17 Email_Y -0.05565267 Average Basket Price -0.00135054 Naïve Bayes # Input Variables : 6 Average Basket Quantity -0.04568335 Number of Baskets 0.00852676 StdDev of Basket Price 0.00019505 StdDev of Basket Quantity 0.02833186 Class # Cases # Errors % Error Class # Cases # Errors % Error Class # Cases # Errors % Error Last Transaction Date 0.00110265 0 905 241 26.63 0 524 208 39.69 0 359 146 40.67 Last Purchase Unique Count 0.01369065 Last Purchase Price 0.00012139 1 1267 456 35.99 1 779 334 42.88 1 509 235 46.17 Last Purchase Quantity -0.00228719 Second Last Purchase -0.00255478 Overall 2172 697 32.09 Overall 1303 542 41.60 Overall 868 381 43.89 Second Last Purchase Price 0.00014995 K-NN Best K: 5 # Input Variables : 14 % Error % Error Value of k Training Validation 1 0.00 44.67 Class # Cases # Errors % Error Class # Cases # Errors % Error Class # Cases # Errors % Error 2 23.94 49.12 3 23.16 45.13 0 905 112 12.38 0 524 148 28.24 0 359 95 26.46 4 27.53 47.51 5 28.68 43.13 1 1267 613 48.38 1 779 508 65.21 1 509 313 61.49 6 30.66 47.12 7 30.34 43.90 Overall 2172 725 33.38 Overall 1303 656 50.35 Overall 868 408 47.00 8 31.72 45.89 9 32.27 43.75 10 32.97 44.74 CART Pruned Tree : 3 Nodes # Input Variables: 18 We tried different Class # Cases # Errors % Error Class # Cases # Errors % Error Class # Cases # Errors % Error # variables, but all 0 905 0 0.00 0 524 189 36.07 0 359 125 34.82 had some over 1 1267 0 0.00 1 779 321 41.21 1 509 215 42.24 fitting Overall 2172 0 0.00 Overall 1303 510 39.14 Overall 868 340 39.17
Model Evaluation (Test Data) • Naïve Rule is considered as the benchmark 100.00% 100.00% for evaluation with all customers tagged as “Brand Loyalist” 80.00% 60.00% • Key Metrics for Evaluation 41.36% • Sensitivity (0 1) 40.00% • All models are better than the benchmark 20.00% • % Total Error 0.00% 0.00% Logistic Regression and CART fare better than • Sensitivity Specificity % Total Error benchmark Benchmark Logistic Regression Naïve Bayes K-NN CART • Misclassification Costs 80000 • INR 120 for 0 1 (Customer Value) • INR 20 for 1 0 (Coupons and Samples) 61080 60000 All models fare better than benchmark • 40000 • Holdout Evaluation - K-NN has some over fitting 22220 21000 19300 Logistic Regression CART 17660 20000 Class # Cases # Errors % Error Class # Cases # Errors % Error 0 0 190 86 45.26% 190 144 75.79% 0 Benchmark Logistic Naïve Bayes K-NN CART 1 1 310 102 32.90% 310 73 23.55% Regression 500 188 37.60% 500 217 43.40% Overall Overall
Recommendations • Deploy Logistic Regression Model for classifying new customers • Low Misclassification Costs • Similar Accuracy across all Data • Better Overall Error and Sensitivity • Easy Deployment of Model / Stable Model • Continuously improve the model by updating the classifications and adding more data Next Steps • Include External Demographics data to improve the model • Expand model to include other products of MSB
Recommend
More recommend