Ch u rn prediction f u ndamentals MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P YTH ON Karolis Urbonas Head of Anal y tics & Science , Ama z on
What is ch u rn ? Ch u rn happens w hen a c u stomer stops b uy ing / engaging The b u siness conte x t co u ld be contract u al or non - contract u al Sometimes ch u rn can be v ie w ed as either v ol u ntar y or in v ol u ntar y MACHINE LEARNING FOR MARKETING IN PYTHON
T y pes of ch u rn Main ch u rn t y polog y is based on t w o b u siness model t y pes : Contract u al ( phone s u bscription , TV streaming s u bscription ) Non - contract u al ( grocer y shopping , online shopping ) MACHINE LEARNING FOR MARKETING IN PYTHON
Modeling different t y pes of ch u rn T y picall y: Non - contract u al ch u rn is harder to de � ne and model , as there ' s no e x plicit c u stomer decision We w ill model contract u al ch u rn in the telecom b u siness model MACHINE LEARNING FOR MARKETING IN PYTHON
Encoding ch u rn T y picall y 1/0, w ith 1 = Ch u rn , 0 = No Ch u rn Co u ld be a string Churn / No Churn or Yes / No - best practice to transform as 1 and 0 set(telcom['Churn']) {0, 1} MACHINE LEARNING FOR MARKETING IN PYTHON
E x ploring ch u rn distrib u tion telcom.groupby(['Churn']).size() / telcom.shape[0] * 100 Churn 0 73.421502 1 26.578498 dtype: float64 MACHINE LEARNING FOR MARKETING IN PYTHON
Split to training and testing data from sklearn.model_selection import train_test_split train, test = train_test_split(telcom, test_size = .25) MACHINE LEARNING FOR MARKETING IN PYTHON
Separate feat u res and target v ariables Separate col u mn names b y data t y pes target = ['Churn'] custid = ['customerID'] cols = [col for col in telcom.columns if col not in custid + target] B u ild training and testing datasets train_X = train[cols] train_Y = train[target] test_X = test[cols] test_Y = test[target] MACHINE LEARNING FOR MARKETING IN PYTHON
Let ' s go practice ! MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P YTH ON
Predict ch u rn w ith logistic regression MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P YTH ON Karolis Urbonas Head of Anal y tics & Science , Ama z on
Introd u ction to logistic regression Statistical classi � cation model for binar y responses Models log - odds of the probabilit y of the target Ass u mes linear relationship bet w een log - odds target and predictors Ret u rns coe � cients and prediction probabilit y MACHINE LEARNING FOR MARKETING IN PYTHON
Modeling steps 1. Split data to training and testing 2. Initiali z e the model 3. Fit the model on the training data 4. Predict v al u es on the testing data 5. Meas u re model performance on testing data MACHINE LEARNING FOR MARKETING IN PYTHON
Fitting the model Import the Logistic Regression classi � er from sklearn.linear_model import LogisticRegression Initiali z e Logistic Regression instance logreg = LogisticRegression() Fit the model on the training data logreg.fit(train_X, train_Y) MACHINE LEARNING FOR MARKETING IN PYTHON
Model performance metrics Ke y metrics : Acc u rac y - The % of correctl y predicted labels ( both Ch u rn and non Ch u rn ) Precision - The % of total model ' s positi v e class predictions ( here - predicted as Ch u rn ) that w ere correctl y classi � ed Recall - The % of total positi v e class samples ( all ch u rned c u stomers ) that w ere correctl y classi � ed MACHINE LEARNING FOR MARKETING IN PYTHON
Meas u ring model acc u rac y from sklearn.metrics import accuracy_score pred_train_Y = logreg.predict(train_X) pred_test_Y = logreg.predict(test_X) train_accuracy = accuracy_score(train_Y, pred_train_Y) test_accuracy = accuracy_score(test_Y, pred_test_Y) print('Training accuracy:', round(train_accuracy,4)) print('Test accuracy:', round(test_accuracy, 4)) Training accuracy: 0.8108 Test accuracy: 0.8009 MACHINE LEARNING FOR MARKETING IN PYTHON
Meas u ring precision and recall from sklearn.metrics import precision_score, recall_score train_precision = round(precision_score(train_Y, pred_train_Y), 4) test_precision = round(precision_score(test_Y, pred_test_Y), 4) train_recall = round(recall_score(train_Y, pred_train_Y), 4) test_recall = round(recall_score(test_Y, pred_test_Y), 4) print('Training precision: {}, Training recall: {}'.format(train_precision, train_recall print('Test precision: {}, Test recall: {}'.format(train_recall, test_recall)) Training precision: 0.6725, Training recall: 0.5736 Test precision: 0.5736, Test recall: 0.4835 MACHINE LEARNING FOR MARKETING IN PYTHON
Reg u lari z ation Introd u ces penalt y coe � cient in the model b u ilding phase Addresses o v er -� � ing (w hen pa � erns are " memori z ed b y the model ") Some reg u lari z ation techniq u es also perform feat u re selection e . g . L 1 Makes the model more generali z able to u nseen samples MACHINE LEARNING FOR MARKETING IN PYTHON
L 1 reg u lari z ation and feat u re selection LogisticRegression from sklearn performs L 2 reg u lari z ation b y defa u lt L 1 reg u lari z ation or also called LASSO can be called e x plicitl y, and this approach performs feat u re selection b y shrinking some of the model coe � cients to z ero . from sklearn.linear_model import LogisticRegression logreg = LogisticRegression(penalty='l1', C=0.1, solver='liblinear') logreg.fit(train_X, train_Y) C parameter needs to be t u ned to � nd the optimal v al u e MACHINE LEARNING FOR MARKETING IN PYTHON
T u ning L 1 reg u lari z ation C = [1, .5, .25, .1, .05, .025, .01, .005, .0025] l1_metrics = np.zeros((len(C), 5)) l1_metrics[:,0] = C for index in range(0, len(C)): logreg = LogisticRegression(penalty='l1', C=C[index], solver='liblinear') logreg.fit(train_X, train_Y) pred_test_Y = logreg.predict(test_X) l1_metrics[index,1] = np.count_nonzero(logreg.coef_) l1_metrics[index,2] = accuracy_score(test_Y, pred_test_Y) l1_metrics[index,3] = precision_score(test_Y, pred_test_Y) l1_metrics[index,4] = recall_score(test_Y, pred_test_Y) col_names = ['C','Non-Zero Coeffs','Accuracy','Precision','Recall'] print(pd.DataFrame(l1_metrics, columns=col_names) MACHINE LEARNING FOR MARKETING IN PYTHON
Choosing optimal C v al u e MACHINE LEARNING FOR MARKETING IN PYTHON
Choosing optimal C v al u e MACHINE LEARNING FOR MARKETING IN PYTHON
Let ' s r u n some logistic regression models ! MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P YTH ON
Predict ch u rn w ith decision trees MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P YTH ON Karolis Urbonas Head of Anal y tics & Science , Ama z on
Introd u ction to decision trees MACHINE LEARNING FOR MARKETING IN PYTHON
Modeling steps 1. Split data to training and testing 2. Initiali z e the model 3. Fit the model on the training data 4. Predict v al u es on the testing data 5. Meas u re model performance on testing data MACHINE LEARNING FOR MARKETING IN PYTHON
Fitting the model Import the decision tree mod u le from sklearn.tree import DecisionTreeClassifier Initiali z e the Decision Tree model mytree = DecisionTreeClassifier() Fit the model on the training data treemodel = mytree.fit(train_X, train_Y) MACHINE LEARNING FOR MARKETING IN PYTHON
Meas u ring model acc u rac y from sklearn.metrics import accuracy_score pred_train_Y = mytree.predict(train_X) pred_test_Y = mytree.predict(test_X) train_accuracy = accuracy_score(train_Y, pred_train_Y) test_accuracy = accuracy_score(test_Y, pred_test_Y) print('Training accuracy:', round(train_accuracy,4)) print('Test accuracy:', round(test_accuracy, 4)) Training accuracy: 0.9973 Test accuracy: 0.7196 MACHINE LEARNING FOR MARKETING IN PYTHON
Meas u ring precision and recall from sklearn.metrics import precision_score, recall_score train_precision = round(precision_score(train_Y, pred_train_Y), 4) test_precision = round(precision_score(test_Y, pred_test_Y), 4) train_recall = round(recall_score(train_Y, pred_train_Y), 4) test_recall = round(recall_score(test_Y, pred_test_Y), 4) print('Training precision: {}, Training recall: {}'.format(train_precision, train_recall print('Test precision: {}, Test recall: {}'.format(train_recall, test_recall)) Training precision: 0.9993, Training recall: 0.9906 Test precision: 0.9906, Test recall: 0.4878 MACHINE LEARNING FOR MARKETING IN PYTHON
Tree depth parameter t u ning depth_list = list(range(2,15)) depth_tuning = np.zeros((len(depth_list), 4)) depth_tuning[:,0] = depth_list for index in range(len(depth_list)): mytree = DecisionTreeClassifier(max_depth=depth_list[index]) mytree.fit(train_X, train_Y) pred_test_Y = mytree.predict(test_X) depth_tuning[index,1] = accuracy_score(test_Y, pred_test_Y) depth_tuning[index,2] = precision_score(test_Y, pred_test_Y) depth_tuning[index,3] = recall_score(test_Y, pred_test_Y) col_names = ['Max_Depth','Accuracy','Precision','Recall'] print(pd.DataFrame(depth_tuning, columns=col_names)) MACHINE LEARNING FOR MARKETING IN PYTHON
Choosing optimal depth MACHINE LEARNING FOR MARKETING IN PYTHON
Choosing optimal depth MACHINE LEARNING FOR MARKETING IN PYTHON
Let ' s b u ild a decision tree ! MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P YTH ON
Identif y and interpret ch u rn dri v ers MAC H IN E L E AR N IN G FOR MAR K E TIN G IN P YTH ON Karolis Urbonas Head of Anal y tics & Science , Ama z on
Recommend
More recommend