Introd u ction to click - thro u gh rates P R E D IC TIN G C TR W - PowerPoint PPT Presentation

Introd u ction to click - thro u gh rates P R E D IC TIN G C TR W ITH MAC H IN E L E AR N IN G IN P YTH ON Ke v in H u o Instr u ctor

Click - thro u gh rates Click - thro u gh rate : # of clicks on ads / # of v ie w s of ads Companies and marketers ser v ing ads w ant to ma x imi z e click - thro u gh rate Prediction of click - thro u gh rates is critical for companies and marketers PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

A classification lens Classi � cation : assigning categories to obser v ations Classi � ers u se training data and are e v al u ated on testing data Target : a binar y v ariable , 0/1 for non - click or click Feat u re : an y v ariable u sed to help predict the target PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

A brief look sample data Each ro w represents a partic u lar o u tcome of click or not click for a gi v en u ser for a gi v en ad Filtering for col u mns can be done thro u gh .isin() : df.columns.isin(['device'])] Ass u ming y is a col u mn of clicks , CTR can be fo u nd b y: y.sum()/len(y) PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

Anal yz ing feat u res print(df.device_type.value_counts()) 1 45902 0 2947 print(df.groupby('device_type')['click'].sum()) 0 633 1 7890 PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

Let ' s practice ! P R E D IC TIN G C TR W ITH MAC H IN E L E AR N IN G IN P YTH ON

O v er v ie w of machine learning models P R E D IC TIN G C TR W ITH MAC H IN E L E AR N IN G IN P YTH ON Ke v in H u o Instr u ctor

Logistic regression Logistic regression : linear classi � er bet w een dependent v ariable and independent v ariables PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

Training the model Can create the model v ia : clf = LogisticRegression() Each classi � er has a fit() method w hich takes in an X_train, y_train : clf.fit(X_train, y_train) X_train is the v ector of training feat u res , y_train is the v ector of training targets Classi � er sho u ld onl y see training data to a v oid " seeing ans w ers beforehand " PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

Testing the model Each classi � er has a predict() method w hich takes in an X_test to generate a y_test as follo w s : array([0, 1, 1, ..., 1, 0, 1]) predict_proba() method prod u ces probabilit y scores array([0.2, 0.8], [0.4, 0.6] ..., [0.1, 0.9] [0.3, 0.7]]) Score re � ects probabilit y of a partic u lar ad being clicked b y partic u lar u ser PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

E v al u ating the model Acc u rac y: the percentage of test targets correctl y identi � ed accuracy_score(y_test, y_pred) Sho u ld not be the onl y metric to e v al u ate model , partic u larl y in imbalanced datasets CTR prediction is an e x ample w here classes are imbalanced PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

CTR prediction u sing decision trees P R E D IC TIN G C TR W ITH MAC H IN E L E AR N IN G IN P YTH ON Ke v in H u o Instr u ctor

Decision trees Sample o u tcomes are sho w n in table belo w: First split is based on age of application For y o u th gro u p , second split is based on st u dent stat u s Model pro v ides he u ristics for u nderstanding is _ st u dent loan Nodes represent the feat u res middle _ aged 1 Branches represent the decisions based on feat u res y o u th no 0 y o u th y es 1 PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

Training and testing the model Create v ia : clf = DecisionTreeClassifier() Similar to logistic regression , a decision tree also in v ol v es clf.fit(X_train, y_train) for training data and clf.predict(X_test) for testing labels : array([0, 1, 1, ..., 1, 0, 1]) clf.predict_proba(X_test) for probabilit y scores : array([0.2, 0.8], [0.4, 0.6] ..., [0.1, 0.9] [0.3, 0.7]]) E x ample for randoml y spli � ing training and testing data , w here testing data is 30% of total sample si z e : train_test_split(X, y, test_size = .3, random_state = 0) PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

E v al u ation w ith ROC c u r v e Tr u e positi v e rate ( Y - a x is ) = #( classi � er predicts positi v e , act u all y positi v e ) / #( positi v es ) False positi v e rate ( X - a x is ) = #( classi � er predicts positi v e , act u all y negati v e ) / #( negati v es ) Do � ed bl u e line : baseline AUC of 0.5 Want orange line ( AUC ) to be as close to 1 as possible PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

AUC of ROC c u r v e Y_score = clf.predict_proba(X_test) fpr, tpr, thresholds = roc_curve(Y_test, Y_score[:, 1]) roc_curve() inp u ts : test and score arra y s roc_auc = auc(fpr, tpr) auc() inp u t : false - positi v e and tr u e - positi v e arra y s If model is acc u rate and CTR is lo w, y o u ma y w ant to reassess ho w the ad message is rela y ed and w hat a u dience it is targeted for PREDICTING CTR WITH MACHINE LEARNING IN PYTHON

Introd u ction to click - thro u gh rates P R E D IC TIN G C TR W - PowerPoint PPT Presentation

Introd u ction to click - thro u gh rates P R E D IC TIN G C TR W ITH MAC H IN E L E AR N IN G IN P YTH ON Ke v in H u o Instr u ctor Click - thro u gh rates Click - thro u gh rate : # of clicks on ads / # of v ie w s of ads Companies and

Introd u ction to Te x t Encoding FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G

Introd u ction to statistical seismolog y C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u

Introd u ction to P y D u b SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u

Introd u ction to a u dio data in P y thon SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Introd u ction to Net w orks IN TR OD U C TION TO N E TW OR K AN ALYSIS IN P YTH ON Eric Ma

Introd u ction to iterators P YTH ON DATA SC IE N C E TOOL BOX ( PAR T 2 ) H u go Bo w ne -

Introd u ction to s w imming data C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u stin Bois

Welcome and Introd u ction SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and

Introd u ction : Working With Web Data in R W OR K IN G W ITH W E B DATA IN R Oli v er Ke y

Introd u ction to APIs and JSONs IN TE R ME D IATE IMP OR TIN G DATA IN P YTH ON H u go Bo w

Introd u ction to the dataset W OR K IN G W ITH G E OSPATIAL DATA IN P YTH ON Joris Van den

Introd u ction IN TE R ME D IATE IN TE R AC TIVE DATA VISU AL IZATION W ITH P L OTLY IN R

Introd u ction to Teacher Forcing MAC H IN E TR AN SL ATION IN P YTH ON Th u shan Ganegedara

Introd u ction VISU AL IZIN G G E OSPATIAL DATA IN P YTH ON Mar y v an Valkenb u rg Data

Introd u ction to databases STR E AML IN E D DATA IN G E STION W ITH PAN DAS Aman y Mahfo uz

Introd u ction to EFA FAC TOR AN ALYSIS IN R Jennifer Br u sso w Ps y chometrician Ps y cho +

Introd u ction to E x plorator y Data Anal y sis STATISTIC AL TH IN K IN G IN P YTH ON ( PAR T 1

INTROD TRODUCT CTION TO TO PRI RIOR ORITY TY-BASED ED B BUDGET ET BUDGETI TING F FOR

Introd u ction to animation IN TE R ME D IATE IN TE R AC TIVE DATA VISU AL IZATION W ITH P L

Introd u ction to spreadsheets STR E AML IN E D DATA IN G E STION W ITH PAN DAS Aman y Mahfo

Introd u ction to relational databases IN TR OD U C TION TO IMP OR TIN G DATA IN P YTH ON H u

Introd u ction to the Co u rse TIME SE R IE S AN ALYSIS IN P YTH ON Rob Reider Adj u nct

Introd u ction to the NASA fireball data set BU IL D IN G DASH BOAR D S W ITH SH IN YDASH BOAR

Introd u ction to common marketing metrics AN ALYZIN G MAR K E TIN G C AMPAIG N S W ITH PAN