Churn Prediction using Dynamic RFM-Augmented node2vec Sandra Mitrovi ć , Jochen de Weerdt, Bart Baesens & Wilfried Lemahieu Department of Decision Sciences and Information Management, KU Leuven 18 September 2017, DyNo Workshop, ECML 2017 Skopje, Macedonia
Outline • Introduction • Motivation • Methodology • Experimental evaluation • Results • Conclusion • Future work 2 Churn Prediction using Dynamic RFM-Augmented node2vec
Introduction Churn prediction (CP) • Predict which customers are going to leave company’s services Still considered as topmost challenge for Telcos (FCC report, 2009) o • Due to acquisition/retention cost imbalance • Different types of data used for CP Subscription, socio-demographic, customer complaints etc. o More recently: Call Detail Records (CDRs) o • CDRs -> call graphs 3 Churn Prediction using Dynamic RFM-Augmented node2vec
Call graph featurization Extracting informative features from (call) graphs • An intricate process, due to: Complex structure / different types of information o • Topology-based (structural) • Interaction-based (as part of customer behavior) • Edge weights quantifying customer behavior Dynamic aspect o • Call graph are time-evolving • Both nodes and edges volatile • Churn = lack of activity 4 Churn Prediction using Dynamic RFM-Augmented node2vec
Motivation Problems identified (w.r.t. current literature) • Not many studies account for dynamic aspects of call networks Especially not jointly with interaction and structural features o • Structural features are under-exploited • Due to high computational time in large graphs (e.g. betweenness centrality) And without using ad-hoc handcrafted features o • No featurization methodology • Dataset dependent Our goal • Performing holistic featurization of call graphs • Incorporating both interaction and structural information • Avoiding/reducing feature handcrafting • While also capturing the dynamic aspect of the network 5 Churn Prediction using Dynamic RFM-Augmented node2vec
Methodology How do we address these goals? Devise different operationalizations G1: Incorporating both of RFM features and novel RFM- interaction and structural augmented call graph architectures information G2: Avoiding/reducing Opt for representation learning feature handcrafting G3: Capturing the dynamic Slice original network into weekly aspect of the network snapshots 6 Churn Prediction using Dynamic RFM-Augmented node2vec
Integrating interaction and structural information Interactions Interactions (current literature) (this work) • Usually delineated with RFM • Summary RFM ( RFM s ) (Recency,Frequency,Monetary) • Detailed RFM ( RFM d ) variables Direction & destination sliced: o Benefits: o X out_h, X out_o, X in, X {R,F,M} ∈ • Simple • Churn RFM ( RFM ch ) • Yet still with good predictive power Only w.r.t. churners o Many different o operationalizations • Different dimensions • Different granularities 7 Churn Prediction using Dynamic RFM-Augmented node2vec
RFM-Augmented networks • Original topology extended By introducing artificial nodes based on RFM o Structural information partially preserved o • Each of R, F, M partitioned into 5 quantiles One artificial node assigned to each quantile o Interaction info embedded through extended o topology Network topology RFM features 4 augmented networks • RFM s • AG s + • RFM s || RFM c h • AG s+ch • RFM d • AG d • RFM d || RFM c h • AG d+c h 8 Churn Prediction using Dynamic RFM-Augmented node2vec
Representation learning Node2vec • Idea: Bring the representations of the words from the same context C close (borrowed from SkipGram) Learn f, f: V -> R d , d<< |V| s.t. max Σ v in V log Pr(C v | f(v)) o • Definition of context in graph setting? Neighborhoods/Random walks o • Of which order? How to perform a walk? • Flexible walks using additional parameters Return parameter p o In-out parameter q o Coming from i, probability to transition o w jk , if d ik = 1 from j to k is: w jk /p, if d ik = 0 w jk /q, if d ik = 2 Figure source: Grover & Leskovec, 2016 9 Churn Prediction using Dynamic RFM-Augmented node2vec
Node2vec -> scalable node2vec Node2vec Scalable node2vec • • Accounts both for previous Accounts only for current node and current node • No additional parameters • Additional parameters (p,q) • Requires precomputation of • To make walks efficient, probability transitions only on requires precomputation of node level probability transitions: Alias sampling retained o On node level (1 st time) o On edge level (successive) o Therefore, scales well even on Alias sampling used for large graphs! o efficient sampling • reduces O(n) to O(1) However, does not scale well on large graphs! (our case ~ 40M edges) 10 Churn Prediction using Dynamic RFM-Augmented node2vec
Dynamic graphs Different definitions (current literature) • G = (V, E, T) • G = (V, E, T, Δ T) • G = (V, E, T, σ , Δ T) Standard approach • Consider several static snapshots of a dynamic graph Our setting • Monthly call graph G = (V, E) -> Four temporal graphs G i = (V i , E i , w i ), i =1,..,4 11 Churn Prediction using Dynamic RFM-Augmented node2vec
Methodology – Graphical overview 12 Churn Prediction using Dynamic RFM-Augmented node2vec
Experimental Evaluation (1/2) • • Evaluation One prepaid, one postpaid dataset AUC, lift (0.5%) 4 months data (only CDRs) o o • Parameter Scalable node2vec Undirected networks # walks 10 • Model walk length 30 context size 10 Logistic regression with L 2 regul. o # dimen. 128 (10-fold CV for tuning hyperparam.) # iterations 5 13 Churn Prediction using Dynamic RFM-Augmented node2vec
Experimental Evaluation (2/2) Research questions • RQ1: Do features taking into account dynamic aspects perform better than static ones? • RQ2: Do RFM-augmented network constructions improve predictive performance? • RQ3: Does the granularity of interaction information (summary, summary +churn, detailed, detailed+churn) influence the predictive performance? Experiments RFM s stat. vs. RFM s dyn. vs. AG s stat. vs. AG s dyn. -> summary o RFM s+ch stat. vs. RFM s+ch dyn. vs. AG s+ch stat. vs. AG s+ch dyn. -> summary+churn o RFM d stat. vs. RFM d dyn. vs. AG d stat. vs. AG d dyn. -> detailed o RFM d+ch stat. vs. RFM d+ch dyn. vs. AG d+ch stat. vs. Ag d+ch dyn. -> detailed +churn o 14 Churn Prediction using Dynamic RFM-Augmented node2vec
Experimental results (1/2) Prepaid • RQ1 Answer: Dynamic better than static! • RQ2 Answer: RFM-augmented networks improve predictive performance • RQ3 Answer: Best performing interaction granularity is: summary+churn • Second best: detailed+churn 15 Churn Prediction using Dynamic RFM-Augmented node2vec
Experimental results (2/2) Postpaid • RQ1 Answer: Dynamic better than static! • RQ2 Answer: RFM-augmented networks improve predictive performance • RQ3 Answer: Best performing interaction granularity is summary+churn • Second best: summary 16 Churn Prediction using Dynamic RFM-Augmented node2vec
Conclusion • We design RFM-augmentations of original graphs Enable conjoining interaction and structural information o • We devise a scalable adaption of the original node2vec approach Relaxing random walk generation and avoiding grid search tuning for two o additional parameters • Conducted experiments showcase the performance benefits which stem from taking into account the dynamic aspect Also from exploiting RFM-augmented networks and learning node o representations from these • Novelty: First work both in using (dynamic) node representations in CDR o graphs for churn prediction and First work in applying the RFM framework together with o unsupervised and dynamic learning of node representations 17 Churn Prediction using Dynamic RFM-Augmented node2vec
Future research • Attempt capturing call dynamics in a more sophisticated manner (e.g. the ordering of calls, their inter-event time distribution) • Investigate the effect of different time granularities • Explore whether prioritizing more recent dynamic networks improves performance 18 Churn Prediction using Dynamic RFM-Augmented node2vec
Thank you! Questions? Email: sandra.mitrovic@kuleuven.be
Recommend
More recommend