Real-Time in the Real World: Building a State of the Art Real-Time Analytics Platform INFORMS Analytics Conference April 14 th , 2015 Joe DeCosmo Chief Analytics Officer Enova International
We are a growing global online Lender • Founded and HQ’d in Chicago since 2004 • 1,000+ employees – 500 corporate employees k • Scalable and flexible technology platform • Proprietary analytics and data • Publicly traded on NYSE (ENVA) since Nov. 13 th , 2014 k Over $13 billion in credit extended to 3MM+ customers around the world
Our Analytics Team … • 51 employees, six teams • 24 new hires in the last 12 months, 8 interns • Mix of college hires and experienced analysts CAO Portfolio Marketing Research and Data Services BI Fraud Analytics Analytics Platforms Our teams are responsible for protecting and growing the profitability of the company through data-driven decision-making
Technologies we use and key challenges we face every day o Model accuracy SAS R o High volume, low latency Python → 30K to 75K decisions per hour, < 1 second Matlab response Wolfram/Mathematica Microstrategy o Multi-national models and SQL data Pentaho o What about growth? Hadoop
Real-time analytics is critical for a fast and easy online customer experience 1. Apply 2. Underwrite 3. Accept & Fund 4. Service Easy-to-complete Decisioning Accept Multi-Channel identity, in 3 to 6 seconds, agreements Service employment, analytics system pulls reviewed and U.S. based in-house income, payroll data, determines signed online service center 24/7 date, bank account credit worthiness, and for assistance and information presents an offer payment Funding Proprietary via ACH by next Multi-stage Advanced Analytics business day in Systems Screening massive parallel U.S., within 10 tailored CRM system to verify identity processing of 100 minutes to debit integrated with and prevent fraud algorithms, 1,000 card in U.K. analytics engine and variables, 10 years marketing channels and 9 TB of customer behavior data 5
Implementing models in real-time can be challenging Modeler builds model offline in R, SAS, Python, etc Modeler translates model for engineering Specifications are routed to software engineer (via a project manager) Software engineer attempts to implement the specifications in the production system in scalable fashion Software engineer and modeler compare results of test records and debug This process could take weeks or even months for a complex model !!
Even simple instructions can lead to errors … Instruction: “Take the log of the variable X1” Modelers Intent: Natural log of X1, if X1 < 0 then use a specified default value Software Engineer’s Implementation: Log Base 10 of X1. If X1<0, then language throws an error.
Our old system ● Home-grown system called MEF (Mathematical Equations Framework) ● Used proprietary domain-specific language to specify models ● Constructed in-house in written in C Variable Name Group Operation Coefficient Default A alpha mult 2.5 1.0 Result = A * 2.5 + B * 1.7 + 4.3 If A is null, replace with 1.0 B alpha mult 1.7 4 If B is null, replace with 4 Intercept alpha mult 4.3 4.3 alpha agg sum 1 1
MEF pros and cons Pros Cons • Tightly integrated into production • Tightly integrated into production system system • Developed and maintained in house • Limited model types • It worked!!! • Slow model deployment and tedious reconciliation • Only a few people really knew how it worked!!
So what did we do? 10
We built a brand new platform … code name COLOSSUS! Beginning in Q3 2013, we set out to build a new real-time platform from the ground up … – Support for a wider variety of models, data sources, and variables – Ability to execute scoring code from multiple platforms – Greatly speed up the time from modeling to deployment – Analysts can deploy models without software engineering – Can be called by any Enova or external service and return a result – Can scale up to 500,000 model evaluations per hour
We moved quickly and took a phased approach Q4 2013 Q2 2014 • Initial • Design and • GO Requirements build LIVE!!! • RFP • Complete build • Vendor List • Paid Pilot • Implementation • Final • Training Selection Q3 2013 Q1 2014 Q3 2014
We evaluated a variety of vendors and platforms ● Statistical Computing Software Vendors ● Scientific Computing Software Vendors ● Open Source Software with Paid Consultants We took two finalists into a paid pilot project and made our final decision based on the results of the pilot
What did we build? COLOSSUS Request translated into readable key-value rules EnovaModelEvaluate Application 1 EnovaConditionalModelSelector JSON request Model selection based on raw key- value rules Application 2 DefaultVariableCheckAndFill Variable exception checks are run, • clean key value rules are prepared • JSON • response EnovaModelApply Model is applied to clean-key-value Application n rules Results including detailed variable handling information is converted back to JSON 14
Where are we now? ● Currently have 27 models in production ● Cut average model deployment time in half ● Processed 2.7 Million model requests in first two months of 2015 ● Average model response time of .07 seconds We’re rolling out to all of our lines of business now and improving the variable fetching and storage to drive response time even lower
Lessons Learned 16
We learned a lot of lessons (some the hard way!) … 1. Meet the partners’ implementation teams 2. Invest in a paid pilot 3. Provide live use cases, example data, and unit tests at the very beginning 4. Define and enforce common architecture, vocabulary, and documentation 5. Plan for repairs
Joe DeCosmo Chief Analytics Officer jdecosmo@enova.com 200 West Jackson Blvd. Chicago, IL 60606 Tel 312.800.4390
Recommend
More recommend