Transferring Deep Learning into a Recommender System By Christophe Duong, Data Scientist Big Data Paris 2017
• High-end Product -> Appearance is crucial • Fewer recurrent buyers -> Web visit patterns are essential -> Short visit sessions (browsing 4-10 first • Mostly flash sales, i.e. volatile, ephemeral sales) -> Unlike amazon / Price Minister / Cdiscount • Challenging context for standard recommender systems -> Sales are seasonal (Christmas, ski, summer)
A data science workflow with Six steps to a predictive model Business Understandin Scored g dataset Data Model Deployment Preparation Creation Scored dataset Iteration 1 Iteration 2 Dataset 1 Iteration n Creating a predictive model is a highly iterative process. Data Dataset Data Exploration & Data Science Studio enables its Evaluation 2 Acquisition Understandin users to create and manage these g projects from end-to-end. This process is not industry specific, Dataset and can be applied to many use n Adapted from the CRISP-DM cases. methodology
Basic Recommendation Engines
Other Factors
One Meta Model to Rule Them All Describe Recommend Recommenders as features Combine Machine learning to optimize purchasing probability
Recommender system for Home Page Ordering +7% revenue (A/B testing) Optimization of home display Customer visits Cleaning, Meta Model Recommendation Purchases combining and Engines enrichment of data Sales information Every customer is shown the 10 Generation of the application Meta model combine recommendations automatically runs and recommendations to sales he is the most likely to buy based on user compiles directly optimize behaviour heterogeneous data purchasing probability Sales Images Batch Scoring every night
Integrating Image Information CONTENT BASED Sea + Beach +Forest + Hotel Pool + Palm Trees Hotel + Mountains Pool + Forest + Hotel + Sea Labelling Model Sales Images Sales descriptions Recommender System
Using Deep Learning models Common Issues “I don’t have GPU s server” “I don’t have a deep leaning expert ” “I don’t have labelled data ” (or too “I don’t have the time to wait for model few) training ” I don’t want to pay for private apis ” / “I’m afraid their labelling will change over time ”
Solution 1 : Pre trained models “I don’t have (or few) labelled data ” -> Is there similar data ? VPG PLACES DATABASE SUN DATABASE 205 categories 307 categories 2.5 M images 110 K images
Solution 1 : Pre trained models If there is open data, there is an open pre trained model ! • Kudos to the community • Check the licensing Example with Places (Caffe Model Zoo) : swimming_pool/outdoor: 0.65 tower: 0.53 inn/outdoor: 0.06 skyscraper: 0.26
Solution 2 : Transfer Learning “I want to add information of SUN database” “But I have only 100 K images” If you know how to recognize… after a little bit of training… you will be able to recognize Transfer Learning Use a network that knows how to see • As a feature generator / transformer • To be updated for the new problem
Solution 2 : Transfer Learning Leverage existing knowledge ! VOYAGE PRIVE PLACES DATABASE SUN DATABASE Caffe Training CPU Model Zoo (optional) GPU GPU tower: 0.53 skyscraper: 0.26 Re-Training Transferred Data : Re-trained model Pre-trained model Last convolutional TensorFlow VGG16 layer features 2 fully connected layers Accuracy: 72%, Top-5 Acc: 90 % > state of the art on dataset alone
Resulting Labels Issue with our approach: Redondant Complementary information information Solution : Matrix Factorization = 200x200 pixels -> 600 tags => 30 themes
Image content detection Topic scores determine the importance of topics in an image TOPIC TOPIC SCORE (%) Golf course – Fairway – Putting green 31 Hotel – Inn – Apartment building outdoor 30 Swimming pool – Lido Deck – Hot tub outdoor 22 Beach – Coast - Harbor 17 TOPIC TOPIC SCORE (%) Tower – Skyscraper – Office building 62 Bridge – River – Viaduct 38
Results ? All Visits : • Mostly France • Pool displayed First Recommendation • Fail to display pools Only Images ? • Pool all around the world Third column = Mix
What’s Next ? Kenya Pragu e Berlin Cambodia
Conclusion • Deep Learning ? • Transfer Learning • Is there existing data ? • Cheaper, faster • Is there a pre-trained model ? • Any Data Scientist can do it • Do Agile Data Science ! • Start simple • Validate each steps • Iterate and grow
Recommend
More recommend