machine learning avec spark la voie de la production
play

Machine Learning avec Spark : La voie de la production Andr - PowerPoint PPT Presentation

Andrea Baita Machine Learning avec Spark : La voie de la production Andr Bois-Crettez Tech W ech Week 2019 - Gr eek 2019 - Grenoble enoble Plan Plan Business case Machine learning Testing Production Lessons learned Business Case


  1. Andrea Baita Machine Learning avec Spark : La voie de la production André Bois-Crettez Tech W ech Week 2019 - Gr eek 2019 - Grenoble enoble �

  2. Plan Plan � Business case Machine learning Testing Production Lessons learned

  3. Business Case Business case

  4. Merchants 
 (Online stores) + 
 Users Publishers 
 (Ads Network) Business case

  5. Targets � KelkooGroup Merchant • Automatization • Attract more buyers • Increase margin • Sell more with less budget End Users • See interesting products • Find the best o ff ers Business case

  6. Decisions to make � • Where to show the o ff er 
 (which site, which publisher) • How much to pay for it Business case

  7. Problem How many clicks the o ff er will get ? Business case

  8. Solution Machine Learning

  9. Machine Learning

  10. How? Machine Learning

  11. ML MODEL � + = (prototype) � Data Data Scientist Machine Learning

  12. Lots of data Color = type of device Actual number of clicks More features are used time 
 category 
 merchant 
 … secret ones ... Machine Learning Click price

  13. Learn first ... = Model Example : past data with past result date date � categoryId categoryId � merchantId � merchantId category � category device device � price price � clicks clicks � 08/04/2019 � 10163969 � 1 � Accessoires Moto � desktop � 0.08 � 2 � 08/04/2019 � 10163969 � 1 � Accessoires Moto � mobile � 0.0704 � 21 � 21 08/04/2019 � 10163969 � 1 � Accessoires Moto � tablet � 0.18 � 22 � 22 08/04/2019 � 10543669 � 2 � Lingerie Femme � desktop � 0.23 � 10 � 10 08/04/2019 � 10543669 � 2 � Lingerie Femme � mobile � 0.0989 � 2 � 08/04/2019 � 12676471 � 3 � Lunettes de vue � mobile � 0.1204 � 1 � Machine Learning

  14. … then predict ! Current data = Predict result with Model Predicted clicks Predicted clicks � date � date categoryId categoryId � merchantId merchantId � category category � device � device price price � 3 � 11/04/2019 � 10163969 � 1 � Accessoires Moto � desktop � 0.09 � 20 20 � 11/04/2019 � 10163969 � 1 � Accessoires Moto � mobile � 0.08 � 23 � 23 11/04/2019 � 10163969 � 1 � Accessoires Moto � tablet � 0.19 � 11 11 � 11/04/2019 � 10543669 � 2 � Lingerie Femme � desktop � 0.24 � 1 � 11/04/2019 � 10543669 � 2 � Lingerie Femme � mobile � 0.10 � 2 � 11/04/2019 � 12676471 � 3 � Lunettes de vue � mobile � 0.13 � Machine Learning

  15. How do we implement it? Machine Learning

  16. ML MODEL � + = (production ready) � Scala Developer � Machine Learning

  17. Spark? Machine Learning

  18. Unified analytics engine for large- scale data processing • interactive exploration • batch processing • SQL • machine learning at scale • ... Machine Learning

  19. How do we use it? Machine Learning

  20. Architecture Ad Ad Raw Data Raw Data � Networks Networks � Users Pr Prepr eprocessing ocessing � Training raining � Pr Prediction ediction � Decision Decision � Machine Learning

  21. Data Predict and Learn Decide Model Machine Learning

  22. The model changes over time ... Machine Learning

  23. … how can we deploy it? Machine Learning

  24. Model deployment approaches Train first and then deploy the model • Real time predictions • Models training is expensive • Training data is stable Deploy the code, train at needs • Batch predictions • Quick model training • Training data evolve fast Machine Learning

  25. How can we test it? Testing

  26. ML testing problems • Behavior depends on data • Di ffi cult to define exact test result • Code is hard to structure • Unit tests are challenging Testing

  27. Solutions • Compare metrics, not values • Use functional testing • Live monitoring • Tracking over time Testing

  28. How to define the metrics? Production

  29. Define relevant metrics Goal : evaluate quality • Prototyping: Statistical metrics • Mean Average Error, Root Mean Square Error • Testing: Business metrics • Total margin • Monitor: Real time metrics • Predicted Clicks vs. Real Clicks Production

  30. Tests and Measures : where ? Here Ad Ad Raw Data Raw Data � Networks Networks � Users Prepr Pr eprocessing ocessing � Training raining � Pr Prediction ediction � Decision Decision � e r e H Production

  31. How can we schedule the jobs? Production

  32. Azkaban • Workflow job scheduler • Hadoop and Spark jobs • Graph of job dependencies • Alerting on failures with Nagios Production

  33. How to track the model behavior? Production

  34. Tracking • Business metrics graphs • Predictions vs. actual results • Study trends long term • Adapt model when market changes • Easy to fix: abrupt drop in quality metric • Harder: slow erosion of quality Production

  35. Tracking with ELK

  36. So, what did we learn ? So, what did we learn ? � Lessons learned

  37. 1

  38. 2

  39. 3

  40. 4

  41. Questions ? �

Recommend


More recommend