machine learning pipeline for real time forecasting uber
play

Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace - PowerPoint PPT Presentation

Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace Chong Sun, Danny Yuan Forecasting On A Global Scale Cases For Real-Time Forecasting 01.01.17 Dynamic Pricing: Every Minute, Every Where Dynamic Pricing: Every Minute, Every


  1. Machine Learning Pipeline for Real-time Forecasting @Uber Marketplace Chong Sun, Danny Yuan

  2. Forecasting On A Global Scale

  3. Cases For Real-Time Forecasting 01.01.17

  4. Dynamic Pricing: Every Minute, Every Where

  5. Dynamic Pricing: Every Minute, Every Where, Every Trip

  6. We Forecast Time Series

  7. We Forecast Time Series For Given Geo Locations

  8. A Few Constraints More recent data has more signals ●

  9. A Few Constraints Smaller areas have more noise ●

  10. A Few Constraints Smaller areas have more noise ●

  11. A Few Constraints More recent data has more signals ● Smaller areas have more noise ● We were rolling out business city by city with competing ● models FFT ○ Kalman Filter ○ Regressions ○ LSTM ○

  12. First Pipeline

  13. The Training Pipeline

  14. The Training Pipeline

  15. The Training Pipeline

  16. The Training Pipeline - Airflow - PySpark - SciPy

  17. The Training Pipeline - Cassandra

  18. A Need for Fast Time Series DB - Cassandra - Elasticsearch

  19. A Need For Streaming Data - Kafka

  20. A Need For Unified Feature Engine

  21. A Digression To Feature Engine

  22. A Digression To Feature Engine - DataFlow API

  23. A Digression To Feature Engine - Flink

  24. A Digression To Feature Engine - Reusable functions - Schema driven - Discoverable by meta data

  25. Inferencing Pipeline - Elasticsearch

  26. Inferencing Pipeline

  27. Real-time Visualization

  28. Real-time Validation

  29. A New Challenge: Model Management

  30. More Signals

  31. Scalable Model Evaluation

  32. Metrics-as-a-Service

  33. Model Lifecycle Management System (MLMS)

  34. What if you're supporting 5+ teams, 10+ products with 4000+ model instances in production

  35. Machine Learning Model Lifecycle

  36. Machine Learning Model Lifecycle

  37. Machine Learning Model Lifecycle

  38. Machine Learning Model Lifecycle

  39. Machine Learning Model Lifecycle

  40. Machine Learning Model Lifecycle

  41. Common Questions in the process ... Where am I going to save and serve my models? ● How do I keep track of the model metadata , e.g., training data used ? ● How can I easily find a previous model for testing and performance comparison? ● How can I automatically deploy a large scale number of models? ● When should I decide to trigger model re-training? ● How can I make sure I would not override any (production) models? ● How do we manage multiple dependent models? ● … ... ●

  42. Common Questions in the process ... Where am I going to save and serve my models? ● How do I keep track of the model metadata , e.g., training data used ? ● How can I easily find a previous model for testing and performance comparison? ● How can I automatically deploy a large scale number of models? ● When should I decide to trigger model re-training? ● How can I make sure I would not override any (production) models? ● How do we manage multiple dependent models? ● … ... ● Model Lifecycle Management System (MLMS)

  43. MLMS Design Principles Immutable Models ● Model Neutral ● Flexible ● Automated Dynamic Orchestration ●

  44. MLMS Architecture

  45. MLMS Architecture

  46. MLMS Architecture

  47. MLMS Architecture

  48. MLMS Architecture

  49. MLMS Architecture

  50. MLMS Architecture

  51. Machine Learning Model Lifecycle MLMS

  52. Data Science and Engineering Work Flow

  53. Data Scientists And Engineers Work In Lock Steps

  54. Engineers Are Blocked Before Modeling Is Done

  55. Time For Productization Is Often Squeezed

  56. Rolling Out To All Cities Are Slow And Painful

  57. Analysis of Bottlenecks Model Model Training and Serving Model Serving Exploration Implementation Production (DS, Python) (DS/Eng, Python/Go/Java) (Eng, Go/Java)

  58. Analysis of Bottlenecks Model Model Training and Serving Model Serving Exploration Implementation Production (DS, Python) (DS/Eng, Python/Go/Java) (Eng, Go/Java) Restricted Models

  59. Analysis of Bottlenecks Model Model Training and Serving Model Serving Exploration Implementation Production (DS, Python) (DS/Eng, Python/Go/Java) (Eng, Go/Java) DS → Eng Reimplementing Knowledge Model Transfer

  60. Analysis of Bottlenecks Model Model Training and Serving Model Serving Exploration Implementation Production (DS, Python) (DS/Eng, Python/Go/Java) (Eng, Go/Java) DS/Eng Model Parity

  61. Analysis of Bottlenecks Model Model Training and Serving Model Serving Exploration Implementation Production (DS, Python) (DS/Eng, Python/Go/Java) (Eng, Go/Java) DS/Eng Performance Debug

  62. Key Insight: Can We All Enjoy One ML Ecosystem?

  63. Unified Framework → Many Benefits Standardized project structure ● Out-of-box support of local and remote deployment ● Reusable algorithms and framework ● Design review between engineer and DS ● Code review between engineer and DS ● Who codes, who debugs ●

  64. TensorFlow Client Dev (Python) Train (Python) Serve (Python/Java) Runtime TensorFlow Graph (C++) Model Model Training and Serving Model Serving Exploration Implementation Production (DS, Python) (DS/Eng, Python/Java) (Eng, Java) DS → Eng Eng Model Restricted DS/Eng Reimplementing Knowledge Performance Models Model Parity Model Transfer Debug

  65. Enable DS to Write Production-Ready Code Tensorflow ● Efficient core ○ DS-friendly API ○ Engineers focusing on optimization and automation ● Parallelization of algorithms ○ End-to-end automation ○ Visualization ○ Integration ○ Project scaffolding ○

  66. Example Build your own FTRL Use a framework

  67. Building Tools Model Lifecycle Management System ● Hyperparameter Tuning ● Horovod for Distributed TensorFlow Training ●

  68. Conclusion A fully automated MLMS is key to the success of complex ML ● systems A single framework for DS and engineers boosts productivity ● Building great tools is crucial to ML projects ●

  69. Q & A

  70. How do we make the forecasts?

  71. Batch forecasting (2015) Batch Forecast Forecasts (ARIMA, FFT) Data Sources

  72. Batch forecasting + Real-time Adjustment Batch Forecast Forecasts (ARIMA, FFT) Data Sources Realtime Adjust Consumer & Serve (Exponential Smoothing)

  73. Issues Observed Not many ML libraries for Node.js Real-time component (Node.js) can not support CPU intensive computation Can not handle large scale data features in real-time Can not share code for batch and online processing

  74. Second Generation of Forecasting Engine (Inspired by DataFlow and TensorFlow) Some interesting design principles: Both realtime and batch prediction: prediction is minute level, backtesting/evaluation requires batch processing

  75. Machine Learning Model Lifecycle

  76. MLMS Architecture Given model_name=linear_demand_model and city_id=1 When status == 'alerting' and time_sustained > 3 days Then retrainModel(model_name, city_id, model_version)

Recommend


More recommend