intelligent services
play

Intelligent Services Serving Machine Learning Joseph E. Gonzalez - PowerPoint PPT Presentation

Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc. Contemporary Learning Systems Big Big Training Data Models Contemporary


  1. Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc.

  2. Contemporary Learning Systems Big Big Training Data Models

  3. Contemporary Learning Systems Create BIDMach MLlib VW Oryx 2 LIBSVM MLC

  4. What happens after we train a model? Data Model Training Conference Dashboards and Drive Actions Papers Reports

  5. What happens after we train a model? Data Model Training Conference Dashboards and Drive Actions Papers Reports

  6. Suggesting Items Fraud Cognitive Internet of at Checkout Detection Assistance Things Low-Latency Personalized Rapidly Changing

  7. Data Model Train

  8. Actions Data Model Train

  9. Machine Intelligent Learning Services 9

  10. The Life of a Query in an Intelligent Service Lookup Model Top-K Request: Intelligent Service Model Query Items like x ∫ μ Info Feature σ Web Serving Tier Lookup math α Top New Page User ρ Items ∑ Images … Data β Feature User Lookup Feedback Feedback: Product Preferred Item Info Content Request

  11. Essential Attributes of Intelligent Services Responsive Adaptive Manageable Intelligent applications ML models out-of-date the Many models are interactive moment learning is done created by multiple people

  12. Responsive : Now and Always Compute predictions in < 20ms for complex Models Queries Features SELECT * FROM Top K users JOIN items, click_logs, pages WHERE … under heavy query load with system failures .

  13. Experiment: End-to-end Latency in Spark MLlib To JSON HTTP Req. Feature Trans. Evaluate Model 4 HTTP Response Encode Prediction

  14. End-to-end Latency for Digits Classification NOP (Avg = 5.5, P99 = 20.6) 784 dimension input Served using MLlib and Dato Inc. Count out of 1000 Single Logistic Regression 100 Tree Random Forrest (Avg = 21.8, P99 = 38.6) (Avg = 50.5, P99 = 73.4) 500 Tree Random Forrest 500 Tree Random Forrest Decision Tree (Avg = 172.6, P99 = 268.7) (Avg = 172.56, P99 = 268.7) (Avg = 22.4, P99 = 63.8) AlexNet CNN One-vs-all LR (10-class) (Avg = 418.7, P99 = 549.8) (Avg = 137.7, P99 = 217.7) Latency measured in milliseconds

  15. 418.7 450 Latency in Milliseconds 400 350 300 250 172.6 200 137.7 150 100 50.5 22.4 21.8 4.3 50 0 Predict Is "4" LR Decision 10-Class 100 500 C++ Avg Tree LR Random Random AlexNet Forrest Forrest

  16. Adaptive to Change at All Scales Population Granularity of Data Session Shopping for Mom Shopping for Me Months Rate of Change Minutes

  17. Adaptive to Change at All Scales Population Granularity of Data Session Population Law of Large Numbers Shopping à Change Slow for Mom Rely on efficient offline retraining Shopping for Me à High-throughput Systems Months Months Rate of Change Minutes

  18. Adaptive to Change at All Scales Population Granularity of Data Session Small Data à Rapidly Changing Shopping for Mom Low Latency à Online Learning Shopping Sensitive to feedback bias for Me Months Rate of Change Minutes

  19. The Feedback Loop I once looked at cameras on Amazon … Similar cameras and accessories Opportunity for Bandit Algorithms Bandits present new challenges: • computation overhead • complicates caching + indexing

  20. Exploration / Exploitation Tradeoff Systems that can take actions can adversely bias future data . Opportunity for Bandits ! Bandits present new challenges: • Complicates caching + indexing • tuning + counterfactual reasoning

  21. Management : Collaborative Development Teams of data-scientists working on similar tasks Ø “competing” features and models Complex model dependencies: Cat Photo isCat isAnimal Animal Cat Classifier Classifier Cute! Cuteness Predictor

  22. UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Predictive Services and Michael I. Jordan

  23. UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Predictive Services and Michael I. Jordan Active Research Project

  24. Velox Model Serving System [CIDR’15, LearningSys’15] Focuses on the multi-task learning (MTL) domain Spam Content Rec. Localized Classification Scoring Anomaly Detection Session 1: f 1 ( ) → f 1 ( ) → f 1 ( ) → Session 2: f 2 ( ) → f 2 ( ) → f 2 ( ) →

  25. Velox Model Serving System [CIDR’15, LearningSys’15] Personalized Models (Mulit-task Learning) Input Output “Separate” model for each user/context.

  26. Velox Model Serving System [CIDR’15, LearningSys’15] Personalized Models (Mulit-task Learning) Feature Personalization Model Model Split

  27. Hybrid Offline + Online Learning Update feature functions offline using batch solvers • Leverage high-throughput systems (Apache Spark) Feature Personalization • Exploit slow change in population statistics Model Model f ( x ; θ ) T w u Update the user weights online : Split • Simple to train + more robust model • Address rapidly changing user statistics

  28. Hybrid Online + Offline Learning Results Similar Test Error Substantially Faster Training Hybrid Offline Full Hybrid Offline Full User Pref. Change

  29. Evaluating the Model Cache Feature Evaluation Input Split

  30. Evaluating the Model Cache Feature Caching Feature Evaluation Across Users Input Approximate Feature Hashing Anytime Feature Evaluation Split

  31. Feature Caching New input: x Compute feature: f ( x ; θ ) h ( x ) Hash input: f ( x ; θ ) Feature Hash Table

  32. LSH Cache Coarsening New input z 6 = x Use Wrong Value! Hash new input: h ( z ) à LSH hash fn. f ( x ; θ ) Feature Hash Table

  33. LSH Cache Coarsening Locality-Sensitive Hashing: h ( x ) = h ( z ) x ≈ z ⇒ Use Value Anyways! h ( z ) Hash new input: Locality-Sensitive Caching: à Req. LSH f ( x ; θ ) f ( x ; θ ) ≈ f ( z ; θ ) h ( x ) = h ( z ) ⇒ Feature Hash Table

  34. Anytime Predictions Compute features asynchronously: __ __ __ f 1 ( x ; θ ) w u 1 + E [ f 2 ( x ; θ )] w u 2 + f 3 ( x ; θ ) w u 3 if a particular element does not arrive use estimator instead Always able to render a prediction by the latency deadline

  35. Coarsening + Anytime Predictions Coarser Hash f i ( x ; θ ) ≈ f i ( x ; θ ) ≈ More Features Overly Coarsened Approx. Expectation f i ( z ; θ ) E [ f i ( x ; θ )] No Coarsening Best Better Checkout our poster!

  36. Part of Berkeley Data Analytics Stack Management + Serving Training Velox BlinkDB MLbase Graph Spark Spark ML Streaming X SQL library Model Prediction Spark Manager Service Mesos Tachyon HDFS, S3, …

  37. UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Predictive Services and Michael I. Jordan

  38. Dato Predictive Services Production ready model serving and management system Ø Elastic scaling and load balancing of docker.io containers Ø AWS Cloudwatch Metrics and Reporting Ø Serves Dato Create models, scikit-learn, and custom python Ø Distributed shared caching: scale-out to address latency Ø REST management API: Demo?

  39. UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services Responsive Adaptive Manageable Key Insights: Caching, Bandits, & Online/Offline Learning Management Latency vs. Accuracy

  40. Future of Learning Systems Actions Data Model Train

  41. Thank You Joseph E. Gonzalez jegonzal@cs.berkeley.edu, Assistant Professor @ UC Berkeley joseph@dato.com, Co-Founder @ Dato

Recommend


More recommend