Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc.
Contemporary Learning Systems Big Big Training Data Models
Contemporary Learning Systems Create BIDMach MLlib VW Oryx 2 LIBSVM MLC
What happens after we train a model? Data Model Training Conference Dashboards and Drive Actions Papers Reports
What happens after we train a model? Data Model Training Conference Dashboards and Drive Actions Papers Reports
Suggesting Items Fraud Cognitive Internet of at Checkout Detection Assistance Things Low-Latency Personalized Rapidly Changing
Data Model Train
Actions Data Model Train
Machine Intelligent Learning Services 9
The Life of a Query in an Intelligent Service Lookup Model Top-K Request: Intelligent Service Model Query Items like x ∫ μ Info Feature σ Web Serving Tier Lookup math α Top New Page User ρ Items ∑ Images … Data β Feature User Lookup Feedback Feedback: Product Preferred Item Info Content Request
Essential Attributes of Intelligent Services Responsive Adaptive Manageable Intelligent applications ML models out-of-date the Many models are interactive moment learning is done created by multiple people
Responsive : Now and Always Compute predictions in < 20ms for complex Models Queries Features SELECT * FROM Top K users JOIN items, click_logs, pages WHERE … under heavy query load with system failures .
Experiment: End-to-end Latency in Spark MLlib To JSON HTTP Req. Feature Trans. Evaluate Model 4 HTTP Response Encode Prediction
End-to-end Latency for Digits Classification NOP (Avg = 5.5, P99 = 20.6) 784 dimension input Served using MLlib and Dato Inc. Count out of 1000 Single Logistic Regression 100 Tree Random Forrest (Avg = 21.8, P99 = 38.6) (Avg = 50.5, P99 = 73.4) 500 Tree Random Forrest 500 Tree Random Forrest Decision Tree (Avg = 172.6, P99 = 268.7) (Avg = 172.56, P99 = 268.7) (Avg = 22.4, P99 = 63.8) AlexNet CNN One-vs-all LR (10-class) (Avg = 418.7, P99 = 549.8) (Avg = 137.7, P99 = 217.7) Latency measured in milliseconds
418.7 450 Latency in Milliseconds 400 350 300 250 172.6 200 137.7 150 100 50.5 22.4 21.8 4.3 50 0 Predict Is "4" LR Decision 10-Class 100 500 C++ Avg Tree LR Random Random AlexNet Forrest Forrest
Adaptive to Change at All Scales Population Granularity of Data Session Shopping for Mom Shopping for Me Months Rate of Change Minutes
Adaptive to Change at All Scales Population Granularity of Data Session Population Law of Large Numbers Shopping à Change Slow for Mom Rely on efficient offline retraining Shopping for Me à High-throughput Systems Months Months Rate of Change Minutes
Adaptive to Change at All Scales Population Granularity of Data Session Small Data à Rapidly Changing Shopping for Mom Low Latency à Online Learning Shopping Sensitive to feedback bias for Me Months Rate of Change Minutes
The Feedback Loop I once looked at cameras on Amazon … Similar cameras and accessories Opportunity for Bandit Algorithms Bandits present new challenges: • computation overhead • complicates caching + indexing
Exploration / Exploitation Tradeoff Systems that can take actions can adversely bias future data . Opportunity for Bandits ! Bandits present new challenges: • Complicates caching + indexing • tuning + counterfactual reasoning
Management : Collaborative Development Teams of data-scientists working on similar tasks Ø “competing” features and models Complex model dependencies: Cat Photo isCat isAnimal Animal Cat Classifier Classifier Cute! Cuteness Predictor
UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Predictive Services and Michael I. Jordan
UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Predictive Services and Michael I. Jordan Active Research Project
Velox Model Serving System [CIDR’15, LearningSys’15] Focuses on the multi-task learning (MTL) domain Spam Content Rec. Localized Classification Scoring Anomaly Detection Session 1: f 1 ( ) → f 1 ( ) → f 1 ( ) → Session 2: f 2 ( ) → f 2 ( ) → f 2 ( ) →
Velox Model Serving System [CIDR’15, LearningSys’15] Personalized Models (Mulit-task Learning) Input Output “Separate” model for each user/context.
Velox Model Serving System [CIDR’15, LearningSys’15] Personalized Models (Mulit-task Learning) Feature Personalization Model Model Split
Hybrid Offline + Online Learning Update feature functions offline using batch solvers • Leverage high-throughput systems (Apache Spark) Feature Personalization • Exploit slow change in population statistics Model Model f ( x ; θ ) T w u Update the user weights online : Split • Simple to train + more robust model • Address rapidly changing user statistics
Hybrid Online + Offline Learning Results Similar Test Error Substantially Faster Training Hybrid Offline Full Hybrid Offline Full User Pref. Change
Evaluating the Model Cache Feature Evaluation Input Split
Evaluating the Model Cache Feature Caching Feature Evaluation Across Users Input Approximate Feature Hashing Anytime Feature Evaluation Split
Feature Caching New input: x Compute feature: f ( x ; θ ) h ( x ) Hash input: f ( x ; θ ) Feature Hash Table
LSH Cache Coarsening New input z 6 = x Use Wrong Value! Hash new input: h ( z ) à LSH hash fn. f ( x ; θ ) Feature Hash Table
LSH Cache Coarsening Locality-Sensitive Hashing: h ( x ) = h ( z ) x ≈ z ⇒ Use Value Anyways! h ( z ) Hash new input: Locality-Sensitive Caching: à Req. LSH f ( x ; θ ) f ( x ; θ ) ≈ f ( z ; θ ) h ( x ) = h ( z ) ⇒ Feature Hash Table
Anytime Predictions Compute features asynchronously: __ __ __ f 1 ( x ; θ ) w u 1 + E [ f 2 ( x ; θ )] w u 2 + f 3 ( x ; θ ) w u 3 if a particular element does not arrive use estimator instead Always able to render a prediction by the latency deadline
Coarsening + Anytime Predictions Coarser Hash f i ( x ; θ ) ≈ f i ( x ; θ ) ≈ More Features Overly Coarsened Approx. Expectation f i ( z ; θ ) E [ f i ( x ; θ )] No Coarsening Best Better Checkout our poster!
Part of Berkeley Data Analytics Stack Management + Serving Training Velox BlinkDB MLbase Graph Spark Spark ML Streaming X SQL library Model Prediction Spark Manager Service Mesos Tachyon HDFS, S3, …
UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Predictive Services and Michael I. Jordan
Dato Predictive Services Production ready model serving and management system Ø Elastic scaling and load balancing of docker.io containers Ø AWS Cloudwatch Metrics and Reporting Ø Serves Dato Create models, scikit-learn, and custom python Ø Distributed shared caching: scale-out to address latency Ø REST management API: Demo?
UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services Responsive Adaptive Manageable Key Insights: Caching, Bandits, & Online/Offline Learning Management Latency vs. Accuracy
Future of Learning Systems Actions Data Model Train
Thank You Joseph E. Gonzalez jegonzal@cs.berkeley.edu, Assistant Professor @ UC Berkeley joseph@dato.com, Co-Founder @ Dato
Recommend
More recommend