Human Centric Human Centric Machine Learning Infrastructure Machine Learning Infrastructure @ Ville Tuulos QCon SF, November 2018
Meet Alex, a new chief data scientist at Caveman Cupcakes Meet Alex, a new chief data scientist at Caveman Cupcakes You are hired!
We need a dynamic pricing model.
We need a dynamic pricing model. Optimal pricing model
Great job! The model works perfectly!
Could you predict churn too?
Optimal churn model Optimal pricing model Alex's model
Optimal churn model Optimal pricing model Alex's model
Good job again! Promising results!
Can you include a causal attribution model for marketing?
Optimal churn model Attribution Optimal pricing model model Alex's model
Are you sure these results make sense?
Take two Take two
Meet the new data science team at Caveman Cupcakes Meet the new data science team at Caveman Cupcakes You are hired!
Attribution Churn model model Pricing model
VS
VS the human is the bottleneck the human is the bottleneck
VS the human is the bottleneck the human is the bottleneck the human is the bottleneck the human is the bottleneck
Build Build
Build Build Data Warehouse Data Warehouse
Build Build Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build Job Scheduler Job Scheduler Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build Versioning Versioning Job Scheduler Job Scheduler Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build Collaboration Tools Collaboration Tools Versioning Versioning Job Scheduler Job Scheduler Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build Model Deployment Model Deployment Collaboration Tools Collaboration Tools Versioning Versioning Job Scheduler Job Scheduler Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build Feature Engineering Feature Engineering Model Deployment Model Deployment Collaboration Tools Collaboration Tools Versioning Versioning Job Scheduler Job Scheduler Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build ML Libraries ML Libraries Feature Engineering Feature Engineering Model Deployment Model Deployment Collaboration Tools Collaboration Tools Versioning Versioning Job Scheduler Job Scheduler Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build ML Libraries ML Libraries How much How much Feature Engineering Feature Engineering data scientist data scientist Model Deployment Model Deployment cares cares Collaboration Tools Collaboration Tools Versioning Versioning Job Scheduler Job Scheduler Compute Resources Compute Resources Data Warehouse Data Warehouse
Build Build ML Libraries ML Libraries How much How much Feature Engineering Feature Engineering data scientist data scientist Model Deployment Model Deployment cares cares Collaboration Tools Collaboration Tools Versioning Versioning How much How much Job Scheduler Job Scheduler infrastructure infrastructure Compute Resources Compute Resources is needed is needed Data Warehouse Data Warehouse
Deploy Deploy
Deploy Deploy No plan survives contact with enemy No plan survives contact with enemy
Deploy Deploy No plan survives contact with enemy No plan survives contact with enemy No model survives contact with reality No model survives contact with reality
our ML infra supports our ML infra supports two two human human activities: activities: building building and and deploying deploying data science workflows. data science workflows.
Content Valuation Screenplay Analysis Using NLP Optimize Production Schedules Intelligent Infrastructure Predict Quality of Network Machine Translation Fraud Detection Content Tagging Classify Support Tickets Predict Churn Title Portfolio Optimization Incremental Impact of Marketing Estimate Word-of-Mouth Effects Cluster Tweets Optimal CDN Caching
ML Libraries: R ML Lib raries: R, XGBoost, TF etc. , XGBoost, TF etc. Notebooks: Nteract Notebooks: Nteract Job Scheduler: Job Scheduler: Meson Meson Compute Resources: Compute Resources: Titus Titus Query Engine: Spark Query Engine: Spark Data Lake: Data Lake: S3 S3
ML Libraries: R ML Lib raries: R, XGBoost, TF etc. , XGBoost, TF etc. Notebooks: Nteract Notebooks: Nteract Job Scheduler: Job Scheduler: Meson Meson Compute Resources: Compute Resources: Titus Titus Query Engine: Spark Query Engine: Spark Data Lake: Data Lake: S3 S3
models ML Libraries: R ML Lib raries: R, XGBoost, TF etc. , XGBoost, TF etc. prototyping Notebooks: Nteract Notebooks: Nteract { Job Scheduler: Meson Job Scheduler: Meson compute Compute Resources: Titus Compute Resources: Titus Query Engine: Query Engine: Spark Spark data Data Lake: S3 Data Lake: S3
Bad Old Days Bad Old Days
Data Scientist built an NLP model in Python. Easy and fun! Data Scientist built an NLP model in Python. Easy and fun! Bad Old Days Bad Old Days
Data Scientist built an NLP model in Python. Easy and fun! Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Bad Old Days Bad Old Days Custom Titus executor.
Data Scientist built an NLP model in Python. Easy and fun! Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Bad Old Days Bad Old Days Custom Titus executor. How to access data at scale? Slow!
Data Scientist built an NLP model in Python. Easy and fun! Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Bad Old Days Bad Old Days Custom Titus executor. How to access data at scale? Slow! How to schedule the model to update daily? Learn about the job scheduler.
Data Scientist built an NLP model in Python. Easy and fun! Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Bad Old Days Bad Old Days Custom Titus executor. How to access data at scale? Slow! How to schedule the model to update daily? Learn about the job scheduler. How to expose the model to a custom UI? Custom web backend.
Data Scientist built an NLP model in Python. Easy and fun! Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Bad Old Days Bad Old Days Custom Titus executor. How to access data at scale? Slow! How to schedule the model to update daily? Time to production: Learn about the job scheduler. 4 months How to expose the model to a custom UI? Custom web backend.
Data Scientist built an NLP model in Python. Easy and fun! Data Scientist built an NLP model in Python. Easy and fun! ? n o i t c u d o r p n i s l e d o m r o t i n o m o t w o H H o w How to run at scale? t Bad Old Days Bad Old Days o i t e r a t e o n a b n r e e a w k i v n e Custom Titus executor. g r s t i h o e n p w r i o t d h o u u c t t i o n v e r s i o n ? e t a r How to access data at scale? e t i t s i t n e i c s a t a d r e h t o n ? a y t e l Slow! e f l a o s t e l w d o o How to backfill historical data? H m e h t f o n o i s r e v r e h n o How to schedule the model to update daily? How to debug yesterday's failed Time to production: Learn about the job scheduler. production run? 4 months How to expose the model to a custom UI? H o w Custom web backend. t o m a k e t h i s f a s t e r ?
ML Wrapping: Metaflow ML Wrapping: Metaflow models ML Libraries: R ML Lib raries: R, XGBoost, TF etc. , XGBoost, TF etc. prototyping Notebooks: Nteract Notebooks: Nteract { Job Scheduler: Meson Job Scheduler: Meson compute Compute Resources: Titus Compute Resources: Titus Query Engine: Spark Query Engine: Spark data Data Lake: Data Lake: S3 S3
Metaflow Metaflow
Build Build
How to get started? How to get started? def compute(input): output = my_model(input) input compute output return output
How to How to get started? get started? def compute(input): output = my_model(input) input compute output return output # python myscript.py
How to structure my code? How to structure my code? from metaflow import FlowSpec, step class MyFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) A @step def a(self): self.next(self.join) join end start @step def b(self): self.next(self.join) B @step def join(self, inputs): self.next(self.end) MyFlow()
How to structure my code? How to structure my code? from metaflow import FlowSpec, step class MyFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) A @step def a(self): self.next(self.join) join end start @step def b(self): self.next(self.join) B @step def join(self, inputs): self.next(self.end) MyFlow() # python myscript.py run
Recommend
More recommend