It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver
It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver
Algorithms @datariver
Data Big Data Sheep @bigdatasheep n n 5yr more data is better than complex algorithms #BigData Big Data Sheep @bigdatasheep n n 4yr more clean data is better than more data #BigData Big Data Sheep @bigdatasheep n n 3yr more labeled data is better than more data #BigData Big Data Sheep @bigdatasheep n n 2yr more smart data is better than purple data #BigData **inflated historical depiction @datariver
Data @datariver
Next Frontier: well designed software architectures Personalization, experimentation, anomaly detection, fraud detection … @datariver
Battle Plan Personalization deep dive sw architecture flavor Anomaly detection quick peek Music streaming, advertising, medical informatics brief stories @datariver
@datariver
… x 1 … x 1 … x 1 … x 1 x 1 x all Reasonable coverage. Reasonable coverage. Product as is. Segmentation. Personalization. No customization. @datariver
Childhood. Approaches. @datariver
Broad Deep @datariver
Push-button Push-scientist App App API delivery storage Optimization -- ML algorithms -- data: more, better, smarter -- features, selection @datariver
Push-button Push-scientist App App API API delivery delivery storage storage Scale & Automation Optimization -- model build -- ML algorithms -- model deploy -- data: more, better, smarter -- single instrumentation -- features, selection @datariver
Push-scientist Invest in ML; start with a thin system How much effort put into Platform & Automation? (A) best you can do in x weeks (B) one step above prototype (C) enough baling wire & duct tape to support a first use case @datariver
Push-button Invest in scale & automation; basic ML How much effort put into ML? (A) best generic model setup in y weeks? (B) noticeably better than random? (C) pack enough punch to be visible, but not more @datariver
Push-button Push-scientist @datariver
Adolescence. Platform Patterns. @datariver
(A) Stored App personalized feedback content API (capture) API (retrieve) pre-computed content periodically batch train model periodically run models @datariver
(B) On-the Fly App personalized feedback content API (capture) API (compute) compute on-the-fly periodically batch train model @datariver
(C) Aggressive App personalized feedback content API (capture) API (deliver) Challenge accepted: asymptotically real time! @datariver
(C) Aggressive App personalized feedback content API (capture) API (deliver) Challenge accepted: asymptotically real time! @datariver
Maturity. Patterns and Assumptions. @datariver
Model Building Model Deployment What do you really need? Data Store Do you need it now ? Content Delivery Analytics Data Capture @datariver
Model Building. What do you really need? 101010 algos space data eval compute operators metrics security scalability HA @datariver
Model Building. What do you really need? 101010 algos space data eval compute operators metrics security scalability HA @datariver
Model Deployment. What do you really need? API M i M i+1 envt ditto versioning deploy performance sharing security scalability HA @datariver
Personalization Delivery. What do you really need? @datariver
Personalization Delivery. What do you really need? API instrument ditto exploit explore performance sharing security scalability HA @datariver
Data Store. What do you really need? API t content ditto performance HA history scalability consumers governance triggers sharing @datariver
Data Store. To HA or not to HA. later (blasphemy) now revenue in-app driver critical user infrastructure benefit cost known build & use cases operate @datariver
Data Store. APIs @datariver
Data Capture. What do you really need? API t triggers consumers content ditto history sharing performance scalability security HA @datariver
Analytics. What do you really need? API t content ditto performance history scalability flexibility consumers @datariver
Analytics. Experimentation & Personalization @datariver
Data Lake. What do you really need? say ‘big data lake’ one more time! @datariver
Evolving Architecture. Before you know it … @datariver
Apps direct in-app personalized personalized feedback content data content content API (compute) API (delivery) API (push) API (capture) 4 2 2 1 3 run models Event 1 raw data Log 3 or features RT train models Analytics periodically Model Deployment Model Building re-run new models API (analytics) periodically 4 **terribly incomplete, mildly inaccurate
Not an Exact Blueprint
Know this non-trivial no one-size fits all Upfront what do you really need? know thy target architecture As you embark … Do it! working system in weeks fast iterations – ship & test interfaaaaaaaces!
village model **not drawn to effort scale
Software architecture is the next frontier! Fail fast still applies! Personalize your personalization platform! @datariver
better algorithms more, better, smarter well designed data software architectures next frontier @datariver
A Brief Look at Anomaly Detection @datariver
Applications ¡ System health – servers, network ¡ Cyber-intrusion detection ¡ Enterprise anomaly detection ¡ Image processing ¡ Textual anomaly detection ¡ Sensor networks ¡ Fraud detection ¡ Medical anomaly detection ¡ Industrial damage detection ¡ … @datariver
Algorithms ¡ Supervised ¡ Unsupervised ¡ Generic statistical ¡ Information theory ¡ … “What algorithms are you going to use?” @datariver
Data Low data volume Invest in data acquisition Invest in high coverage High data volume Invest in defining signal Invest in labeling, tools, and crowdsourcing @datariver
Architectures Again Data Collectors Labeling Processors (M&A) Clickstream, User Input … Crowdsourcing broad: time bounded Real time, DBs … Active learning deep: open ended Capture Labeling Compute run models **check assumptions @datariver
Advertising @datariver
Music Streaming @datariver
Medical Informatics @datariver
better algorithms more, better, smarter well designed data software architectures next frontier @datariver
Thank you! Lucian Lita @datariver [always hiring] data@intuit.com @datariver
Thank you! Lucian Lita @datariver [always hiring] data@intuit.com @datariver
@datariver
Extra Content @datariver
Security. What do you really need? @datariver
@datariver
App. Who does the App talk to? (a) (b) App App personalized dynamic personalized content data content API (retrieve) API (compute) -- apply op logic -- retrieve static data -- retrieve pre-computed -- apply op logic content -- compute features -- run model -- log actions @datariver
Recommend
More recommend