it takes a village to raise a machine learning model
play

It Takes a Village to Raise a Machine Learning Model Lucian Lita - PowerPoint PPT Presentation

It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver Algorithms @datariver Data Big Data Sheep @bigdatasheep n n 5yr more data is


  1. It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver

  2. It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver

  3. Algorithms @datariver

  4. Data Big Data Sheep @bigdatasheep n n 5yr more data is better than complex algorithms #BigData Big Data Sheep @bigdatasheep n n 4yr more clean data is better than more data #BigData Big Data Sheep @bigdatasheep n n 3yr more labeled data is better than more data #BigData Big Data Sheep @bigdatasheep n n 2yr more smart data is better than purple data #BigData **inflated historical depiction @datariver

  5. Data @datariver

  6. Next Frontier: well designed software architectures Personalization, experimentation, anomaly detection, fraud detection … @datariver

  7. Battle Plan Personalization deep dive sw architecture flavor Anomaly detection quick peek Music streaming, advertising, medical informatics brief stories @datariver

  8. @datariver

  9. … x 1 … x 1 … x 1 … x 1 x 1 x all Reasonable coverage. Reasonable coverage. Product as is. Segmentation. Personalization. No customization. @datariver

  10. Childhood. Approaches. @datariver

  11. Broad Deep @datariver

  12. Push-button Push-scientist App App API delivery storage Optimization -- ML algorithms -- data: more, better, smarter -- features, selection @datariver

  13. Push-button Push-scientist App App API API delivery delivery storage storage Scale & Automation Optimization -- model build -- ML algorithms -- model deploy -- data: more, better, smarter -- single instrumentation -- features, selection @datariver

  14. Push-scientist Invest in ML; start with a thin system How much effort put into Platform & Automation? (A) best you can do in x weeks (B) one step above prototype (C) enough baling wire & duct tape to support a first use case @datariver

  15. Push-button Invest in scale & automation; basic ML How much effort put into ML? (A) best generic model setup in y weeks? (B) noticeably better than random? (C) pack enough punch to be visible, but not more @datariver

  16. Push-button Push-scientist @datariver

  17. Adolescence. Platform Patterns. @datariver

  18. (A) Stored App personalized feedback content API (capture) API (retrieve) pre-computed content periodically batch train model periodically run models @datariver

  19. (B) On-the Fly App personalized feedback content API (capture) API (compute) compute on-the-fly periodically batch train model @datariver

  20. (C) Aggressive App personalized feedback content API (capture) API (deliver) Challenge accepted: asymptotically real time! @datariver

  21. (C) Aggressive App personalized feedback content API (capture) API (deliver) Challenge accepted: asymptotically real time! @datariver

  22. Maturity. Patterns and Assumptions. @datariver

  23. Model Building Model Deployment What do you really need? Data Store Do you need it now ? Content Delivery Analytics Data Capture @datariver

  24. Model Building. What do you really need? 101010 algos space data eval compute operators metrics security scalability HA @datariver

  25. Model Building. What do you really need? 101010 algos space data eval compute operators metrics security scalability HA @datariver

  26. Model Deployment. What do you really need? API M i M i+1 envt ditto versioning deploy performance sharing security scalability HA @datariver

  27. Personalization Delivery. What do you really need? @datariver

  28. Personalization Delivery. What do you really need? API instrument ditto exploit explore performance sharing security scalability HA @datariver

  29. Data Store. What do you really need? API t content ditto performance HA history scalability consumers governance triggers sharing @datariver

  30. Data Store. To HA or not to HA. later (blasphemy) now revenue in-app driver critical user infrastructure benefit cost known build & use cases operate @datariver

  31. Data Store. APIs @datariver

  32. Data Capture. What do you really need? API t triggers consumers content ditto history sharing performance scalability security HA @datariver

  33. Analytics. What do you really need? API t content ditto performance history scalability flexibility consumers @datariver

  34. Analytics. Experimentation & Personalization @datariver

  35. Data Lake. What do you really need? say ‘big data lake’ one more time! @datariver

  36. Evolving Architecture. Before you know it … @datariver

  37. Apps direct in-app personalized personalized feedback content data content content API (compute) API (delivery) API (push) API (capture) 4 2 2 1 3 run models Event 1 raw data Log 3 or features RT train models Analytics periodically Model Deployment Model Building re-run new models API (analytics) periodically 4 **terribly incomplete, mildly inaccurate

  38. Not an Exact Blueprint

  39. Know this non-trivial no one-size fits all Upfront what do you really need? know thy target architecture As you embark … Do it! working system in weeks fast iterations – ship & test interfaaaaaaaces!

  40. village model **not drawn to effort scale

  41. Software architecture is the next frontier! Fail fast still applies! Personalize your personalization platform! @datariver

  42. better algorithms more, better, smarter well designed data software architectures next frontier @datariver

  43. A Brief Look at Anomaly Detection @datariver

  44. Applications ¡ System health – servers, network ¡ Cyber-intrusion detection ¡ Enterprise anomaly detection ¡ Image processing ¡ Textual anomaly detection ¡ Sensor networks ¡ Fraud detection ¡ Medical anomaly detection ¡ Industrial damage detection ¡ … @datariver

  45. Algorithms ¡ Supervised ¡ Unsupervised ¡ Generic statistical ¡ Information theory ¡ … “What algorithms are you going to use?” @datariver

  46. Data Low data volume Invest in data acquisition Invest in high coverage High data volume Invest in defining signal Invest in labeling, tools, and crowdsourcing @datariver

  47. Architectures Again Data Collectors Labeling Processors (M&A) Clickstream, User Input … Crowdsourcing broad: time bounded Real time, DBs … Active learning deep: open ended Capture Labeling Compute run models **check assumptions @datariver

  48. Advertising @datariver

  49. Music Streaming @datariver

  50. Medical Informatics @datariver

  51. better algorithms more, better, smarter well designed data software architectures next frontier @datariver

  52. Thank you! Lucian Lita @datariver [always hiring] data@intuit.com @datariver

  53. Thank you! Lucian Lita @datariver [always hiring] data@intuit.com @datariver

  54. @datariver

  55. Extra Content @datariver

  56. Security. What do you really need? @datariver

  57. @datariver

  58. App. Who does the App talk to? (a) (b) App App personalized dynamic personalized content data content API (retrieve) API (compute) -- apply op logic -- retrieve static data -- retrieve pre-computed -- apply op logic content -- compute features -- run model -- log actions @datariver

Recommend


More recommend