machine learning machine learning fast slow fast slow
play

Machine Learning Machine Learning Fast & Slow Fast & Slow - PowerPoint PPT Presentation

Machine Learning Machine Learning Fast & Slow Fast & Slow Suman Deb Roy Suman Deb Roy Lead Data Scientist @ betaworks bot www.rundexter.com /messaging www.poncho.is www.digg.com www.digg.com/messaging www.rundexter.com


  1. Machine Learning Machine Learning Fast & Slow Fast & Slow Suman Deb Roy Suman Deb Roy Lead Data Scientist @ betaworks

  2. bot www.rundexter.com /messaging www.poncho.is www.digg.com www.digg.com/messaging www.rundexter.com www.poncho.is

  3. The The Last Last 10% 10% Art & Art & Science Science Runway Runway

  4. 1: Poncho 1: Poncho • A weather cat that sends you personalized weather messages. • Algorithms + Humans • Not every feature in weather data has equal importance – what's ac?onable?

  5. 2: Digg Trending 2: Digg Trending • Ranked each day: – 10 million RSS feeds, 200 million tweets, 7.5 million new ar?cles ranked each day m.me/digg

  6. 3: Digg Deeper 3: Digg Deeper

  7. 4: 4: Instapaper’s Instapaper’s InstaRank InstaRank

  8. 5: Scale Model 5: Scale Model Communi?es Not Keywords

  9. MACHINE LEARNING MACHINE LEARNING WAS HARD HARD WAS ITS STILL STILL HARD ITS HARD

  10. VALUE of VALUE of Predic?on Error Varied Distribu?on Algorithms Algorithms vs. Data vs. Data Historical Data Similarity between training & test distribu?ons (less varied dist) Impact of a more complex algorithm Historical Data Value

  11. Moving fast and slow Moving fast and slow • Fast: – Experience, Similar Problems, Pre-exis?ng pipelines • Slow: – New type of data, Bootstrap, Scaling • Main challenge: – how to jump between states, when to change gears.

  12. Planned Planned Conscious Conscious Slow Fast Fast Fast Slow Slow Unconscious Unconscious Slow Fast

  13. Effects of moving Fast Effects of moving Fast • Technical debt? – Refactoring code – improving unit tests – delete dead code – reducing dependencies – ?ghtening APIs – improving documenta?on

  14. Effects of moving Slow Effects of moving Slow • Growth debt? – Wai?ng team mates – Uncertain quality assurance – Piling up further requests – Hypothesis might not be feedback driven – Overthinking the solu?on

  15. Maintenance Maintenance • Code Level – How researchable, reusable, deployable • System Level – Eroding abstrac?on boundaries • Data Level – Data influences ML behavior.

  16. Data vs. Code Organization Data vs. Code Organization • Snapshodng .. Detects bias • Interface at the method , be procedural – Easy to execute por?ons of the code. • Separate hyper-arguments from parameters – Parameter: How your model is specified – Hyper-Arguments: How your algorithm should run

  17. Unstable APIs Unstable APIs • Who owns the data stream? • Who owns the model ? • Ownership by – en?re solu?on – Exper?se? DB ? Pipelines? Algorithms? Stats • Debug? – Frozen versioning instead of con?nual

  18. Feature Erosion Feature Erosion • User behavior with new model could make features of current model unimportant • How can we detect this? • How can we prevent this?

  19. Predictor Variables Predictor Variables • Myth: If you add a few more variables, the predictor will be befer. • If the predictors have realis?c priors, their coefficients could be appropriately pulled down (in expecta?on) and over fidng shouldn’t be such a problem

  20. Visualizations Visualizations Any ML algorithm must be seen to believe it.

  21. Visualizations Visualizations

  22. Research vs. Production Research vs. Production • Collabora?on looks very different based on the end goals • Do you need to master git or just get by • How quickly can you move something from iPython to produc?on grade?

  23. Even the best tools.. Even the best tools.. • Lets talk about iPython notebooks: – Version Control – Fragmented Code is deadly for produc?on grade. – Security issue : all those open ports – Code Reviews and Pull Requests.

  24. Heuristic Escape Heuristic Escape “ Heuristic is an algorithm in a clown suit. It’s less predictable, it’s more fun, and it comes without a 30- day, money-back guarantee .” ― Steve McConnell, Code Complete

  25. Domain of Impact Domain of Impact • Most engineers and computers scien?sts will conceptualize domains as primarily a ra?onal, evidence-based, problem-solving enterprise focused on well-defined condi?ons. • But the real world is ….. more complex! • e.g.,: Trending News Algorithms

  26. Invention vs. Innovation Invention vs. Innovation • What is ML good at? Both ? • Not outside the box, instead connect them. • innova?on = improve significantly by adjus?ng ML method • inven?on = totally new ML method.

  27. Fitting ML into the betaworks model Fitting ML into the betaworks model Product C Company Company Nexus B A Research

  28. Code & Data Residence Code & Data Residence • ML module transfer – Code transfer • Core module • Model upda?ng component • Analysis component – Data transfer • Infrastructure rebuild? • Performance • maintenance

  29. Powered by deepNews Research ready pipelines Research ready pipelines

  30. Powered by deepNews + Scale Model Second order Analysis Second order Analysis

  31. Conversational Conversational Software Software

  32. HUMAN HUMAN BOT BOT HBI INTER INTER CONNECTION CONNECTION

  33. ZERO automated solutions Affective Computing trending digg topics deeper Topic Modeling DBpedia Freebase APIs Apps for transactional tasks MANY automated solutions

  34. HIGH VALUE of historical data LSTM ? Tone Analyzer? Trending Digg topics deeper LDA LSA Freebase DBpedia APIs Apps for transactional tasks LOW VALUE of historical data

  35. Data Types by Company Data Types by Company • Digg has topic modeling/ news data • Scale model has social graph data • Poncho has weather data/editorialized personality • Giphy has gifs (emo?on++) • Instapaper has reading data • Dexter has hooks to APIs

  36. Transfer Learning Transfer Learning Yosinski et. al. How transferrable are deep learning features? , in NIPS 2014

  37. To Sum up To Sum up • Constraints to ML solu?ons occur at three levels: – Algorithmic – Data – Humans • These parameters lead to several oscilla?ng cycles of fast and slow impact of ML • Whats good for you?

  38. ML 2016 ML 2016 • Understood by few, hyped by some, revered by most. • Can be the difference between a company scaling vs. close shop. • Almost every company can have at least 1 product feature powered by ML. • Be careful about bias in data.

  39. Suman Deb Roy suman@betaworks.com | @_roysd data.betaworks.com

Recommend


More recommend