scaling data products under startup constraints

Scaling Data Products Under Startup Constraints A Case Study of ML - PowerPoint PPT Presentation

Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing Edwin Ong @edwin Co-Founder, TinyData Founded CastTV (acquired by Tribune)

  1. Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing

  2. Scaling Data Products Under Startup Constraints A Case Study of ML Bias Testing

  3. Edwin Ong @edwin Co-Founder, TinyData Founded CastTV (acquired by Tribune) Founded FileFish (acquired by Oracle) Stanford Symbolic Systems

  4. TinyData ● Help other companies make data products ● Make our own data products

  5. Problem: Testing Machine Learning in Production ● Tools for machine learning testing in training ● Not as many tools for machine learning testing in production ● Different tools needed because ML testing is different from traditional software testing

  6. Traditional Software Has Deterministic Outcomes

  7. Traditional Software Has Deterministic Outcomes

  8. ML Has Probabilistic Outcomes Dog vs Muffin given new user input

  9. ML Has Probabilistic Outcomes That Change Over Time Version 1: Muffin (59%) Version 2: Muffin (66%)

  10. ML Platforms Often End at Deploy New User Input Production Testing ML Chaos Engineering

  11. Requirements for Production ML Testing Tool 1. “Entropy”: Generation of new inputs against model servers 2. Recording of outputs from model servers 3. Feedback loop for additional training

  12. Challenges for Building as a Startup 1. Need access to non-toy model servers 2. Need access to generated data for testing model servers

  13. Access to Non-Toy Model Servers

  14. Non-Toy Model Servers: Commercial Cloud Services

  15. Commercial Image Recognition Services Opaque systems ● Object and scene detection, facial recognition, facial analysis, ● NSFW detection, text detection Facial analysis includes gender detection ●


  17. Testing Commercial Systems for Gender Bias Testing = Finding cases where trained systems fail ● Hypothesis: Gender labels are trained on traditional images ● What if we generate “non-traditional” images? ●

  18. Training Data vs Test Data Training Data Test Data

  19. A Man with Long Hair

  20. A Man with Long Hair

  21. A Man with Long Hair

  22. A Woman with Short Hair

  23. A Woman with Short Hair

  24. A Woman with Short Hair

  25. A Woman with Short Hair

  26. A Woman with Short Hair

  27. Woman with Long Hair

  28. Woman with Long Hair

  29. “Facial Analysis”?

  30. Data Generation

  31. Data Generation

  32. Prototype Data

  33. Global Standard

  34. Data Generation

  35. Data Generation

  36. Data Generation

  37. Woman with Short Hair

  38. Woman with Short Hair

  39. Man with Long Hair

  40. Man with Long Hair

  41. Man with Long Hair

  42. Man with Long Hair

  43. Man with Makeup

  44. Man with Makeup

  45. Man with Makeup

  46. Man with Makeup

  47. Man with Makeup

  48. Man with Makeup

  49. Automating Data Generation + Testing

  50. Automating Data Generation + Testing

  51. Tracking Results Over Time

  52. Takeaways ● Even the best trained commercial ML systems are far from perfect ● Systems return different results over time as new versions get deployed ● Cumbersome & intractable to test without tools & automation

  53. Scaling Data Products as a Startup ● Bootstrap servers with commercial APIs ● Bootstrap data with open web, public & synthetic datasets ● Automation is startups’ best friend

  54. Questions / Comments Twitter: @edwin


More recommend