a b testing in the wild disclaimer this talk represents
play

A/B Testing in the Wild Disclaimer: This talk represents my own - PowerPoint PPT Presentation

Emily Robinson @robinson_es A/B Testing in the Wild Disclaimer: This talk represents my own views, not those of Etsy Ov Overview INTRODUCTION CHALLENGES & LESSONS A/B Testing Etsy Business Statistical Et Etsy Etsy is a global


  1. Emily Robinson @robinson_es A/B Testing in the Wild

  2. Disclaimer: This talk represents my own views, not those of Etsy

  3. Ov Overview INTRODUCTION CHALLENGES & LESSONS A/B Testing Etsy Business Statistical

  4. Et Etsy

  5. Etsy is a global creative commerce platform. We build markets, services and economic opportunity for creative entrepreneurs.

  6. Our Items

  7. By The Numbers 1.8M 29.7M $2.84B 45+M active sellers active buyers annual GMS items for sale AS OF MARCH 31, 2017 AS OF MARCH 31, 2017 IN 2016 AS OF MARCH 31, 2017 Photo by Kirsty-Lyn Jameson

  8. A/B Testin ing

  9. What is A/B Testing?

  10. Old Experience

  11. New Feature

  12. A/B Testing: It’s Everywhere

  13. Highly Researched

  14. My Perspective Millions of Data Engineering visitors Pipeline daily Set-Up

  15. Generating numbers is easy; generating numbers you should trust is hard!

  16. Why Statistics Anyway? • “Election surveys are done with a few thousand people” 1 • Targeting small effects • A .5% change in conversion rate (e.g. 6% to 6.03%) on a high traffic page can be millions of dollars annually 1 Online Experimentation at Microsoft

  17. Example le Experim iment

  18. Listing Card Experiment

  19. Result 👏

  20. Listing Card Experiment: Redux 👎 🎊 👏 💰

  21. Statis istic ical l Challe llenges

  22. Level of Analysis Visit: Browser: User: activity by browser over cookie or device ID (for Signed-in user ID a defined time period apps) (30 minutes)

  23. Browser vs Visit: An Example I really want my own lightsaber

  24. Next Day

  25. Pros and Cons Visit Browser Captures relevant later Tighter attribution behavior Independence violation Introduces noise assumption Misses multiple events for Cannibalization potential proportion metrics Our conclusion: offer both, browser generally better

  26. GMS per User • Generally this is key metric • But it’s a very badly behaved distribution - Highly skewed and strictly non-negative: can’t use t-test - Many zeros: can’t log numbers

  27. ACBV/ACVV

  28. Definitions • Power: Probability if there is an effect of a certain magnitude, we will detect it • Bootstrap: random sampling with replacement • Simulation: modeling random events

  29. Test Selection Process Estimate Take Real Power of Experiment Different Tests

  30. Estimating Power

  31. Test Selection Process Estimate Estimate Find Best Take Real Power of Power for Simulation Experiment Different Different Method Tests Effect Sizes

  32. Simulation Method Comparison

  33. Estimating Power

  34. Power at 1% Increase in ACBV

  35. Busin iness Challe llenges

  36. Working with Teams

  37. Proactive Communication Demonstrate Develop Early value: relationship: involvement: Prioritization, Understand No post-mortems feasibility, teammates sequencing

  38. Dealing with Adhoc Questions Question: What’s the conversion rate of visitors in Estonia on Saturday looking in the wedding category? First Response: What decision are you using this for?

  39. Helps Avoid This

  40. Checks Translation

  41. We often joke that our job … is to tell our clients that their new baby is ugly Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained

  42. Business Partners & Experiments • Financial and emotional investment • Inaccurate expectations: - Features are built because team believes they’re useful - But experiment success rate across industry is (sometimes far less) than 50%

  43. Peeking Question: “What do the results mean?” Answer: “It’s been up for 15 minutes…”

  44. Daily Experiment Updates Offers Interpretation Shows You’re Monitoring *This is a Made-up Example

  45. Want Fast Decision Making

  46. Cost of Peeking: 5% FPR to 20%!

  47. Solution 1: Adjust P-Value Threshold Easy to Interpret Not Rigorous

  48. Solution 2: “Outlaw” Peeking Correct Way Miss Bugs

  49. Solution 3: Continuous Monitoring Peek and Stay Rigorous Complicated to Implement & Explain

  50. And at the End of the Day … From Julia Evans, @b0rk “How to be a Wizard Programmer”

  51. Resources • Controlled Experiments on the Web: Survey and Practical Guide • Overlapping Experiment Infrastructure: More, Better, Faster Experimentatio • From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks • What works in e-commerce – a meta-analysis of 6700 online experiments • Online Controlled Experiments at Large-Scale • Online Experimentation at Microsoft • Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained

  52. Acknowledgments • Evan D’Agostini (for ACBV development & slides) • Jack Perkins & Anastasia Erbe (former & fellow search analysts) • Michael Berkowitz, Callie McRee, David Robinson, Bill Ulammandakh, & Dana Levin- Robinson (for presentation feedback) • Etsy Analytics team • Etsy Search UI & Search Ranking teams

  53. Thank You tiny.cc/abslides robinsones.github.io @robinson_es

Recommend


More recommend