democratizing metric definition discovery at airbnb
play

Democratizing Metric Definition & Discovery at Airbnb Lauren - PowerPoint PPT Presentation

Lauren Chircus / April 18, 2018 Democratizing Metric Definition & Discovery at Airbnb Lauren Chircus / April 18, 2018 Changing the paradigm on metric management Does this metrics workflow look familiar? Schedule A/B Add table Create


  1. Lauren Chircus / April 18, 2018 Democratizing Metric Definition & Discovery at Airbnb

  2. Lauren Chircus / April 18, 2018 Changing the paradigm on metric management

  3. Does this metrics workflow look familiar? Schedule A/B Add table Create Backfill Create Airflow Testing to Table Table Dashboard Job Config Superset Tell Anomaly consumers Monitor Detection to update Pipelines Config queries

  4. Lauren Chircus Company: Airbnb Role: Product Manager Previous Role: Data Scientist Twitter: @lchircus Fun Fact: This Airbnb near Salinas was my favorite

  5. You can change the paradigm! Global Create Metric Dashboard Config

  6. Changing the metric 1. Airbnb’s journey management 2. Why you should make dimensions paradigm first class citizens 3. Why prioritize bonus features early

  7. Airbnb’s Journey

  8. Plethora of tools for building & accessing data A/B testing Anomaly Airflow Detection

  9. Strong, open source-based compute environment A/B testing Anomaly Airflow Detection

  10. Consuming metrics was painful, too Metrics weren’t reusable across tools -> discrepancies

  11. Consuming metrics was painful, too Metrics weren’t Metrics were hard to reusable across tools -> find discrepancies

  12. Consuming metrics was painful, too Metrics weren’t Metrics were hard to Required SQL reusable across tools -> find knowledge or prepared discrepancies dashboards

  13. Global Metrics Framework A/B testing Anomaly Airflow Detection

  14. What is Global Metrics? “Global Metrics” is the concept that metrics should be defined in one place , have strong metadata , and available wherever you need them .

  15. Can we reuse existing infra? ? Global Metrics Framework ML Feature Framework

  16. The basic frameworks look similar Logic & metadata store Compute data Consuming Apps

  17. ML: serve data to models Logic & metadata store Compute data Consuming Apps Search Pricing Fraud ....

  18. Metrics: serve data to apps Logic & metadata store Compute data Consuming Apps Anomaly A/B testing Detection

  19. Metrics are different than ML features Metrics ML Features ● Leverage as much information as possible ● Entirely offline ● Diverse metric types

  20. Metrics are different than ML features Metrics ML Features ● Leverage as much ● Prevent data leakage to information as possible keep models clean ● Entirely offline ● Available online and offline ● Diverse metric types ● Windowing functions

  21. Similar basics, different details ≠ Global Metrics Framework ML Feature Framework

  22. Why dimensions are 1st class citizens

  23. Denormalization makes analytics speedy Image Source

  24. doesn’t allow joins timestamp shape color count 12:00 square yellow 23 12:00 circle yellow 2 12:00 square red 57 12:00 circle red 188

  25. Many metrics are dimensional cuts Company Bookings

  26. Many metrics are dimensional cuts Growth First Time Bookings Company Bookings

  27. Many metrics are dimensional cuts Growth First Time Bookings Company China Bookings Bookings in China

  28. Many metrics are dimensional cuts Growth First Time Bookings Company China Bookings Bookings in China Airbnb for Work Business Trip Bookings

  29. Exploratory analysis across many dimensional cuts Bookings in China from North America by host status

  30. Standard Star Schema Foreign Key Foreign Key Foreign Key

  31. Global Metrics Framework Naming Subject Dimension Source Subject Dimension Source Metric Source Dimension Source Subject

  32. YAML configs instead of tables Dimension Source Origin_geo.yaml Dimension Source Metric Source Destination_geo.yaml Bookings.yaml Dimension Source Host_status.yaml

  33. Data scientists list which dimensions to include metric_source: bookings metrics: - bookings - nights subjects: - listing - guest - host dimensions: - dim_destination_china - dim_origin_region - dim_new_host

  34. Automatically joins to the relevant dimension sources dim_source: origin_geo dim_source: destination_geo dimensions: metric_source: bookings - dim_origin_region dimensions: metrics: - dim_destination_region subject: - bookings - dim_destination_china - guest - nights subject: subjects: - listing - listing - guest dim_source: host_status - host dimensions: dimensions: - dim_new_host - dim_destination_china - dim_origin_region subject: - dim_new_host - host

  35. Bookings has hundreds of dimensions Bookings in China by returning users from North by platform America by host status for work

  36. Expensive dimensions Bookings in China by returning users from North by platform America by host status by Listing Lifetime Value

  37. Dimension sets give DS control over SLAs metric_source: bookings metrics: - bookings - nights dimension_sets: china_dims: - dim_destination_china - dim_origin_region host_dims: - dim_new_host - dim_origin_region

  38. Dimension sets give DS control over SLAs table: bookings__china_dims metric_source: bookings columns: - bookings metrics: - nights - bookings - dim_destination_china - nights - dim_origin_region dimension_sets: china_dims: table: bookings__host_dims - dim_destination_china - dim_origin_region columns: - bookings host_dims: - nights - dim_new_host - dim_new_host - dim_origin_region - dim_origin_region

  39. Global Metrics Framework = Denormalization Machine Super powerful for ad hoc analysis Bookings in North America Bookings by lifetime value by new guests from North America in China by host status

  40. Config-driven pipeline generation eliminates 3 steps Global Metric Config Schedule A/B Add table Create Backfill Create Airflow Testing to Table Table Dashboard Job Config Superset Tell Anomaly consumers Monitor Detection to update Pipelines Config queries

  41. Logic & metadata store Compute data Consuming Apps Anomaly A/B testing Detection

  42. Serving data to apps eliminates 3 more steps Global Metric Config Schedule A/B Add table Create Backfill Create Airflow Testing to Table Table Dashboard Job Config Superset Tell Anomaly consumers Monitor Detection to update Pipelines Config queries

  43. Bonus features for data scientist drive love

  44. Free stuff Automatic backfills when metrics or dimensions change

  45. Free stuff z Automatic backfills Self-healing when days when metrics or are missed dimensions change

  46. Free stuff z Automatic backfills Self-healing when days Dashboard generation when metrics or are missed script dimensions change

  47. Bonus features eliminate 2 more steps Global Metric Config Schedule A/B Add table Create Backfill **Create Airflow Testing to Table Table Dashboard Job Config Superset Tell Anomaly consumers Monitor Detection to update Pipelines Config queries

  48. Old Data Science metric workflow took >2 weeks for simple changes Schedule A/B Add table Create Backfill Create Airflow Testing to Table Table Dashboard Job Config Superset Tell Anomaly consumers Monitor Detection to update Pipelines Config queries

  49. New Data Science metric workflow takes <2 days Global **Create Metric Dashboard Config **semi-automated

  50. Focusing on producers drives love “It has dramatically reduced time to insight.”

  51. Focusing on producers drives love “It has dramatically reduced time to insight.” “In our current world, even simple changes are painful. With Global Metrics, most of it becomes trivial.”

  52. Focusing on producers drives love “It has dramatically reduced time to insight.” “In our current world, even simple changes are painful. With Global Metrics, most of it becomes trivial.” "You can put me in the satisfied customer quotes!”

  53. At the time of official launch (last week) Word-of-mouth adoption ● >20 teams contributing ● > 350 metrics added Less-technical contributors (Finance) ●

  54. Changing the metric 1. Airbnb’s journey management 2. Why you should make dimensions paradigm first class citizens 3. Why prioritize bonus features early

  55. Where to go from here?

  56. More features for metric consumers Leverage metadata in Superset integration

  57. More features for metric consumers Leverage metadata in Make metrics more Superset integration discoverable

  58. More features for metric consumers Leverage metadata in Make metrics more Metric certification Superset integration discoverable process

  59. Open Source? Global Metrics Framework A/B testing Anomaly Airflow Detection

  60. Questions? Twitter: @lchircus LinkedIn: linkedin.com/in/lchircus Email: lauren.chircus@airbnb.com

Recommend


More recommend