mike borsuk
play

Mike Borsuk mike.borsuk@optimizely.com About Optimizely Experiment - PowerPoint PPT Presentation

The Continuing Story Of Analytics at Optimizely : Batch, Streaming and Lambda Systems Mike Borsuk mike.borsuk@optimizely.com About Optimizely Experiment Everywhere o Experimentation, Personalization, Recommendations o Web, Mobile, OTT, Full


  1. The Continuing Story Of Analytics at Optimizely : Batch, Streaming and Lambda Systems Mike Borsuk mike.borsuk@optimizely.com

  2. About Optimizely Experiment Everywhere o Experimentation, Personalization, Recommendations o Web, Mobile, OTT, Full stack Data challenges o Billions of events per day received o Real-time results

  3. Overview Background & Motivation o Real Time Stream Processing o What is Lambda Architecture and how/why we o are implementing

  4. Optimizely X Personalization

  5. Personalization data scale o 4.14B raw events received daily o Grouped into 10M distinct visitor sessions daily (stream processing w/Samza) o Calculating and serving back millions of time series data points

  6. Personalization data challenges o From a single A/B test per experiment to multiple targeted tests in a campaign o Longer running data collection / analysis o Need for session based metrics o Data schema designed for single A/B tests

  7. Personalization data scale o Mean response time (HBase) goes from milliseconds to nearly 30s

  8. Realtime Stream Processing Persist raw events o S3 buckets grouped by 24h UTC Fan out events into processing queues o Kafka topics for event types Session aggregation w/Samza Groups clickstream events into sessions o Per-visitor basis o Split on 30 minutes inactivity o

  9. Stream Processing Architecture

  10. Lambda Architecture o Batch Layer o Serving Layer o Speed Layer

  11. Lambda Architecture

  12. Our Implementation of LA o Match schema to query patterns o Make time-series data “combinable” or at the same base granularity o Write data into HBase for locality at query time, “de-normalization”

  13. Our Implementation of LA o Immutable raw-event “source of truth” o Pre-computation batch jobs matching our real- time o Time range optimized real-time queries o Serving layer to merge batch + real-time o Done for performance, not accuracy

  14. Adding Lambda Layers Speed ``

  15. Adding Lambda Layers Speed Layer Pre-computed Time Series Realtime Computation Batch Layer Serving Layer Composite Time Series Result query time range

  16. Benefits we are seeing Solving our query latency issues •

  17. Benefits we are seeing o Flexibility o System Fault Tolerance o Human Fault Tolerance

  18. Drawbacks we are seeing o Complexity in serving layer o Batch job management o Operational Burdens

  19. References o Big Data, book by Nathan Marz and James Warren o Optimizely engineering blog: https://medium.com/engineers-optimizely o Samza specific: Optimizely presentation at LinkedIn streaming meetup (https://youtu.be/p7hjrKyfQkc)

Recommend


More recommend