the evolution of spotify home architecture emily anil
play

The Evolution of Spotify Home Architecture Emily Anil Staff - PowerPoint PPT Presentation

The Evolution of Spotify Home Architecture Emily Anil Staff Engineer Data Engineer @anilmuppallar @emilymsa Our mission is to unlock the potential of human creativity by giving a million creative artists the opportunity to live off


  1. The Evolution of Spotify Home Architecture

  2. Emily Anil Staff Engineer Data Engineer @anilmuppallar @emilymsa

  3. Our mission is to unlock the potential of human creativity — by giving a million creative artists the opportunity to live off their art and billions of fans the opportunity to enjoy and be inspired by it.

  4. shelf name shelf card

  5. Overview ● Started with a Batch architecture ● Used services to hide complexity and be more reactive ● Leveraged GCP and added streaming pipelines to build a product based on user activity

  6. Batch 2016

  7. Batch Songs Played Logs Word2Vec

  8. word2vec A natural language processing model to learn vector representations of words (“embeddings”) from text. https://www.tensorflow.org/tutorials/word2vec

  9. word2vec Output: Input: Vector representation of tracks Playlists

  10. word2vec 2Pac Mozart Output: Bach Input: Vector representation of tracks Playlists

  11. Batch Songs Played Logs Word2Vec

  12. Batch Songs Played Logs Hadoop Jobs Word2Vec

  13. Batch Songs Played Logs Hadoop Cassandra Jobs Word2Vec

  14. Batch Songs Played Logs Hadoop Cassandra Jobs Word2Vec

  15. Batch Songs Played Logs Hadoop CMS Cassandra Jobs Word2Vec

  16. Batch Songs Played Logs Hadoop Fetch Shelf CMS Cassandra Jobs for Home Word2Vec

  17. Pros & Cons - Recommendations updated + Low latency to load Home once every 24 hours - Calculate recommendations + Fallback to old data if it fails to for every user, even if they generate recommendations aren’t active - Experimentation can be difficult - Operational overhead to maintain Cassandra and Hadoop

  18. Batch Songs Played Logs Hadoop Fetch Shelf CMS Cassandra Jobs for Home Word2Vec

  19. Batch Songs Played Logs Hadoop Fetch Shelf CMS Cassandra Jobs for Home Word2Vec

  20. Services 2017

  21. Services Songs Played Service Word2Vec Service

  22. Services Songs Played Service CMS Word2Vec Service

  23. Services Songs Played Service Create Shelf CMS for Home Word2Vec Service

  24. Services Songs Played Service Create Shelf CMS for Home Word2Vec Service

  25. Services Songs Played Service Create Shelf Create Shelf for Home CMS for Home Word2Vec Service

  26. Services Songs Played Service Create Shelf Create Shelf for Home Create Shelf for Home CMS for Home Word2Vec Service

  27. Pros & Cons + Updates recommendations at - High latency to load Home request time - No fallback if request fails + Calculate recommendations for Home users only + Simplified stack + Easier to Experiment + Google managed infrastructure

  28. Streaming ++ Services 2018 - Present

  29. Streaming Pipelines Google Dataflow pipelines using Spotify Scio - scala wrapper on Apache ● Beam ● Real time data - Unbounded stream of user events All user events are available as Google Pubsub topics ○ ● Perform aggregation operations using time based windows groupBy, countBy, join... ○ ● Store the results Pubsub, BigQuery, GCS, Bigtable ○

  30. Real time Signals follow

  31. Real time Signals pubsub follow pubsub pubsub

  32. Real time Signals pubsub Streaming follow pubsub Pipeline pubsub

  33. Real time Signals pubsub Streaming pubsub follow pubsub Pipeline pubsub

  34. Real time Signals Streaming Create follow pubsub Pipeline Shelves

  35. Real time Signals Streaming Create follow pubsub Pipeline Shelves

  36. Songs Real time Signals Played Service Streaming Create follow pubsub Pipeline Shelves Word2Vec Service

  37. Songs Real time Signals Played Service Streaming Create follow pubsub Pipeline Shelves Word2Vec Service Write Write Shelf BT BT Fetch Shelf

  38. Songs Real time Signals Played Service Streaming Create follow pubsub Pipeline Shelves Word2Vec Service Write Write Shelf BT BT Fetch CMS Shelf

  39. Pros & Cons + Updates recommendations based - More complex stack on user events - More tuning in the system + Computing recommendations out - Event spikes of request path + Guardrails + Fresher content, driven by user sessions - Debugging is more complicated + Fallback to previously generated recommendations + Easy to experiment

  40. Lessons Learned Batch Services Streaming ++ Services + Updates are + Fallback to old + Updates are fast frequent/fast recommendations - High Latency to load + Low latency to load + Low latency to load Home Home Home - No fallback if + Fallback to old - Updates are slow request fails recommendations - Balance computation frequency and downstream system load

  41. Lessons Learned Batch Services Streaming ++ Services + Updates are + Fallback to old + Updates are fast frequent/fast recommendations - High Latency to load + Low latency to load + Low latency to load Home Home Home - No fallback if + Fallback to old - Updates are slow request fails recommendations - Balance computation frequency and downstream system load

  42. Takeaways Less overhead with managed infrastructure. Focus more on ● product ● If you care about timeliness, then adopt streaming pipelines Beware of event spikes ○ ● Optimize for developer productivity and ease of experimentation Creating a new shelf is as simple as writing a new function. ○

  43. Hi! I’m Luna, Any questions?

Recommend


More recommend