Music Recommendation in Spotify Boxun Zhang About me Data - PowerPoint PPT Presentation

Music Recommendation in Spotify Boxun Zhang

About me • Data scientist at Spotify • Big hype nowadays • Build models of user behavior • Develop algorithms • Design A/B tests • Ph.D. in CS from TU Delft (NL) • Studied user behavior in P2P systems • Interned at Spotify

Outline • Spotify basics • Machine learning at Spotify • Music recommendation • Collaborative filtering • Latent factor model • Approximate nearest neighbor search • Future work

Spotify basics • A popular music streaming service • 60M+ active users • 30M+ songs • 1.5B+ user-generated playlists • Multi-platform, now also on PlayStation • Available in 58 countries

Privacy • Private session 

Machine learning at Spotify • User segmentation • Churn/conversion prediction • Ads clicking • Automatic playlist generation • Related artists • Music recommendation

Music recommendation • Help users to discover good music • Search: requires lots of efforts • Browse: good curated playlists, but not personalized • Discover: personalized recommendations Not that trivial for our large catalog and user base

Collaborative filtering • Predict user rating on items • Popular strategy for recommender systems • Exploits user interactions with items, songs or videos • Domain-free • Suffers from the cold start problem • Memory-based approach • Model-based approach

Latent factor model • Proved to be more effective in the Netflix prize • How it works • Build user-item interaction matrix [users, items] • Map user/item vectors to a latent factor space • The latent factor space should have much lower dimensions • Approximate users’ ratings using latent vectors

From video to music • Implicit user feedback in Spotify • Binary rating of songs: 1 if streamed, otherwise 0 • Repetitive consumption • An ad-hoc weight on user rating

Compute latent vectors • Minimize the loss function below • r ui : 1 if a track if streamed, otherwise 0 • p u : user vector • q i : item vector 1 + a × plays • c ui : ad-hoc weight to consider repetitive consumption ui • λ : regularization penalty æ ö 2 + å å å 2 c ui ( r ui - q i + l ç ÷ T p u ) 2 p u q i è ø u , i u i

Compute latent vectors, cont. • Alternating least squares • Cost function becomes quadratic when fixing either user factors or item factors • Minimize the cost function iteratively until convergent • Linear run-time complexity in each iteration • Support parallelization in e.g., Hadoop • Spotify matrix • 40 latent factors • Computation converges within ~20 iterations (a few hours) • On our Hadoop cluster of ~1,300 nodes

The real reality • It’s not only the latent factor model • We use an ensemble model to approximate user ratings • include some other information

Find recommendations • There are 30M+ songs out there • 20K+ songs added every day • Brute-force? Too slow, and NOT cool! • Use (Approximate) Nearest Neighbor (ANN) search

Annoy • Locality-sensitive hashing • Vectors close to each other are still close nearby after been projected to a space with lower dimensionality or a hyperplane • Build a tree with intermediate nodes being random hyperplanes • Nearby vectors likely to be on the same side • Better approximation with several trees • Very fast query www.github.com/spotify/annoy

Future work • Include bias and temporal patterns into latent factor model • Improve evaluation of recommender system • Echo Nest: Signal processing • Deep learning, maybe

Since two days ago • Not only music any more • Video • Podcast • News • Context-based recommendations • Running

Thank you

Music Recommendation in Spotify Boxun Zhang About me Data - PowerPoint PPT Presentation

Music Recommendation in Spotify Boxun Zhang About me Data scientist at Spotify Big hype nowadays Build models of user behavior Develop algorithms Design A/B tests Ph.D. in CS from TU Delft (NL) Studied user behavior in

Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com Mrti Kalvns

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com)

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com)

The Spotify Platform WOW Hack Gteborg 2014 Per-Olov Jernberg @possan @SpotifyPlatform Spotify

Danielle de Ferrari Sarah de Ferrari Source: Spotify Source: Spotify, 2014 Source: Mashable,

Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Music recommenda tion System - Spotify Collaborative Filtering and Feedback System 1 Mithun

Music recommendation and discovery in which Web? scar Celma (Music Technology Group, UPF)

Breaking the hierarchy How Spotify enables engineer decision making Kristian Lindwall, Spotify

TICKETMASTER SPOTIFY We are proposing a new way for music fans to purchase concert tickets by

Music recommendation at Spotify Ben Carterette What we do Spotifys mission is to unlock the

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Cindy Franklin, Director Alcohol & Marijuana Control Boards All 9 articles are currently

Machine Learning Review 1 Linear Regression Assume a set of traning data is denoted by { x ( i )

Growing Sustainability for Parent Support Organizations You will be able to see the webinar

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

GoSam 2.0 Gudrun Heinrich Max Planck Institute for Physics, Munich In collaboration with

UNDERSTAND PASSWORD POLICY IN OPENLDAP AND DISCOVER TOOLS TO MANAGE IT Pass the SALT 2020 $

Compression Outline Introduction : Lossy vs. Lossless, Benchmarks, 15-583:Algorithms in the

LOCALIZATION AND SPREADING OF INTERFACES (CONTACT DISCONTINUITIES) IN PPM AND WENO SIMULATIONS OF

Music Recommendation in Spotify Boxun Zhang About me Data - PowerPoint PPT Presentation

Music Recommendation in Spotify Boxun Zhang About me Data scientist at Spotify Big hype nowadays Build models of user behavior Develop algorithms Design A/B tests Ph.D. in CS from TU Delft (NL) Studied user behavior in

Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com Mrti Kalvns

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com)

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com)

The Spotify Platform WOW Hack Gteborg 2014 Per-Olov Jernberg @possan @SpotifyPlatform Spotify

Danielle de Ferrari Sarah de Ferrari Source: Spotify Source: Spotify, 2014 Source: Mashable,

Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Music recommenda tion System - Spotify Collaborative Filtering and Feedback System 1 Mithun

Music recommendation and discovery in which Web? scar Celma (Music Technology Group, UPF)

Breaking the hierarchy How Spotify enables engineer decision making Kristian Lindwall, Spotify

TICKETMASTER SPOTIFY We are proposing a new way for music fans to purchase concert tickets by

Music recommendation at Spotify Ben Carterette What we do Spotifys mission is to unlock the

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Cindy Franklin, Director Alcohol &amp; Marijuana Control Boards All 9 articles are currently

Machine Learning Review 1 Linear Regression Assume a set of traning data is denoted by { x ( i )

Growing Sustainability for Parent Support Organizations You will be able to see the webinar

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

GoSam 2.0 Gudrun Heinrich Max Planck Institute for Physics, Munich In collaboration with

UNDERSTAND PASSWORD POLICY IN OPENLDAP AND DISCOVER TOOLS TO MANAGE IT Pass the SALT 2020 $

Compression Outline Introduction : Lossy vs. Lossless, Benchmarks, 15-583:Algorithms in the

LOCALIZATION AND SPREADING OF INTERFACES (CONTACT DISCONTINUITIES) IN PPM AND WENO SIMULATIONS OF

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Cindy Franklin, Director Alcohol & Marijuana Control Boards All 9 articles are currently