DJ-MC: A Reinforcement Learning Agent for Music Playlist - PowerPoint PPT Presentation

DJ-MC: A Reinforcement Learning Agent for Music Playlist Recommendation Elad Liebman Maytal Saar-Tsechansky Peter Stone University of Texas at Austin May 11, 2015 1 / 35

Background & Motivation ◮ Many Internet radio services (Pandora, last.fm, Jango etc.) ◮ Some knowledge of single song preferences ◮ No knowledge of preferences over a sequence ◮ ...But music is usually in context of sequence ◮ Key idea - learn transition model for song sequences ◮ Use reinforcement learning 2 / 35

Overview ◮ Use real song data to obtain audio information ◮ Formulate the playlist recommendation problem as a Markov Decision Process ◮ Train an agent to adaptively learn song and transition preferences ◮ Plan ahead to choose the next song (like a human DJ) ◮ Our results show that sequence matters, and can be efficiently learned 3 / 35

Reinforcement Learning Framework The adaptive playlist generation problem – an episodic Markov Decision Process (MDP) ( S , A , P , R , T ) . For a finite set of n songs and playlists of length k : ◮ State space S – the entire ordered sequence of songs played, S = { ( a 1 , a 2 , . . . , a i ) | 1 ≤ i ≤ k ; ∀ j ≤ i , a j ∈ M} . ◮ The set of actions A is the selection of the next song to play, a k ∈ A , i.e. A = M . ◮ S and A induce a deterministic transition function P . Specifically, P (( a 1 , a 2 , . . . , a i ) , a ∗ ) = ( a 1 , a 2 , . . . , a i , a ∗ ) (shorthand notation). ◮ R ( s , a ) is the utility the current listener derives from hearing song a when in state s . ◮ T = { ( a 1 , a 2 , . . . a k ) } : the set of playlists of length k . 4 / 35

Song Descriptors ◮ Used a large archive - The Million Song Dataset (Bertin-Mahieux et al. ◮ Feature analysis and metadata provided by The Echo Nest ◮ 44745 different artists, 10 6 songs ◮ Used features describing timbre (spectrum), rhythmic characteristics, pitch and loudness ◮ 12 meta-features in total, out of which 2 are 12-dimensional, resulting in a 34-dimensional feature vector 9 / 35

Song Representation To obtain more compact state and action spaces, we represent each song as a vector of indicators marking the percentile bin for each individual descriptor: 10 / 35

Transition Representation To obtain more compact state and action spaces, we represent each transition as a vector of pairwise indicators marking the percentile bin transition for each individual descriptor: 11 / 35

Modeling The Reward Function We make several simplifying assumptions: ◮ The reward function R corresponding to a listener can be factored as R ( s , a ) = R s ( a ) + R t ( s , a ) . ◮ For each feature, for each each 10-percentile, the listener assigns reward ◮ for each feature, for each percentile-to-percentile transition, the listener assigns transition reward ◮ In other words, each listener internally assigns 3740 weights which characterize a unique preference. ◮ Transitions considered throughout history, stochastically (last song - non-Markovian state signal) ◮ totalReward t = R s ( a t ) + R t (( a 1 , . . . , a t − 1 ) , a t ) where t − 1 1 E [ R t (( a 1 , . . . , a t − 1 ) , a t )] = � i 2 r t ( a t − i , a t ) i = 1 12 / 35

Expressiveness of the Model ◮ Does the model capture differences between separate types of transition profiles? Yes ◮ Take same pool of songs ◮ Consider songs appearing in sequence originally vs. songs in random order ◮ Song transition profiles clearly different (19 of 34 features separable) 13 / 35

Learning Initial Models 14 / 35

Planning via Tree Search 15 / 35

Full DJ-MC Architecture 16 / 35

Experimental Evaluation in Simulation ◮ Use real user-made playlists to model listeners ◮ Generate collections of random listeners based on models ◮ Test algorithm in simulation ◮ Compare to baselines: random, and greedy ◮ Greedy only tries to learn song rewards 17 / 35

Experimental Evaluation in Simulation ◮ DJ-MC agent gets more reward than an agent which greedily chooses the “best” next song ◮ Clear advantage in “cold start” scenarios 18 / 35

Experimental Evaluation on Human Listeners ◮ Simulation useful, but human listeners are (far) more indicative ◮ Implemented a lab experiment version, with two variants: DJ-MC and Greedy ◮ 24 subjects interacted with Greedy (learns song preferences) ◮ 23 subjects interacted with DJ-MC (also learns transitions) ◮ Spend 25 songs exploring randomly, 25 songs exploiting (still learning) ◮ queried participants on whether they liked/disliked songs and transitions 19 / 35

Experimental Evaluation on Human Listeners ◮ To analyze results and estimate distributions, used bootstrap resampling ◮ DJ-MC gains substantially more reward (likes) for transitions ◮ Comparable for song transitions ◮ Interestingly, transition reward for Greedy somewhat better than random 20 / 35

Experimental Evaluation on Human Listeners 21 / 35

Experimental Evaluation on Human Listeners 22 / 35

Related Work ◮ Chen et al., Playlist prediction via metric embedding, KDD 2012 ◮ Aizenberg et al., Build your own music recommender by modeling internet radio streams, WWW 2011 ◮ Zheleva et al., Statistical models of music-listening sessions in social media, WWW 2010 ◮ Mcfee and Lanckriet, The Natural Language of Playlists, ISMIR 2011 23 / 35

Summary ◮ Sequence matters. ◮ Learning meaningful sequence preferences for songs is possible. ◮ A reinforcement-learning approach that models transition preferences does better (on actual human participants) compared to a method that focuses on single song preferences only. ◮ Learning can be done with respect to a single listener and online, in reasonable time and without strong priors. 24 / 35

Questions? Thank you for listening! 25 / 35

A few words on representative selection 26 / 35

DJ-MC: A Reinforcement Learning Agent for Music Playlist - PowerPoint PPT Presentation

DJ-MC: A Reinforcement Learning Agent for Music Playlist Recommendation Elad Liebman Maytal Saar-Tsechansky Peter Stone University of Texas at Austin May 11, 2015 1 / 35 Background & Motivation Many Internet radio services (Pandora,

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

My Statistical Questions: I have a playlist on my iPod for when I go running. A) How long are the

1 Further information: go.ifrs.org/IFRS-17-implementation and IFRS 17 webcasts YouTube playlist:

LECTURE 1: ubiquity ; INTRODUCTION interconnection ; intelligence ; delegation ;

Multi-agent Group Decision Making Presentation by: Julian Zappala Presented at: Doctoral School

CHAPTER 1: INTRODUCTION Multiagent Systems http://www.csc.liv.ac.uk/mjw/pubs/imas/

Dialogue Datasets C S294 S :B UILDING TH E B EST V IRTU AL A SSISTANT Ryan Kearns & Lucas Sato

Dhruv Batra Long-term Goal Physical agent Is there smoke in any room capable of taking around

The Flexlab Approach To Realistic Evaluation of Networked Systems Robert Ricci, Jonathon Duerig,

Inductive Analysis of the Internet Protocol TLS Lawrence C. Paulson Computer Laboratory

Mesos Networking with Project Calico Ed Harrison and Neil Jerram Christos Kozyrakis, Spike

DJ-MC: A Reinforcement Learning Agent for Music Playlist - PowerPoint PPT Presentation

DJ-MC: A Reinforcement Learning Agent for Music Playlist Recommendation Elad Liebman Maytal Saar-Tsechansky Peter Stone University of Texas at Austin May 11, 2015 1 / 35 Background & Motivation Many Internet radio services (Pandora,

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

My Statistical Questions: I have a playlist on my iPod for when I go running. A) How long are the

1 Further information: go.ifrs.org/IFRS-17-implementation and IFRS 17 webcasts YouTube playlist:

LECTURE 1: ubiquity ; INTRODUCTION interconnection ; intelligence ; delegation ;

Multi-agent Group Decision Making Presentation by: Julian Zappala Presented at: Doctoral School

CHAPTER 1: INTRODUCTION Multiagent Systems http://www.csc.liv.ac.uk/mjw/pubs/imas/

Dialogue Datasets C S294 S :B UILDING TH E B EST V IRTU AL A SSISTANT Ryan Kearns &amp; Lucas Sato

Dhruv Batra Long-term Goal Physical agent Is there smoke in any room capable of taking around

The Flexlab Approach To Realistic Evaluation of Networked Systems Robert Ricci, Jonathon Duerig,

Inductive Analysis of the Internet Protocol TLS Lawrence C. Paulson Computer Laboratory

Mesos Networking with Project Calico Ed Harrison and Neil Jerram Christos Kozyrakis, Spike

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Dialogue Datasets C S294 S :B UILDING TH E B EST V IRTU AL A SSISTANT Ryan Kearns & Lucas Sato