Clova Music: DJ AI (Adrian Kim), M.S. Clova AI - PowerPoint PPT Presentation

Clova Music: 똑똑한 DJ같은 AI�비서 김정명 (Adrian Kim), M.S. Clova AI Research(CLAIR), Naver Corp.

Clova: Cloud-based Virtual Assistant General Purpose AI platform

Clova: Cloud-based Virtual Assistant https://clova.ai

Clova: Cloud-based Virtual Assistant

Clova Music • The biggest need from a speaker would be MUSIC = Music Listening Platform?

Clova Music • Intelligent music recommendation service of Clova • Aims to be a human DJ-like curator • Powered with NAVER/LINE music user/content data

Contents • Part 1 Short Tutorial on Music modeling - What kind of data do we use? - What kind of models can we use? - What kind of problems can we solve? - Any industry research? • Part 2 Music Research in Clova - Recommendation Systems - Representation learning - Emotion recognition - Highlight extraction - Automatic DJ list generation

Introducing the Music Domain

Popular Domains...

Audio domain data +

Audio Domain Data Wave • Basic data form is 16 bit integer • You can normalize to [-1, 1] • 1D vector of samples For 16kHz, • 16kHz, 22050Hz, ... 30 seconds = 480k datapoints! • Very information inefficient

Audio Domain Data Spectrograms Expressive, has more information!

Audio Domain Data Mel-spectrograms frequency bins > 1k Mel Filter banks mel bins = 80, 96, 128

Audio Domain Data Mel-spectrograms • Mel-spectrogram filter distributions give relative focus on lower frequency bins Image from Choi, et. al. 16

Audio Domain Data Transformation between data types WavenetVocoder (Shen et al. 17) If complex, inverse stft If only magnitude, Griffin-Lim algorithm (1323000,) stft irreversable (1025, 2584) =2648600 mel filter bank (128, 2584) =330752

Low quality, weakly labeled (Choi et al. 2017) Issues Takes a lot of time for high quality Not much open data Dirty Labels Storage problem Memory problem Information per data point is very small Audio Low efficiency Data Too large Convoluted Multiple Must hear to sources evaluate

Issues Comparing Simple Tasks MNIST GTZAN Storage 45MB 1.2GB Data pairs 60000 1000 (30 second) Classes 10 digits 10 genres (100 each) Preprocessing Fast Slow Testing Easy Hard

Issues Comparing Speech and Music Bad Boy – Red Velvet News Speech Audio Short, Single source Long, Multiple source

Example Baselines

er ected Element-wise multiplication What kind of problems can we solve? er LSTM output Attention-weighted LSTM LSTM tional Attention ers (softmax) LSTM LSTM • Genre/Artist Classification ion n Channel summation • Automatic Tagging e el ion Convolution & pooling g • Music generation er • Style transfer • Source separation • Onset detection • Sound embedding • Beat tracking • and more...!

Autotagging with Convnets • Input: mel-spectrogram (MSD dataset) • Output: Tags (50 top tags) 2D Convs https://github.com/keunwoochoi/music-auto_tagging-keras Automatic tagging using deep convolutional neural networks, ISMIR 16, Choi et. al

Note: Filter design in CNNs 2D convs 1D convs Slow training Fast training Local structure in freq Frequencies are discrete nxm filters, 1 channel nx1 filters, m channels

Auto Music Transcription with Deep Complex Networks • Input: Spectrogram complex output • Change network components (batchnorm, initialization, activations, convolution) to match complex domain Real: real and imaginary values as separate channels complex: as suggested Deep Complex Networks, Trabelsi et al., To appear at ICLR18

WaveNet for TTS • Input: wav format data Image from https://kakalabblog.wordpress.com/2017/07/18/wavenetnsynth-deep-audio-generative-models/ WaveNet: A Generative Model for Raw Audio, Oord et al., https://arxiv.org/pdf/1609.03499.pdf

Industries focusing on Music Research and more!

NSynth: Encoding sounds with Wavenet Autoencoder • Wavenet based model made by Magenta to produce a neural synthesizer • Latent embeddings(Z) from various sounds made by the model can be used to produce new sounds • New dataset with instrument, pitch, etc. tags on individual sounds https://magenta.tensorflow.org/nsynth

Performance RNN • Trained by Yamaha e-Piano Competition dataset • Midi of 1400+ piano performances • Magenta used LSTMs to predict from 388 events occuring during the timeline Generated example https://magenta.tensorflow.org/performance-rnn

Discover Weekly • Spotify’s weekly personalized recommendation service • Collaborative Filtering • NLP modeling • Audio modeling http://benanne.github.io/2014/08/05/spotify-cnns.html#contentbased http://blog.galvanize.com/spotify-discover-weekly-data-science/

Any questions? • onto part 2..

Clova Music Recommendation System

Recommendation in Clova Music • User logs as main data, content data hybrid is possible • Large and sparse online data • Topics: • User log analysis • Music semantic embedding learning • Collaborative filtering with matrix factorization

* Reported at 2017 Oct. Top queries with Music • 노래 틀어줘 • 자장가 틀어줘 • 동요 틀어줘 • 신나는 노래 틀어줘 • Artists > Tracks • 조용한 노래 틀어줘 • Genre, mood, themes > Artists • 핑크퐁 노래 틀어줘 • JUST PLAY > Genres • 아이유 노래 틀어줘 • 클래식 틀어줘 • 분위기 좋은 음악 틀어줘 • 잔잔한 음악 틀어줘 • 발라드 틀어줘

* Reported at 2017 Oct. Device Usage Patterns NAVER_APP NAVER_PC WAVE CLOVA_APP 0 5 10 15 20 25

* Reported at 2017 Oct. Device Usage Patterns NAVER_APP WAVE 가요 기능성음악 팝 동요 OST 클래식 재즈 종교음악 일렉트로… 락 힙합 기타

* Reported at 2017 Oct. Device Usage Patterns • Artists / Play count ratio Playing ratio • Long tail distribution • Distribution itself is not so different... Artist

* Reported at 2017 Oct. Device Usage Patterns WAVE NAVER MUSIC APP Playing ratio 핑크퐁 EXO 아이유 아이유 동요 젝스키스 동요 방탄소년단 뉴이스트 EXO 뉴이스트(NU`EST) 윤종신 Wanna One 별하나 동요 윤종신 이루마 우원재 오르골뮤직 볼빨간사춘기 볼빨간사춘기 뉴이스트 W 젝스키스 황치열 트니트니 헤이즈 헤이즈 선미 성시경 WINNER Artist 힐링피아노 자장가

Implication • Paradigm shift in terms of music consumption on AI speaker devices • New market • Kids, New parents • Lean-out music, lounge music • Classic, Jazz • Music Recommendation takes an important role on AI assistant platforms

Recommendation Challenges Lack of well-defined Musical Semantic Embedding meta data Personalized Playlists Multimodal Semantic Embedding

Semantic Embedding Lack of well-defined meta data Music Semantic Embedding • Mapping tracks, artists, and words to the same embedding space • Word2Vec 가을 신나는 • Feature learning • Usages • Item similarities • Used as features

Semantic Embedding Lack of well-defined meta data Word2Vec with Tagged playlists • JAMM playlists • User-created playlists in Naver Music • About 72,000 playlists • Keywords from tags • Artists from tracks • Treat trackIds as ”words” within a playlist

Semantic Embedding That song in the charts • 벚꽃엔딩 / 버스커버스커

Semantic Embedding Personalized Playlists Multimodal Semantic Embedding • We would want to model different playlists for different personalities • Query: 밤편지 < 밤편지_2 > < 밤편지_1 >

Semantic Embedding Personalized Playlists Embedding with session data • User playing sequence as document! • We use multimodal word distributions formed from Gaussian distributions Ben Athiwaratkun and Andrew Gordon Wilson , Multimodal Word Distributions , 2017

Collaborative Filtering Most popular method: Matrix Factorization

Collaborative Filtering Matrix Factorization for Personalized Recommendation • Basic MF objective • Select tracks and artists that user prefers when generating a playlist • Simple, but hard to apply • Sparsity • Overfitting / Underfitting • Hard to evaluate (need real feedback, not rmse!) • Combining with other models

Collaborative Filtering What can we do? • Learning in 2 phases • Long term: batch learning • Short term: online learning • Negative sampling • When doing negative sampling, consider item distribution • Remove abusing users • Over clicking users • Top 100 only users

Remaining Challenges • Conventional problems • Sparsity • Top 100 songs • Cold-start problems • Explanatory recommendation • Music Recommendation for AI Speakers • Interaction • Lean-in / Lean-back • Personalizing level (Familiar vs New)

Music Modeling

Music Modeling • Audio data as main data • Topics: • Representation Vector Extraction (Park et al. 17) • Music Emotion Recognition (Jeon et al. 17) • Music Highlight Extraction (Ha et al. 17) • Automatic DJ mix Generation (Kim et al. 17)

Clova Music: DJ AI (Adrian Kim), M.S. Clova AI - PowerPoint PPT Presentation

Clova Music: DJ AI (Adrian Kim), M.S. Clova AI Research(CLAIR), Naver Corp. Clova: Cloud-based Virtual Assistant General Purpose AI platform Clova: Cloud-based Virtual Assistant https://clova.ai Clova:

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Neural Architectures for Music Representation Learning Sanghyuk Chun, Clova AI Research Contents

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Evaluating Weakly-Supervised Object Localization Methods Right Junsuk Choe * Seong Joon Oh*

A Musical Future Options for Studying Music at UWA Why choose Music at UWA? Music at UWA

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

School Music Education Plan THAMES Guidance for Schools Music in Schools - Introducing School

Radium: A Music Editor Inspired by the Music Tracker Kjetil Matheussen Norwegian Center for

Music recommendation and discovery in which Web? scar Celma (Music Technology Group, UPF)

1 Music IR Music? Music IR Music? Music - Sound Music - Sound - Loudness http://

Music Composition with LISP Drew Krause LispNYC November 13, 2012 Lisp Music Environments

Workshop: Music and Pain BRAMS -- Montreal Music, Pain and Emotions Neurobiological

Approximate Joins for Data-Centric XML Nikolaus Augsten 1 ohlen 1 Curtis Dyreson 2 Johann Gamper 1

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search Irshad Ahmad Bhat

Generating Computer Music from Skeletal Notation for Carnatic Music Compositions (M.

Agenda Announcements List comprehension Set Password example 1/14/2013 CompSci101

Jones, J. (2017, February 1). Scenery, machinery, people Rethinking our view of humans. The

Single mask technology implementation Piotr Bielwka 10 th RD51 Stony Brook Single mask

Analysis of Antarctic Scintillation Measured at McMurdo and South Pole Station Anthea Coster 1 ,

Slope problems in the theory of semigroups of holomorphic self-maps of the unit disc Santiago D

Clova Music: DJ AI (Adrian Kim), M.S. Clova AI - PowerPoint PPT Presentation

Clova Music: DJ AI (Adrian Kim), M.S. Clova AI Research(CLAIR), Naver Corp. Clova: Cloud-based Virtual Assistant General Purpose AI platform Clova: Cloud-based Virtual Assistant https://clova.ai Clova:

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Neural Architectures for Music Representation Learning Sanghyuk Chun, Clova AI Research Contents

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Evaluating Weakly-Supervised Object Localization Methods Right Junsuk Choe * Seong Joon Oh*

A Musical Future Options for Studying Music at UWA Why choose Music at UWA? Music at UWA

Music Tagging Ryan Curtin LUG@GT Ryan Curtin Music Tagging - p. 1 The Problem You have a

School Music Education Plan THAMES Guidance for Schools Music in Schools - Introducing School

Radium: A Music Editor Inspired by the Music Tracker Kjetil Matheussen Norwegian Center for

Music recommendation and discovery in which Web? scar Celma (Music Technology Group, UPF)

1 Music IR Music? Music IR Music? Music - Sound Music - Sound - Loudness http://

Music Composition with LISP Drew Krause LispNYC November 13, 2012 Lisp Music Environments

Workshop: Music and Pain BRAMS -- Montreal Music, Pain and Emotions Neurobiological

Approximate Joins for Data-Centric XML Nikolaus Augsten 1 ohlen 1 Curtis Dyreson 2 Johann Gamper 1

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search Irshad Ahmad Bhat

Generating Computer Music from Skeletal Notation for Carnatic Music Compositions (M.

Agenda Announcements List comprehension Set Password example 1/14/2013 CompSci101

Jones, J. (2017, February 1). Scenery, machinery, people Rethinking our view of humans. The

Single mask technology implementation Piotr Bielwka 10 th RD51 Stony Brook Single mask

Analysis of Antarctic Scintillation Measured at McMurdo and South Pole Station Anthea Coster 1 ,

Slope problems in the theory of semigroups of holomorphic self-maps of the unit disc Santiago D

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &