clova music dj ai
play

Clova Music: DJ AI (Adrian Kim), M.S. Clova AI - PowerPoint PPT Presentation

Clova Music: DJ AI (Adrian Kim), M.S. Clova AI Research(CLAIR), Naver Corp. Clova: Cloud-based Virtual Assistant General Purpose AI platform Clova: Cloud-based Virtual Assistant https://clova.ai Clova:


  1. Clova Music: 똑똑한 DJ같은 AI�비서 김정명 (Adrian Kim), M.S. Clova AI Research(CLAIR), Naver Corp.

  2. Clova: Cloud-based Virtual Assistant General Purpose AI platform

  3. Clova: Cloud-based Virtual Assistant https://clova.ai

  4. Clova: Cloud-based Virtual Assistant

  5. Clova Music • The biggest need from a speaker would be MUSIC = Music Listening Platform?

  6. Clova Music • Intelligent music recommendation service of Clova • Aims to be a human DJ-like curator • Powered with NAVER/LINE music user/content data

  7. Contents • Part 1 Short Tutorial on Music modeling - What kind of data do we use? - What kind of models can we use? - What kind of problems can we solve? - Any industry research? • Part 2 Music Research in Clova - Recommendation Systems - Representation learning - Emotion recognition - Highlight extraction - Automatic DJ list generation

  8. Introducing the Music Domain

  9. Popular Domains...

  10. Audio domain data +

  11. Audio Domain Data Wave • Basic data form is 16 bit integer • You can normalize to [-1, 1] • 1D vector of samples For 16kHz, • 16kHz, 22050Hz, ... 30 seconds = 480k datapoints! • Very information inefficient

  12. Audio Domain Data Spectrograms Expressive, has more information!

  13. Audio Domain Data Mel-spectrograms frequency bins > 1k Mel Filter banks mel bins = 80, 96, 128

  14. Audio Domain Data Mel-spectrograms • Mel-spectrogram filter distributions give relative focus on lower frequency bins Image from Choi, et. al. 16

  15. Audio Domain Data Transformation between data types WavenetVocoder (Shen et al. 17) If complex, inverse stft If only magnitude, Griffin-Lim algorithm (1323000,) stft irreversable (1025, 2584) =2648600 mel filter bank (128, 2584) =330752

  16. Low quality, weakly labeled (Choi et al. 2017) Issues Takes a lot of time for high quality Not much open data Dirty Labels Storage problem Memory problem Information per data point is very small Audio Low efficiency Data Too large Convoluted Multiple Must hear to sources evaluate

  17. Issues Comparing Simple Tasks MNIST GTZAN Storage 45MB 1.2GB Data pairs 60000 1000 (30 second) Classes 10 digits 10 genres (100 each) Preprocessing Fast Slow Testing Easy Hard

  18. Issues Comparing Speech and Music Bad Boy – Red Velvet News Speech Audio Short, Single source Long, Multiple source

  19. Example Baselines

  20. er ected Element-wise multiplication What kind of problems can we solve? er LSTM output Attention-weighted LSTM LSTM tional Attention ers (softmax) LSTM LSTM • Genre/Artist Classification ion n Channel summation • Automatic Tagging e el ion Convolution & pooling g • Music generation er • Style transfer • Source separation • Onset detection • Sound embedding • Beat tracking • and more...!

  21. Autotagging with Convnets • Input: mel-spectrogram (MSD dataset) • Output: Tags (50 top tags) 2D Convs https://github.com/keunwoochoi/music-auto_tagging-keras Automatic tagging using deep convolutional neural networks, ISMIR 16, Choi et. al

  22. Note: Filter design in CNNs 2D convs 1D convs Slow training Fast training Local structure in freq Frequencies are discrete nxm filters, 1 channel nx1 filters, m channels

  23. Auto Music Transcription with Deep Complex Networks • Input: Spectrogram complex output • Change network components (batchnorm, initialization, activations, convolution) to match complex domain Real: real and imaginary values as separate channels complex: as suggested Deep Complex Networks, Trabelsi et al., To appear at ICLR18

  24. WaveNet for TTS • Input: wav format data Image from https://kakalabblog.wordpress.com/2017/07/18/wavenetnsynth-deep-audio-generative-models/ WaveNet: A Generative Model for Raw Audio, Oord et al., https://arxiv.org/pdf/1609.03499.pdf

  25. Industries focusing on Music Research and more!

  26. NSynth: Encoding sounds with Wavenet Autoencoder • Wavenet based model made by Magenta to produce a neural synthesizer • Latent embeddings(Z) from various sounds made by the model can be used to produce new sounds • New dataset with instrument, pitch, etc. tags on individual sounds https://magenta.tensorflow.org/nsynth

  27. Performance RNN • Trained by Yamaha e-Piano Competition dataset • Midi of 1400+ piano performances • Magenta used LSTMs to predict from 388 events occuring during the timeline Generated example https://magenta.tensorflow.org/performance-rnn

  28. Discover Weekly • Spotify’s weekly personalized recommendation service • Collaborative Filtering • NLP modeling • Audio modeling http://benanne.github.io/2014/08/05/spotify-cnns.html#contentbased http://blog.galvanize.com/spotify-discover-weekly-data-science/

  29. Any questions? • onto part 2..

  30. Clova Music Recommendation System

  31. Recommendation in Clova Music • User logs as main data, content data hybrid is possible • Large and sparse online data • Topics: • User log analysis • Music semantic embedding learning • Collaborative filtering with matrix factorization

  32. * Reported at 2017 Oct. Top queries with Music • 노래 틀어줘 • 자장가 틀어줘 • 동요 틀어줘 • 신나는 노래 틀어줘 • Artists > Tracks • 조용한 노래 틀어줘 • Genre, mood, themes > Artists • 핑크퐁 노래 틀어줘 • JUST PLAY > Genres • 아이유 노래 틀어줘 • 클래식 틀어줘 • 분위기 좋은 음악 틀어줘 • 잔잔한 음악 틀어줘 • 발라드 틀어줘

  33. * Reported at 2017 Oct. Device Usage Patterns NAVER_APP NAVER_PC WAVE CLOVA_APP 0 5 10 15 20 25

  34. * Reported at 2017 Oct. Device Usage Patterns NAVER_APP WAVE 가요 기능성음악 팝 동요 OST 클래식 재즈 종교음악 일렉트로… 락 힙합 기타

  35. * Reported at 2017 Oct. Device Usage Patterns • Artists / Play count ratio Playing ratio • Long tail distribution • Distribution itself is not so different... Artist

  36. * Reported at 2017 Oct. Device Usage Patterns WAVE NAVER MUSIC APP Playing ratio 핑크퐁 EXO 아이유 아이유 동요 젝스키스 동요 방탄소년단 뉴이스트 EXO 뉴이스트(NU`EST) 윤종신 Wanna One 별하나 동요 윤종신 이루마 우원재 오르골뮤직 볼빨간사춘기 볼빨간사춘기 뉴이스트 W 젝스키스 황치열 트니트니 헤이즈 헤이즈 선미 성시경 WINNER Artist 힐링피아노 자장가

  37. Implication • Paradigm shift in terms of music consumption on AI speaker devices • New market • Kids, New parents • Lean-out music, lounge music • Classic, Jazz • Music Recommendation takes an important role on AI assistant platforms

  38. Recommendation Challenges Lack of well-defined Musical Semantic Embedding meta data Personalized Playlists Multimodal Semantic Embedding

  39. Semantic Embedding Lack of well-defined meta data Music Semantic Embedding • Mapping tracks, artists, and words to the same embedding space • Word2Vec 가을 신나는 • Feature learning • Usages • Item similarities • Used as features

  40. Semantic Embedding Lack of well-defined meta data Word2Vec with Tagged playlists • JAMM playlists • User-created playlists in Naver Music • About 72,000 playlists • Keywords from tags • Artists from tracks • Treat trackIds as ”words” within a playlist

  41. Semantic Embedding That song in the charts • 벚꽃엔딩 / 버스커버스커

  42. Semantic Embedding Personalized Playlists Multimodal Semantic Embedding • We would want to model different playlists for different personalities • Query: 밤편지 < 밤편지_2 > < 밤편지_1 >

  43. Semantic Embedding Personalized Playlists Embedding with session data • User playing sequence as document! • We use multimodal word distributions formed from Gaussian distributions Ben Athiwaratkun and Andrew Gordon Wilson , Multimodal Word Distributions , 2017

  44. Collaborative Filtering Most popular method: Matrix Factorization

  45. Collaborative Filtering Matrix Factorization for Personalized Recommendation • Basic MF objective • Select tracks and artists that user prefers when generating a playlist • Simple, but hard to apply • Sparsity • Overfitting / Underfitting • Hard to evaluate (need real feedback, not rmse!) • Combining with other models

  46. Collaborative Filtering What can we do? • Learning in 2 phases • Long term: batch learning • Short term: online learning • Negative sampling • When doing negative sampling, consider item distribution • Remove abusing users • Over clicking users • Top 100 only users

  47. Remaining Challenges • Conventional problems • Sparsity • Top 100 songs • Cold-start problems • Explanatory recommendation • Music Recommendation for AI Speakers • Interaction • Lean-in / Lean-back • Personalizing level (Familiar vs New)

  48. Music Modeling

  49. Music Modeling • Audio data as main data • Topics: • Representation Vector Extraction (Park et al. 17) • Music Emotion Recognition (Jeon et al. 17) • Music Highlight Extraction (Ha et al. 17) • Automatic DJ mix Generation (Kim et al. 17)

Recommend


More recommend