course introduction
play

Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Course Introduction Juhan Nam Who We Are Instructor: Juhan Nam Associate Professor, Graduate School of Culture Technology (GSCT) Affiliated Professor, Graduate


  1. GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Course Introduction Juhan Nam

  2. Who We Are ● Instructor: Juhan Nam ○ Associate Professor, Graduate School of Culture Technology (GSCT) ○ Affiliated Professor, Graduate School of Artificial Intelligence (GSAI) ○ Music and Audio Computing Lab: https://mac.kaist.ac.kr/ ● TAs ○ Taegyun Kwon, PhD student, GSCT ○ Taejun Kim, PhD student, GSCT ○ Wonil Kim, PhD student, GSCT ○ Seunghyun Lee, MS Student, GSAI

  3. Music ● The most widely enjoyed cultural contents and activities Music in KAIST Source: http://times.kaist.ac.kr/news/articleView.html?idxno=3835, http://times.kaist.ac.kr/news/articleView.html?idxno=3185

  4. Music and Computer ● Computer is now essential in musical activities ○ Music listening: download music tracks as compressed audio files and uncompress them into waveforms

  5. Music and Computer ● Computer is now essential in musical activities ○ Music performance: musical instrument, karaoke machine

  6. Music and Computer ● Computer is now essential in musical activities ○ Music composition and production: recording, MIDI, processing, mixing

  7. Data and Processing ● The role of computer is representing musical data in a digital form and processing them according to the target task Output data Input data ○ Data: audio, MIDI, text (meta data) ○ Processing: audio (un)compression, sound synthesis, recording, digital audio effect, editing, and mixing

  8. Data and Processing ● The role of computer is representing musical data in a digital form and processing them according to the target task Output data Input data In such music systems, each step of processing is hand-designed and programmed by human based on domain knowledge such as digital signal processing, acoustics, and music theory

  9. Machine Learning for Music? ● Machine Learning (ML) is a method of teaching computer how to make accurate predictions using data Output data Input data In the ML-based systems, each step of processing is learned from data through a learning algorithm ● Why we need machine learning for music ?

  10. Music Listening ● Music at scale ○ Spotify: 60M tracks, 40K new tracks per day, and 4B playlists (2020) ○ SoundCloud: 200M tracks (2019) ○ YouTube: 500 min videos upload per min (2020) (music tracks, music videos and performance videos) ● Content organization, search, and recommendation become important ○ Meta data is not enough to explain the “content” of music ○ Need more rich descriptions Naver VIBE Source: https://newsroom.spotify.com/company-info/

  11. Music Listening ● Pandora’s Music Genome Project (1999) ○ Annotate a track with about 450 music attributes Genre, instruments, timbre, vocal quality, … ■ ○ Playlists are generated using the similarity of music attribute vectors ○ Not biased by the popularity Pandora Internet Radio ● Problems ○ Takes 20-30 mins to annotate a single track by a music expert ○ The dictionary size of music attributes is fixed

  12. Music Listening ● Need to teach computer how to describe music with natural language as humans do ○ Associate music with not only musical terms (e.g., genre, instrument, timbre) but also listening contexts (e.g. mood, time of day, location and activity) Input data Output data

  13. Google for Music (KAIST MAC Lab)

  14. Music Performance ● Mobile apps become popular in music education and entertainment ○ Music score ○ Karaoke ○ Instrument learning game ● Emergence of “smart” features MuseScore Smule ○ Performance evaluation ○ Score following and page turning ○ Auto-accompaniment Yousician Source: https://promusicianhub.com/yousician-review/, https://musescore.org, https://www.smule.com/

  15. Music Performance ● Extract music score information from audio ○ Identify and separate sound sources in adverse acoustic conditions: microphone, reverberation and interfering sources ○ Detect multiple pitches from polyphonic musical instruments: e.g., piano, guitar Source: http://jameasy.com/ko/company.html, https://magenta.tensorflow.org/onsets-frames

  16. Music Performance ● Need to teach computer how to separate individual sources and extract musical information from complex auditory scenes ○ Source separation from mixed audio ○ Transcribe polyphonic music into music score or MIDI Input data Output data

  17. Re-Performance by Polyphonic Piano Transcription (KAIST MAC Lab)

  18. Music Composition ● Automatic music composition has been a dream project since the birth of computer ○ Illiac Suite (String Quartet No. 4) (1957) ■ The first music score composed by an electronic computer ■ Composed using a Markov model https://www.youtube.com/watch?v=n0njBFLQSk8 ■ ○ Experiments in Musical Intelligence (EMI) (1980s) The Experimental Music Studio (Lejaren Hiller and Leonard Isaacson) Style imitation using pre-composed patterns (recombinant) ■ ■ https://www.youtube.com/watch?v=t6WeiyvAiYQ&t=52s ○ Numerous approaches in “algorithmic composition” ■ Audio/MIDI programming languages: Music-N, CSound, Max/PD, Common Music, Supercollider, Chuck ■ Rule-based or statistical models EMI: recombinant music (David Cope) Source: http://www.moz.ac.at/sem/lehre/lib/es/ems/hist/battisti.html

  19. Music Composition ● Recent advances in machine learning ○ Learn the sequential order of music data in a highly data-driven way ○ MIDI or audio generation ■ Flow Machine (Sony): https://www.flow-machines.com/ Music Transformer (Google): https://magenta.tensorflow.org/music-transformer ■ ■ Jukebox (OpenAI): https://openai.com/blog/jukebox/ Flow Machine Jukebox Music Transformer (“Hello World”: AI-composed album)

  20. Music Composition ● Need to teach computer how to learn the distribution of the high-dimensional long-term sequential data and generate music from conditions given by human ○ The conditions can be semantic, artist, lyrics, score, audio or even preference ○ Possible to create novel pieces (collaborating with human) ? Input data Output data

  21. Machine Learning for Music ● Powerful means to teach computer how to listen, perform and compose music Audio Learning Score (MIDI) Model Text

  22. Deep Learning ● The key element in recent AI technology and developed mainly in the computer vision, speech processing and natural language processing communities ○ Each of them handles a different modality: image, audio, text (i.e., symbol) ○ Due to the nature of data-driven approach (or less use of domain knowledge), the advance of deep learning has been naturally applied to a wide variety of domains that use image, audio and text as a input or output data form ○ Music is one of the domains that have benefited a lot from them ● Deep learning is representation learning ○ Transform a type of data onto a more meaningful vector space (i.e., feature space) ○ The vector spaces from different modalities of data are associated with each other by their correspondence

  23. Deep Learning for Music ● Modality-agnostic representation learning Audio Score (MIDI) Learning Model Text Image

  24. Objectives of This Course ● Understanding machine learning and deep learning ● Learning how to apply it to various tasks in the music domain ● Hands-on experiences with Python language and machine learning libraries through homework ● Gain experience of the full cycle of research through the final project

  25. Course Format ● This course is served as an 100% online format ● There are two types of online sessions ○ Pre-Recorded videos ■ Cover the lecture part Uploaded weekly in KLMS (YouTube link) ■ ■ Students must watch videos before the weekly Zoom meeting ○ Weekly Zoom meeting ■ Focus on review, interactive Q&A and hands-on practice ■ Thursday from Week #2: 2:30-3:45 PM (Hopefully, less than 60 minutes)

  26. Schedules ● Week 1 ○ Course introduction ● Week 2 ○ Audio data representations ● Week 3 ○ Machine learning review: supervised learning ● Week 4 ○ Machine learning review: unsupervised learning

  27. Schedules ● Week 5 ○ Chusuk (no class) ● Week 6 ○ Convolutional neural network (CNN): music classification and tagging ● Week 7 ○ Recurrent neural network (RNN): automatic music transcription ● Week 8 ○ Break (no class)

  28. Schedules ● Week 9 ○ Auto-encoder, U-net: source separation ● Week 10 ○ Variational auto-encoder (VAE), generative adversarial network (GAN): music generation and sound synthesis ● Week 11 ○ Auto-regressive models: music generation and sound synthesis ● Week 12 ○ Transformer: music transcription and generation

  29. Schedules ● Week 13 ○ Invited talk or advanced topics (TBD) ● Week 14 ○ Invited talk or advanced topics (TBD) ● Week 15 ○ TBD ● Week 16 ○ Final project presentations

  30. Pre-requisite ● Linear Algebra ● Probability and Statistics ● Basic understanding of machine learning and deep learning ● Digital Signal Processing: digital filters, discrete Fourier transform, and spectral analysis ● Programming Language: Python

  31. Software ● Audio processing: Librosa ● Machine learning and deep learning: Scikit-learn, PyTorch ● And more…

  32. Grading ● 4 assignments: 50% ● Final project (paper review, presentation and report): 50%

Recommend


More recommend