Machine Learning for Music: Intro Juhan Nam Definition of Machine - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Machine Learning for Music: Intro Juhan Nam

Definition of Machine Learning ● Tom M. Mitchell provided a widely accepted definition: “A computer program is said to learn from experience E with respect to ○ some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”

Definition of Machine Learning ● Tasks T Classification, Regression, Transcription, Machine Translation, Structured ○ output, Anomaly detection, Synthesis and Sampling, Imputation of missing values, Denoising, and Density Estimation (listed from the DL book) ● Experience E Data and their correspondence: supervised /unsupervised ○ learning/reinforcement learning ● Performance P Loss function, accuracy metrics ○

In Musical Context ● Tasks T Analysis tasks: music genre/mood classification, music-auto tagging, ○ automatic music transcription, source separation Synthesis tasks: sound synthesis, music generation (automatic music ○ composition or arrangement), expressive performance rendering ● Experience E Music data (audio, MIDI, text, images) and their correspondence ○ ● Performance P Objective measure: loss function, accuracy metrics (e.g., F-score) ○ Subjective measures: user test (i.e., human test) ○

Classification Tasks in Music ● Classification is the most commonly used supervised learning approach in music analysis tasks Train the model with audio data and its class labels and then predict labels ○ from new test audio “C2” Classification “C#2” Model “D2” … Pitch Estimation (frame-level)

Classification Tasks in Music ● Classification is the most commonly used supervised learning approach in many music analysis tasks Train the model with audio data and its class labels and then predict labels ○ from new test audio “Piano” Classification “Drum” Model “Guitar” … Instrument Recognition (note-level)

Classification Tasks in Music ● Classification is the most commonly used supervised learning approach in music analysis tasks Train the model with audio data and its class labels and then predict labels ○ from new test audio “Jazz” Classification “Metal” Model “Classical” … Genre Classification (segment-level)

Classification Model for Music ● The classification models are formed with the following steps in common Audio data representation: waveforms, spectrogram, mel-spectrogram ○ Feature extraction: highly depends on the tasks and the abstraction level ○ Higher-levels require longer input size and more complex features ■ Classifiers: measuring the distance between the feature vector and class ○ templates for the final classification “Class #1 ” Audio Data Feature “Class #2” Classifier Representation Extraction “Class #3” … Classification Model

Classification Model for Music ● It is important to extract good audio features! “Classical” “Classical” “Jazz” “Jazz” “Metal” “Metal” Feature Space Feature Space Good Features Bad Features

Classification Model for Music ● Traditional machine learning ● Deep learning

Traditional Machine Learning ● Use hand-designed features for the task Based on domain knowledge (e.g. acoustics, signal processing) ○ Mel-frequency cepstral coefficient (MFCC), chroma, spectral statistics ○ ● Use standard classifiers Logistic regression, support vector machine, multi-layer-perceptron ○ “Class #1 ” Audio Data Hand-designed “Class #2” Classifier Representation Features “Class #3” Learning algorithm … Classification Model

Traditional Machine Learning ● Advantages A small dataset is fine ○ The classifiers are fast to train ○ The hand-designed features are interpretable ○ ● Disadvantages Requires domain knowledge ○ The feature design is an art ○ The two-stage approach is sub-optimal ○ ● Good as a baseline algorithm

Deep Learning ● Learn feature representations using neural network modules Better to call it representation learning ○ Fully-connected, convolutional, recurrent, pooling, non-linear layers ○ Stack more layers as the output has a higher abstraction level ○ Audio data representation can be also learned (end-to-end learning) ○ Gradient-based learning: all neural network modules are differentiable. We ○ can also add a new custom layer as long as it is differentiable “Class #1 ” Neural Network Linear Audio Data “Class #2” Modules Classifier Representation “Class #3” Learning algorithm … Classification Model Learned features via feature embedding

Deep Learning ● Advantages Less domain knowledge required. We can borrow many successful models ○ from other domains (e.g. image or speech) The trained model is reusable (transfer learning) ○ Superior performance in numerous machine learning tasks ○

Deep Learning ● Disadvantages (or challenges) A large-scale labeled dataset and the models are slow to train ○ Semi-supervised/unsupervised/self-supervised learning are actively developed ■ Required regularization to avoid overfitting ○ Many regularization techniques have been studied ■ Designing neural nets and searching hyperparameter is an art ○ Model and hyperparameter optimization is another research topic: e.g., AutoML ■ Understanding learned features is hard ○ Feature visualization techniques ■ Disentangled learning models where one parameter controls one sub-dimension ■ of learned features

Example: Mel-Frequency Cepstral Coefficient (MFCC) ● Most popularly used audio feature to extract “timbre” Extract spectrum envelop from an audio frame: remove pitch information ○ Standard audio feature in the legacy speech recognition systems ○ ● Computation Steps Mel-spectrum: use a mel-filter bank ○ Discrete cosine transform (DCT): a small set of cosine kernels with low ○ frequencies. It captures slowly varying trend of mel-spectrum over frequency which correspond to the spectrum envelope abs log Mel DFT MFCC DCT (magnitude) compression Filterbank Magnitude Spectrum Mel-spectrum

Example: Mel-Frequency Cepstral Coefficient (MFCC) Mel DCT filterbank Frequency spectrum Magnitude spectrum MFCC (mel-scaled, 60 bins) (512 bins) (13 dim) Inverse Inverse DCT mel filterbank Reconstructed Reconstructed Magnitude spectrum Mel spectrum

Example: Mel-Frequency Cepstral Coefficient (MFCC) Spectrogram Mel-frequency Spectrogram MFCC Reconstructed Spectrogram from MFCC

Representation Learning Point of View: MFCC ● We can replace the hand-designed modules with the trainable modules DFT, Mel-filterbank and DCT is a linear transform ○ Abs and log compression is a non-linear function ○ The linear transforms are designed by hands in MFCC but they can be ○ optimized further using the trainable modules Abs Mel Log DFT DCT (magnitude) Filterbank compression MFCC Linear Non-linear Non-linear Linear Linear Transform function function Transform Transform Deep Neural Network

Example: Chroma ● Musical notes are denoted with a pitch class and an octave number Pitch class: C, C#, D, D#, E, F, F#, G, G#, A, A#, B ○ Octave number: 0, 1, 2, 3, 4, 5, … ○ Example: C4 (middle C), E3, G5 ○ ● The octave difference is the most consonant pitch interval Therefore, they belong to the same pitch class ○ ● This can be represented with “pitch helix” Chroma: inherent circularity of pitch organization ○ Height: naturally increase and have one octave above ○ for one rotation Pitch Helix and Chroma (Shepard, 2001)

Example: Chroma ● Compute the energy distribution of an audio frame on 12 pitch classes " Convert the frequency to a musical note (= 12log ! ##$ + 69 ) and take the ○ pitch class from the musical note (e.g. 69 à A4 à A) Extract harmonic characteristics while removing timbre information ○ Useful in music synchronization, chord recognition, music structure ○ analysis, music genre classification ● Computation Steps Projecting the DFT or Constant-Q transform onto 12 pitch classes ○ DFT or abs Chroma Chroma Constant-Q Transform (magnitude) Mapping

Example: Chroma Chroma Spectrogram Chroma mapping (Reconstructed Chroma: Shepard tone)

Representation Learning Point of View: Chroma ● We can replace the hand-designed modules with the trainable modules DFT, constant-Q transform, and chroma mapping are a linear transform ○ Abs correspond to a non-linear function ○ The linear transforms are designed by hands in chroma but they can be ○ optimized further using the trainable modules DFT or Abs Chroma Constant-Q Transform (magnitude) Mapping Chroma Linear Non-linear Linear Transform function Transform Deep Neural Network

Summary ● Introduce machine learning in the perspective of representation learning (or feature learning) ● In the traditional machine learning approach, we design feature representations by hands. Once the features are extracted, we use standard machine learning algorithms. ● In the deep learning approach, we design the network architecture by hands. The feature representations are learned through the neural network modules and the optimization

Machine Learning for Music: Intro Juhan Nam Definition of Machine - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Machine Learning for Music: Intro Juhan Nam Definition of Machine Learning Tom M. Mitchell provided a widely accepted definition: A computer program is said to learn from

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Applications of AI in music Smart music through machine learning Dorien Herremans ISTD,

Music Generation Using Machine Learning Seminar Computer Music SS 2017 Michael Krause RWTH

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Phase changes and bonding Chemistry, Life, the Universe &

Timing, Control, and Stability CPSC 599.86 / 601.86 Sonny Chan University of Calgary Todays

Status of Charmed Meson Spectroscopy Feng-Kun Guo Institute of Theoretical Physics, Chinese

Wave functions and compositeness for hadron resonances from the scattering amplitude Takayasu S

A complete declarative debugger for Maude System demonstration Adri an Riesco Alberto Verdejo

GC0097 Proposal Updates 14th November 2017 Pre-qualification Requirements System Operator

Building Competitive Advantage through Successful Training and Development Submitted in the

We considered age-related disguise as the intentional modification of the speaker's voice to

Sambuz

Useful Links

Newsletter

Mail Us

Machine Learning for Music: Intro Juhan Nam Definition of Machine - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Machine Learning for Music: Intro Juhan Nam Definition of Machine Learning Tom M. Mitchell provided a widely accepted definition: A computer program is said to learn from

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Applications of AI in music Smart music through machine learning Dorien Herremans ISTD,

Music Generation Using Machine Learning Seminar Computer Music SS 2017 Michael Krause RWTH

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Phase changes and bonding Chemistry, Life, the Universe &amp;

Timing, Control, and Stability CPSC 599.86 / 601.86 Sonny Chan University of Calgary Todays

Status of Charmed Meson Spectroscopy Feng-Kun Guo Institute of Theoretical Physics, Chinese

Wave functions and compositeness for hadron resonances from the scattering amplitude Takayasu S

A complete declarative debugger for Maude System demonstration Adri an Riesco Alberto Verdejo

GC0097 Proposal Updates 14th November 2017 Pre-qualification Requirements System Operator

Building Competitive Advantage through Successful Training and Development Submitted in the

We considered age-related disguise as the intentional modification of the speaker's voice to

Sambuz

Useful Links

Newsletter

Mail Us

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Phase changes and bonding Chemistry, Life, the Universe &