music classification overview and audio features
play

Music Classification Overview and Audio Features Graduate School of - PowerPoint PPT Presentation

GCT634: Musical Applications of Machine Learning Music Classification Overview and Audio Features Graduate School of Culture Technology, KAIST Juhan Nam Outlines Definition of Music Classification Tasks Overview of Music Classification


  1. GCT634: Musical Applications of Machine Learning Music Classification Overview and Audio Features Graduate School of Culture Technology, KAIST Juhan Nam

  2. Outlines • Definition of Music Classification Tasks • Overview of Music Classification Systems • Audio Features

  3. Definition • Categorizing input audio into labels - Labels can be anything, even including note, chord or beat notations - However, we limit them to semantic words such as genre, mood, instrument, era and other word-based descriptions Model Input Output

  4. Types of Music Classification Tasks • Genre/Mood classification - Classify music clips into a category - Single-label classification • Instrument Identification - Can be recast as a classification problem - Polyphonic cases: pre-dominant instrument detection (single-label classification) or multiple instrument detection (multiple-label classification) • Music Auto-Tagging - Labels can be anything (e.g. genre, mood, instrument, era, vocal quality) - Multi-label classification

  5. Music Genre • Numerous genres and their sub-genres - http://research.google.com/bigpicture/music/ - http://en.wikipedia.org/wiki/List_of_popular_music_genres • Evolutionary and influence-based - https://frananddavesmusicaladventure.wordpress.com/the-music-tree/ - http://www.historyshots.com/rockmusic/ - http://techno.org/electronic-music-guide/ • Based on cultural context - Many cultural communities (or countries with homogenous culture) have different genre distributions - Unique genres (e.g. trot) and different popularity (e.g. metal)

  6. Genre Categories in MIREX • MIREX ( M usic I nformation R etrieval E valuation e X change) - Community-based algorithm evaluation framework and events US Pop Genre Latin Genre K-pop Classification Classification Classification Blues Axe Ballad Jazz Bachata Dance Country/Western Bolero Folk Baroque Forro Hip-hop Classical Gaucha R&B Romantic Merengue Rock Electronica Pagode Trot Hip-Hop Salsa Rock Sertaneja HardRock/Metal Tango http://www.music-ir.org/mirex/wiki/2017:Audio_Classification_(Train/Test)_Tasks

  7. Music Mood Models in Music Psychology 2/2 • Russel’s circumplex model of affect - “Arousal-Valence” 2D space � Dimensional � Russell’s circumplex model Russell, J. A. 1980. A circumplex model of affect. Social , 39: 1161‐ 1178. 10/10/2012 27

  8. Mood Label Clustering Music Mood Mood labels for albums Mood labels for songs • Mood clustering - Using mood labels for songs ( “allmusic.com”) - Song by mood matrix à mood by mood correlation matrix à clustering C4 C5 C3 C2 C1 10/10/2012 30 Hu, X., & Downie, J. S. (2007). Exploring Mood Metadata: Relationships with (Hu, 2007) 10/10/2012 Genre, Artist and Usage Metadata. In 31

  9. Mood Categories in MIREX • The five clusters are used in the MIREX mood classification task Mood Classification Cluster_1: passionate, rousing, confident, boisterous, rowdy Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry Cluster_5: aggressive, fiery, tense/anxious, intense, volatile, visceral http://www.music-ir.org/mirex/wiki/2017:Audio_Classification_(Train/Test)_Tasks

  10. Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal”

  11. Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal” • Audio representations - Low-level representation of audio - Preserve the majority of information in input data - e.g. waveform, spectrogram, mel-spectrogram

  12. Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal” • Feature extraction - Summary of acoustic or musical patterns that explain the characteristics of the audio representations - e.g. MFCC, chroma, learning-based feature represenations

  13. Overview of Music Classification Systems Audio Feature Classifier Representations Extraction “Classical” “Jazz” (?) “Metal” • Classifiers - Determine the category based on the extracted features - A learning algorithm is necessary: e.g. SVM, GMM, NN - Training and Testing

  14. It is important to extract good audio features! Audio Feature Classifier Representations Extraction “Classical” “Classical” “Jazz” “Jazz” “Metal” “Metal” Feature Space Feature Space Bad Features Good Features

  15. Let’s listen to examples • What the genre of the music ? • What the mood of the music ? • What are the features of the music that explain your answers?

  16. Human Knowledge to Explain Music • Acoustic Level - Loudness - Pitch - Timbre • Musical Level - Instrumentation - Rhythm - Key and scale - Chord and melodic pattern - Lyrics, structure, singing style, …

  17. Two Approaches in Music Classification Audio Feature Classifier Representations Extraction • Feature engineering - Features are designed based on domain knowledge and heuristics - Traditional approach: e.g. MFCC+GMM model • Feature learning - Features are learned using optimization algorithms - Recent approach: e.g. deep neural networks Let’s focus on the feature engineering approach first!

  18. Feature Engineering Model • Feature extraction is divided into several steps Frame-Level Temporal Normalization Audio Features Summarization (G. Tzanetakis)

  19. (Frame-Level) Audio Features • Loudness - Root-Mean-Squares (RMS) of audio frames • Timbre features - Zero-crossing rate - MFCC (w/ delta or double-delta): spectral envelop - Spectral summary: centroid, roll-off, … • Pitch/Harmony features - Chroma • Rhythm features (this is not frame-level) - Beat histogram, Tempogram

  20. Zero-Crossing Rate (ZCR) • ZCR is low for harmonic (voiced) sounds and high for noisy (unvoiced) sounds - Useful to classify different drum sounds (e.g. bass, snare, high-hat) • For narrow-band periodic signals, it is related to the F0 Unvoiced Voiced

  21. Spectral Statistics • Spectral Centroid: “Center of gravity” of the spectrum - Associated with the brightness of sounds ∑ f k X t ( k ) SC ( t ) = k ∑ X t ( k ) k • Spectral Roll-off: frequency under which 85% or 95% of spectral energy is concentrated in R t N ∑ ∑ X t ( k ) = 0.85 X t ( k ) k k

  22. Examples of Spectral Centroids 10000 10000 9000 9000 8000 8000 7000 7000 frequency [Hz] frequency [Hz] 6000 6000 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 time [sec] time [sec] Classical: “Beethoven String Quartet” Pop: “Video killed the radio star”

  23. Spectral Statistics • Spectral Spread(SS): a measure of the bandwidth of the spectrum ( f k − SC ( t )) 2 X t ( k ) ∑ SS ( t ) = k ∑ X t ( k ) k • Spectral flatness (SF): a measure of the noisiness of the spectrum - The ratio between the geometric and arithmetic means ∏ X t ( k ) K k SF ( t ) = 1 ∑ X t ( k ) K k

  24. Mel-Frequency Cepstral Coefficient (MFCC) • Most popularly used audio feature for timbre feature extraction - Extract spectral envelop from an audio frame - Standard audio feature in speech recognition - Introduced in music domain by Logan in 2000 • Computation Steps DFT Mapping freq. Log DCT (audio frame) scale to mel magnitude

  25. Mel-Frequency Spectrogram • Convert linear frequency to mel scale • Usually reduce the dimensionality of spectrum Spectrum Spectrum (mel-scaled)

  26. Discrete Cosine Transform • Real-valued transform: similar to DFT - De-correlate the mel-scaled log spectrum and reduce the dimensionality again N − 1 2 x ( n )cos( π k ∑ X DCT ( k ) = N ( n − 0.5)) N n = 1 MFCC Spectrum (mel-scaled)

  27. Reconstructed Frequency Spectrum from MFCC Frequency spectrum Frequency spectrum MFCC (mel-scaled, 60 bins) (13 dim) (512 bins) Reconstructed Reconstructed Frequency spectrum Frequency Spectrum (mel-scaled)

  28. Comparison of Spectrogram and MFCC Spectrogram Mel-frequency Spectrogram MFCC Reconstructed Spectrogram from MFCC

  29. Sound Examples of MFCC • Original: • MFCC reconstruction (using white-noise as a source):

  30. Post-processing • Adding temporal dynamics - Short-term dynamics of features are characterized with delta or double- delta Δ x = x ( n ) − x ( n − h ) ΔΔ x = Δ x ( n ) − Δ x ( n − h ) h h - 39 MFCCs in speech recognition: 13 MFCCs + 13 delta + 13 double-delta

  31. Pitch and Chroma • The basic assumption in tonal harmony is that octave-distance notes belong to the same pitch class - No dissonance among them - As a result, there are “12 pitch class” • Shepard represented the octave equivalence with “pitch helix” - Chroma: represents the inherent circularity of pitch organization - Height: naturally increase and have one octave apart for one rotation Pitch Helix and Chroma (Shepard, 2001)

  32. Pitch and Chroma • Chroma is independent of the height - Shepard tone: single pitch class in harmonics - Constant rising and falling Optical illusion stairs Shepard tone https://vimeo.com/34749558

Recommend


More recommend