gct535 sound technology for multimedia tonal analysis
play

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outline Pitch Perception Perceptual Pitch Scale Log-Scaled Spectrum Tonal Analysis Chroma Feature Key


  1. GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Outline § Pitch Perception – Perceptual Pitch Scale – Log-Scaled Spectrum § Tonal Analysis – Chroma Feature – Key Estimation – Chord Recognition 2

  3. Frequency Scale in Spectrogram § Linear frequency scale – Great to see the harmonic structure of a single tone. – However, it is not the most intuitive way to visualize musical signals 4000 10000 3500 3000 8000 frequency − Hz 2500 frequency − Hz 6000 2000 1500 4000 1000 2000 500 0 0 0 1 2 3 4 5 6 7 8 10 20 30 40 50 time [second] time [second] Beatles “Hey Jude” Piano (Chromatic Scale) 3

  4. Human Pitch Perception § Human ears are sensitive to frequency changes in a log scale – Pitch resolution: just noticeable difference (JND) increases as the frequency goes up – Place theory: resonance position along the basilar membrane in cochlea From CCRMA Music 150 slides (Thomas Rossing) Response of the basilar membrane to a pair of tones 4

  5. Critical Bandwidth § Frequency bandwidth within which one tone interferes with the perception of another tone by auditory masking – Constant at low frequency but linear at high frequency 5 From CCRMA Music 150 slides (Thomas Rossing)

  6. Psychoacoustical Pitch Scales § Mel scale – Based on pitch ratio of tones (mel from 1 “melody”) 0.9 m = 2595log 10 (1 + f / 700) 0.8 0.7 normalized scales 0.6 § Bark scale 0.5 – Critical band measurement by masking 0.4 0.3 Bark = 13arctan(0.00075 f ) + 3.5arctan(( f / 7500) 2 ) 0.2 ERB 0.1 Mel § Equivalent Regular Bandwidth (EBR) rate Bark 0 0 0.5 1 1.5 2 2.5 frequency (Hz) 4 – Critical band measurement using the notched- x 10 Comparison of Pitch Scales noise method Using Matlab code from https://www.speech.kth.se/~giampi/auditoryscales/ ERBS = 21.4 ⋅ log 10 (1 + 0.00437 f ) 6

  7. Musical Pitch Scale § Equal temperament – 1: 2 1/12 ratio between two adjacent notes – Music note ( m ) and frequency ( f ) in Hz m = 12log 2 ( f ( m − 69) 440) + 69, f = 440 ⋅ 2 12 7 https://newt.phys.unsw.edu.au/jw/notes.html

  8. Frequency Mapping Using Spectrogram § Mapping linear scale to a perceptual (log-like) scale – Locate center frequencies according to the frequency mapping – Linear interpolation on the center frequency with the corresponding bandwidth skirt 4000 120 3500 100 3000 Band Center MIDI note number width Frequency 80 2500 frequency − Hz 2000 60 1500 40 1000 20 500 0 10 20 30 40 50 10 20 30 40 50 time [second] time [second] Log-Frequency Spectrogram Linear-Frequency Spectrogram 8

  9. Frequency Mapping Using Spectrogram § The mapping can be formed as matrix multiplication – Each column of the mapping matrix contain the interpolation coefficients Y = M ⋅ X ( M : mapping matrix, X : spectrogram, Y : scaled spectrogram) 4000 120 3500 20 100 3000 40 MIDI note number 2500 frequency − Hz 80 × = 60 2000 60 80 1500 40 1000 100 20 500 120 0 100 200 300 400 500 600 10 20 30 40 50 10 20 30 40 50 time [second] time [second] § Limitation – Simple but time frequency resolutions are still constrained on STFT 9

  10. Mel-Frequency Spectrogram § Mel scale is a popularly choice – Example: MFCC 250 10000 200 8000 frequency − Hz 150 Mel bin 6000 100 4000 50 2000 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 time [second] time [second] Linear-Frequency Spectrogram Mel-Frequency Spectrogram 10

  11. Constant-Q transform § Use a set of sinusoidal kernels with: – Logarithmically spaced frequencies – Constant Q = frequency/bandwidth 11

  12. Comparison of Different Time-Frequency Representations frequency frequency time time Spectrogram (short window) Spectrogram (long window) frequency frequency time time Constant-Q transform Mel Spectrogram 12

  13. Example of Constant-Q transform 320 120 300 280 100 260 MIDI note number 240 80 220 60 200 180 40 160 140 20 120 100 10 20 30 40 50 0 10 20 30 40 50 time [second] time [second] Log-Frequency Spectrogram (mapping) Log-Frequency Spectrogram (Constant-Q transform) 13

  14. Chord Recognition in MIR § Identifying chord progression of tonal music § It is a challenging task (even for human) – Chords are not explicit in music – Non-chord notes or passing notes – Key change and chromaticism: requires in-depth knowledge of music theory – In audio, multiple musical instruments are mixed • Relevant: harmonically arranged notes • Irrelevant: percussive sounds (but can help detecting chord changes) § What kind of audio features can be extracted to recognize chords in a robust way? 14

  15. Pitch Helix § The basic assumption in tonal harmony is that octave-distance notes belong to the same pitch class – No dissonance among them – As a result, there are “12 pitch class” § Shepard represented the octave equivalence with “pitch helix” – Chroma: represents the inherent circularity of pitch organization – Height: naturally increase and have one octave apart for one rotation Pitch Helix and Chroma (Shepard, 2001) 15

  16. Chroma § Chroma is independent of the height – Shepard tone: single pitch class in harmonics – Constant rising and falling https://vimeo.com/34749558 Shepard tone Optical illusion stairs § Chroma contains the relative distribution of pitch classes and pitch height is noisy variation in chord recognition – Thus, chroma is considered to be well-suited for analyzing harmony. 16

  17. Chroma Features § Chroma features are audio feature vectors that contain the chroma characteristics – Ideally, obtained by polyphonic note transcription but too expensive – In addition, as notes are more harmonized, separating polyphonic notes become harder § In practice, chroma features are obtained by projecting all time-frequency energy onto 12 pitch classes § Used for not only for chord recognition but also key estimation, segmentation, synchronization, cover-song detection 17

  18. Chroma Features: FFT-based approach § Compute spectrogram and mapping matrix – Convert frequency to music pitch scale and get the pitch class – Set one to the corresponding pitch class and, otherwise, set zero – Adjust non-zeros values such that low-frequency content have more weights 18

  19. Improvements § Blurring – Intrinsic problem with STFT – Solutions: find amplitude peaks and use them only § De-tuning – Notes can be deviated from reference tuning – Compute 36 bin chroma features: add two neighboring bins to each pitch class – Use only a peak value among the three bins per pitch class § Normalization – Divide the frame chroma features by the local maximum or mean to regularize the volume change 19

  20. Chroma Features: Filter-bank approach § Alternatively, a filter-bank can be used to get a log-scale time-frequency representation – Center frequencies are arranged over 88 piano notes – band widths are set to have constant-Q and robust to +/- 25 cent detune § The outputs that belong to the same pitch class are wrapped and summed. (Muller, 2011) 20

  21. Beat-Synchronous Chroma Features § Make chroma features homogeneous within a beat (Bartsch and Wakefield, 2001) (From Ellis’ slides) 21

  22. Key Estimation Overview § Estimate music key from music data – One of 24 keys: 12 pitch classes (C, C#, D, .., B) + major/minor § General Framework (Gomez, 2006) Chroma Similarity Average Key G major Features Measure Strength Key Template 22

  23. Key Template § Probe tone profile (Krumhansl and Kessler, 1982) – Relative stability or weight of tones – Listeners rated which tones best completed the first seven notes of a major scale. • For example, in C major key, C, D, E, F, G, A, B, … what? Probe Tone Profile - Relative Pitch Ranking 23

  24. Key Estimation § Similarity by cross-correlation between chroma features and templates § Find the key that produces the maximum correlation 24

  25. Chord Recognition § Estimate chords from music data – Typically, one of 24 keys: 12 pitch classes + major/minor – Often, diminish chords are added (36 chords) § General Framework Template Matching HMM, SVM Audio/ Decision Chords Chroma Transform Making Features Chord Template or Models 25

  26. Template-Based Approach § Use chord templates (Fujishima, 1999; Harte and Sandler, 2005) and find the best matches § Chord Templates (from Bello’s Slides) 26

  27. Template-Based Approach § Compute the cross-correlation between chroma features and chord templates and select chords that have maximum values (from Bello’s Slides) 27

  28. Limitations § Template approach is too straightforward – The binary templates are hard assignments § Temporal dependency of chords is not considered – The majority of tonal music have certain types of chord progression § The recognized chords are not smooth – Some post-processing (smoothing) is necessary 28

  29. Demo § Chordify: https://chordify.net 29

Recommend


More recommend