GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology KAIST Juhan Nam 1
Outlines § Temporal Analysis – Introduction – Human perception of Tempo § Onset detection – Definition – Onset Detection Functions § Tempo Estimation – Beat histogram – Auto-correlation – Comb-Filter banks § Applications 2
Introduction § Rhythm – A strong, regular, repeated pattern of movement or sound. § The most primitive and foundational element of music – Melody, harmony, musical forms and other musical elements are arranged on the basis of rhythm § Human and rhythm – Human has innate ability of rhythm perception: heart beat, walking – Associated with motor control: dance, labor song 3
Rhythm Analysis § Hierarchical structure of rhythm (meter) – Division (tatum): temporal atom, eighth or sixteenth – Beat (tactus): the most prominent level, foot tapping rate – Measure (bar): the unit of rhythm pattern (and also harmonic changes) § Notations – Tempo: speed rate of beat, e.g. 90 bpm (beats per minute) – Time signature: 4/4 , 3/4, 6/8, ... [Wikipedia] 4
Human Perception of Tempo § Mckinney and Moelant (2006) – Collect tapping data from 40 human subjects – Initial synchronization delay and anticipation (by tempo estimation) – Ambiguity in tempo: beat or its division ? [From D. Ellis’ e4896 course slides] 5
Rhythm Analysis in MIR § A process of detecting moments of musical stress (accents) in an acoustic signal and filtering them so that underlying periodicities are discovered. § Onset Detection Onset Tempo Beat Detection Estimation Tracking § Tempo Estimation Musical § Beat Tracking Knowledge (Prior) 6
Onset Detection § Identify the starting times of musical events – Notes, drum sounds [M.Muller] § Types of onsets – Hard onsets: percussive sounds – Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed strings) 7
Example: Onset Detection 1 0.5 amplitude 0 “Eat ( 꺼내먹어요 ) ” − 0.5 Zion.T − 1 0 1 2 3 4 5 6 ? time [sec] 8
Onset Detection § Onset Detection Function (ODF) – Instantaneous measure of temporal change in signals – Often called “novelty” function § Types of ODFs – Time-domain energy – Spectral or sub-band energy – Phase difference – Statistical methods 9
Time-Domain Onset Detection § Local energy – Usually have high energy at onsets – Effective for percussive sounds / 𝑦 𝑜 + 𝑛 𝑥(𝑛) . 𝑃𝐸𝐺(𝑜) = 𝐹 𝑜 = ) 𝑥(𝑛) : window 012/ 1 20 0.5 15 amplitude ODF 0 10 − 0.5 5 − 1 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 time [sec] time [sec] Waveform Onset Detection Function 10
Time-Domain Onset Detection § Local energy with half-wave rectification – Interested in increasing energy for onsets – Take positive differences of the local energy 𝐼 𝑠 = 𝑠 + 𝑠 = 6𝑠, 𝑠 ≥ 0 𝑃𝐸𝐺(𝑜) = 𝐼(𝐹 𝑜 + 1 − 𝐹 𝑜 ) 0, 𝑠 < 0 2 20 10 8 15 6 ODF ODF 10 4 5 2 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 time [sec] time [sec] 11
Time-Domain Onset Detection § Positive differences of log-energy – Human perception of sound intensity is logarithmic – Note that we often add an small value before taking the log 𝑃𝐸𝐺(𝑜) = 𝐼(log (𝐹 𝑜 + 1 ) − log (𝐹 𝑜 )) 10 8 8 6 6 ODF ODF 4 4 2 2 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 time [sec] time [sec] 12
Spectral-Based Onset Detection 4 x 10 2 § Spectral Flux – Sum of the positive differences from log 1.5 spectrogram frequency − kHz – ODF changes depending on the amount of 1 compression 𝜍 0.5 𝑍 𝑜, 𝑙 = log 1 + 𝜍 𝑌 𝑜, 𝑙 𝑌 𝑜, 𝑙 : STFT 0 1 2 3 4 5 time [sec] /2D 400 𝑃𝐸𝐺(𝑜) = ) 𝐼(𝑍 𝑜 + 1, 𝑙 − 𝑍 𝑜, 𝑙 ) 300 E1F ODF 200 100 0 0 1 2 3 4 5 time [sec] 13
Phase Deviation § Sinusoidal components of a note is continuous while the note is sustained – Abrupt change in phase means that there may be a new event [From D. Ellis’ e4896 course slides] ϕ k ( n ) − ϕ k ( n − 1) ≈ ϕ k ( n − 1) − ϕ k ( n − 2) Phase continuation (e.g. during sustain of a single note) Δ ϕ k ( n ) = ϕ k ( n ) − 2 ϕ k ( n − 1) + ϕ k ( n − 2) ≈ 0 N ζ p = 1 Deviation from the steady-state ∑ Δ ϕ k ( n ) for all frequency bins N k = 1 14
Post-Processing § DC removal – Subtract the mean of ODF § Normalization – Scaling level of ODF § Low-pass filtering – Remove small peaks § Down-sampling – For data reduction Low-pass Filtering (Solid line) (Tzanetakis, 2010) 15
Determining the Onsets § Peak Detection – Peaks above thresholds are determined as onsets – The thresholds are often adaptively computed from the ODF – Averaging and median are popular choices to compute the thresholds threshold = α + β ⋅ median( ODF ) α :offset, β :scaling 350 ODF 300 Threshold 250 200 ODF 150 100 50 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time [sec] Median with window size 5 16
References § J. Bello, “A Tutorial On Onset Detection in Musical Signals”, 2005 § S. Dixon, “Onset Detection Revisited”, 2006 § S. Bock et. al., “Evaluating the Online Capabilities of Onset Detection Methods”, 2012 § S. Bock et. al., “Maximum Filter Vibrato Suppression For Onset Detection”, 2013 17
Tempo Estimation § Estimate a regular time interval between beats – Tempo is a global attribute of a song – However, tempo often changes within a song • Intentionally: e.g. dramatic effect: Top 10 tempo changes • Unintentionally: e.g. re-mastering, live performance – There are also local changes in the regularity: e.g. rubato 18
Tempo estimation methods § Auto-Correlation – Find the periodicity as used in pitch detection § Discrete Fourier Transform – use DFT over ODF and find the periodicity § Comb-filter Banks – Leverage the “oscillating nature” of musical beats 19
Auto-Correlation § ACF is a generic method to detect periodicity of a signal – Thus, this can be applied to ODF to find a dominant period that may correspond to tempo – The ACF shows the dominant peaks that indicate dominant tempi 5 3 x 10 400 300 2 ODF ODF 200 1 100 0 0 − 1 0 1 2 3 4 5 0 1 2 3 4 5 time [sec] time [sec] Onset Detection Function (spectral flux) Auto-Correlation 20
Tempo Estimation using Tempo Prior § Tempo is estimated by multiplying the prior with the auto-correlation (observation) – In a Bayesian sense, it is like a posterior. – Tempo prior can be calculated from beat annotations of a dataset • The distribution fits to a log-normal distribution well Histogram of beats from a dataset (Klapuri, 2003) [From D. Ellis’ e4896 course slides] 21
Beat Histogram § Discrete wavelet transform as a sub-band approach § Full-wave rectification to extract envelope § Picked up three highest peaks of the auto-correlation in an appropriate range (40-200 bpm) and accumulate them over segments. (Tzanetakis, 2002) 22
Example of Beat Histogram (Tzanetakis, 2002) 23
Beat Spectrum § Leverage the repetitive nature of music § Compute cosine distances between two frames of magnitude responses D C ( i , j ) = v i • v j v i v j § Visualize all pairs as a 2-D matrix S – The matrix in the left shows 34 notes in the piece § Beat spectrum is derived by summing the matrix S on the diagonal ∑ B ( l ) = S ( k , k + l ) k ∈ R (Foote, 2001) 24
Beat Spectrum § A more robust version can be obtained from the auto-correlation of the matrix S ∑ B ( k , l ) = S ( i , j ) ⋅ S ( i + k , j + l ) i , j § The final beat spectrum is derived by summing over one variable – The left plot shows five beats and a triplet within a beat. § “Beat spectrogram” can be also obtained by successive beat spectra (Foote, 2001) 25
Tempogram § Compute ODF from the half-wave rectified spectral flux § Compute “Predominant Local Periodicity (PLP)” – Obtain the frequency and phase that provide the maximum magnitude for the ODF – Form a local sinusoidal kernel k ( m ) = w ( m − n )cos(2 π ( ˆ wm − ˆ ϕ )) – Accumulate the successive local sinusoidal kernels to form a PLP curve (Grosche, 2009) 26
Tempogram § Take DFT or ACF over ODF – Generate Fourier Tempogram or Auto- correlation Tempogram § Cyclic Tempogram – Accumulate the tempogram for integer multiples of a tempo (up to four octaves) (Grosche, 2011) 27
Comb-Filter Banks § Also called resonant filter banks – Comb-Filter equation y ( t ) = α y ( t − τ ) + (1 − α ) x ( t ) – Compute this for the delay τ § Builds up rhythmic evidences (by anticipation?) (Klapuri, 2006) 28
Sub-band Filter Banks § A sub-band filter bank as a front-end processing § Parallel ODFs for 6 bands § 150 resonators for each band and all possible tempo values (60 - 240 bpm) § Pick up the delay that provides the highest peak as a tempo – Beat tracking is possible directly from the result. – This is the advantage of the resonant filter bank approach (Scheirer, 1998) 29
Recommend
More recommend