GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1
Outlines § Timbre Analysis – Definition of Timbre – Timbre Features • Zero-crossing rate • Spectral summary features • Mel-Frequency Cepstral Coefficient (MFCC) 2
What is timbre? § Definition – Attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar (ANSI) – Tone color or quality that defines a particular sound § Associated with classifying or identifying sound sources – Class: piano, guitar, singing voice, engine sound – Identity: Steinway Model D, Fender Stratocaster, Michael Jackson, Harley Davisson § Also used to holistically describe polyphonic sounds – For example, music or environmental sounds – Associated with genre, mood or other high-level descriptions 3
What is timbre? § Timbre is a very vague concept – There is no single quantitative scale like loudness or pitch. – There are actually multiple attributes. § Different aspects of the multiplicity – Acoustic attributes: temporal or spectral factors – Timber space: perceptual similarity/dissimilarity – Semantic attributes: textual descriptions 4
Acoustic Attributes in Timbre Perception § Acoustic Attributes (Schouten, 1968) – Harmonicity: the range between tonal and noise-like character – Time envelope (ADSR) – Spectral envelope – Changes of spectral envelope and fundamental frequency – The onset of a sound differing notably from the sustained vibration ADSR Changes of spectral envelope 5
Acoustic Attributes in Timbre Perception § Sound design problem? 6
Timbre Space § Perceptual multi-dimensional attributes based on measuring similarity – Ask human to listen a pair of sounds and judge the degree of similarity as a score – The similarity matrix is processed using multi- dimensional scaling (MDS), a dimensionality reduction algorithm which determines the timbre space § Acoustic correlation with the three (reduced) dimensions – Spectral energy distribution – Attack and decay time – Amount of inharmonic sound in the attack (Grey, 1977) 7
Semantic attributes § Verbally describe different characteristics of timbre using words Dull______|______Sharp Dull______|______Brilliant Compact______|______Scattered Cold______|______Warm Full______|______Empty Pure______|______Rich Colorful______|______Colorless (Pratt and Doak, 1976) (von Bismark, 1974) (T. Rossing’s music150 slides) 8
Timbre Feature Extraction § Extracting acoustic features from signals § Low-level Acoustic Features – Zero-crossing rates – Spectral summaries – Spectral envelope: MFCC 9
Zero-Crossing Rate (ZCR) § ZCR is low for harmonic (voiced) sounds and high for noisy (unvoiced) sounds § For simple periodic signals, it is related to the F0 Voiced Unvoiced 10
Spectral Summary Features § Spectral Centroid: “Center of gravity” of the spectrum – Associated with the brightness of sounds ∑ f k X t ( k ) k SC ( t ) = ∑ X t ( k ) k § Spectral Roll-off: frequency under which 85% or 95% of spectral energy is concentrated in R t N ∑ ∑ X t ( k ) = 0.85 X t ( k ) k k 11
Spectral Summary Features § Spectral Spread(SS): a measure of the bandwidth of the spectrum ( f k − SC ( t )) 2 X t ( k ) ∑ SS ( t ) = k ∑ X t ( k ) k § Spectral flatness (SF): a measure of the noisiness of the spectrum – The ratio between the geometric and arithmetic means – Examples: white noise à 1, pure tone à 0 ∏ X t ( k ) K k SF ( t ) = 1 ∑ X t ( k ) K k 12
Examples of Spectral Centroids 10000 10000 9000 9000 8000 8000 7000 7000 frequency [Hz] frequency [Hz] 6000 6000 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 time [sec] time [sec] Classical: “Beethoven String Quartet” Pop: “Video killed the radio star” 13
Mel-Frequency Cepstral Coefficient (MFCC) § Most popularly used audio feature that extracts spectral envelop from an audio frame – Standard audio feature in speech recognition – Introduced in music domain by Logan in 2000 § Computation Steps DFT Mapping freq. Log DCT (audio frame) scale to mel magnitude 14
Mel-Frequency Spectrogram § Convert linear frequency to mel scale § Usually reduce the dimensionality of spectrum Spectrum (mel-scaled) Spectrum 15
Discrete Cosine Transform § Real-valued transform: similar to DFT – De-correlate the mel-scaled log spectrum and reduce the dimensionality again N − 1 2 x ( n )cos( π k ∑ X DCT ( k ) = N ( n − 0.5)) N n = 1 Spectrum (mel-scaled) MFCC 16
Reconstructed Frequency Spectrum from MFCC Frequency spectrum MFCC Frequency spectrum (mel-scaled, 60 bins) (512 bins) (13 dim) Reconstructed Reconstructed Frequency spectrum Frequency Spectrum (mel-scaled) 17
Comparison of Spectrogram and MFCC Spectrogram Mel-frequency Spectrogram MFCC Reconstructed Spectrogram from MFCC 18
Sound Examples of MFCC § Original: § MFCC reconstruction (using white-noise as a source): 19
Post-processing § Adding temporal dynamics – Short-term dynamics of features are characterized with delta or double-delta Δ x = x ( n ) − x ( n − h ) ΔΔ x = Δ x ( n ) − Δ x ( n − h ) h h – 39 MFCCs in speech recognition: 13 MFCCs + 13 delta + 13 double-delta § Normalization – Cepstral Mean Subtraction (CMS): subtract the mean over surrounding frames – Standardization: subtract the mean and divide by the variance 20
Applications § Music – Musical Instrument classification – Music genre/mood classification – Similarity-based audio retrieval § Speech – Speech recognition – Speaker recognition 21
References § J. Grey, “Multidimensional Perceptual Scaling of musical timbre”, 1977 § D. Wessel, “Timbre Space as a musical control structure”, 1979 § S. Donnadieu, “Mental Representation of the Timbre of Complex Sounds”, book chapter (ch. 8) in “Analysis, Synthesis and Perception of Musical sounds”, ed. J. Beauchamp, 2007 § B. Logan, “Mel Frequency Cepstral Coefficients for Music Modeling”, 2000 22
Recommend
More recommend