CTP431- Music and Audio Computing Spectral Analysis Graduate School of Culture Technology KAIST Juhan Nam 1
Outlines § Time-domain representation of sound – Waveform § Time-Frequency domain representation of sound – Discrete Fourier Transform (DFT) – Short-time Fourier Transform (STFT) 2
Waveform § Time-domain representation of sound – Show the amplitude over time § Amplitude envelope – Short-term loudness: e.g. sound level meter – Computed by various methods • max-peak picking • root-mean-square (RMS) – ADSR • The amplitude envelope of musical sounds are often described with attack, decay, sustain and release. – Also used for dynamic range compression: e.g. compressor, expander 3
Example: Waveform and Amplitude Envelopes Flute A4 Note Piano C4 Note 4
Spectrogram § Time/Frequency-domain representation of sound – Show the amplitude envelope of individual frequency components over time – Better representation to observe pitch and timbre characteristics – Often called “Sonogram” § Visualization – 2D color map or waterfall 5
Example: Spectrogram - 2D color map Piano C4 Note Flute A4 Note 6
Example: Spectrogram - 3D waterfall Piano C4 Note Flute A4 Note 7
Frequency-Domain Representation § Can we represent 𝑦 𝑜 with a finite set of sinusoids? $ %+$ % ∑ – 𝑦 𝑜 = 𝐵 𝑙 𝑠 * 𝑜 *,- 34*5 + ϕ(𝑙) ): discrete-time sinusoid with length N • 𝑠 * 𝑜 = cos ( % – Find 𝐵 𝑙 , ϕ(𝑙) 8
Euler’s identity § Euler’s identity 𝑓 =@ = cos𝜄 + 𝑘sin𝜄 – Can be proved by Taylor’s series – If 𝜄 = 𝜌 , 𝑓 =4 + 1 = 0 (“the most beautiful equation in math”) § Properties sin𝜄 = 𝑓 =@ − 𝑓 +=@ cos𝜄 = 𝑓 =@ + 𝑓 +=@ 2𝑘 2 9
Complex Sinusoids § Cosine and sine can be represented in a single term = cos 2𝜌𝑙𝑜 + 𝑘sin 2𝜌𝑙𝑜 𝑡 * 𝑜 = 𝑓 =34*5 % 𝑂 𝑂 34* * – Frequencies: % radian or G Hz ( 𝐺 G : the sampling rate) ( 𝐿 = 0, 1, 2, … , 𝑂 − 1) % 𝐺 – Example: N = 8 10 Figures are from https://ccrma.stanford.edu/~jos/dft/
Complex Sinusoids N=8 11 Figures are from https://ccrma.stanford.edu/~jos/dft/
Frequency-Domain Representation Using Complex Sinusoids § 𝑦 𝑜 is expressed in a simpler form: %+$ 𝑦 𝑜 = 1 𝑂 M 𝐵 𝑙 cos 2𝜌𝑙𝑜 + 𝜚(𝑙) 𝑂 *,- %+$ %+$ = 1 = 1 𝑂 M 𝐵 𝑙 (𝑓 =(34*5 OP * ) +𝑓 +=(34*5 𝑂 M(𝐵 𝑙 𝑓 =P(*) 𝑓 =34*5 + 𝐵 𝑙 𝑓 +=P(*) 𝑓 +=34*5 OP * ) )/2 % )/2 % % % *,- *,- %+$ %+$ = 1 = Real{1 𝑂 M(𝑌 𝑙 𝑓 =34*5 + 𝑌 𝑙 𝑓 +=34*5 𝑂 M 𝑌 𝑙 𝑓 =34*5 % )/2 } % % *,- *,- %+$ = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 𝑌 𝑙 = 𝐵(𝑙)𝑓 =X * = 𝐵 𝑙 cos ϕ 𝑙 + 𝑘 sin ϕ 𝑙 % *,- – Now, how can we find 𝑌 𝑙 ? 12
Orthogonality of Sinusoids § Inner product between two complex sinusoids %+$ ∗ 𝑜 = M 𝑓 =34Y5 Z 𝑓 +=34[5 𝑂 if 𝑞 = 𝑟 𝑡 Y 𝑜 Z 𝑡 [ = ] % % 0 otherwise 5,- # 0 otherwise % N − 1 N − 1 ∑ ∑ sin(2 π pn / N )sin(2 π qn / N )) = N / 2 if p = q cos(2 π pn / N )sin(2 π qn / N )) = 0 $ % n = 0 n = 0 − N / 2 if p = N − q % & # N − 1 N / 2 if p = q or p = N − q % ∑ cos(2 π pn / N )cos(2 π qn / N )) = $ 0 otherwise % & n = 0 13
Orthogonal Projection on Complex Sinusoids § Do the inner product with the signal and sinusoids %+$ %+$ %+$ = M(1 𝑦 𝑜 Z 𝑡 [ (𝑜) = M 𝑦 𝑜 𝑓 +=34[5 𝑂 M 𝑌 𝑙 𝑓 =34Y5 )𝑓 +=34[5 % % % 5,- 5,- Y,- %+$ %+$ = 1 = 1 𝑂 M 𝑌 𝑙 (M 𝑓 =34Y5 𝑓 +=34[5 % ) 𝑂 𝑌 𝑙 𝑂 = 𝑌 𝑙 = 𝐵 𝑙 𝑓 =X * % Y,- 5,- 14
� To Wrap Up § Discrete Fourier Transform %+$ 𝑌 𝑙 = M 𝑦 𝑜 𝑓 +=34*5 = 𝑌 e 𝑙 + 𝑘𝑌 f 𝑙 = 𝐵(𝑙) =X * % 5,- 3 𝑙 + 𝑌 f 3 𝑙 – Magnitude spectrum: 𝑌 𝑙 = 𝐵 𝑙 = 𝑌 e ∠𝑌 𝑙 = ϕ 𝑙 = tan +$ (𝑌 f (𝑙) – Phase spectrum: 𝑌 e (𝑙)) § Inverse Discrete Fourier Transform %+$ 𝑦(𝑜) = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 % *,- 15
Why we choose this set of frequencies in sinusoids? § Underlying assumption in DFT – The N samples are periodic – In the view of “Fourier Series”, a periodic signal with period N can be represented as sinusoids with period N, N/2, N/3 , … ( 1/N, 2/N, 3/N , ... in frequency ) 0.2 0.1 amplitude 0 − 0.1 − 0.2 0 20 40 60 80 time − seconds 0.2 0.1 amplitude 0 − 0.1 − 0.2 16 0 50 100 150 200 250 300 time − seconds
Properties of DFT § Periodicity – 𝑌 𝑙 = 𝑌 𝑙 + 𝑂 = 𝑌 𝑙 + 2𝑂 = … – 𝑌 𝑙 = 𝑌 𝑙 − 𝑂 = 𝑌 𝑙 − 2𝑂 = … § Symmetry – Magnitude response: 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 = 𝑌 −𝑙 – Phase response : ∠𝑌 𝑙 = −∠𝑌 −𝑙 = −∠𝑌 𝑂 − 𝑙 – We often display only half the amplitude and phase responses 17
Properties of DFT Waveform Waveform 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Magnitude (N=32) Magnitude (N=32) 15 15 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 𝑌 𝑙 = 𝑌 −𝑙 10 10 5 5 0 0 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 Phase (N=32) Phase (N=32) 4 4 ∠𝑌 𝑙 = −∠𝑌 𝑂 − 𝑙 ∠𝑌 𝑙 = −∠𝑌 −𝑙 2 2 0 0 -2 -2 -4 -4 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 18
Frequency Scales 𝑙 = 0, 1, … , 𝑂 corresponds to frequency values that are evenly § 𝑌 𝑙 distributed between 0 and 𝑔𝑡 in Hz N -N -N/2 0 N/2 f s -f s 0 f s /2 -f s /2 19
Examples of DFT 150 0.5 magnitude amplitude 100 0 50 − 0.5 0 5 10 15 20 25 30 35 40 45 50 0 500 1000 1500 2000 2500 3000 3500 4000 time − milliseconds freqeuncy Sine: waveform Sine: spectrum 15 0.5 magnitude amplitude 10 0 5 − 0.5 0 0 20 40 60 80 100 120 140 160 0 0.5 1 1.5 2 2.5 time − milliseconds freqeuncy 4 x 10 Drum: waveform Drum: spectrum 0.4 40 0.2 30 magnitude amplitude 0 20 − 0.2 10 − 0.4 0 0 0.5 1 1.5 2 2.5 50 52 54 56 58 60 time − milliseconds freqeuncy 4 x 10 Flute: waveform 20 Flute: spectrum
Fast Fourier Transform (FFT) § Matrix multiplication view of DFT 1 1 ⋯ 1 1 𝑌(0) 𝑦(0) %+$ 𝑋 3 𝑋 𝑋 ⋯ % 𝑌(1) 𝑦(1) 1 % % 3(%+$) o 𝑋 𝑋 ⋯ 3 𝑌(2) 1 𝑋 𝑦(2) % % % ⋯ p m(%+$) = 1 𝑌(3) m 𝑋 𝑦(3) 𝑋 𝑋 % % % ⋯ ⋮ ⋮ ⋮ ⋮ ⋮ ⋯ 𝑌(𝑂 − 2) 𝑦(𝑂 − 2) ⋯ 3(%+$) 1 𝑌(𝑂 − 1) %+$ 𝑦(𝑂 − 1) 𝑋 𝑋 (%+$)(%+$) 𝑋 % % % § In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)” – Complexity reduction by FFT: O( N 2 ) à O( N log 2 N ) – Divide and conquer 21
Short-Time Fourier Transform (STFT) § DFT assumes that the signal is stationary – It is not a good idea to apply DFT to a long and dynamically changing signal like music – Instead, we segment the signal and apply DFT separately § Short-Time Fourier Transform : hop size ℎ %+$ 𝑌(𝑙, 𝑚) = M 𝑥(𝑜)𝑦(𝑜 + 𝑚 Z ℎ)𝑓 += 34*5 𝑥(𝑜) : window % : FFT size 𝑂 5,- § This produces 2-D time-frequency representations – Get “spectrogram” from the magnitude – Parameters: window size, window type, FFT size, hop size 22
Windowing § Types of window functions – Trade-off between the width of main-lobe and the level of side-lobe Main-lobe width Side-lobe level 23
Short-Time Fourier Transform (STFT) Source: the JOS SASP book 50% overlap 24
Example: Pop Music 25
Example: Deep Note 26
Time-Frequency Resolutions in STFT § Trade-off between time- and frequency-resolution by window size < Short window > < Long window > low freq.-resolution high freq.-resolution high time-resolution low time-resolution 27
Recommend
More recommend