GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of Culture Technology KAIST Juhan Nam 1
Waveforms § The basic audio representation that computers can take – x(n) = [a1, a2, a3, ...] § Great to observe energy change over time but less intuitive to observe pitch or timbre characteristics § Better representations than this? 2
Sound Generation § Mass-Spring Model – Simple harmonic motion 1 0.5 Restoration force Inertial force k 0 F = − kx = m d 2 x − 0.5 x m dt 2 − 1 0 1 2 3 4 5 6 7 8 − 3 x 10 T = 1 – This generates a sinusoidal oscillation f angular frequency k / m ω = x = A sin( ω t ) = A sin(2 π ft ) f = ω / 2 π frequency period T = 1/ f – Practical models have dampers that make the oscillation decay over time 3
Sound Generation § Any oscillatory object can be modeled as a complex network of masses and springs – This generates a complex tone . . . § Generation steps (e.g. guitar) – Excitation: wideband energy – Propagation on the string – Reflection on the ends – Superposition with reflected waves – Standing waves: constructive superposition – Radiation from the object – Propagation through air Demos http://www.acs.psu.edu/drussell/demos.html String oscillation 4
Frequency-Domain Representation § Can we represent 𝑦 𝑜 with a finite set of sinusoids? $ %+$ % ∑ – 𝑦 𝑜 = 𝐵 𝑙 𝑠 * 𝑜 *,- 34*5 + ϕ(𝑙) ): discrete-time sinusoid with length N • 𝑠 * 𝑜 = cos ( % – Find 𝐵 𝑙 , ϕ(𝑙) 5
Euler’s identity § Euler’s identity 𝑓 =@ = cos𝜄 + 𝑘sin𝜄 – Can be proved by Taylor’s series – If 𝜄 = 𝜌 , 𝑓 =4 + 1 = 0 (“the most beautiful equation in math”) § Properties sin𝜄 = 𝑓 =@ − 𝑓 +=@ cos𝜄 = 𝑓 =@ + 𝑓 +=@ 2𝑘 2 6
Complex Sinusoids § Cosine and sine can be represented in a single term = cos 2𝜌𝑙𝑜 + 𝑘sin 2𝜌𝑙𝑜 𝑡 * 𝑜 = 𝑓 =34*5 % 𝑂 𝑂 34* * – Frequencies: % radian or G Hz ( 𝐺 G : the sampling rate) ( 𝐿 = 0, 1, 2, … , 𝑂 − 1) % 𝐺 – Example: N = 8 7 Figures are from https://ccrma.stanford.edu/~jos/dft/
Complex Sinusoids N=8 8 Figures are from https://ccrma.stanford.edu/~jos/dft/
Frequency-Domain Representation Using Complex Sinusoids § 𝑦 𝑜 is expressed in a simpler form: %+$ 𝑦 𝑜 = 1 𝑂 M 𝐵 𝑙 cos 2𝜌𝑙𝑜 + 𝜚(𝑙) 𝑂 *,- %+$ %+$ = 1 = 1 𝑂 M 𝐵 𝑙 (𝑓 =(34*5 OP * ) +𝑓 +=(34*5 𝑂 M(𝐵 𝑙 𝑓 =P(*) 𝑓 =34*5 + 𝐵 𝑙 𝑓 +=P(*) 𝑓 +=34*5 OP * ) )/2 % )/2 % % % *,- *,- %+$ %+$ = 1 = Real{1 𝑂 M(𝑌 𝑙 𝑓 =34*5 + 𝑌 𝑙 𝑓 +=34*5 𝑂 M 𝑌 𝑙 𝑓 =34*5 % )/2 } % % *,- *,- %+$ = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 𝑌 𝑙 = 𝐵(𝑙)𝑓 =X * = 𝐵 𝑙 cos ϕ 𝑙 + 𝑘 sin ϕ 𝑙 % *,- – Now, how can we find 𝑌 𝑙 ? 9
Orthogonality of Sinusoids § Inner product between two complex sinusoids %+$ ∗ 𝑜 = M 𝑓 =34Y5 Z 𝑓 +=34[5 𝑂 if 𝑞 = 𝑟 𝑡 Y 𝑜 Z 𝑡 [ = ] % % 0 otherwise 5,- # 0 otherwise % N − 1 N − 1 ∑ ∑ sin(2 π pn / N )sin(2 π qn / N )) = N / 2 if p = q cos(2 π pn / N )sin(2 π qn / N )) = 0 $ % n = 0 n = 0 − N / 2 if p = N − q % & # N − 1 N / 2 if p = q or p = N − q % ∑ cos(2 π pn / N )cos(2 π qn / N )) = $ 0 otherwise % & n = 0 10
Orthogonal Projection on Complex Sinusoids § Do the inner product with the signal and sinusoids %+$ %+$ %+$ = M(1 𝑦 𝑜 Z 𝑡 * (𝑜) = M 𝑦 𝑜 𝑓 +=34[5 𝑂 M 𝑌 𝑙 𝑓 =34Y5 )𝑓 +=34[5 % % % 5,- 5,- Y,- %+$ %+$ = 1 = 1 𝑂 M 𝑌 𝑙 (M 𝑓 =34Y5 𝑓 +=34[5 % ) 𝑂 𝑌 𝑙 𝑂 = 𝑌 𝑙 = 𝐵 𝑙 𝑓 =X * % Y,- 5,- 11
� To Wrap Up § Discrete Fourier Transform %+$ 𝑌 𝑙 = M 𝑦 𝑜 𝑓 +=34*5 = 𝑌 e 𝑙 + 𝑘𝑌 f 𝑙 = 𝐵(𝑙) =X * % 5,- 3 𝑙 + 𝑌 f 3 𝑙 – Magnitude spectrum: 𝑌 𝑙 = 𝐵 𝑙 = 𝑌 e ∠𝑌 𝑙 = ϕ 𝑙 = tan +$ (𝑌 f (𝑙) – Phase spectrum: 𝑌 e (𝑙)) § Inverse Discrete Fourier Transform %+$ 𝑦(𝑜) = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 % *,- 12
Why we choose this set of frequencies in sinusoids? § Underlying assumption in DFT – The N samples are periodic – In the view of “Fourier Series”, a periodic signal with period N can be represented as sinusoids with period N, N/2, N/3 , … ( 1/N, 2/N, 3/N , ... in frequency ) 0.2 0.1 amplitude 0 − 0.1 − 0.2 0 20 40 60 80 time − seconds 0.2 0.1 amplitude 0 − 0.1 − 0.2 13 0 50 100 150 200 250 300 time − seconds
Properties of DFT § Periodicity – 𝑌 𝑙 = 𝑌 𝑙 + 𝑂 = 𝑌 𝑙 + 2𝑂 = … – 𝑌 𝑙 = 𝑌 𝑙 − 𝑂 = 𝑌 𝑙 − 2𝑂 = … § Symmetry – Magnitude response: 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 = 𝑌 −𝑙 – Phase response : ∠𝑌 𝑙 = −∠𝑌 −𝑙 = −∠𝑌 𝑂 − 𝑙 – We often display only half the amplitude and phase responses 14
Properties of DFT Waveform Waveform 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Magnitude (N=32) Magnitude (N=32) 15 15 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 𝑌 𝑙 = 𝑌 −𝑙 10 10 5 5 0 0 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 Phase (N=32) Phase (N=32) 4 4 ∠𝑌 𝑙 = −∠𝑌 𝑂 − 𝑙 ∠𝑌 𝑙 = −∠𝑌 −𝑙 2 2 0 0 -2 -2 -4 -4 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 15
Frequency Scales 𝑙 = 0, 1, … , 𝑂 corresponds to frequency values that are evenly § 𝑌 𝑙 distributed between 0 to 𝑔𝑡 in Hz N -N -N/2 0 N/2 f s -f s 0 f s /2 -f s /2 16
Cracks in Sinusoids § If the frequency compoment in 𝑦 𝑜 is not exactly on one of the sinusoids – For example, if 𝑦 𝑜 is a sinusoid with an arbitrary frequency 𝜕: 𝑦 𝑜 = 𝑓 =l5 %+$ %+$ %+$ 𝑌 𝑙 = M 𝑦 𝑜 𝑓 +=34*5 = M 𝑓 =l5 𝑓 +=34*5 = M 𝑓 =(l+34* % )5 % % 5,- 5,- 5,- ((𝜕 − 2𝜌𝑙 = 1 − 𝑓 =(l+34* % )% %/3 sin 𝑂 )𝑂/2) % ) = 𝑓 = l+34* % ((𝜕 − 2𝜌𝑙 1 − 𝑓 =(l+34* sin 𝑂 )/2) 17
Cracks in Sinusoids on the sinusoids off the sinusoids 2 2 1 1 Amplitude 0 0 − 1 − 1 − 2 − 2 5 10 15 20 25 30 5 10 15 20 25 30 20 20 ((𝜕 − 2𝜌𝑙 ((𝜕 − 2𝜌𝑙 sin 𝑂 )𝑂/2) sin 𝑂 )𝑂/2) 15 15 Magntude Magntude ((𝜕 − 2𝜌𝑙 ((𝜕 − 2𝜌𝑙 sin 𝑂 )/2) sin 𝑂 )/2) 10 10 5 5 0 0 0 2 4 6 8 10 0 2 4 6 8 10 𝜕 𝜕 18
Zero-padding § Adding zeros to a windowed frame in time domain – Corresponds to “ideal interpolation” in frequency domain – In practice, FFT size increases by the size of zero-padding Before Zeropadding After Zeropadding (x4) 2 2 Amplitude 1 1 0 0 − 1 − 1 − 2 − 2 50 100 150 200 250 200 400 600 800 1000 1200 100 150 80 Magntude 100 60 40 50 20 0 0 0 2 4 6 8 10 12 14 16 0 10 20 30 40 50 60 19
Examples of DFT 150 0.5 magnitude amplitude 100 0 50 − 0.5 0 5 10 15 20 25 30 35 40 45 50 0 500 1000 1500 2000 2500 3000 3500 4000 time − milliseconds freqeuncy Sine: waveform Sine: spectrum 15 0.5 magnitude amplitude 10 0 5 − 0.5 0 0 20 40 60 80 100 120 140 160 0 0.5 1 1.5 2 2.5 time − milliseconds freqeuncy 4 x 10 Drum: waveform Drum: spectrum 0.4 40 0.2 30 magnitude amplitude 0 20 − 0.2 10 − 0.4 0 0 0.5 1 1.5 2 2.5 50 52 54 56 58 60 time − milliseconds freqeuncy 4 x 10 Flute: waveform 20 Flute: spectrum
Fast Fourier Transform (FFT) § Matrix multiplication view of DFT § In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)” – Complexity reduction by FFT: O( N 2 ) à O( N log 2 N ) – Divide and conquer 21
Short-Time Fourier Transform (STFT) § DFT assumes that the signal is stationary – It is not a good idea to apply DFT to a long and dynamically changing signal like music – Instead, we segment the signal and apply DFT separately § Short-Time Fourier Transform : hop size ℎ %+$ 𝑌(𝑙, 𝑚) = M 𝑥(𝑜)𝑦(𝑜 + 𝑚 Z ℎ)𝑓 += 34*5 𝑥(𝑜) : window % : FFT size 𝑂 5,- § This produces 2-D time-frequency representations – Get “spectrogram” from the magnitude – Parameters: window size, window type, FFT size, hop size 22
Windowing § Types of window functions – Trade-off between the width of main-lobe and the level of side-lobe Main-lobe width Side-lobe level 23
Short-Time Fourier Transform (STFT) Source: the JOS SASP book 50% overlap 24
Example: Waveform Flute A4 Note Piano C4 Note 25
Example: Spectrogram - 2D color map Piano C4 Note Flute A4 Note 26
Example: Spectrogram - 3D waterfall Piano C4 Note Flute A4 Note 27
Example: Pop Music 28
Example: Deep Note 29
Time-Frequency Resolutions in STFT § Trade-off between time- and frequency-resolution by window size < Short window > < Long window > low freq.-resolution high freq.-resolution high time-resolution low time-resolution 30
Recommend
More recommend