GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of Culture Technology KAIST Juhan Nam 1
Outlines § Resampling § OverLap and Add (OLA) methods – SOLA – WSOLA – PSOLA § Phase Vocoder 2
Playback Rate Conversion § “Playback rate” is not necessarily equal to the recording rate § Adjusting the playback rate given the recorded audio creates different tones – Sliding tapes on the magnetic header in a variable speed – Speeding down: “monster-like” – Speeding up: “chipmunk-like” – It can be even negative rate: reverse playback § Demo – https://musiclab.chromeexperiments.com/Voice-Spinner 3
Resampling § Reconstruct the original signal and sample it with a new rate – For a digital system with a constant playback rate • Up-sampling makes the original sound played slower • Down-sampling makes the original sound played faster 4
Resampling by Reconstruction Lowpass Filters § As you recall from the topic of digital audio, the original signal can be reconstructed by the sinc function – Resampling on the reconstructed signal is equivalent to interpolation with the reconstruction filter
Reconstruction Lowpass Filters (Interpolation Filters) Windowed Sinc Linear 1.5 1.5 1 1 0.5 0.5 0 0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 Sample Time Sample Time 3rd − order B − spline 3rd − order Lagrange 1.5 1.5 1 1 0.5 0.5 0 0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 Sample Time Sample Time 6
Resampling § Resampling changes pitch, length and timbre at the same time! Original Speed Down (Up-sampling) Speed Up (Down-sampling) [The DaFX book] 7
How can we control pitch and length independently? § Yes, the answer is processing samples in frame-level instead of sample- level – The sample block preserves the local shape of waveforms Sample Block Analysis hop size Synthesis hop size 8
Time-Stretching § Time-Stretching (without pitch-shifting) # $ – Time-stretching ratio: 𝛽 = # % ( 𝐼 ' : synthesis hop size, 𝐼 ( : analysis hop size) – If 𝛽 > 1 , increase the length – If 𝛽 < 1 , reduce the length § Algorithms – OLA – SOLA – PSOLA – Phase Vocoder 9
OverLap-and-Add (OLA) § A time-stretching algorithm by segmenting, overlapping and adding the waveform – Overlapped region is cross-faded between two adjacent frames – Problem: fuzziness by the phase difference between the frames Analysis Hop Size Fade-In Fade-In Fade-Out Synthesis Hop Size Fade-Out 10
Synchronized OverLap-and-Add (SOLA) § Reduce artifacts in OLA by shifting the overlapped region such that the two adjacent frames are maximally correlated Synchronization by cross-correlation n = L − 1 ∑ X corr ( l ) = x 1 ( n ) x 2 ( n + l ) Analysis Hop Size L n = 0 0.6 0.4 0.2 0 − 0.2 − 0.4 Synthesis Hop Size − 0.6 − 0.8 0 100 200 300 400 500 600 700 800 0.6 0.4 0.2 0 − 0.2 − 0.4 Synthesis Hop Size − 0.6 − 0.8 0 100 200 300 400 500 600 700 800 Find the lag ( l ) where the cross correlation is maximum Shift the next frame by the lag 11
Pitch-Synchronous OLA (PSOLA) § Analysis – Perform block-based pitch detection and find pitch PSOLA analysis marks 𝑢 . : pitch period 𝑄 𝑢 = 𝑢 .01 − 𝑢 . – Extract a segment centered at every pitch mark 𝑢 . Pitch marks using a Hanning window with length 𝑀 . = 2𝑄(𝑢 . ) to ensure fade-in and fade-out Segments § Synthesis for time-stretching – For every synthesis pitch mark 𝑢̃ 8 , search the Pitch marks corresponding 𝑢 . that minimizes 𝛽𝑢 . − 𝑢̃ 8 – Overlap and add the selected segment Segments • If 𝛽 > 1 , some segments will be repeated • If 𝛽 < 1 , some segments will be discarded PSOLA time stretching – The next synthesis pitch mark 𝑢̃ 8 is determined to Synthesis preserve local pitch pitch marks :(𝑢̃ 8 )= 𝑢̃ 8 + 𝑄(𝑢 . ) 𝑢̃ 801 = 𝑢̃ 8 + 𝑄 • Overlap and add 12
Pitch-Shifting § Using Time-stretching and Resampling – First, perform time-stretching with ratio 𝛽 – Second, resampling the output with the same ratio 𝛽 § Problem – Timbre ( i.e. formant) changes by the resampling – This is quite audible for human voice (e.g. speech or singing ) 13
Pitch-Synchronous OLA (PSOLA) § PSOLA can be used for pitch-shifting – For every synthesis pitch mark 𝑢̃ 8 , search the corresponding 𝑢 . that minimizes 𝑢 . − 𝑢̃ 8 – Overlap and add the selected segment Pitch marks • If 𝛾 > 1 , some segments will be repeated • If 𝛾 < 1 , some segments will be discarded Segments – The next synthesis pitch mark 𝑢̃ 8 is determined to preserve local pitch PSOLA pitch shifting :(𝑢̃ 8 )= 𝑢̃ 8 + 𝑄(𝑢 . )/𝛾 𝑢̃ 801 = 𝑢̃ 8 + 𝑄 • Synthesis pitch marks § It is possible to combine the time-stretching (with the term 𝛽𝑢 . − 𝑢̃ 8 ) and pitch-shifting Overlap and add § This preserves the formant of the input sound! 14
Resources § TSM Toolbox – Time-scaling modification code using WSOLA (waveform similarity OLSA) and phase vocoder – Additionally, with harmonic-percussive source separation (HPSS) – https://www.audiolabs-erlangen.de/resources/MIR/TSMtoolbox/ 15
Recommend
More recommend