Time-frequency multipliers for sound synthesis Time-frequency multipliers for sound synthesis Ph. Depalle † , R. Kronland-Martinet ‡ and B. Torr´ esani ⋆ † : SPCL, McGill University, Montreal, Canada, ‡ : LMA, Centre National de la Recherche Scientifique, Marseille, France ⋆ : LATP, Universit´ e de Provence, Marseille, France SPIE Symposium, San Diego, August 2007
Time-frequency multipliers for sound synthesis Outline Time-frequency sound synthesis: Generate sound signals from elementary, time-frequency localized atoms Implement sound transformations directly in the time-frequency domain
Time-frequency multipliers for sound synthesis Outline Time-frequency sound synthesis: Generate sound signals from elementary, time-frequency localized atoms Implement sound transformations directly in the time-frequency domain Goal: investigate the actual relevance of time-frequency multipliers and generalizations for sound synthesis and transformation.
Time-frequency multipliers for sound synthesis Introduction 1 Time-frequency operator representation 2 Time-frequency space Spreading function representation Gabor frames and multipliers Multiple Gabor multipliers Gabor multiplier and multiple Gabor multiplier estimation 3 The simple case Introducing time-frequency shifts MGM: masking pursuit Applications to sound transformation 4 Conclusions 5
Time-frequency multipliers for sound synthesis Introduction Time-frequency representations are known to be quite appropriate for representing audio signals: Audio signals often have good time-frequency localization properties The time-frequency plane allows one to easy model dependencies between coefficients We consider here the case of audio signal transformations , and their time-frequency implementation, in view of Sound synthesis from simple templates Various applications such as sound morphing, pitch shifting,... Here, we discuss the interest of time-frequency operator representation for quantifying timbre perceptual differences between sounds. An operator is represented by a (family of) time-frequency image(s), plus additional parameters.
Time-frequency multipliers for sound synthesis Introduction Musical sound is described in terms of pitch and timbre: timbre is not very well defined, but several physics related cues are know to play significant role: The harmonicity: frequencies of partials may be multiple of a fundamental frequency (for example violin strings) or not (drums, piano strings,...). The harmonic content: for example, for wind instruments, boundary conditions determine the relative weight of even/odd harmonics (see example below) The time decay of the various harmonics, which is also related to the physics of the instrument The strength of the attack: i.e., the speed at which energy propagates across frequencies
Time-frequency multipliers for sound synthesis Introduction Toy example Sound signals composed of partials with partial dependent offset partial dependent initial amplitude partial dependent decay rate (exponential)
Time-frequency multipliers for sound synthesis Introduction Toy example Sound signals composed of partials with partial dependent offset partial dependent initial amplitude partial dependent decay rate (exponential) Magnitude of the Gabor Transform of toy example
Time-frequency multipliers for sound synthesis Introduction Additive synthesis � x ( t ) = a k ( t ) cos(2 π kf 0 t + ϕ k ) k with (physics driven) a k ( t ) = c k t γ k exp {− t /τ k } . simple model: τ k = αβ − k
Time-frequency multipliers for sound synthesis Introduction Real examples: Soprano saxophone and clarinet Gabor Transform Magnitude of the Saxophone Tone Gabor Transform Magnitude of the Clarinet Tone 6000 6000 5000 5000 Frequency (Hz) Frequency (Hz) 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Time (s) Time (s) Magnitude of the Gabor Transform of the saxophone tone (left) and of the clarinet tone (right).
Time-frequency multipliers for sound synthesis Introduction Real examples: Soprano saxophone and clarinet Gabor Transform Magnitude of the Saxophone Tone Gabor Transform Magnitude of the Clarinet Tone 6000 6000 5000 5000 Frequency (Hz) Frequency (Hz) 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Time (s) Time (s) Magnitude of the Gabor Transform of the saxophone tone (left) and of the clarinet tone (right). Questions: Read the physical characteristics from time-frequency images (categorization) map (morph) saxophone into clarinet
Time-frequency multipliers for sound synthesis Introduction Time-frequency morphing mask Gabor Mask from a Saxophone to a Clarinet Tone (Magnitude) 6000 5000 Frequency (Hz) 4000 3000 2000 1000 0 0 0.5 1 1.5 2 Time (s) Saxophone: Clarinet: Morphed Saxophone:
Time-frequency multipliers for sound synthesis Introduction Introduction 1 Time-frequency operator representation 2 Time-frequency space Spreading function representation Gabor frames and multipliers Multiple Gabor multipliers Gabor multiplier and multiple Gabor multiplier estimation 3 The simple case Introducing time-frequency shifts MGM: masking pursuit Applications to sound transformation 4 Conclusions 5
Time-frequency multipliers for sound synthesis Time-frequency operator representation Time-frequency space The time-frequency plane Short time Fourier transform (STFT): associate to a (continuous time) signal x ∈ L 2 ( R ) the function V g x ∈ L 2 ( R 2 ), defined by � ∞ x ( t ) e − 2 i πν t g ( t − b ) dt = � x , g ( b ,ν ) � , V g x ( b , ν ) = (1) −∞ where g is a fixed analysis window , and the atoms g ( b ,ν ) are obtained from g through time-frequency shifts π ( b , ν ) = M ν T b g ( b ,ν ) ( t ) = exp { 2 i πν t } g ( t − b ) = [ π ( b , ν )] g ( t ) If g � = 0, the STFT can be inverted in many ways: for any synthesis window h ∈ L 2 ( R ) such that � g , h � � = 0, one has � ∞ � ∞ 1 x ( t ) = V g x ( b , ν ) h ( b ,ν ) ( t ) dbd ν , (2) � h , g � −∞ −∞ the equality holding in the strong L 2 ( R ) sense.
Time-frequency multipliers for sound synthesis Time-frequency operator representation Spreading function representation Spreading function representation Theorem (Time-frequency operator representation) 1 Let H ∈ H , the class of Hilbert-Schmidt operator on L 2 ( R ) . Then there exists a function η = η H ∈ L 2 ( R 2 ) , called the spreading function , such that � ∞ � ∞ H = η ( b , ν ) π ( b , ν ) dbd ν . (3) −∞ −∞ the integral being interpreted in the weak operator sense. 2 The relation η ∈ L 2 ( R ) ↔ H ∈ H extends to a Gelfand triple isomorphism ( S 0 ( R ) , L 2 ( R ) , S ′ 0 ( R )) ↔ ( B , H , B ′ ) . Here, S 0 denotes the space of functions whose STFT (with Gaussian window) is L 1 , S ′ 0 its dual space, and B the space of operators that are bounded S ′ 0 → S 0 .
Time-frequency multipliers for sound synthesis Time-frequency operator representation Spreading function representation Spreading function representation (2) The spreading function representation is closely related to the twisted convolution on the time-frequency plane, defined by � R 2 F ( b ′ , ν ′ ) G ( b − b ′ , ν − ν ′ ) e − 2 i π b ′ ( ν − ν ′ ) db ′ d ν ′ . ( F ♮ G )( b , ν ) = (4)
Time-frequency multipliers for sound synthesis Time-frequency operator representation Spreading function representation Spreading function representation (2) The spreading function representation is closely related to the twisted convolution on the time-frequency plane, defined by � R 2 F ( b ′ , ν ′ ) G ( b − b ′ , ν − ν ′ ) e − 2 i π b ′ ( ν − ν ′ ) db ′ d ν ′ . ( F ♮ G )( b , ν ) = (4) Theorem (Twisted convolution representation) Assume that g , h ∈ L 2 ( R ) are such that � g , h � = 1 . Then H may be realized as a left twisted convolution in the time-frequency domain: for all x ∈ L 2 ( R ) , V g Hx = η H ♮ V g x . (5) (known as Weyl’s quantization in the theoretical physics literature)
Time-frequency multipliers for sound synthesis Time-frequency operator representation Spreading function representation Remarks An immediate consequence: the range of V g is invariant under left twisted convolution.
Time-frequency multipliers for sound synthesis Time-frequency operator representation Spreading function representation Remarks An immediate consequence: the range of V g is invariant under left twisted convolution. Digital signals are finite dimensional, and a corresponding finite-dimensional version may be derived. Unfortunately, such an expression is of poor practical interest. Numerical evaluation discrete twisted convolutions is extremely time consuming ( O ( N 4 ) complexity). Subsampling the discrete twisted convolution results in very poor approximations. A better setting for discretizing such expressions is provided by (discrete) Gabor transforms.
Recommend
More recommend