Motivation Analysis-and-manipulation Active music listening Conventional music listening � Selecting from users’ requirement � Selecting from limited playlist approach to pitch and duration of � Changing music to suit users’ feeling � Only listening after pressing “play” musical instrument sounds without Active and exploratory Passive and limited distorting timbral characteristics listening listening experience Instrument equalizers have been developed Takehiro Abe † Katsutoshi Itoyama † Kazuyoshi Yoshii ‡ Drumix Itoyama’s EQ. user Our equalizer [Yoshii 07] [Itoyama 08] Kazunori Komatani † Tetsuya Ogata † Hiroshi G. Okuno † can change Instruments (Drums only) (All) †Department of Intelligence Science and Technology, Volume Kyoto University, Japan ‡National Institute of Advanced Industrial Science Timbre and Technology (AIST) Replacing arbitrary part with users favorite timbre Demonstration: http://winnie.kuis.kyoto-u.ac.jp/~abe/DAFx-08/ Digital Audio Effects DAFx-2008 Digital Audio Effects DAFx-2008 Demonstration (Trial equalizer) Requirements for our equalizer 1. Sound separation from polyphonic audio to extract a Content musical instrument sound that users want to replace midi sound Well studied genre buttons 2. Sound manipulation from separated sounds without timbral distortion to play arbitrary phrases synthesized The application of separated sounds is not well studied piano sound Our research target Difference from the sound excited by real instrument Jazz sound part buttons (synthesis) Objective Synthesizing monotones excited by the same Equalizer’s sounds are synthesized from real sounds except midi sounds instrument from multiple musical instrument sounds Digital Audio Effects DAFx-2008 Digital Audio Effects DAFx-2008 1
Our definition of timbral features Manipulation of pitch and duration ASA’s definition It is not proper to achieve manipulation without changing the timbral features The quality of a sound that distinguishes it from others of the same pitch and volume [ASA 60] Timbre has pitch dependency [Marozeau 03] Concrete definition based on [Grey 77] Seed, (440Hz) Ref., (880Hz) Phase vocoder (880Hz) Our method(880Hz) Distort Our definition high The quality of a sound that consists of three features except pitch and volume frequency We use pitch-dependency feature function for the dependency Attack, decay and vibrato feature are similar in the same instrument Seed, ref., (length1) Sinusoidal model(length4) Our method(length4) Distort Distort 1.The relative amplitudes 3.Temporal envelopes attack vibrato of harmonic peaks segment feature 2.The inharmonic component We use the tonal model that can analyze these features [Itoyama 08] We preserve attack, decay segments and vibrato feature Digital Audio Effects DAFx-2008 Digital Audio Effects DAFx-2008 Overview of our manipulation method Analysis to obtain three features Time Time Amplitude Amplitude Amplitude Tonal model Step1 : Analysis Separate harmonic Frequency and inharmonic structures Harmonic Feature1 Harmonic structure Inharmonic structure and extract timbral features Frequency model Frequency Spectral Power of Step2 : Manipulation structure harmonics Inharmonic Manipulate pitch, is expressed as model Pitch Feature2 duration, and energy the Gaussian Mixture Model represents spectrogram of the inharmonic structure Temporal of inharmonic component Duration Step3 : Synthesis structure Amplitude Synthesize harmonic is expressed as Envelope and inharmonic signals the nonparametric model and add them Frequency Feature3 Time Digital Audio Effects DAFx-2008 Digital Audio Effects DAFx-2008 2
Pitch manipulation Duration manipulation • Manipulating the spectral envelope • Manipulating the temporal envelope ( ) E ( r ) µ r – by multiplying the pitch trajectory ( ) by a desired ratio ( r ) – by expanding or shrinking between onset ( ) and offset ( ) r on off – Obtain timbral features from pitch-dependent feature function dE ( r ) < ε > detection equation: , E ( r ) Th Amplitude v Power of Power of dr n v ' harmonics harmonics Detect Preserve Detect Preserve Expand n Amplitude Pitch trajectory … µ µ µ µ ( r ) ( r ) Frequency ' r ( ) ' r ( ) E ( r ) E ( r ) Temporal • Pitch-dependent feature function envelope r r Time on off – approximates timbral features over pitches by polynomial function • Preserving the vibrato • power of harmonics ( ) v n µ – Pitch trajectory ( ) is analyzed and synthesized by sinusoidal model ( r ) w / H w • the ratio of harmonic energy to inharmonic energy ( ) I × 10 6 1.00 0.10 harmonic en. to 2.00 inharmonic en. Power of Power of Analyze Preserve Synthesize Preserve The ratio of harmonics harmonics Frequency 0.80 1.50 0.05 v of 1st v of 4th Smoothing 0.60 w H / w I 1.00 0.40 0.00 Original Synthesized 1 4 0.50 0.20 µ −0.05 ( r ) Pitch µ Pitch th th ( r ) 0.00 0.00 220 440 880 220 440 880 220 440 880 trajectory trajectory Fundamental Frequency [Hz] Fundamental Frequency [Hz] pitch trajectory [Hz] pitch trajectory [Hz] Fundamental Frequency [Hz] pitch trajectory [Hz] Time Digital Audio Effects DAFx-2008 Digital Audio Effects DAFx-2008 Synthesis from harmonics and inharmonics Evaluation in pitch manipulation s H ( t ) s I ( t ) • Harmonic signal ( ) • Inharmonic signal ( ) • Baseline method = Sophisticated sinusoidal model – using sinusoidal model – from inharmonic model weighted – Our method without pitch-dependent feature function by inharmonic energy ( ) w ' I • Criteria – Spectral distance: evaluation of harmonic component difference s ( t ) • Output signal ( ) – Mel-Frequency Cepstrum Coefficient (MFCC) distance: – obtained by adding these two signals • quantitative auditory measurement • evaluation of harmonic and inharmonic components differences ∑ 2 / = − s I ( t ) s H ( t ) s ( t ) D ( C ( f , r ) C ( f , r )) T C Spectrum or MFCC real syn i ※ ” ‘ ” parameter is a manipulated parameter. Equations for harmonic signal f , t Real sound Synthesis sound Frames • Conditions ∑ = φ s ( t ) A ( t ) exp[ j ( t )] w Harmonic energy: Harmonic signal: H n n H – 32 instruments from RWC-MDB (forte, normal articulation) v ' n Power of harmonics: = A ( t ) w v ' E ( t ) n • 3 individuals for each instrument Instance amplitude: n H n n E n ( t ) Temporal envelope: ∫ t – 10-fold cross validation (10%:90% = [evaluation data]:[learning data]) φ = φ + µ τ τ µ ' τ ( t ) ( 0 ) n ' ( ) d ( ) Instance phase: Pitch trajectory: n n 0 Digital Audio Effects DAFx-2008 Digital Audio Effects DAFx-2008 3
Recommend
More recommend