EE E6820: Speech & Audio Processing & Recognition Lecture 6: Music analysis and synthesis 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis 4 Music analysis 5 Transcription Dan Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/e6820/ E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 1
Music & nonspeech 1 • What is ‘nonspeech’? - according to research effort: a little music - in the world: most everything high speech music Information content animal sounds machines & engines contact/ collision wind & water low natural man-made Origin attributes? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 2
Sound attributes • Attributes suggest model parameters • What do we notice about ‘general’ sound? - psychophysics: pitch, loudness, ‘timbre’ - bright/dull; sharp/soft; grating/soothing - sound is not ‘abstract’: tendency is to describe by source-events • Ecological perspective - what matters about sound is ‘what happened’ → our percepts express this more-or-less directly E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 3
Aside: Sound textures • What do we hear in: - a city street - a symphony orchestra • How do we distinguish: - waterfall - rainfall - applause - static Applause04 Rain01 5000 5000 4000 4000 freq / Hz freq / Hz 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 time / s time / s • of ecological description... Levels E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 4
Motivations for modeling • Describe/classify - cast sound into model because want to use the resulting parameters • Store/transmit - model implicitly exploits limited structure of signal • Resynthesize/modify - model separates out interesting parameters Sound Model parameter space E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 5
Analysis and synthesis • Analysis is the converse of synthesis: Model / representation Synthesis Analysis Sound • Can exist apart: - analysis for classification - synthesis of artificial sounds • Often used together: - encoding/decoding of compressed formats - resynthesis based on analyses - analysis-by-synthesis E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 6
Outline 1 Music and nonspeech 2 Music synthesis techniques - Framework - Historical development 3 Sinewave synthesis 4 Music analysis 5 Transcription elements? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 7
Music synthesis techniques 2 • What is music? → - could be anything flexible synthesis needed! • Key elements of conventional music - instruments → note-events (time, pitch, accent level) → melody, harmony, rhythm - patterns of repetition & variation • Synthesis framework: instruments: common framework for many notes score: sequence of (time, pitch, level) note events E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 8
The nature of musical instrument notes • Characterized by instrument (register), note, loudness (emphasis), articulation... Piano Violin 4000 4000 Frequency 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 Time Time Clarinet Trumpet 4000 4000 Frequency 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 Time Time distinguish how? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 9
Development of music synthesis • Goals of music synthesis: - generate realistic / pleasant new notes - control / explore timbre (quality) • Earliest computer systems in 1960s (voice synthesis, algorithmic) • Pure synthesis approaches: - 1970s: Analog synths - 1980s: FM (Stanford/Yamaha) - 1990s: Physical modeling, hybrids • Analysis-synthesis methods: - sampling / wavetables - sinusoid modeling - harmonics + noise (+ transients) others? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 10
Analog synthesis • The minimum to make an ‘interesting’ sound Envelope Trigger Pitch t + Cutoff + Vibrato freq + Oscillator Filter Sound Gain f t • Elements: - harmonics-rich oscillators - time-varying filters - time-varying envelope - modulation: low frequency + envelope-based • Result: - time-varying spectrum, independent pitch E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 11
FM synthesis → • Fast frequency modulation sidebands: ∞ ∑ ( ω c t β ( ω m t ) ) J n β ( ) ( ( ω c n ω m ) t ) cos + sin = cos + ∞ n = – ω ω - a harmonic series if = · r c m β • J ( ) is a Bessel function: n 1 J 0 J 1 J 2 J 3 J 4 0.5 J n( β ) ≈ 0 for β < n - 2 0 -0.5 modulation index β 0 1 2 3 4 5 6 7 8 9 → β Complex harmonic spectra by varying 4000 ω c 3000 = 2000 Hz freq / Hz 2000 ω m what = 200 Hz use? 1000 0 time / s 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 12
Sampling synthesis 0.2 0.1 • Resynthesis from real notes 0 → vary pitch, duration, level -0.1 -0.2 0 0.1 0.2 time • Pitch: stretch (resample) waveform 0.2 0.2 596 Hz 894 Hz 0.1 0.1 0 0 -0.1 -0.1 -0.2 -0.2 0.008 time / s 0.008 time / s 0 0.002 0.004 0.006 0 0.002 0.004 0.006 • Duration: loop a ‘sustain’ section 0.2 0.2 0.1 0.1 0.204 0.206 0.174 0.176 0 0 -0.1 -0.1 -0.2 -0.2 0 0.1 0.2 0.3 time / s 0 0.1 0.2 0.3 time / s • Level: cross-fade different examples 0.2 0.2 Soft Loud mix 0.1 0.1 good 0 0 -0.1 -0.1 & bad? veloc -0.2 -0.2 0 0.05 0.1 0.15 time / s 0 0.05 0.1 0.15 time / s - need to ‘line up’ source samples E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 13
Outline 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis (detail) - Sinewave modeling - Sines + residual ... 4 Music analysis 5 Transcription E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 14
Sinewave synthesis 3 • If patterns of harmonics are what matter, why not generate them all explicitly: ∑ [ ] [ ] ( k ω 0 n ⋅ [ ] n ⋅ ) s n = A k n cos k - particularly powerful model for pitched signals • Analysis (as with speech): - find peaks in STFT | S [ ω , n ] | & track - or track fundamental ω 0 (harmonics / autoco) & sample STFT at k· ω 0 → set of A k [ n ] to duplicate tone: freq / Hz 8000 6000 2 mag 4000 1 2000 0 0.2 5000 0.1 0 freq / Hz time / s 0 0 time / s 0 0.05 0.1 0.15 0.2 • Synthesis via bank of oscillators E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 15
Steps to sinewave modeling - 1 • The underlying STFT: N – 1 j 2 π kn ∑ [ , ] [ ] w n ⋅ [ ] ⋅ - - - - - - - - - - - - - X k n 0 = x n + n 0 exp – N n = 0 What value for N ( FFT length & window size ) ? What value for H ( hop size: n 0 = r · H , r = 0, 1, 2... ) ? • STFT window length determines freq. resol’n: X w e j ω X e j ω W e j ω ( ) ( ) ( ) = * • Choose N long enough to resolve harmonics → 2-3x longest (lowest) fundamental period - e.g. 30-60 ms = 480-960 samples @ 16 kHz - choose H ≤ N /2 N too long → lost time resolution • - limits sinusoid amplitude rate of change E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 16
Steps to sinewave modeling - 2 • Choose candidate sinusoids at each time by picking peaks in each STFT frame: 8000 freq / Hz 6000 4000 2000 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 time / s 20 level / dB 0 -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Quadratic fit for peak, lin. interp. for phase: 20 0 y phase / rad 10 ab 2 /4 level / dB y = ax(x-b) x 0 -5 b/2 -10 -20 -10 400 600 800 freq / Hz 400 600 800 freq / Hz + linear interp. of unwrapped phase E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 17
Steps to sinewave modeling - 3 • Which peaks to pick? Want ‘true’ sinusoids, not noise fluctuations - ‘prominence’ threshold above smoothed spec. 20 level / dB 0 -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Sinusoids exhibit stability... - of amplitude in time - of phase derivative in time → compare with adjacent time frames to test? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 18
Steps to sinewave modeling - 4 • ‘Grow’ tracks by appending newly-found peaks to existing tracks: freq birth existing tracks time death new peaks - ambiguous assignments possible • Unclaimed new peak - ‘birth’ of new track - backtrack to find earliest trace? • No continuation peak for existing track - ‘death’ of track - or: reduce peak threshold for hysteresis E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 19
Recommend
More recommend