May 11 th 2002 PCM to MIDI Transposition Transposition PCM to MIDI Luís Gustavo P. M. Martins – lmartins@inescporto.pt Aníbal J. S. Ferreira – ajf@inescporto.pt Apartado 4433 4007-001 Porto Codex www.inescporto.pt tel (351)22 209 4000 fax (351)22 208 4172
Summary • Characterization of the Problem • Applications • Objectives of this Work • Selected Approach and System Design • Frequency Analysis Framework • Harmonic Analysis • Tracking of Harmonic Structures • Post-Processing • Results • Conclusion 2002.05.11 112th AES Convention – Munich - May 2002 2
Characterization of the Problem • Transcription of Music: – Act of listening to a piece of music and writing down musical notation for the notes that constitute the piece. • Implies: – Extraction of specific features out of a musical acoustic signal, such as: • Pitch • Timings • Dynamics • Instruments played 2002.05.11 112th AES Convention – Munich - May 2002 3
Characterization of the Problem • Monophonic pitch detection – Use many well-understood algorithms, such as: • Time-domain techniques (zero-crossing, autocorrelation) • Frequency-domain techniques (DFTs and Cepstrum) • Polyphonic pitch detection – Increased complexity of the signals. – Monophonic pitch detection techniques do not suit well multi-pitch estimation. – Most solutions had to be developed from scratch 2002.05.11 112th AES Convention – Munich - May 2002 4
Applications • The area of Music Recognition is just now starting to attract attention to its commercial potentialities • Numerous applications are starting to appear, but are still limited by the low reliability of the results presented by current solutions. – Music Transcription Systems – Access to Musical Databases – Structured Audio Encoding – Synthetic Performance Systems – Algorithmic Composition – Visual Music Displays – Automatic Teaching Systems 2002.05.11 112th AES Convention – Munich - May 2002 9
Objectives of this work • Development and implementation of an automatic polyphonic music transcription system. PITCH TIMINGS DYNAMICS NOTES MIDI MIDI NOTES NOTES 2002.05.11 112th AES Convention – Munich - May 2002 10
Characterization of Musical Signals • Features extracted from the musical audio signal and their perceptual correlates: – Fundamental Frequency �� Pitch – Power �� Loudness – notes ’ On/Off-set Times and Duration �� Rhythm 2002.05.11 112th AES Convention – Munich - May 2002 12
Selected Approach and System Design • Simulation and Development environment 2002.05.11 112th AES Convention – Munich - May 2002 13
Selected Approach and System Design • System Overview ON-LINE PROCESSING HARMONIC FREQUENCY ANALYSIS HARMONIC ANALYSIS STRUCTURE TRACKING PCM Music File TRAJECTORY ON-SET TRAJECTORY TRANSIENT DETECTOR TIME ADJUST CLUSTERING & PRUNING POST-PROCESSING MIDI OUT 2002.05.11 112th AES Convention – Munich - May 2002 14
Frequency Analysis Framework • Objectives: – Deliver a convenient spectral representation of the audio signal – Derive important information such as spectral power distribution and tonality behaviour – Provide a suitable front-end for the harmonic analysis of music signals • Based around a 50% overlap analysis scheme • Uses an N- point sine window and ODFT 2002.05.11 112th AES Convention – Munich - May 2002 15
Harmonic Analysis • Objective: – Accurately extract, from a spectral representation of a musical signal , parametric information that could lead to an easier and more robust way of detecting the presence of musical notes. FREQUENCY HARMONIC HARMONIC STRUCTURE ANALYSIS ANALYSIS TRACKING HARMONIC HARMONIC SPECTRAL PEAK SPECTRAL |ODFT| 2 STRUCTURE STRUCTURE DETECTOR INTERPOLATION DETECTOR TRACKING 2002.05.11 112th AES Convention – Munich - May 2002 22
Harmonic Analysis • Strengths: – Flexible enough to identify several pitches, suiting the problem of polyphonic music transcription – System does not assume any previous knowledge of spectral models of the instruments playing � allows a more generic detection – The pitch of the harmonic structure is based on the frequencies of all its partials, making it increasingly precise with the increasing number of partials detected – More robust tracking (in the subsequent harmonic structure tracking block) There are less harmonic structures to track than peaks � lower computational – load and system delay • Algorithm shortcomings: – Ambiguous detection of harmonic structures whose fundamental frequencies are related by integer numbers – Difficulty in distinguishing simultaneous notes separated by octave intervals – Does not admit missing fundamental harmonic structures – Only admits one recovery from a missing harmonic situation 2002.05.11 112th AES Convention – Munich - May 2002 32
Harmonic Analysis • Output of the Harmonic Analysis block PSD PSD PSD 10 10 10 10 10 10 8 8 8 10 10 10 6 6 6 10 10 10 4 4 4 10 10 10 2 2 2 10 10 10 ω ω ω 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 2002.05.11 112th AES Convention – Munich - May 2002 33
Harmonic Analysis • Successive frame results of the Harmonic Analysis block aligned as a discrete time-frequency representation: f0 f0 f0 time time time 2002.05.11 112th AES Convention – Munich - May 2002 34
Tracking of Harmonic Structures • Objectives: – Study the time evolution of the harmonic structures detected in the previous frames, and… – … define trajectories: • Entities that already share many of the properties of a note: Start / stop times � duration – – Fundamental frequency – Intensity • Frequency Continuation Algorithm • Objective: – Organize the detected harmonic structures in time-oriented trajectories ( i.e. detect the “lines” in the previously presented time-frequency representation) using a causal scheme 2002.05.11 112th AES Convention – Munich - May 2002 35
Tracking of Harmonic Structures • Frequency Continuation Algorithm – Trajectory structure: > TRAJECTORY START FRAME STOP FRAME DUR = STOPFRAME - STARTFRAME + 1 [F0(1), F0(2),...,F0(DUR)] F0 = FUNDAMENTAL FREQUENCY VECTOR [P(1), P(2),..., P(DUR-INTERPOLS)] P = POWER VECTOR [INTERP(1),...,INTERP(INTERPOLS)] INTERPOLS = NR. OF INTERPOLATED GAPS TRAJECTORY LIST ( ) – Trajectory list structure: (1) (2) START FRAME START FRAME • Candidate trajectory list STOP FRAME STOP FRAME [FREQUENCY] [FREQUENCY] • Validated trajectory list [POWER] [POWER] [INTERPOLATIONS] [INTERPOLATIONS] 2002.05.11 112th AES Convention – Munich - May 2002 37
Tracking of Harmonic Structures • Frequency Continuation Algorithm – Parameters: • Minimum-note-duration: – Controls the minimum duration admitted for a trajectory ( i.e. musical note) • Minimum-pause-duration: – Defines the minimum duration of a musical pause – Specifies the minimum number of frames that separate two trajectories with close fundamental frequencies • Maximum-frequency-deviation: – Controls the maximum allowable frequency deviation from a fundamental frequency of a harmonic structure to the frequency of a trajectory (default value=1/2 semitone, considering an equal temperament scale) 2002.05.11 112th AES Convention – Munich - May 2002 38
Tracking of Harmonic Structures • Frequency Continuation Algorithm – Minimum-note.duration = 3 frames – Minimum-pause-duration = 2 frames freq. X X X X Validated Trajectory Candidate Trajectory Interpolated Trajectory Harmonic Structure Maximum Frequency X X X X Dev iation X X time (frames) 2002.05.11 112th AES Convention – Munich - May 2002 39
Tracking of Harmonic Structures • Frequency Continuation Algorithm f0 f0 f0 45 45 45 40 40 40 35 35 35 30 30 30 25 25 25 20 20 20 15 15 15 10 10 10 5 5 5 20 40 60 80 100 120 140 160 180 20 40 60 80 100 120 140 160 180 20 40 60 80 100 120 140 160 180 time (frames) time (frames) time (frames) 2002.05.11 112th AES Convention – Munich - May 2002 43
Post-Processing • Objectives: – Fine-tune all the trajectories returned by the on-line processing blocks – Identify the best trajectories to represent the true musical notes played • Processing blocks: – Time-domain transient detector – Trajectory on-set time adjust block – Trajectory clustering and pruning block 2002.05.11 112th AES Convention – Munich - May 2002 44
Post-Processing • Transient detector: – Objectives: • Detect non-stationarities in the time-domain representation of musical signals • Determine the most probable spots for the on-set time of musical notes 1 1 1 0.8 0.8 0.8 Release = 2 frames Release = 2 frames Release = 2 frames 0.6 0.6 0.6 0.4 0.4 0.4 TRANSIENT TRANSIENT TRANSIENT 0.2 0.2 0.2 THRESHOLD THRESHOLD THRESHOLD 0 0 0 time (frames) time (frames) time (frames) 2002.05.11 112th AES Convention – Munich - May 2002 45
Recommend
More recommend