Preeti Rao 2 nd CompMusic Workshop, Istanbul 2012 o Music signal - PowerPoint PPT Presentation

Preeti Rao 2 nd CompMusic Workshop, Istanbul 2012

o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o Pitch detection algorithms o Polyphonic context and predominant pitch tracking o Applications in MIR 2

Digital audio format: PCM • Sampling rate: 44.1 kHz, 22.05 kHz • Amplitude resolution: 16 bits/sample *The Physics Classroom:http://www.glenbrook.k12.il.us/gbssci/ phys/Class/sound/u11l2a.html WiSSAP 2007

Interesting sounds are typically coded in the form of a temporal sequence of “atomic sound events”. E.g. speech -> a sequence of phones music -> an evolving pattern of notes An atomic sound event, or a single gestalt, can be a complex acoustical signal described by a set of temporal and spectral properties => an evoked sensation. Department of Electrical Engineering , IIT Bombay

A sound of given frequency components and sound pressure levels leads to perceived sensations that can be distinguished in terms of: o loudness <-- intensity o pitch <-- fundamental frequency o timbre (“quality” or “colour”) <--ther spectro-temporal properties Department of Electrical Engineering , IIT Bombay

low pitch tone Frequency = 100 Hz �� T 0 = 10 msec 1 Hertz = 1 vibration/sec high pitch tone Frequency = 300 Hz T 0 = 3.3 msec Department of Electrical Engineering , IIT Bombay

Musical pitch scale low pitch high pitch semitone = 2 1/12 Department of Electrical Engineering , IIT Bombay

o The construction of a musical scale is based on two assumptions about the human hearing process: o The ear is sensitive to ratios of fundamental frequencies (pitches), not so much to absolute pitch. o The preferred “musical intervals”, i.e. those perceived to be most consonant, are the ratios of small whole numbers. o A musical sound is typically comprised of several frequencies. The frequencies are evident if we observe the “spectrum” of the sound Department of Electrical Engineering , IIT Bombay

300 Hz 600 Hz 900 Hz 300 Hz + 600Hz 300 Hz + 600Hz + 900Hz Department of Electrical Engineering , IIT Bombay

Sound “atoms” : Single tone signal ( ) ( ) x 1 t X 1 f 0.7 0.8 t ( ms ) 0 50 f ( Hz ) -0.6 500

Non-tonal Signal ( ) ( ) x 2 t X 2 f 0.7 0.2 t ( ms ) 0 50 f ( Hz ) -0.5 500

Complex tone signal ( ) ( ) x 3 t X 3 f 0.5 0.2 t ( ms ) 0 50 f ( Hz ) -0.4 1000 500

Bandpass noise signal ( ) ( ) x 4 t X 4 f 0.3 1 t ( ms ) 0 50 f ( Hz ) -0.3 800 250

A flute note ( ) dB ( ) X 1 f x 1 t -20 0.5 t ( ms ) 0 50 f ( kHz ) -70 -0.5 5

o We see that the distinctive signal characteristics are more evident in the frequency domain. o The ear is a frequency analyzer. It represents a unique combination of analysis and synthesis => we do not perceive spectral components but rather the composite sounds. o We observe that a single “note” is perceived as one entity of well-defined subjective sensations. This is due to the spatial pattern recognition process achieved by the central auditory system. 15

Major dimensions of music for retrieval are melody, rhythm, harmony and timbre. o Melody, harmony -> based on pitch content o Rhythm -> based on timing information o Timbre -> relates to instrumentation, texture A representation of these high-level attributes can be obtained from pitch, timing and spectro-temporal information extracted by audio signal analysis. Representations are then compared via a similarity measure to achieve retrieval. 16

o The temporal pattern of frame-level features can offer important cues to signal identity Audio signal Texture <= duration: 0.5 – 1.0 s windows Analysis <= duration: 50 – 100 ms windows Feature Extraction Frame-level features M. F. Martin and J. Breebaart, "Features Feature summary for Audio and Music Classification," in Feature Proc.ISMIR , 2003 . vector 17

Melody: pitch related feature Melody is the temporal sequence of notes usually played by a single instrument (fixed timbre). The discrete notes (pitches) are typically selected from a musical scale. frequency/note time

o Typical implementation : o Pitch detection is carried out on the audio signal at uniformly spaced intervals o The pitch sequence is segmented into notes (regions of relatively steady pitch) o Notes are labeled o Note patterns are matched to determine melodic similarity o Challenges : o Note segmentation can be a difficult task o Pitch detection in polyphonic music is tough 19

Monophonic Signal: cues to perceived pitch Spectrum Waveform A. de Cheveigne. Multiple F0 estimation. In D.-L. Wang and G.J. Brown, editors, Computational Auditory Scene Analysis : Principles, Algorithms and Applications, IEEE Press / Wiley, 2006 . “Schroeder histogram” PDA Department of Electrical Engineering , IIT Bombay

o Time (Lag) domain : maximise autocorrelation value o Frequency domain : minimise error between estimated and predicted harmonic structures o Other 21

Music and speech signals are typically time-varying in nature => a time-frequency representation is required to visualize signal characteristics. The short-time Fourier transform (STFT) affords such a representation based on an assumption of signal quasi- stationarity. The window shape dictates the time and frequency resolution trade-off. ∑ ∑ ∑ ∑ ∞ ∞ ∞ ∞ − − − − j ω ω ω ω m X ( ω ω ω ω , n ) = = = = x ( m ) w ( n − − − − m ) e S m = = = = −∞ −∞ −∞ −∞ Department of Electrical Engineering , IIT Bombay

w(n-m) x(m) x(m)w(n-m) X n ( , ) ω DFT π ω 0

I t [ ] ∑ ˆ[ ]= x t a t [ ]cos [ ] t e t [ ] Φ + i i i 1 = a t [ ] - amplitude variation of i th sinusoidal component (“partial”) i Φ [ ] t - total phase (represents both frequency and phase variation) i I t [ ] - Number of partials, can vary with time [ ] t [ ] t t [ ] t Φ = ω + ϕ i i i { a , , } ω ϕ Model parameters to be estimated: i i i l

Sinusoid Audio Peak Peak x DFT parameters signal detection tracking { a , , } ω ϕ i i i l Additive Window synthesis _ Tonal component Σ Residual + For the smooth evolution of the signal, sine components are detected in each frame and linked to tracks from the previous frame based on frequency proximity.

50 Spectral magnitude Fixed threshold (MaxPeak - 40 dB) 40 Final peaks picked 30 20 ) agnitude (dB 10 0 -10 M -20 -30 -40 -50 0 500 1000 1500 2000 2500 3000 Frequency (Hz) 50 Spectral magnitude 40 Envelope - 20 dB Envelope - 25 dB 30 Envelope - 30 dB 20 ) B (d 10 e d itu 0 n g -10 a M -20 -30 -40 -50 0 500 1000 1500 2000 2500 3000 Frequency (Hz)

Match spectrum around peak with that of ideal sinusoid. Apply threshold to the error. Department of Electrical Engineering , IIT Bombay

Peak tracking D sine peak C Frequency track born B track dies A 0 1 2 3 4 Time

Singer (main melody) Tanpura (drone) Harmonium (secondary melody) Tabla (percussion) 2000 Tun Na Ghe 1500 Frequency (Hz) 1000 500 0 0 5 10 15 20 Time (sec)

Predicted Measured o Input : magnitudes + locations of Components Components sinusoids a b 800 800 o For a range of trial fundamentals, 700 700 generate predicted harmonics 600 500 o Minimise TWM error w.r.t. trial 420 400 fundamentals 375 300 200 200 Err Err 100 100 p → m m → p Err = + ρ total N K Nearest Neighbour Matching Department of Electrical Engineering , IIT Bombay

Department of Electrical Engineering , IIT Bombay

E(p,j) p W(p,p') E(p',j+1) j p → Pitch candidates, j → Frame (time instant) E → Measurement cost (local), W → Smoothness cost Minimize the Global transition cost over the singing spurt Department of Electrical Engineering , IIT Bombay

Department of Electrical Engineering , IIT Bombay

Multi-F0 Signal Polyphonic analysis representation audio signal Singing voice Predominant-F0 Voice F0 detection trajectory extraction contour

“ Pitch class profile ” o Pitch histogram o Similarity measure involves match between histograms 38

Positive Positive Positive Positive Negative Negative Negative Negative phrases phrases phrases phrases phrase phrase phrase phrase

Detects phrases melodically similar to ‘Guru Bina’ pitch contour Swaras: S S N R Emphatic beat Negative Positive sam phrase phrases

Multi-F0 Signal Polyphonic analysis representation audio signal Singing voice Predominant-F0 Voice F0 detection trajectory extraction contour

Preeti Rao 2 nd CompMusic Workshop, Istanbul 2012 o Music signal - PowerPoint PPT Presentation

Preeti Rao 2 nd CompMusic Workshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o Pitch detection algorithms o Polyphonic

STYLES FROM MELODIC CONTOURS Amruta Vidwans Kaustuv Kanti Ganguli Preeti Rao Department of

Overview of Research at IITB Computational studies on Hindustani music CompMusic Workshop,

Ontology for Indian Music: An Approach for ontology learning from online music forums

Hindustani Classical Music: Methods And Evaluation Strategy Joe Cheri Ross and Preeti Rao IIT

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Opportunities for a Culture Specific Approach in the Computational Description of Music Xavier

in Hindustani Music Amruta Vidwans Prateek Verma Preeti Rao Department of Electrical

Flights to/from Ukraine resumed Flights to/from Ukraine resumed Kyiv-Istanbul 12 flights a

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Rhythmic Structure based segmenta3on for Hindustani music T. T.P.Vi

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Cent Filter Banks and its Relevance to Carnatic Music Padi Sarala, Akshay Ananthapadmanabhan and

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Preeti Ahuja Practice Manager, Agriculture & Food Global Practice . ARGENTINA IN THE GLOBAL

FP7 Project Grant Number 288238 Brief introduction of WP4 10min General objectives and role

Identification of perceptual qualities in textural sounds using the repertory grid method Audio

CS 4700: Foundations of Artificial Intelligence Bart Selman selman@cs.cornell.edu Module:

Strategic Information Transmission: Cheap Talk Games Outline (November 12, 2008) Credible

Applied Machine Learning Timon Schroeter Konrad Rieck Soeren Sonnenburg Intelligent Data

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

Sound in Nature Collisions lead to surface vibrations Vibrations create pressure waves in

Unsupervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

Preeti Rao 2 nd CompMusic Workshop, Istanbul 2012 o Music signal - PowerPoint PPT Presentation

Preeti Rao 2 nd CompMusic Workshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o Pitch detection algorithms o Polyphonic

STYLES FROM MELODIC CONTOURS Amruta Vidwans Kaustuv Kanti Ganguli Preeti Rao Department of

Overview of Research at IITB Computational studies on Hindustani music CompMusic Workshop,

Ontology for Indian Music: An Approach for ontology learning from online music forums

Hindustani Classical Music: Methods And Evaluation Strategy Joe Cheri Ross and Preeti Rao IIT

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Opportunities for a Culture Specific Approach in the Computational Description of Music Xavier

in Hindustani Music Amruta Vidwans Prateek Verma Preeti Rao Department of Electrical

Flights to/from Ukraine resumed Flights to/from Ukraine resumed Kyiv-Istanbul 12 flights a

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Rhythmic Structure based segmenta3on for Hindustani music T. T.P.Vi

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

Cent Filter Banks and its Relevance to Carnatic Music Padi Sarala, Akshay Ananthapadmanabhan and

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Preeti Ahuja Practice Manager, Agriculture &amp; Food Global Practice . ARGENTINA IN THE GLOBAL

FP7 Project Grant Number 288238 Brief introduction of WP4 10min General objectives and role

Identification of perceptual qualities in textural sounds using the repertory grid method Audio

CS 4700: Foundations of Artificial Intelligence Bart Selman selman@cs.cornell.edu Module:

Strategic Information Transmission: Cheap Talk Games Outline (November 12, 2008) Credible

Applied Machine Learning Timon Schroeter Konrad Rieck Soeren Sonnenburg Intelligent Data

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

Sound in Nature Collisions lead to surface vibrations Vibrations create pressure waves in

Unsupervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Preeti Ahuja Practice Manager, Agriculture & Food Global Practice . ARGENTINA IN THE GLOBAL