GCT535- Sound Technology for Multimedia Pitch Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

Outlines § Introduction – Definition of Pitch – Information in Pitch § Monophonic Pitch Detection Algorithms – Time-Domain Approaches – Frequency-Domain Approaches – Psychoacoustic Model Approaches § Pitch Tracking § Applications 2

Definition of Pitch § Pitch – Defined as auditory attribute of sound according to which sounds can be ordered on a scale from low and high (ANSI, 1994) – One way of measuring pitch is finding the frequency of a sine wave that is matched to the target sound in a psychophysical experiment – thus, subject to individual persons: e.g. tone-deaf § Fundamental Frequency – Physical attribute of sounds measured from periodicity – Often called F0 § Pitch should be discriminated from F0: – However, in practice, they are exchangeably used. 3

Information in pitch § Music – Notes or melody – Tonality (in polyphony) – Size (or register) of musical instruments: bass, cello, violin § Speech – Context (prosody): question, mood, attitude – Speaker: gender, age, identity – Meaning: Chinese (Mandarin) § Others – Vocalization of animals (e.g. bird’s chirp, whale): size and types, communication 4

Pitch and Musical Instruments § Pitch is determined by the spectral characteristics of musical instruments – Not all musical instruments have pitch § Type of musical Instruments by harmonicity – Harmonic and steady: guitar, flute – Harmonic and dynamic: violin, organ, singing voice(vowel) – Inharmonic: piano, vibraphone – Non-harmonic: drum, percussion, singing voice (consonant) [From Klapuri’s slides] Vibraphone *Inharmonicity in Piano 5

Pitch Detection Algorithms 0.3 0.2 § Time-Domain Approaches 0.1 Amplitude – Periodicity in time 0 − 0.1 § Frequency-Domain Approaches − 0.2 – Periodicity in frequency 228 230 232 234 236 238 240 242 244 time [ms] waveform § Psychoacoustic Model Approaches 50 – Both time and frequency 40 Magnitude (dB) 30 20 10 0 − 10 − 20 0 1000 2000 3000 4000 5000 6000 freqeuncy [Hertz] spectrum 6

Time-Domain Approach § Basic Ideas – Periodicity: x(t) = x(t+T) – Measure similarity (or distance) between two adjacent segments – Find the period ( T ) that gives the closest distance § Two main approaches – Auto-correlation function (ACF): distance by inner product – Average magnitude difference function(AMDF): distance by difference (e.g., L1, L2 norm) 7

Auto-Correlation Function (ACF) § Measuring self-similarity by N − 1 − l ∑ r t ( l ) = x t ( n ) ⋅ x t ( n + l ), l = 0,1,2,..., L − 1 (Sondhi 1967) n = 0 Waveform Auto − correlation 1 80 60 0.5 40 0 20 0 − 0.5 − 20 − 1 − 40 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 lag [sample] time [sample] Singing Voice 8

Auto-Correlation Function (ACF) § Biased auto-correlation N − 1 − l ∑ r biased , t ( l ) = x t ( n ) ⋅ x t ( n + l ), l = 0,1,2,..., L − 1 n = 0 § Unbiased auto-correlation N − 1 − l 1 ∑ r unbiased , t ( l ) = x t ( n ) ⋅ x t ( n + l ), l = 0,1,2,..., L − 1 N − l n = 0 Auto − correlation 0.08 0.06 0.04 0.02 0 − 0.02 − 0.04 100 200 300 400 500 600 700 800 900 1000 lag [sample] 9

Pitch Detection by ACF Spectrogram (tracking max values) ACF (tracking max values) 10

Interpretation of ACF in Frequency Domain § By convolution theorem, auto-correlation can be computed in frequency domain and also efficiently using FFT X ( k ) = FFT( x ( n )) N − 1 − l 2 ) ∑ ⋅ x ( n + l ) = FFT − 1 ( X ( k ) X * ( k )) = FFT − 1 ( X ( k ) x ( n ) n = 0 § Thus, the ACF can be computed as 1 2 )) real(FFT − 1 ( X ( k ) r ( l ) = N − l 11

Interpretation of ACF in Frequency Domain § This is equivalent to K − 1 1 cos(2 π lk 2 ∑ r ( l ) = K ) X ( k ) N − l k = 0 1 Power Spectrogram Weight 0.8 0.6 Magnitude Power 0.4 0.2 0 − 0.2 − 0.4 10 20 30 40 50 60 70 80 90 100 Freqeuncy [bin] § ACF is a simple template-based approach in the frequency domain – Positive weights for (harmonic) peaks and negative weights for valleys 12

Problems in ACF § Bias to the large peak around zero lag § Not robust to octave errors, particularly, lower octaves – ACF is sensitive to amplitude changes § Equal weights for all harmonic partials – In general, low-numbered harmonic partials are more important in determining pitch 13

Average Magnitude Difference Function (AMDF) § Measuring self-similarity by N − 1 − l p ∑ d t ( l ) = x t ( n ) − x t ( n + l ) , l = 0,1,2,..., L − 1 n = 0 § In YIN, p is set to 2 (de Cheveigné & Kawahara, 2002) N − 1 − l N − 1 − l x t ( n ) 2 − 2 x t ( n ) x t ( n + l ) + x t ( n + l ) 2 ∑ ( x t ( n ) − x t ( n + l )) 2 ∑ d t ( l ) = = n = 0 n = 0 = r t (0) − 2 r t ( l ) + r t + l (0) Minimize the negative ACF § And the AMDF is normalized as plus a lag-dependent term " 1 l = 0 $ $ ˆ d ( l ) = l # d ( l ) [1 ∑ d ( u ) ] otherwise $ l $ % u = 1 14

Average Magnitude Difference Function (AMDF) AMDF Normalized AMDF 15

Why YIN (AMDF) works better § Robust to changes in amplitude – The difference (instead of correlation) takes care of amplitude changes. – This reduces octave errors. § Zero-lag bias is avoided by the normalized AMDF § The normalized AMDF allows using a fixed threshold – Can choose multiple candidates and refine peaks 16

Example of AMDF (YIN) 17

Frequency-Domain Approach § Basic Ideas – Periodic in time domain à Harmonic in frequency domain – Measure how harmonic the spectrum is – Find F0 that best explains the harmonic patterns (harmonic partials) § Algorithms – Pattern Matching – Cepstrum – Harmonic-Product-Sum (HPS) 18

Pattern Matching: Comb-filtering § Using sharp harmonic sieves to take harmonic peak regions only – Compute pitch saliency for F0 candidates (Puckette et al. 1998) 19

Pattern Matching: Cross-correlation § Cross-correlation with an ideal template on a log-scale spectrogram [From Ellis’ e4896 course slides] 20

Cepstrum § Real Cepstrum is defined as c x ( l ) = real(FFT − 1 (log( FFT( x )))) (Noll, 1967) § Basic ideas – Harmonic partials are periodic in frequency domain – (Inverse) FFT find the the periodicity 120 200 100 150 80 Liftering Magnitude [dB] 100 Cepstrum 60 50 40 0 20 − 50 0 − 20 − 100 0 500 1000 1500 2000 2500 3000 3500 4000 0 100 200 300 400 500 600 700 800 Frequency [Hz] Quefrency 21

Harmonic Product Sum (HPS) § Harmonic Product Sum (HPS) is obtained by multiplying the original magnitude spectrum its decimated spectra by an integer number M ∏ (Noll, 1969) HPS( k )= X ( mk ) m = 1 22

Auditory Filter bank § A set of filter bank that imitates the magnitude and delay of traveling waves on basilar membrane in cochlear § Correlogram – Formed by concatenating the ACF of individual HC output – 3-D representation (time-channel-lag) or “auditory images” Stabilize & Combine HC ACF HC ACF Correlogram . input . . . . . HC ACF Correlogram Hair cells Summary ACF Auto-correlation Oval Functions Summary ACF window High Freq. Low Freq. Cochlear Filter banks 23

Types of Auditory Filter Banks § Gamma-tone Filter banks − 2 π bt cos(2 π ft + ϕ ) u ( t ) – Gamma-tone: n − 1 e g ( t ) = at – Used in Patterson’s auditory filter banks based on ERB § Pole-Zero Filter Cascade (Lyon) 24

Hair-Cell § (Inner) Hair-cell – Transform mechanical movement into neural spikes § Modeled as cascade of – Half-wave rectification – Compression – Low-pass filtering § This conducts a non-linear processing – Generate new harmonic partials – Associated with missing fundamentals 25

Pitch Analysis Using Auditory Model § Summary ACF is computed by summing the ACF across all channels – The peaks in the ACF represent periodicity features – This is known to be robust to band-limited noises Summary ACF 26

Pitch Tracking § Pitch is usually continuous over time – Once a pitch with strong harmonicity is detected on a frame, the following frames form smooth pitch contour § Pitch tracking methods – Post processing: first detect pitch in a frame-by-frame manner and then find a continuous path by smoothing. • Median Filtering • Dynamic Programming (Talkin, 1995) – Probabilistic approach: detect multiple pitch candidates every frame and and find the best path • Viterbi-decoding: Probabilistic YIN (Mauch, 2014) 27

Applications § Sound Modification – Time-stretching using PSOLA – Auto-tune: pitch-correction or T-Pain effect § Music Performance – Tuning musical instruments – Pitch-based sound control – Score-following and auto-accompaniment § Query-by humming – Relative pitch change might be more important § Singing evaluation (e.g. karaoke) and visualization 28

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Pitch Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Introduction Definition of Pitch Information in Pitch Monophonic Pitch Detection Algorithms Time-Domain

GCT535- Sound Technology for Multimedia Tonal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

GCT535- Sound Technology for Multimedia Temporal Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Filters Graduate School of Culture Technology KAIST

GCT535- Sound Technology for Multimedia Digital Systems Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture

GCT535- Sound Technology for Multimedia Delay-based Effects Graduate School of Culture Technology

PITCH AND VOLUME BY: ALEXANDRA BLASBERG WHAT IS PITCH? PITCH IS HOW HIGH OR LOW SOUND IS. AN

Pitch vs. loudness Emma Baron High pitch What is Pitch Pitch is as high or low a sound can go.

Pitch and loudness By Aniyah Bilgrami Pitch Pitch is how high or low some sounds are. An example

Lecture 14: LPC speech synthesis and autocorrelation- based pitch tracking ECE 417, Multimedia

MULTIMEDIA RETRIEVAL Electronic album, Personalised electronic journals Education and Training

01. MULTIMEDIA REVOLUTION 1 1 Contemporary Multimedia is the development, integration, and

Chapter 7 Audition Sound Sound is the compression and rarefaction of air, or, in other

CTP431- Music and Audio Computing Fundamentals of Sound and Digital Audio Graduate School of

Basic Acoustics Graduate School of Culture Technology (GSCT) Juhan Nam 1 Outlines What is

TINNITUS INVESTOR PITCH DECK THE TINNITUS BRAIN 2 INVESTOR PITCH DECK 3 CUTTING-EDGE

Hearing and other senses Sound Sound: sensed variations in air pressure Frequency:

Sonification - Sound of Science VU, WS 2013 Lecture 8 - Parameter Mapping Visda Goudarzi

CS 528 Mobile and Ubiquitous Computing Lecture 5a: Playing Sound and Video Emmanuel Agu

Sound 2: frequency analysis Tues. March 27, 2018 1 Speed of Sound Sound travels at about 340