GCT634: Musical Applications of Machine Learning Tonal Analysis Hidden Markov Model Graduate School of Culture Technology, KAIST Juhan Nam
Outlines • Introduction - Tonality - Perceptual Distance of Two Tones - Chords and Scales • Tonal Analysis - Key Estimation - Chord Recognition • Hidden Markov Model
Introduction Bach’s Chorale Harmonization Jazz “Real book” Pop Music
Tonality • Tonal music has a tonal center called key - 12 keys (C, C#, D, …, B) • Tonal music has a major or minor scale on the key and the notes have different roles (C major scale) • Notes in tonal music are harmonized by chords
Tonality • A sequence of notes or chord progressions provide certain degree of stability or instability - E.g., cadence (V-I, IV-I), tension (sus2, sus4) • How the tonality is formed? - In other words, how we perceive different degrees of stability or tension from notes?
Tonality • Consonance and Dissonance - If two sinusoidal tones are within 3 ST (minor 3 rd ) in frequency, they become dissonant - Most dissonant when they are apart about one quarter of the critical band - Critical bands become wider below 500 Hz; two low notes can sound dissonant (e.g. two piano notes in lower keys) • Consonance of two harmonics tones - Determined by how much two tones have closely-located overtones within critical bands
Consonance Rating of Intervals in Music • Perceptual distance between two notes are different from semi- tone distance between them.
Chords • The basic units of tonal harmony - Triads, 7 th , 9 th , 11 th , … • Triads are formed by choosing three notes that make the most consonant (or “most harmonized”) sounds - This ends up with stacking up major or minor 3rds - 7 th , 9 th are obtained by stacking up 3rds more. • The quality of consonance becomes more sophisticated as more notes are added - Music theory is basically about how to create tension and resolve it with different quality of consonance
Scales in Tonal Harmony • Major Scale - Formed by spreading notes from three major chords • Minor scale - Formed by spreading notes from three minor chords (natural minor scale) - Harmonic or melodic minor scale can be formed by using both minor and major chords
Automatic Chord Recognition • Identifying chord progression of tonal music • It is a challenging task (even for human) - Chords are not explicit in music - Non-chord notes or passing notes - Key change and chromaticism: requires in-depth knowledge of music theory - In audio, multiple musical instruments are mixed - Relevant: harmonically arranged notes - Irrelevant: percussive sounds (but can help detecting chord changes) • What kind of audio features can be extracted to recognize chords in a robust way?
Chroma Features: FFT-based approach • Compute spectrogram and mapping matrix - Convert frequency to music pitch scale and get the pitch class - Set one to the corresponding pitch class and, otherwise, set zero - Adjust non-zeros values such that low-frequency content have more weights
Chroma Features: Filter-bank approach • A filter-bank can be used to get a log- scale time-frequency representation - Center frequencies are arranged over 88 piano notes - band widths are set to have constant-Q and robust to +/- 25 cent detune • The outputs that belong to the same pitch class are wrapped and summed. (Müller, 2011)
Beat-Synchronous Chroma Features • Make chroma features homogeneous within a beat (Bartsch and Wakefield, 2001) (From Ellis’ slides)
Key Estimation Overview • Estimate music key from music data - One of 24 keys: 12 pitch classes (C, C#, D, .., B) + major/minor • General Framework (Gomez, 2006) Chroma Similarity Average Key G major Features Measure Strength Key Template
Key Template • Probe tone profile (Krumhansl and Kessler, 1982) - Relative stability or weight of tones - Listeners rated which tones best completed the first seven notes of a major scale - For example, in C major key, C, D, E, F, G, A, B, … what? Probe Tone Profile - Relative Pitch Ranking
Key Estimation • Similarity by cross-correlation between chroma features and templates • Find the key that produces the maximum correlation
Chord Recognition • Estimate chords from music data - Typically, one of 24 keys: 12 pitch classes + major/minor - Often, diminish chords are added (36 chords) • General Framework Template Matching HMM, SVM Audio/ Decision Chords Chroma Transform Making Features Chord Template or Models
Template-Based Approach • Use chord templates (Fujishima, 1999; Harte and Sandler, 2005) and find the best matches • Chord Templates (from Bello’s Slides)
Template-Based Approach • Compute the cross-correlation between chroma features and chord templates and select chords that have maximum values (from Bello’s Slides)
Review • Template approach is too straightforward - The binary templates are hard assignments • We can use a multi-class classifier - The output is one of the target chords - However, the local estimation tends to be temporally not smooth • We need some algorithm that considers the temporal dependency between chords - The majority of tonal music have certain types of chord progression
Hidden Markov Model (HMM) • A probabilistic model for time series data - Speech, gesture, DNA sequence, financial data, weather data, … • Assumes that the time series data are generated from hidden states and the hidden states follow a Markov model • Learning-based approach - Need training data annotated with labels - The labels usually correspond to hidden states
Markov Model • A random variable 𝑟 has 𝑂 states ( 𝑇 1 , 𝑇 2 , … , 𝑇 𝑂 ) and, at each time step, one of the states are randomly chosen: 𝑟 ( ∈ {𝑇 1 , 𝑇 2 , … , 𝑇 𝑂 } • The probability distribution for the current state is determined by the previous state(s) - The first-order: 𝑄 𝑟 ( 𝑟 - , 𝑟 . , … , 𝑟 (/- = 𝑄 𝑟 ( 𝑟 (/- - The second-order: 𝑄 𝑟 ( 𝑟 - , 𝑟 . , … , 𝑟 (/- = 𝑄 𝑟 ( 𝑟 (/- , 𝑟 (/. • The first-order Markov model is widely used for simplicity
Markov Model • Example: chord progression - 𝑟 ( ∈ {𝐷, 𝐺, 𝐻} - The transition probability matrix 3 by 3 𝑄 𝑟 ( = 𝐷 𝑟 (/- = 𝐺 = 0.2 𝑄 𝑟 ( = 𝐷 𝑟 (/- = 𝐷 = 0.7 𝑄 𝑟 ( = 𝐺 𝑟 (/- = 𝐺 = 0.6 𝑄 𝑟 ( = 𝐺 𝑟 (/- = 𝐷 = 0.1 F 𝑄 𝑟 ( = 𝐻 𝑟 (/- = 𝐺 = 0.2 𝑄 𝑟 ( = 𝐻 𝑟 (/- = 𝐷 = 0.2 C St G End 𝑄 𝑟 ( = 𝐷 𝑟 (/- = 𝐻 = 0.3 𝑄 𝑟 ( = 𝐺 𝑟 (/- = 𝐻 = 0.1 𝑄 𝑟 ( = 𝐻 𝑟 (/- = 𝐻 = 0.6
Markov Model • The joint probability of a sequence of states is simple with the Markov model 𝑄 𝑟 - , 𝑟 . , … , 𝑟 ( = 𝑄 𝑟 - , 𝑟 . , … , 𝑟 (/- 𝑄 𝑟 ( 𝑟 - , 𝑟 . , … , 𝑟 (/- = 𝑄 𝑟 - , 𝑟 . , … , 𝑟 (/- 𝑄 𝑟 ( 𝑟 (/- = 𝑄 𝑟 - , 𝑟 . , … , 𝑟 (/. 𝑄 𝑟 (/- 𝑟 - , 𝑟 . , … , 𝑟 (/. 𝑄 𝑟 ( 𝑟 (/- = 𝑄 𝑟 - , 𝑟 . , … , 𝑟 (/. 𝑄 𝑟 (/- 𝑟 (/. 𝑄 𝑟 ( 𝑟 (/- = 𝑄 𝑟 - 𝑄 𝑟 . |𝑟 - … 𝑄 𝑟 (/- 𝑟 (/. 𝑄 𝑟 ( 𝑟 (/-
What Can We Do with the Markov Model? • Generate a chord sequence - e.g.) C – C – C – C – F – F – C – C – G – G – C– C - … - We can also generate melody if we define the transition probability matrix among notes • Evaluate if a specific chord progression is more likely than others. - For example, C-G-C is more likely than C-F-C (assuming 𝑄 𝑟 - = 𝐷 = 1 ) 𝑄 𝑟 = 𝐷, 𝐻, 𝐷 = 𝑄 𝑟 - = 𝐷 𝑄 𝑟 . = 𝐻|𝑟 - = 𝐷 𝑄 𝑟 ; = 𝐷|𝑟 . = 𝐻 = 0.2 ∗ 0.3 = 0.06 𝑄 𝑟 = 𝐷, 𝐺, 𝐷 = 𝑄 𝑟 - = 𝐷 𝑄 𝑟 . = 𝐺|𝑟 - = 𝐷 𝑄 𝑟 ; = 𝐷|𝑟 . = 𝐺 = 0.1 ∗ 0.2 = 0.02
What Can We Do with a Markov Model ? • Compute the probability that the chord at time 𝑈 is C (or F or G) - Naïve method: count all paths that have C chord at time 𝑈 : exponential! - Clever method: use a recursive induction - 𝑄 𝑟 > = 𝐷 = 𝑄 𝑟 > = 𝐷|𝑟 >/- = 𝐷 𝑄 𝑟 >/- = 𝐷 +𝑄 𝑟 > = 𝐷|𝑟 >/- = 𝐺 𝑄 𝑟 >/- = 𝐺 +𝑄 𝑟 > = 𝐷|𝑟 >/- = 𝐻 𝑄 𝑟 >/- = 𝐻 - Repeat this for 𝑄 𝑟 @ = 𝐷 , 𝑄 𝑟 @ = 𝐺 , 𝑄 𝑟 @ = 𝐻 for 𝑗 = 𝑈 − 1, 𝑈 − 2, … , 1
Chord Recognition from Audio • What we observe are not chords but audio features (e.g. chroma) • We want to infer a chord sequence from audio feature sequences 𝑟 - , 𝑟 . , … , 𝑟 (/- 𝑃 - , 𝑃 . , … , 𝑃 (/-
Hidden Markov Model (HMM) • The hidden states follow the Markov model • Given a state, the corresponding observation distribution is independent of previous states or observations - Each state has emission distribution F C . . . 𝑟 (/- 𝑟 ( 𝑟 (D- G 𝑃 (/- 𝑃 ( 𝑃 (D- 𝑄 𝑃 𝑟 ( = 𝐷 𝑄 𝑃 𝑟 ( = 𝐻 𝑄 𝑃 𝑟 ( = 𝐺
Hidden Markov Model (HMM) • Model parameters - Initial state probabilities: 𝑄 𝑟 E → 𝜌 @ - Transition probability matrix: 𝑄 𝑟 ( 𝑟 (/- → 𝑏 @J - Observation distribution given a state: 𝑄 𝑃 𝑟 J → 𝑐 J (e.g. Gaussian) • How can we learn the parameters from data?
Recommend
More recommend