gct535 sound technology for multimedia music and audio
play

GCT535- Sound Technology for Multimedia Music and Audio Alignment - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Musical Representations Score, Audio, MIDI Music and Audio Alignment Synchronization Framework


  1. GCT535- Sound Technology for Multimedia Music and Audio Alignment Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Outlines § Musical Representations – Score, Audio, MIDI § Music and Audio Alignment – Synchronization Framework – Dynamic Time Warping – Dynamic Programming 2

  3. Music Representations § Score – Abstract symbols of musical events § Audio – Concrete(or actual) renditions of the score as sound § MIDI – A series of events • Note messages: onset on/off (onset and offset), note number, note velocity, • Control messages: Pedal on/off, pitch wheel, modulation, … – Can be either score-like abstract event sequences or a recording of note/control events from actual performance 3

  4. Symbols and Performances § MIDI (score) § Valentina Lisitsa § Vladimir Horowitz 4

  5. Where are the differences from? § Musical expressions – Temporal: ritardando, rubato – Dynamics: piano, forte, crescendo, … – Play techniques: legato, staccato – Mood and emotion: dolce, grazioso § Different styles of performers – Temporal: tempo (global) and note onset/offset timings (local) – Dynamics § Moreover… – Variation in key, rhythm, chord, melody, instrumentation (e.g. cover songs) – Tuning 5

  6. Music Synchronization § Temporal align different representations from a piece of music – Audio to Audio – Audio to Score § Why do we synchronize them? – Score following – Auto-accompaniment – Related • Variable time-stretching • Audio classification [from M. Muller’s Book] 6

  7. Synchronization Framework § Choose feature representations to compare – Often, MIDI is convert to audio for alignment on the same feature space § Compute a similarity matrix between two features sequences – All possible combinations of local feature pairs § Find a path that makes the best alignment on the similarity matrix – Dynamic Time Warping (DTW) Feature Seq. #1 Similarity Dynamic Matrix Programming Feature Seq. #2 Compute Find the best path local similarity 7

  8. Feature Representations § Frequent choices of audio feature representations – Spectrogram, Chroma, MFCC, … MIDI Lisitsa CENS : Normalized Chroma Features (Muller, 2005) 8

  9. Similarity Matrix § M by N matrix – (i, j) element is computed by similarity between the i-th vector of an M-long feature sequence and the j-th vector of an N-long feature sequence 9

  10. Finding the Optimal Path 250 § You can move only to three directions Schumann − Traumerei − MIDI 200 – Up, right, diagonal 150 § The number of possible paths for 100 M by N matrix is ??? 50 50 100 150 200 250 300 Schumann − Traumerei − Lisitsa 10

  11. 3D Surface Plot of Similarity Matrix § Finding the optimal path is like figuring out a trail route that you can take with minimum efforts in hiking. 11

  12. Dynamic Time Warping § Finding an (N, M)-warped path of length L – P = (p1, p2, p3, .. pL) where pi = (ni, mi) § Three conditions – Boundary condition: p1=(1,1), pL=(N,M) – Monotonicity condition • n1 <= n2 <= … <= nL • m1 <=m2 <= .. <mL – Step size condition • Move only upward, rightward, diagonal (upper-right) 12

  13. Dynamic Time Warping : Bad Examples 13

  14. Dynamic Programming § Finding the minimum-cost-path § Naïve approach – Find all paths from A to K and calculate the cost for each – Choose the path that has the minimum cost. – However, as the number of nodes increases, the number of paths increase exponentially. 2 3 3 1 6 B E H 4 4 7 5 2 2 2 2 3 3 4 3 4 1 7 A C F I K 2 6 3 3 5 3 2 4 3 5 2 3 5 J D G 14

  15. Dynamic Programming § Observation – Say the minimum-cost-path passes by a node p , – What is the minimum-cost-path from A to p ? – It is just a sub-path of the minimum-cost-path from A to K. – Thus, we don’t have to compute the cost from scratch; we can use the cost computed from the previous nodes. 2 3 3 1 6 B E H 4 4 7 5 2 2 2 2 3 3 4 3 4 1 7 A C F I K 2 6 3 3 5 3 2 4 3 5 2 3 5 J D G 15

  16. Dynamic Programming § The minimum cost is computed by the following equation: : cost up to node j C k ( j ) C k ( j ) = O k ( j ) + min i { C k − 1 ( i ) + c ij } : local cost at node j O k ( j ) c ij : transition cost from i to j § The minimum-cost-path can be found by tracing back the computation 2 3 3 1 6 B E H 4 7 4 2 5 2 2 2 3 4 3 3 4 1 7 A C F I K 2 6 3 3 5 3 2 4 3 5 2 3 5 J D G 16

  17. DP for Dynamic Time Warping (DTW) § Algorithm – Initialization: C(n,1) = sum(O(1:n,1)), n=1…N C(1,m) = sum(O(1,1:m)), n=1…M – Recurrence Relation : For each m = 1…M For each n = 1…N C(n-1,m) C(n,m)= O(n,m)+ min C(n,m-1) C(n-1,m-1) – Termination : C(N,M) is distance 17

  18. DP for Dynamic Time Warping (DTW) § Toy Example 18

  19. Score and Audio Alignment by DTW C(i,j) O(i,j) 19

  20. Limitations § The optimal path is obtained after we arrive the destination (by back- tracking) – i.e. the DTW works offline – What if the sequences are very long? – Online version of DTW? § Every frame is equally important – In general, human is more sensitive to note onsets – Perceptually, every frame is not equally important 20

  21. Online DTW § Set a moving search window and 20 calculate the cost only within the 17 window 16 – Time and space cost: quadratic à linear 13 21 11 18 19 § The movement is determined by the 10 9 14 15 position that gives a minimum cost 7 12 within the current window. If the 5 position is ... 3 – Corner: move both up and right 1 2 4 6 8 (alternatively) – Upper edge: move up Figure 2: An example of the on-line time warping algorithm with search window c = 4 , showing the order of evaluation for a partic- – Right edge: move right ular sequence of row and column increments. The axes represent the variables t and j (see Figure 1) respectively. All calculated cells are framed in bold, and the optimal path is coloured grey. [Dixon, 2005] 21

  22. Onset-sensitive Alignment § We are sensitive to the time alignment on note onsets. – The similarity matrix has no additional weight to onsets § DLNCO Features – D ecaying L ocally-adapted N ormalized C hroma O nset – Capture only onset strength on chroma features – Normalize onset energy and note length (by artificially-created note tail) [Ewert, 2009] 22

  23. Onset-sensitive Alignment Demo: https://www.audiolabs-erlangen.de/resources/MIR/SyncRWC60 Score Following Results on the RWC dataset 23

  24. DTW in Matlab § Check out: http://labrosa.ee.columbia.edu/matlab/dtw/ 24

  25. Beat Tracking using Dynamic Programming § Find the optimal “hopping” path that accords with onset detection function and the estimated tempo: 8 8 𝐷 𝑢 2 = 3 𝑃 𝑢 2 + 𝛽 3 𝐺(𝑢 2 − 𝑢 267 , 𝜐) 297 290 – 𝑃(𝑢) is onset detection function ∆. – 𝐺(∆𝑢, 𝜐) is temporal consistency score: 𝐺 ∆𝑢, 𝜐 = −(log / ) 0 § Recast it as a dynamic programming – Maximize the following equation 𝐷 𝑢 = 𝑃 𝑢 + max / {𝛽𝐺 𝑢 + 𝜐, 𝜐 > + 𝐷 𝑢 } 25

  26. Beat Tracking By DP in Matlab § Check out: http://www.ee.columbia.edu/ln/rosa/matlab/beat_simple/ 26

  27. Applications § Performance analysis – Understand human performances • e.g. “In search of the Horowitz Factor” (G. Widmer, 2003) – Performance evaluation for music education and entertainment § Interactive music notation system – Score following: tracking notes or measure – Automatic page turner – Score-synchronized music listening § Auto-accompaniment – Roger Dannerberg’s work – IRCAM Antescofo – Sonation Cadenza 27

  28. Applications Interpretation Switcher Score viewer 28

  29. References • S. Dixon, “Live Tracking Of Musical Performance Using On-line Time Warping”, 2005 • G. Widmer, “In search of the Horowitz Factor”, 2003 • S. Ewert, “High Resolution Audio Synchronization Using Chroma Onset Features”, 2009 29

Recommend


More recommend