identifying repeated patterns in music using sparse
play

Identifying Repeated Patterns in Music Using Sparse Convolutive - PowerPoint PPT Presentation

Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Factorization ISMIR 2010 Ron Weiss Juan Bello { ronw,jpbello } @nyu.edu Music and Audio Research Lab New York University August 10, 2010 Ron Weiss, Juan


  1. Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Factorization ISMIR 2010 Ron Weiss Juan Bello { ronw,jpbello } @nyu.edu Music and Audio Research Lab New York University August 10, 2010 Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 1 / 17

  2. Repetitive patterns in music Repetition is ubiquitous is music long-term verse-chorus structure repeated motifs Can we identify this structure directly from audio? What about the repeated units? Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 2 / 17

  3. Proposed approach Treat song as concatenation of short, repeated template patterns Inspired by source separation / text topic modeling Convolutive Non-negative Matrix Factorization (NMF) Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 3 / 17

  4. Beat-synchronous chroma features [Ellis and Poliner, 2007] Day Tripper 1.0 0.9 G 0.8 F 0.7 E 0.6 0.5 D 0.4 C 0.3 B 0.2 0.1 A 0.0 0 50 100 150 200 250 Time (beats) Summarize energy at each pitch class during each beat Normalize frame energy to ignore dynamics Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 4 / 17

  5. SI-PLCA [Smaragdis and Raj, 2007] Shift-invariant Probabilistic Latent Component Analysis i.e. probabilistic convolutive NMF � V ≈ W k ∗ h k z k k Decompose matrix V into weighted (by Z ) sum of latent components each component is convolution of basis W with activations H Short-term structure in W , long-term structure in H Must specify number, length of patterns Iterative EM learning algorithm Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 5 / 17

  6. Learning algorithm example – Initialization Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 6 / 17

  7. Learning algorithm example – Converged V (Iteration 199) Z 0.30 10 0.25 8 0.20 6 0.15 4 0.10 2 0.05 0 0.00 Reconstruction 0 100 200 300 400 500 600 700 0 1 2 3 10 8 6 4 2 0 Basis 0 reconstruction W0 H0 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 Basis 1 reconstruction W1 H1 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 Basis 2 reconstruction W2 H2 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 Basis 3 reconstruction W3 H3 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 7 / 17

  8. Sparsity Encourage sparse (mostly zero) parameters using prior distributions Use entropic prior over activations H [Smaragdis et al., 2008] low entropy = ⇒ less uniform Leads to more meaningful patterns but reduces temporal information in activations sparse H = ⇒ dense W Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 8 / 17

  9. Automatic relevance determination [Tan and F´ evotte, 2009] Avoid having to specify number of patterns in advance Initialize decomposition with large number of patterns Sparse Dirichlet distribution over mixing weights Z Discard unused patterns 16 Effective rank ( K ) 14 12 10 8 6 4 2 0 0 50 100 150 200 Iteration Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 9 / 17

  10. Sparse learning example – Initialization V (Iteration 0) Z 0.07 10 0.06 8 0.05 6 0.04 0.03 4 Reconstruction 0.02 2 0.01 0 0.00 10 0 100 200 300 400 500 600 700 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 6 4 Basis 0 reconstruction W0 H0 2 0 10 0 100 200 300 400 500 600 700 8 6 4 Basis 1 reconstruction W1 H1 ∗ 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 2 reconstruction W2 ∗ H2 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 3 reconstruction W3 H3 2 ∗ 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 4 reconstruction W4 H4 2 ∗ 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 5 reconstruction W5 H5 ∗ 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 6 reconstruction W6 ∗ H6 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 7 reconstruction W7 H7 2 ∗ 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 8 reconstruction W8 H8 ∗ 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 9 reconstruction W9 H9 ∗ 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 10 reconstruction W10 H10 2 ∗ 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 11 reconstruction W11 H11 2 ∗ 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 12 reconstruction W12 H12 ∗ 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 13 reconstruction W13 ∗ H13 2 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 14 reconstruction W14 H14 2 ∗ 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 Basis 15 reconstruction W15 H15 2 ∗ 0 10 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 8 6 4 ∗ 2 0 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 10 / 17

  11. Sparse learning example – Converged V (Iteration 199) Z 0.45 10 0.40 0.35 8 0.30 0.25 6 0.20 4 0.15 2 0.10 0.05 0 0.00 Reconstruction 0 100 200 300 400 500 600 700 0 1 2 3 10 8 6 4 2 0 Basis 0 reconstruction W0 H0 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 Basis 1 reconstruction W1 H1 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 Basis 2 reconstruction W2 H2 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 Basis 3 reconstruction W3 H3 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 10 8 6 ∗ 4 2 0 0 100 200 300 400 500 600 700 0 10 20 30 0 100 200 300 400 500 600 700 Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 11 / 17

  12. Applications: Riff identification / Thumbnailing Reconstruct song using a single pattern Sparse activations Riff length known in advance (for now) Thumbnail corresponds to largest activation in H 0.025 0.024 G 0.021 0.020 F 0.018 E 0.015 0.015 D 0.012 0.010 0.009 C 0.006 B 0.005 0.003 A 0.000 0.000 0 2 4 6 8 10 12 14 0 100 200 300 400 500 600 700 800 Time (beats) Time (beats) 0.016 0.016 G 0.014 0.014 0.012 F 0.012 E 0.010 0.010 0.008 D 0.008 0.006 0.006 C 0.004 0.004 B 0.002 0.002 A 0.000 0.000 0 2 4 6 8 10 12 14 0 200 400 600 800 1000 Time (beats) Time (beats) Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Facto August 10, 2010 12 / 17

  13. Applications: Structure segmentation Identify long-term song structure (verse, chorus, bridge, etc.) Assume one-to-one mapping between chroma patterns and segments Use SI-PLCA decomposition with longer patterns no prior on activations Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 13 / 17

  14. Structure segmentation example Estimated intro refrain verse refrain verse refrain verse refrain refrain outro .. .. .. .. .. .. .. .. .. .. Ground truth intro refrain verse refrain vs/break refrain verse refrain refrain outro .. .. .. .. .. .. .. .. .. .. Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix Fact August 10, 2010 14 / 17

Recommend


More recommend