applying text based ir techniques to cover song
play

Applying Text-Based IR Techniques to Cover Song Identification - PowerPoint PPT Presentation

Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Applying Text-Based IR Techniques to Cover Song Identification Nicola Montecchio nicola.montecchio@dei.unipd.it Department of Information Engineering


  1. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Applying Text-Based IR Techniques to Cover Song Identification Nicola Montecchio nicola.montecchio@dei.unipd.it Department of Information Engineering University of Padova IRCAM, September 29th, 2010 joint work with Emanuele Di Buccio and Nicola Orio - University of Padova

  2. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Introduction Characterization of the problem Content–based music identification in a Query By Exam- ple paradigm: retrieving music pieces that are relevant w.r.t. a musical query, given as audio recording, without using any metadata information. In this case, relevant = Cover song : rendition of a previously recorded song in genres such as rock and pop. Cover songs can be either live or studio recordings, possibly by other musicians , and may have a completely different arrangement . Useful for: intellectual property rights management, recommen- dation systems, ...

  3. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future An example Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws

  4. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future An example Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws cover – Jewel [in a different key]

  5. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future An example Sweet Home Alabama reference – Lynyrd Skynyrd live – Lynyrd Skynyrd cover – The Outlaws cover – Jewel [in a different key] live – Lynyrd Skynyrd [in another different key] reference with added noise

  6. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Related work

  7. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Why another approach? Motivation: some of the existing methods yield a very high identification accuracy (e.g., Serr´ a, Zanin, Andrzejak at MIREX 2009) but are computationally intensive; we propose a fast approach for selecting a small set of candidate matches, on which accuracy can be refined using slower techniques; We adapt techniques from text-based Information Retrieval to the music domain, in order to achieve speed.

  8. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Overview of the system

  9. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Assumptions a song is represented as a sequence of excerpts , and the order of the excerpts is not relevant each excerpt is represented as a sequence of chroma features , and again the order of chroma features is not taken into account A song is thus represented in a bag-of-bag-of-words fashion. while ordering information is not considered, temporal information is not completely discarded as it is loosely preserved by the grouping of chroma features into excerpts.

  10. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Chroma features The perceived quality of a chord depends only partially on the octaves in which the individual notes are played; what seems to be relevant is the pitch class of the notes that form the chord. Extraction steps: windowing (46ms) spectral processing frequency axis “folding” 1 minute → 1292 chroma features

  11. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Quantization – excerpt similarity Hashing of Chroma vectors by rank–based quantization : Chroma vector c = ( c 1 . . . c 12 ) Rank vector r = ( r 1 . . . r 12 ) , r k = arg k th largest value in c Hash: � K k =1 r k The similarity of two excerpts q i , d j is measured by counting (with repetitions) the number of hashes they have in common. sim( q i , d j ) = | q i ∩ d j |

  12. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Segmentation – song similarity A song is composed of overlapping excerpts of about 15s The similarity score s for a query–document pair ( q , d ) is computed like: q = ( q 1 . . . q N q ) d = ( d 1 . . . d N d ) � N q � � � Nq s q , d ← max sim( q i , d j ) � j =1 ... N d i =1 where sim( q i , d j ) is the local similarity of excerpts q i , d j

  13. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Matching songs in different keys As is often the case, cover versions of a song are performed in a different key A brute-force approach consists in trying all the possible 12 rotations of chroma vectors and keeping the best match among the transposed versions Alternatively, the most likely key(s) can be estimated, and only a subset of transposed matches is computed (in our case, 3).

  14. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Similarity computation Algorithmic formulation : � � N q � Nq � � max sim( q i , d j ) s q , d ← � j =1 ... N d i =1 for all songs in the collection do for all excerpts of the query do for all excerpts of the song do compute similarity end for retain max score among song excerpts end for compute geometric mean among scores end for

  15. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Similarity computation Actual implementation: Algorithmic formulation : � � Nq � Nq for all excerpts of the query do � � s q , d ← max sim( q i , d j ) � j =1 ... Nd for all distinct hashes of the excerpt do i =1 find excerpts of any song that have such hash for all found excerpts do for all songs in the collection do accumulate partial scores for all excerpts of the query do for all excerpts of the song do end for compute similarity end for end for retain max among song excerpts retain max among song excerpts (group by song) end for end for compute geometric mean among scores compute geometric mean among scores (group by song) end for

  16. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Optimization caching helps reducing time spent for score accumulation the computational load is mostly due to index access: for all distinct hashes of the excerpt do solution: consider only a subset of the hashes for some distinct hashes of the excerpt do Pruning algorithm based on simple, precomputed 0.8 collection-wise statistics for each hash 0.6 0.4 trained by randomized hill climbing 0.2 1.0 1.0 0.8 objective function which privileges speed 0.8 fraction of pruned hashes 0.6 0.6 MRR while maintaining sufficient accuracy 0.4 0.4 0.2 0.2 results 0.0 0.0

  17. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future FALCON FALCON is an open source, pure Java implementation of the proposed approach, based on the popular Apache Lucene search engine library. Full source code, along with binary distribution and a test dataset, is available at: http://ims.dei.unipd.it/falcon (a demo will follow ...)

  18. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Test collection Base collection : 500 pop songs in the database 70 corresponding queries (with a single match) 20 queries are played in a different key from their counterpart personal collection of the authors, a “real” usage scenario Extension of the collection to 10000 songs

  19. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Evaluation measures The output of our system for a query is a rank list , i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries.

  20. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Evaluation measures The output of our system for a query is a rank list , i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries. MRR - Mean Reciprocal Rank assumption: exactly one relevant document N MRR = 1 1 for each query � N r n n =1 r n = rank of the relevant doc. for query n

  21. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Evaluation measures The output of our system for a query is a rank list , i.e., a list of possible responses ordered by probability of correctness. We evaluate our system with N queries. MRR - Mean Reciprocal Rank assumption: exactly one relevant document N MRR = 1 1 for each query � N r n n =1 r n = rank of the relevant doc. for query n Precision : fraction of the documents retrieved that are relevant MAP - Mean Average Precision Average Precision for a query is computed as the average of the N � j P ( j ) r ( j ) MAP = 1 precision values at each of the relevant � � N j r ( j ) documents in the ranked sequence n =1 let r ( j ) = 1 ( j -th doc. is relevant)

  22. Overview Methodology Implementation Experimental results Rhythm Conclusion - Future Accuracy number accuracy MRR − , MRR + , MAP of songs 500 .615, .615, .615 1000 .545, .552, .550 2500 .504, .516, .493 10000 .385, .411, .323

Recommend


More recommend