http://zenodo.org/record/1422385, http://www.ircam.fr, http://abcdj.eu Methods and Datasets for DJ-Mix Reverse Engineering Diemo Schwarz, Dominique Fourer Ircam Lab, CNRS, Sorbonne Université, Ministère de la Culture, Paris, France IBISC, Université d’Évry-Val-d’Essonne/Paris-Saclay, Évry, France ABC_DJ Artist to Business to Business to Consumer Audio Branding System
Collaboration Context: ABC_DJ Artist to Business to Business to Consumer The ABC DJ EU-Project Audio Branding System http://abcdj.eu MIR tools for audio branding automatic, DJ-like playback of playlists in stores HearDis! GmbH The ABC DJ project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 688122.
Scientific Context: Understanding DJ Culture & Practices Important part of popular music culture Enables: - musicological research in popular music - studies on DJ culture - computer support of DJing - automation of DJ mixing Qualitative accounts exists, but…
Problem: Lack of Annotated Databases of DJ Mixes or DJ Sets Very large scale availability (millions) of DJ mixes, often with tracklist, e.g. http://www.mixcloud.com, YouTube, podcasts. very few annotated databases Existing research in studio multi-track mixing and unmixing in DAWs Existing work on DJ production tools, but no information retrieval from recorded mixes
Needed Components identify contained tracks DJ Mixes Audio Tracks (fingerprinting) Identification Content Analysis get track start and end in mix Playlist determine tempo changes (beat-aligned mixing) Alignment Time-Scaling derive genre and suggested here Context Analysis social tags attached to the music Unmixing Cultural Data → inform about the choices a DJ makes estimate fade curves for volume, when creating a mix Mix Data bass/treble, and parameters of other downstream research e ff ects (compression, echo, etc.) enabled by DJ mix annotation
Proposed Method for DJ Mix Reverse Engineering Input • recorded DJ mix • playlist (list of tracks in the mix in correct order) • audio files of the original tracks Five steps 1.rough alignment 2.sample alignment 3.verification by track removal 4.estimation of gain curves 5.estimation of cue regions
Step 1: Rough Alignment by DTW 7000 Dynamic Time Warping alignment of 0 concatenated MFCCs of tracks with mix 6000 → relative positioning of the tracks in -10 the mix (intersections) 5000 mix MFCCs → speed factor (slopes of path) -20 mix frames 4000 -30 3000 2000 -40 0 Reference index 1000 -50 -1 -2 2000 4000 6000 8000 10000 12000 -2 -1 0 track frames Query index concatenated tracks’ MFCCs
Step 2: Sample Alignment Refine alignment to close in to sample precision: 1. time-scale source track according to estimated speed factor 2. search best sample shift around rough (frame) alignment maximum cross-correlation between mix and track
Step 3: Verification by Track Removal Success of sample alignment can be verified by subtracting the aligned and time-scaled track from the mix → drop in RMS energy /!\ Method applicable even when ground truth is unknown or inexact! di ff erence of RMS energy [dB] 10 0 -10 10 0 -10 10 0 -10 0 15 30 45 60 75 90 105 seconds
Step 4: Volume Curve Estimation Estimate the volume curves â i ( black lines) applied to each track to obtain the mix Novel method based on time-frequency representations X (mix) and S i (track): | | 8 ⇣ ⌘ if ∃ m 0 s. t. | S i ( n, m 0 ) | 2 > 0 | X ( n,m 0 ) | median < | S i ( n,m 0 ) | a i ( n )= ˆ 8 m 0 2 M 0 otherwise :
Step 5: Cue Point Estimation Cue points are the start and end points of fades Estimation ( blue lines) by linear regression of the fade curve â at beginning and end (where â is between 0 and 70% of its maximum) Ground truth fade curve in red
http://zenodo.org/record/1422385 The UnmixDB Open DJ-Mix Dataset Automatically generated “ecologically valid” beat-synchronous mixes based on CC-licensed freely available music tracks from net label http://www.mixotic.net curated by Sonnleitner, Arzt & Widmer (2016) Each mix combines 3 track excerpts of ~40s (start cutting into end on a downbeat) Precise ground truth about the placement of tracks in a mix, fade curves, speed Mixes generated in 12 variants: 4 e ff ects: no e ff ect, bass boost, dynamics compression, distortion 3 time-scaling algorithms: none, resample, time stretch 6 sets of tracks and mixes, 500 MB – 1 GB, total 4 GB python source code for mix generation at https://github.com/Ircam-RnD/unmixdb-creation
no stretch resample stretch 10 3 Evaluation Measures and Results: 10 2 Alignment 7000 0 10 1 6000 -10 log frameerror [s] 5000 10 0 -20 mix frames 4000 -30 3000 10 -1 2000 -40 1000 -50 10 -2 2000 4000 6000 8000 10000 12000 track frames 10 -3 frame error : absolute error between ground truth and frame start time from DTW rough alignment (step 1) [s] 10 -4 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist 10 3 sample error : absolute error between ground truth and track 10 2 start time from sample alignment (step 2) [s] 10 1 log sampleerror [s] 10 0 no fx bass compression distortion 10 -1 10 -2 10 -3 10 -4 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist
Evaluation Measures and Results: Speed Ratio 7000 0 6000 -10 5000 -20 mix frames 4000 -30 3000 2000 -40 1.04 1000 -50 2000 4000 6000 8000 10000 12000 track frames 1.03 1.02 1.01 speed ratio : ratio between ground truth and speedratio 1 speed factor estimated by DTW alignment (step 1, ideal value is 1) 0.99 0.98 0.97 0.96 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist
10 3 10 2 Reinjecting Ground Truth Speed 10 1 log sampleerror [s] 10 0 10 -1 High sensitivity of sample alignment and track removal on 10 -2 accuracy of speed estimation from DTW 10 -3 Judge its influence by reinjecting ground truth speed: 10 -4 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist top : sample alignment error 10 3 bottom : sample alignment error with ground truth speed 10 2 reinjected for time-scaling 10 1 log risampleerror [s] 10 0 10 -1 10 -2 10 -3 10 -4 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist
1.6 Evaluation Measures and Results: 1.4 1.2 Suppression 10 0 1 -10 suppratio 0.8 10 0 -10 0.6 10 0.4 0 -10 0 15 30 45 60 75 90 105 0.2 seconds 0 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist suppression ratio : ratio of track time with >15 dB of track removal (step 3, bigger is better) 1.6 1.4 1.2 top : with DTW-estimated speed 1 risuppratio bottom : with ground truth speed for time-scaling 0.8 0.6 0.4 0.2 0 -0.2 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist
Evaluation Measures and Results: Fade Error 100 90 80 70 fade error : total di ff erence between ground truth fadeerror [db/s] 60 and estimated fade curves (steps 4 and 5) 50 40 30 20 10 0 none none none none resamp resamp resamp resamp stretch stretch stretch stretch none bass comp dist none bass comp dist none bass comp dist
Conclusion and Future Work Our DJ-mix reverse engineering method validated on artificial open UnmixDB dataset → retrieval of rich data from existing real DJ mixes With some refinements, our method could become robust and precise enough to allow the inversion of EQ and other processing (e.g. compression, echo) Extend to mixes with non-constant tempo curves, more e ff ects Close link between alignment, time-scaling, and unmixing hints at a joint and possibly iterative estimation algorithm Other signal representations (MFCC, spectrum, chroma, scattering transform)? beware: IANADJ!
Recommend
More recommend