combining musical and cultural features for intelligent
play

Combining Musical and Cultural Features for Intelligent Style - PowerPoint PPT Presentation

Combining Musical and Cultural Features for Intelligent Style Detection Brian Whitman Paris Smaragdis MIT Media Lab Music, Mind and Machine Group (formerly Machine Listening) What Were Getting At Overall Results 120 100 80 Style ID


  1. Combining Musical and Cultural Features for Intelligent Style Detection Brian Whitman Paris Smaragdis MIT Media Lab Music, Mind and Machine Group (formerly Machine Listening)

  2. What We’re Getting At Overall Results 120 100 80 Style ID Prediction Combined 60 Audio 40 Cultural 20 0 -20 Style

  3. Music Understanding ! Meyer: “Music is Information” ! We all arm a representation of music against noise I nformation Transmitter Receiver Destination Source Sound & Delivery Artists Listeners Score (CDs, bits) Channel

  4. Two-Way IR ! So much going the other way! “My favorite song” P2P Collections “Timbaland produced the new Missy record” Online playlists “Uninspired electro-glitch rock” Informal reviews “Reminds me of my ex-girlfriend” Query habits Sound & Artists Listeners Score

  5. Personal vs. Community ! 2 kinds of audience to artist relation ! Personal: ! Musical memory, personal preference, local cultural noise ! Audio sim / rec as insult! ! Community: ! Large-scale cultural factors, “stranger recommendation” (CF)

  6. Audio and Audience Where does Daily ‘Top 40’ for peer-to-peer P2P networks (Napster/Gnutella/etc) music preference Network Models User models, trend ID come from? Automatic music description Does the type of (“cultural representation”) Web Web music actually mining, Query-by-description mining, NLP NLP matter? Time-aware recommendation (‘buzz factor’ extraction) Content-based representation Mapping personal Sound Feature extraction (beat, and community instrument types) musical memory

  7. What’s On Today! ! Cultural representations for music ! Bimodal acoustic/ textual decision space ! Experiment: style I D task ! Cultural representations of the future

  8. Acoustic vs. Cultural Representations ! Acoustic: ! Cultural: ! Instrumentation ! Long-scale time ! Short-time (timbral) ! Inherent user m odel ! Mid-time (structural) ! Listener’s perspective ! Usually all we have ! Two-way IR Describe this. Which genre? Do I like this? Which artist? 10 years ago? What instruments? Which style?

  9. Bimodal Model ! Independent kernel hyperspaces ! Acoustic: fine-grained, frame level, short-term time-aware ! Cultural: intrinsic user model, artist level, long- term time

  10. “Community Metadata” ! (Whitman/ Lawrence ICMC2002) ! Combine all types of mined data ! P2P, web, usenet, future? ! Long-term time aware ! One comparable representation via gaussian kernel ! Machine learning friendly

  11. Data Collection Overview ! Cultural Feature Extraction: ! Web crawls for music information ! Retrieved documents are parsed for: • Unigrams, bigrams and trigrams • Artist names • Noun phrases • Adjectives ! P2P crawl: ! Robots watch OpenNap network for shared songs on collections.

  12. Smoothing Function ! Inputs are term and document frequency with mean and standard deviation: − − µ 2 (log( f ) ) f e d = t s ( f , f ) σ t d 2 2 ! We use mean of 6 and stdev of 0.9

  13. ! Reward ‘mid-ground’ terms Smooth the TF-IDF

  14. ! For Portishead: Example

  15. Style ID experiment ! AMG style prediction ! ‘Soft’ ground truth ! Audio: ! 10-20 songs per artist ! Minnowmatch testbed ! Cross album ! 25 artists, 5 styles

  16. Cultural/ Acoustic Disconnects ! Styles can be related acoustically but not culturally ! R&B / top 40 pop (marketing) ! Rap (substyle glut) ! Or culturally and not acoustically ! “IDM”

  17. What’s a Style? ! Style vs. genre ! All styles have genres above them ! Artists can have multiple styles ! Albums can have styles, too ! Style as a small music cluster of cultural perception ! = Sound + Peers + Time

  18. Why Style? ! Recommendation within styles ! Marketing recommendation ! New music recommendation ! Self-recommendation ! Creating a music hierarchy ! Search ! Musical synonymy / hypernymy

  19. Artist List & Styles Heavy Metal Contemporary Hardcore Rap IDM Female R&B Country Guns N’ Roses Billy Ray Cyrus DMX Boards of Lauryn Hill Canada AC/ DC Alan Jackson Ice Cube Aphex Twin Aaliyah Skid Row Tim McGraw Wu-Tang Clan Squarepusher Debelah Morgan Led Zeppelin Garth Brooks Mystikal Plone Toni Braxton Black Sabbath Kenny Chesney Outkast Mouse on Mars Mya

  20. Audio Representation 2sec audio weighting PCA PSD

  21. Acoustic Representation Classification ! Feedforward time-delay NN ! 3 frame delay ! Backpropagation ! Input layer – 20 PCA coefficients ! Hidden layer of 40 nodes ! 4 train/ 1 test batch split

  22. Acoustic Representation Results Acoustic Representation 70 60 50 Precision (%) Heavy Metal 40 Contemporary Country 30 Hardcore Rap IDM 20 Female Vocal R&B 10 0 1 2 3 4 5 Style

  23. Cultural Representation Classification ! Gram matrix of CM kernel space: ! Sum overlap of smoothing function ! K- nearest-neighbors clustering ! Given a new artist, find closest cluster in kernel space

  24. Cultural Representation Results Cultural Representation 70 60 50 Precision (%) Heavy Metal 40 Contemporary Country 30 Hardcore Rap IDM 20 Female Vocal R&B 10 0 1 2 3 4 5 Style

  25. Combined Classification ! Can’t compare independent distance measures ! So we look at hypothesis probabilities ! Average or multiply?

  26. Combined Classification Results Combined Representation 70 60 50 Precision (%) Heavy Metal 40 Contemporary Country 30 Hardcore Rap IDM 20 Female Vocal R&B 10 0 1 2 3 4 5 Style

  27. Style ID Overall Overall Results 120 100 80 Style ID Prediction Combined 60 Audio 40 Cultural 20 0 -20 Style

  28. What’s Next ! CM proven for artist similarity ! Against AMG editors • Whitman/ Lawrence (ICMC) ! Against human evaluation • Ellis/ Whitman/ Berenzweig/ Lawrence (ISMIR) ! Current IR uses of CM: ! Recommendation / Buzz Factor Extraction ! Query by Description ! Grounding Sound

  29. Time-Aware Recommendation ! CM is ‘Time-Aware: ’ ! Artists change over time ! So does audience perception ! Gauges buzz ! Parsable content goes up during album releases, major news ! Avoids ‘stale’ recommendations ! Captures that non-audio ‘aboutness’

  30. Query by Description ! “Play me something fast with an electronic beat!” “I’m tired tonight, let’s hear some romantic music.” ! CM vectors in time-aware QBD. ! We don’t need to label any data– the internet does that for us.

  31. Grounding Sound ! Bimodal representation for symbol grounding of music ! Understanding sound innately

  32. Conclusions ! Style useful and peculiar delimiter ! Test case for non-audio aboutness ! CM as cultural representation ! Freely available ! Thanks: MMM group, Steve, Adam, Dan, Ryan Rifkin

Recommend


More recommend