Applause Identification and its relevance to Archival of Carnatic Music Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1 1 Computer Science Dept, IIT Madras, India. July 6, 2012 Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Outline of the presentation Introduction to Carnatic music concert Problem definition Feature Extraction Spectral flux Spectral Entropy Characterising the applause using Cumulative sum Highlights detection using CUSUM Results Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Carnatic music concert (1) Carnatic music concert can be 2 to 3 hours long Concert consists of various pieces. Concert consists of compositions, interlaced with improvisational aspects like Raga Alapana , Nereval , Kalpanaswara , Thanam , Sloka , Thani Avarthanam . Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Carnatic music concert (2) In a Concert audience applauds the artist after end of piece. Some times audience applauds the artist in-between improvisational aspects like Raga vocal , Raga violin , After song , Kalpana swara , Thanam , Thani Avarthanam . Most of the carnatic music recordings which are archived today are Manually segmented into pieces. Entire recordings are stored as a single recording. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Applications of Applause Identification Existing work on Applause identification Manoj et al (2011) , discusses how applause is detected in a continuous speech meetings and how it can be used as a key indicator of highlights in speech meeting. Lie Lu et al (2001) , discusses techniques for audio classification and segmenting the audio signal into speech, music, silences, environmental sounds like applause, laughter etc and these segments can be used as an index for audio retrieval. Z. Xiong et al (2003), discusses how applause is detected for determining the highlights of the game. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Problem Definition Identifying the applauses in a given carnatic music concert using spectral domain features. Concert can be automatically segmented into individual pieces for archival purpose. Finding duration and strength of an applause using CUSUM technique. We can determine the highlights of the concert. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Characteristics of Applause and Music 30000.0 30000.0 10000.0 10000.0 20000.0 20000.0 6000.0 6000.0 Amplitude 10000.0 10000.0 Amplitude 0.0 0.0 2000.0 2000.0 -10000.0 -10000.0 -2000.0 -2000.0 -20000.0 -20000.0 -6000.0 -6000.0 -30000.0 -30000.0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 -10000.0 -10000.0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 30000.0 30000.0 10000.0 10000.0 20000.0 20000.0 Amplitude 6000.0 6000.0 10000.0 10000.0 Amplitude 0.0 0.0 2000.0 2000.0 -10000.0 -10000.0 -2000.0 -2000.0 -20000.0 -20000.0 -6000.0 -6000.0 -30000.0 -30000.0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Time in seconds Time in seconds -10000.0 -10000.0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Time in seconds Time in seconds Figure: Typical sequence of applause and music segments(time domain) In time domain applause segment is rhythmic not structured but corresponding to music it is more structured. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Characteristics of Applause and Music 80.0 80.0 80.0 80.0 Log Magnitude (dB) 60.0 60.0 Log Magnitude (dB) 60.0 60.0 40.0 40.0 40.0 40.0 20.0 20.0 20.0 20.0 0.0 0.0 0.0 0.0 0 2000 4000 6000 8000 0 2000 4000 6000 8000 0 2000 4000 6000 8000 0 2000 4000 6000 8000 80.0 80.0 80.0 80.0 60.0 60.0 60.0 60.0 Log Magnitude (dB) Log Magnitude (dB) 40.0 40.0 40.0 40.0 20.0 20.0 20.0 20.0 0.0 0.0 0.0 0.0 0 2000 4000 6000 8000 0 2000 4000 6000 8000 0 2000 4000 6000 8000 0 2000 4000 6000 8000 Frequency in Hz Frequency in Hz Frequency in Hz Frequency in Hz Figure: Typical sequence of applause and music segments(spectral domain) Power spectrum of applause is flat whereas spectrum of music is structured. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Feature Extraction Selecting a good feature for classification or segmentation is crucial task. Most of the audio signals spectral properties change slowly with respect to time. To discriminate between music and applause the following features are used. Spectral flux Spectral entropy Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Spectral flux (1) Spectral flux (SF), also called spectral variation, characterises the change in spectra between adjacent two frames of speech signal. It measures how quickly the power spectrum changes. � SF [ n ] = ( | X n ( ω ) | − | X n + 1 ( ω ) | ) 2 d ω (1) ω where X n ( w ) is the magnitude spectrum of nth frame of an audio signal. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Spectra flux (2) Different Normalisations of Spectral flux are: Spectral flux with no normalisation. 1 Power spectral density normalisation: In this approach XNorm n ( ω ) is 2 defined: X n ( ω ) XNorm n ( ω ) = (2) ω X n ( ω ) d ω � Peak normalisation: In this approach XNorm n ( ω ) is defined as: 3 X n ( ω ) XNorm n ( ω ) = (3) max ω ( X n ( ω )) Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Spectral flux (3) 9 x 10 −4 2.5 x 10 Spectral flux of Power Spectral Density Normalisation Music Segment Spectral flux of unnormalised spectra 2 Music Segment Appaluse Segment 2 Applause Segment 1.5 1 1 0.5 0 0 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Time in Seconds Time in Seconds 0.018 Spectral flux of Peak Normalised Spectra 0.016 0.014 0.012 Music Segment 0.01 Applause Segment 0.008 0.006 0.004 0.002 0 0 100 200 300 400 500 600 700 800 Time in Seconds Figure: Different Normalisations of Spectral flux Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Spectral Entropy (1) Spectral Entropy (SE) is the measure of randomness of a system. Shannons entropy of a discrete stochastic variable X with probability mass function is given by N H(X) = − � p ( x i ) log 2 [ p ( x i )] (4) i = 1 | X n ( ω ) | 2 PSD n ( ω ) = ω | X n ( ω ) | 2 d ω � � SE [ n ] PSD n ( ω ) log PSD n ( ω ) d ω = − (5) ω Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Spectral Entropy (2) 4 3.5 Music Segment 3 Spectral Entropy 2.5 Applause Segment 2 1.5 1 0.5 0 0 100 200 300 400 500 600 700 800 Time in Seconds Figure: Spectral entropy of music signal Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Database Used 19 Concerts of male and female singers are taken for experiments. All concerts are Vocal, in that lead musician is a singer. Each concert has 15-20 applauses resulting a total of 343 applauses. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Experimental analysis For 19 concerts Spectral flux and Spectral entropy features are extracted for a frame of 0.25 s duration with a overlap of 0.01 s with a sampling frequency of 44.1KHz. Extracted features are smoothed by a rectangular moving average filter of length 15. For all concerts applause locations and type of applauses are marked manually by a musician. Based on the ground truth DET curve and Equal Error Rates (EER) are calculated for all above extracted features. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Experimental Analysis DET Curve is plotted for Applause detection for various thresholds. The Equal error rates(EER) are given in Table. Applause Detection Performance 80 Entropy fluxnonorm fluxnorm EER values 60 Miss probability (in %) 40 20 10 5 2 1 1 2 5 10 20 40 60 80 False Alarm probability (in %) Figure: DET Curve for appaluse detection Method EER Spectral Flux (no norm) 44.55 % Spectral Flux 23.33% Spectral Entropy 17.33% Table: EER for applause detection Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Introduction to cumulative sum( CUSUM ) method In case of spectral flux and spectral entropy applause locations are identified based on threshold. It may not be sufficient to determine the duration and strength of an applause. So CUSUM is a non-parametric approach and it can be used to identify the statistical inhomogeneity of a given signal. CUSUM is estimated as Let X [ n ] be the value of feature extracted at time n , Y [ n ] X [ n ] − a = � Cusum [ n − 1 ] + Y [ n ] , Y [ n ] > 0 Cusum [ n ] = 0 Otherwise If Cusum [ n ] > Θ , then it suggests that there is a significant structural shift in the series. The values of ‘ a ’ and ‘ Θ ’ have to be estimated empirically and may vary across different data sets. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop
Recommend
More recommend