EM EMOTION RECOGNITION IN IN SOUND ANASTASIYA S. POPOVA HSE NN 2017
INTRODUCTION
THE PROBLEM y : X → Y y : R n → Y
THE DATASET (RA RAVDESS DA DATABASE) http://neuron.arts.ryerson.ca/ravdess/?f=3
PRETREATMENT Length equalization
PRETREATMENT Loudness normalization
PRETREATMENT Highpass&Lowpass filters, voice audio detection (VAD) algorithm
SPECTROGRAM -> MELSPECTROGRAM
THE DIFFERENCE BETWEEN CLASSES (HYPOTHESIS ) neutral calm happy sad surprised fearful angry disgust
CONVOLUTION NETWORK
Input RGB image VGG-11 à VGG-16 Conv3-64 Maxpool Input RGB image Conv3-128 Conv3-64 Maxpool Maxpool Conv3-256 Conv3-128 Conv3-256 Maxpool Conv3-256 Conv3-256 Conv3-256 Maxpool Conv3-512 Maxpool Conv3-512 Conv3-512 Conv3-512 Conv3-512 Maxpool Maxpool Conv3-512 Conv3-512 Conv3-512 Conv3-512 Conv3-512 Maxpool Maxpool FC-4096 FC-4096 FC-4096 FC-4096 FC-1000 FC-1000 Soft-max Soft-max
CLASSIFICATION ON 8 CLASSES ACCURACY VGG-11 + spectrogram VGG-16 + melspectrogram
CONFUSION MATRIX
MEL FREQUENCY CEPSTRAL COEFFICIENTS (MFCC)
stasysp.96@gmail.com
Recommend
More recommend