Artificial Neural Networks for Multimodal Information Fusion Friedhelm Schwenker Institute of Neural Information Processing University of Ulm Cairo University April 9, 2010 Schwenker UUlm ANN Informationfusion
Outline Artificial neural networks (ANN) Recognition of bio-acoustic time series Emotion recognition in human computer interatction Schwenker UUlm ANN Informationfusion
Pattern recognition applications at NI Recognition of visual objects from camera images (OCR, faces recognition) Medical diagnosis and bioinformatics Speaker identification Speech recognition/understanding Recognition of human emotions from speech, and facial expressions Bio-acoustic pattern recognition .... Schwenker UUlm ANN Informationfusion
1. Artificial Neural networks Von Neumann Computer Biologial neural net Processor complex simple high speed low speed 1 or a few large number Computing centralized distributed sequential parallel by programs by learning from data Memory localized distributed addressable by keys addressable by content not faulttolerant faulttolerant Schwenker UUlm ANN Informationfusion
Layered Networks Layered neural networks (single or multilayer perceptrons, radial basis function networks) are widely used in pattern recognition and regression applications. Input Weight matrix Nonlinear transfer function Output Schwenker UUlm ANN Informationfusion
Neural Models Linear neuron n Weight vector � y = � x , c � = x i c i c i = 1 Threshold neuron Input � 1 � x , c � ≥ θ x y = 0 sonst Sigmoidal neuron 1 f y = f ( � x , c �− θ ) , f ( s ) = 1 + exp ( − β s ) Output RBF neuron y y = f ( � x − c � 2 ) , f ( r ) = exp ( − r 2 2 σ 2 ) Schwenker UUlm ANN Informationfusion
Learning in artificial neural nets Mapping F C : X → Y , Input connectivity matrix C learnt by C examples x Data x ∈ X or ( x , T ) ∈ X × Y Different types of target function E ( C ) . Optimising E ( C ) leads to Teacher T y Output learning rules for C . Schwenker UUlm ANN Informationfusion
Supervised learning j c ij Output y j , teaching signal T j . c ij adapted, such that y j ≈ T j . Input x Example: Delta-rule i � � ∆ c ij ∼ x i T j − y j Delta-rule minimises T Teacher j E ( c ) = � T − y � 2 Output y j Schwenker UUlm ANN Informationfusion
Unsupervised Learning j c ij c ij adapted without teaching signal Input x Example: Hebbian learning : i ∆ c ij ∼ x i y j Hebbian learning maximises E ( c ) = � y � 2 Output y j Schwenker UUlm ANN Informationfusion
Competitive learning Winner detection j c ij Neurons of neighbourhood of the winner are adapted Input x Example: SOM learning or i k-means : ∆ c ij ∼ ( x i − c ij ) · N j N Winner j + k-means minimises Neigbourhood y Output E ( c ) = � c − x � 2 j Schwenker UUlm ANN Informationfusion
Model complexity and training data Artificial neural networks can solve complex tasks, e.g. high-dimensional input (many input variables), high-dimensional output (multi-class problems). Large networks (with many parameters) are needed to achieve good approximations. Size of the training set grows with the number of free parameters � VCdim log 1 � + 1 � log ( 1 �� M �,δ = O VCdim = O ( W log ( K )) , � δ W ( K ) number of weights (units), � error, 1 − δ confidence. Typically the training data set is to small. Possible approach: Decomposition of the learning task in combination with information/sensor fusion . Schwenker UUlm ANN Informationfusion
Multimodal Informationfusion Sensors Feature extraction Fusion feature extraction 1 vision feature decision extraction 2 audio feature extraction N Schwenker UUlm ANN Informationfusion
Early fusion • Mid-level fusion • Late fusion(MCS) Schwenker UUlm ANN Informationfusion
Multiple Classifier Systems architecture Classifier�Layer Fusion�Layer 1 C (x ) 1 Feature�1 : : : : i C (x�) z i Feature�i F : : : I : C (x�) I Feature�I Classification Schwenker UUlm ANN Informationfusion
Fixed decision fusion mappings Fusion by Averaging : I F ( P ) := 1 � C i ( x i ) (1) I i = 1 Probabilistic Fusion : I 1 + α Pr ( ω = l ) Pr ( ω = l | x i ) Pr ( ω � = l ) � − 1 � � Pr ( ω = l | x 1 ... x I ) = 1 − � Pr ( ω � = l ) Pr ( ω � = l | x i ) Pr ( ω = l i = 1 (2) with Pr ( ω = l | x i ) = C i l ( x i ) + � l ( x i ) Voting, Median-Fusion, .... Schwenker UUlm ANN Informationfusion
Examples of trainable fusion mappings Train the classifier layer by 1 Train the fusion mapping 2 Decision templates Bayes rule Behaviour knowledge space (Linear) Associative memory networks (Hebbian Learning, delta learning rulde, pseudo-inverse solution) Artificial neural networks/Kernel methods Schwenker UUlm ANN Informationfusion
2. Bio-acoustic pattern recognition Schwenker UUlm ANN Informationfusion
Example : Ephippiger 0.6 0.4 0.2 Amplitude 0 −0.2 −0.4 −0.6 1 2 3 4 5 6 Time (s) 0.8 0.6 0.4 0.2 Amplitude 0 −0.2 −0.4 −0.6 −0.8 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 Time (s) 0.6 0.4 0.2 Amplitude 0 −0.2 −0.4 −0.6 278 280 282 284 286 Time (ms) Schwenker UUlm ANN Informationfusion
Extraction of Local Features in Time series window W 1 window W X( ) 1 � X( ) 2 ������ X(���) s ( t ) T Signal t = 1 I local features X ( ) = ( x 1 ( ) ... x I ( )) Schwenker UUlm ANN Informationfusion
FCT-Architecture 1 j Feature�1 x�( ) x�( ) j j � � Feature i x ( ) j i Feature I x ( ) j Final I Fusion Classification X( ) j Classification j z 1 z o z z � � � � Temporal�fusion j =�1,�..., R Φ , Φ = � I F usion: X ( ) = ( x 1 ( ) ... x I ( )) ∈ I i = 1 d i C := C ( X ( )) C lassification: C o := F ( C 1 ... C J ) T emporal fusion: Schwenker UUlm ANN Informationfusion
CDT-Architecture C (x�(��)) j � 1 x�( ) j Decision�fusion � Feature�1 Classification 1 C x ( ) j i Feature i i C x ( ) j I Feature I C I o C Final Temporal�fusion j =�1,�..., Classification C i : I R d i → ∆ , i = 1 ... I C lassification: C 1 ( x 1 ( )) ... C I ( x I ( )) C := F ( C 1 ( x 1 ( )) ... C I ( x I ( ))) D ecision fusion: C o := F ( C 1 ... C J ) T emporal fusion: Schwenker UUlm ANN Informationfusion
Results for cricket songs Crossvalidation experiments (mean error rates) of 28 cricket species with 4 to 6 animals per species. Radial-Basis-Function Networks as first level classifiers. Extracted features: pulse length, pulse distance, energy contour, Averaged fusion lead to an error ≥ 0 . 1 Algorithm ρ =0.0 ρ =0.2 ρ =0.4 ρ =0.6 ρ =0.8 ρ =1.0 DT 8.61 7.88 8.03 7.74 7.59 7.59 Multiple DT 8.32 8.03 7.15 6.86 6.86 6.72 Cluster DT 8.61 7.30 7.15 7.15 7.30 7.30 Schwenker UUlm ANN Informationfusion
3. Multimodal pattern recognition of emotions in HCI Human machine interaction (HCI) Emotion theory and emotional data collection Recognition of facial expressions Audio-Visual Laughter detection Schwenker UUlm ANN Informationfusion
Human machine interaction (1) Schwenker UUlm ANN Informationfusion
Human machine interaction (2) In many situations the human machine interaction (HCI) could be improved by having machines naturally adapt to their users. HCI should take into account information of the emotional state of the user, e.g. frustration, confusion, disliking, interest, surprise, anger, ... Schwenker UUlm ANN Informationfusion
Ekman’s 6 basic emotions Based on psychophysical experiments of facial expressions Ekman/Friesen defined 6 basic emotions: Anger Surprise Disgust Sadness Happiness Fear Schwenker UUlm ANN Informationfusion
More complex emotion theories Schwenker UUlm ANN Informationfusion
Frontal views Recognition of emotions in facial expressions based on frontal views seem to be easy ... Schwenker UUlm ANN Informationfusion
Helmut data set Three camera views to the user: frontal, back, total Person is labelled to be interested Schwenker UUlm ANN Informationfusion
Face detection from the frontal view Viola-Jones classifier is implemented to detect the region of the user’s face Sobel edge detector is applied to extract features relevant to classify the user’s emotional state Schwenker UUlm ANN Informationfusion
Multimodal emotions Emotions are expressed through Body movements (head, arms, torso, legs) Hand gestures Gaze Facial expressions Speech Biophysiological measures (e.g. skin conductance, heart rate, blood volume pressure) Schwenker UUlm ANN Informationfusion
Multimodal emotional data Nexus with 24 EEG sensors, 4 EMG sensors, blood pressure and respiration meter 1 camera 1 microphone Schwenker UUlm ANN Informationfusion
3.1 Emotion regnition from facial expressions Cohn-Kanade benchmark data base Basic emotions (anger, disgust, fear, happiness, sadness, surprise) acted by semi-professional actors. 432 sequences (97 individuals) of 30 frames per second; resolution 640 × 480; Schwenker UUlm ANN Informationfusion
Recommend
More recommend