artificial neural networks for multimodal information
play

Artificial Neural Networks for Multimodal Information Fusion - PowerPoint PPT Presentation

Artificial Neural Networks for Multimodal Information Fusion Friedhelm Schwenker Institute of Neural Information Processing University of Ulm Cairo University April 9, 2010 Schwenker UUlm ANN Informationfusion Outline Artificial neural


  1. Artificial Neural Networks for Multimodal Information Fusion Friedhelm Schwenker Institute of Neural Information Processing University of Ulm Cairo University April 9, 2010 Schwenker UUlm ANN Informationfusion

  2. Outline Artificial neural networks (ANN) Recognition of bio-acoustic time series Emotion recognition in human computer interatction Schwenker UUlm ANN Informationfusion

  3. Pattern recognition applications at NI Recognition of visual objects from camera images (OCR, faces recognition) Medical diagnosis and bioinformatics Speaker identification Speech recognition/understanding Recognition of human emotions from speech, and facial expressions Bio-acoustic pattern recognition .... Schwenker UUlm ANN Informationfusion

  4. 1. Artificial Neural networks Von Neumann Computer Biologial neural net Processor complex simple high speed low speed 1 or a few large number Computing centralized distributed sequential parallel by programs by learning from data Memory localized distributed addressable by keys addressable by content not faulttolerant faulttolerant Schwenker UUlm ANN Informationfusion

  5. Layered Networks Layered neural networks (single or multilayer perceptrons, radial basis function networks) are widely used in pattern recognition and regression applications. Input Weight matrix Nonlinear transfer function Output Schwenker UUlm ANN Informationfusion

  6. Neural Models Linear neuron n Weight vector � y = � x , c � = x i c i c i = 1 Threshold neuron Input � 1 � x , c � ≥ θ x y = 0 sonst Sigmoidal neuron 1 f y = f ( � x , c �− θ ) , f ( s ) = 1 + exp ( − β s ) Output RBF neuron y y = f ( � x − c � 2 ) , f ( r ) = exp ( − r 2 2 σ 2 ) Schwenker UUlm ANN Informationfusion

  7. Learning in artificial neural nets Mapping F C : X → Y , Input connectivity matrix C learnt by C examples x Data x ∈ X or ( x , T ) ∈ X × Y Different types of target function E ( C ) . Optimising E ( C ) leads to Teacher T y Output learning rules for C . Schwenker UUlm ANN Informationfusion

  8. Supervised learning j c ij Output y j , teaching signal T j . c ij adapted, such that y j ≈ T j . Input x Example: Delta-rule i � � ∆ c ij ∼ x i T j − y j Delta-rule minimises T Teacher j E ( c ) = � T − y � 2 Output y j Schwenker UUlm ANN Informationfusion

  9. Unsupervised Learning j c ij c ij adapted without teaching signal Input x Example: Hebbian learning : i ∆ c ij ∼ x i y j Hebbian learning maximises E ( c ) = � y � 2 Output y j Schwenker UUlm ANN Informationfusion

  10. Competitive learning Winner detection j c ij Neurons of neighbourhood of the winner are adapted Input x Example: SOM learning or i k-means : ∆ c ij ∼ ( x i − c ij ) · N j N Winner j + k-means minimises Neigbourhood y Output E ( c ) = � c − x � 2 j Schwenker UUlm ANN Informationfusion

  11. Model complexity and training data Artificial neural networks can solve complex tasks, e.g. high-dimensional input (many input variables), high-dimensional output (multi-class problems). Large networks (with many parameters) are needed to achieve good approximations. Size of the training set grows with the number of free parameters � VCdim log 1 � + 1 � log ( 1 �� M �,δ = O VCdim = O ( W log ( K )) , � δ W ( K ) number of weights (units), � error, 1 − δ confidence. Typically the training data set is to small. Possible approach: Decomposition of the learning task in combination with information/sensor fusion . Schwenker UUlm ANN Informationfusion

  12. Multimodal Informationfusion Sensors Feature extraction Fusion feature extraction 1 vision feature decision extraction 2 audio feature extraction N Schwenker UUlm ANN Informationfusion

  13. Early fusion • Mid-level fusion • Late fusion(MCS) Schwenker UUlm ANN Informationfusion

  14. Multiple Classifier Systems architecture Classifier�Layer Fusion�Layer 1 C (x ) 1 Feature�1 : : : : i C (x�) z i Feature�i F : : : I : C (x�) I Feature�I Classification Schwenker UUlm ANN Informationfusion

  15. Fixed decision fusion mappings Fusion by Averaging : I F ( P ) := 1 � C i ( x i ) (1) I i = 1 Probabilistic Fusion : I 1 + α Pr ( ω = l ) Pr ( ω = l | x i ) Pr ( ω � = l ) � − 1 � � Pr ( ω = l | x 1 ... x I ) = 1 − � Pr ( ω � = l ) Pr ( ω � = l | x i ) Pr ( ω = l i = 1 (2) with Pr ( ω = l | x i ) = C i l ( x i ) + � l ( x i ) Voting, Median-Fusion, .... Schwenker UUlm ANN Informationfusion

  16. Examples of trainable fusion mappings Train the classifier layer by 1 Train the fusion mapping 2 Decision templates Bayes rule Behaviour knowledge space (Linear) Associative memory networks (Hebbian Learning, delta learning rulde, pseudo-inverse solution) Artificial neural networks/Kernel methods Schwenker UUlm ANN Informationfusion

  17. 2. Bio-acoustic pattern recognition Schwenker UUlm ANN Informationfusion

  18. Example : Ephippiger 0.6 0.4 0.2 Amplitude 0 −0.2 −0.4 −0.6 1 2 3 4 5 6 Time (s) 0.8 0.6 0.4 0.2 Amplitude 0 −0.2 −0.4 −0.6 −0.8 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 Time (s) 0.6 0.4 0.2 Amplitude 0 −0.2 −0.4 −0.6 278 280 282 284 286 Time (ms) Schwenker UUlm ANN Informationfusion

  19. Extraction of Local Features in Time series window W 1 window W X( ) 1 � X( ) 2 ������ X(���) s ( t ) T Signal t = 1 I local features X (  ) = ( x 1 (  ) ... x I (  )) Schwenker UUlm ANN Informationfusion

  20. FCT-Architecture 1 j Feature�1 x�( ) x�( ) j j � � Feature i x ( ) j i Feature I x ( ) j Final I Fusion Classification X( ) j Classification j z 1 z o z z � � � � Temporal�fusion j =�1,�..., R Φ , Φ = � I F usion: X (  ) = ( x 1 (  ) ... x I (  )) ∈ I i = 1 d i C  := C  ( X (  )) C lassification: C o := F ( C 1 ... C J ) T emporal fusion: Schwenker UUlm ANN Informationfusion

  21. CDT-Architecture C (x�(��)) j � 1 x�( ) j Decision�fusion � Feature�1 Classification 1 C x ( ) j i Feature i i C x ( ) j I Feature I C I o C Final Temporal�fusion j =�1,�..., Classification C i : I R d i → ∆ , i = 1 ... I C lassification: C 1 ( x 1 (  )) ... C I ( x I (  )) C  := F ( C 1 ( x 1 (  )) ... C I ( x I (  ))) D ecision fusion: C o := F ( C 1 ... C J ) T emporal fusion: Schwenker UUlm ANN Informationfusion

  22. Results for cricket songs Crossvalidation experiments (mean error rates) of 28 cricket species with 4 to 6 animals per species. Radial-Basis-Function Networks as first level classifiers. Extracted features: pulse length, pulse distance, energy contour, Averaged fusion lead to an error ≥ 0 . 1 Algorithm ρ =0.0 ρ =0.2 ρ =0.4 ρ =0.6 ρ =0.8 ρ =1.0 DT 8.61 7.88 8.03 7.74 7.59 7.59 Multiple DT 8.32 8.03 7.15 6.86 6.86 6.72 Cluster DT 8.61 7.30 7.15 7.15 7.30 7.30 Schwenker UUlm ANN Informationfusion

  23. 3. Multimodal pattern recognition of emotions in HCI Human machine interaction (HCI) Emotion theory and emotional data collection Recognition of facial expressions Audio-Visual Laughter detection Schwenker UUlm ANN Informationfusion

  24. Human machine interaction (1) Schwenker UUlm ANN Informationfusion

  25. Human machine interaction (2) In many situations the human machine interaction (HCI) could be improved by having machines naturally adapt to their users. HCI should take into account information of the emotional state of the user, e.g. frustration, confusion, disliking, interest, surprise, anger, ... Schwenker UUlm ANN Informationfusion

  26. Ekman’s 6 basic emotions Based on psychophysical experiments of facial expressions Ekman/Friesen defined 6 basic emotions: Anger Surprise Disgust Sadness Happiness Fear Schwenker UUlm ANN Informationfusion

  27. More complex emotion theories Schwenker UUlm ANN Informationfusion

  28. Frontal views Recognition of emotions in facial expressions based on frontal views seem to be easy ... Schwenker UUlm ANN Informationfusion

  29. Helmut data set Three camera views to the user: frontal, back, total Person is labelled to be interested Schwenker UUlm ANN Informationfusion

  30. Face detection from the frontal view Viola-Jones classifier is implemented to detect the region of the user’s face Sobel edge detector is applied to extract features relevant to classify the user’s emotional state Schwenker UUlm ANN Informationfusion

  31. Multimodal emotions Emotions are expressed through Body movements (head, arms, torso, legs) Hand gestures Gaze Facial expressions Speech Biophysiological measures (e.g. skin conductance, heart rate, blood volume pressure) Schwenker UUlm ANN Informationfusion

  32. Multimodal emotional data Nexus with 24 EEG sensors, 4 EMG sensors, blood pressure and respiration meter 1 camera 1 microphone Schwenker UUlm ANN Informationfusion

  33. 3.1 Emotion regnition from facial expressions Cohn-Kanade benchmark data base Basic emotions (anger, disgust, fear, happiness, sadness, surprise) acted by semi-professional actors. 432 sequences (97 individuals) of 30 frames per second; resolution 640 × 480; Schwenker UUlm ANN Informationfusion

Recommend


More recommend