Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu Liang, Ziyin Liu, Amir Zadeh, Louis-Philippe Morency 1 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Progress of Artificial Intelligence Intelligent Robots and Multimedia Content Personal Assistants Virtual Agents 2 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multimodal Language Modalities Language Visual Ø Lexicon Ø Gestures Ø Syntax Ø Body language Ø Pragmatics Ø Eye contact Ø Facial expressions Acoustic Ø Prosody Ø Vocal expressions 3 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multimodal Language Modalities Language Visual Sentiment Ø Positive Ø Lexicon Ø Gestures Ø Negative Ø Syntax Ø Body language Emotion Ø Anger Ø Pragmatics Ø Eye contact Ø Disgust Ø Fear Ø Facial expressions Ø Happiness Acoustic Ø Sadness Ø Surprise Ø Prosody Personality Ø Vocal expressions Ø Confidence Ø Persuasion Ø Passion 4 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Challenge 1: Intra-modal Interactions a) Temporal sequences Speaker’s behaviors Sentiment Intensity “This movie is great” Intra-modal time Head nod Smile time 5 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Challenge 2: Cross-modal Interactions a) Multiple co-occurring interactions b) Different weighted combinations Speaker’s behaviors Sentiment Intensity Cross-modal “This movie is great ” Smile Loud voice time 6 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Aggregation in Humans (Parsini et al. 2015, Taylor et al. 2017) wide smile loud voice 7 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Aggregation in Humans (Parsini et al. 2015, Taylor et al. 2017) wide smile positive reaction loud voice positive words 8 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Aggregation in Humans (Parsini et al. 2015, Taylor et al. 2017) wide smile positive reaction excitement loud voice positive words joyous 9
Computational Model for Multistage Fusion wide smile positive reaction excitement loud voice positive words joyous Computational Model 10
Multimodal Descriptors multimodal He’s type … Language average descriptors … Visual … … Acoustic time 11 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Language Descriptors multimodal He’s type … Language average descriptors neutral word … Visual … … Acoustic time 12 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Visual Descriptors multimodal He’s type … Language average descriptors neutral word frown shrug … Visual … … Acoustic time 13 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Acoustic Descriptors multimodal He’s type … Language average descriptors neutral word frown shrug … Visual loud voice speech elongation … … Acoustic time 14 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion neutral word frown shrug loud voice speech elongation … 15
Multistage Fusion stage 1 neutral word HIGHLIGHT frown shrug loud voice speech elongation … 16
Multistage Fusion stage 1 negative FUSE negative neutral word HIGHLIGHT frown shrug loud voice speech elongation … 17
Multistage Fusion stage 1 stage 2 negative FUSE negative neutral word neutral word HIGHLIGHT frown frown shrug shrug loud voice loud voice speech elongation speech elongation … … 18
Multistage Fusion stage 1 stage 2 negative FUSE negative emphasis neutral word neutral word HIGHLIGHT frown frown shrug shrug loud voice loud voice speech elongation speech elongation … … 19
Multistage Fusion stage 1 stage 2 strongly negative negative FUSE negative emphasis neutral word neutral word HIGHLIGHT frown frown shrug shrug loud voice loud voice speech elongation speech elongation … … 20
Multistage Fusion stage 1 stage 2 stage 3 strongly negative negative FUSE negative emphasis neutral word neutral word neutral word HIGHLIGHT frown frown frown shrug shrug shrug loud voice loud voice loud voice speech elongation speech elongation speech elongation … … … 21
Multistage Fusion stage 1 stage 2 stage 3 strongly negative negative FUSE negative emphasis ambivalence neutral word neutral word neutral word HIGHLIGHT frown frown frown shrug shrug shrug loud voice loud voice loud voice speech elongation speech elongation speech elongation … … … 22
Multistage Fusion stage 1 stage 2 stage 3 strongly negative disappointed negative FUSE negative emphasis ambivalence neutral word neutral word neutral word HIGHLIGHT frown frown frown shrug shrug shrug loud voice loud voice loud voice speech elongation speech elongation speech elongation … … … 23
Intra-modal Recurrent Networks ' $ % & $ % ( $ % LSTHM " LSTHM " LSTHM ! LSTHM ! LSTHM # LSTHM # time ) time ) + + 24
Multistage Fusion Process Multistage Fusion Process $ ! " # ! " % ! " 25 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process stage 1 $ ! " # ! " % ! " HIGHLIGHT 26 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process stage 1 $ ! " # ! " % ! " HIGHLIGHT 27 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process stage 1 FUSE $ ! " # ! " % ! " HIGHLIGHT 28 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process stage 1 stage 2 FUSE $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT 29 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process stage 1 stage 2 FUSE Highlight LSTM $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT 30 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process stage 1 stage 2 Fuse LSTM FUSE FUSE Highlight LSTM $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT 31 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process ⋯ stage & stage 1 stage 2 Fuse LSTM FUSE FUSE FUSE Highlight LSTM ⋯ $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT HIGHLIGHT 32 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Multistage Fusion Process Multistage Fusion Process ⋯ stage ' stage 1 stage 2 ! " Fuse LSTM FUSE FUSE FUSE SUMMARIZE Highlight LSTM ⋯ % # " $ # " & # " HIGHLIGHT HIGHLIGHT HIGHLIGHT 33 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Recurrent Multistage Fusion Network Multistage Fusion Process ⋯ stage * stage 1 stage 2 $ % FUSE FUSE FUSE SUMMARIZE ⋯ ( & % ' & % ) & % HIGHLIGHT HIGHLIGHT HIGHLIGHT LSTHM " LSTHM " LSTHM ! LSTHM ! LSTHM # LSTHM # time + time + + - 34
Recurrent Multistage Fusion Network Multistage Fusion Process ⋯ stage * stage 1 stage 2 $ % FUSE FUSE FUSE SUMMARIZE ⋯ ( & % ' & % ) & % HIGHLIGHT HIGHLIGHT HIGHLIGHT LSTHM " LSTHM " LSTHM ! LSTHM ! LSTHM # LSTHM # time + time + + - 35
Baseline Models 1. Non-temporal Models § SVM (Cortes and Vapnik, 1995) , DF ( Nojavanasghari et al., 2016 ) 2. Early Fusion § EF-LSTM (Hochreiter and Schmidhuber, 1997), EF-RHN (Zilly et al., 2016) 3. Late Fusion § LMF (Liu et al., 2018), TFN (Zadeh et al., 2017), BC-LSTM (Poria et al., 2017) 4. Multi-view Learning § MV-LSTM ( Rajagopalan et al., 2016 ) 5. Memory-based models § MARN, MFN (Zadeh et al., 2018) 36 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
State-of-the-art Results CMU-MOSI Sentiment (Binary Accuracy) 78.4% 77 76.5 76 75.5 75 74.5 74 73.5 73 SVM-MD DF EF-RHN EF-LSTM TFN BC-LSTM MV-LSTM MARN MFN Graph-MFN MVLSTM MFN DF EF-RHN EF-LSTM TFN BC-LSTM MARN RMFN SVM Baseline Models RMFN 37 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
State-of-the-art Results IEMOCAP Happy Emotion POM Personality Traits IEMOCAP Sad Emotion CMU-MOSI Sentiment (Binary Accuracy) (Multiclass Accuracy) (Binary Accuracy) (Correlation) 63 45.15 70 0.55 45.1 69 0.54 62.5 45.05 68 0.53 67 45 0.52 62 66 44.95 0.51 65 61.5 44.9 0.5 64 44.85 0.49 61 63 0.48 44.8 62 0.47 44.75 60.5 61 0.46 44.7 60 60 0.45 44.65 MFN RMFN MARN RMFN MV-LSTM RMFN MFN RMFN Best Baseline Model RMFN 38 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion
Recommend
More recommend