multimodal language analysis with recurrent multistage
play

Multimodal Language Analysis with Recurrent Multistage Fusion - PowerPoint PPT Presentation

Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu Liang, Ziyin Liu, Amir Zadeh, Louis-Philippe Morency 1 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion Progress of


  1. Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu Liang, Ziyin Liu, Amir Zadeh, Louis-Philippe Morency 1 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  2. Progress of Artificial Intelligence Intelligent Robots and Multimedia Content Personal Assistants Virtual Agents 2 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  3. Multimodal Language Modalities Language Visual Ø Lexicon Ø Gestures Ø Syntax Ø Body language Ø Pragmatics Ø Eye contact Ø Facial expressions Acoustic Ø Prosody Ø Vocal expressions 3 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  4. Multimodal Language Modalities Language Visual Sentiment Ø Positive Ø Lexicon Ø Gestures Ø Negative Ø Syntax Ø Body language Emotion Ø Anger Ø Pragmatics Ø Eye contact Ø Disgust Ø Fear Ø Facial expressions Ø Happiness Acoustic Ø Sadness Ø Surprise Ø Prosody Personality Ø Vocal expressions Ø Confidence Ø Persuasion Ø Passion 4 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  5. Challenge 1: Intra-modal Interactions a) Temporal sequences Speaker’s behaviors Sentiment Intensity “This movie is great” Intra-modal time Head nod Smile time 5 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  6. Challenge 2: Cross-modal Interactions a) Multiple co-occurring interactions b) Different weighted combinations Speaker’s behaviors Sentiment Intensity Cross-modal “This movie is great ” Smile Loud voice time 6 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  7. Multistage Aggregation in Humans (Parsini et al. 2015, Taylor et al. 2017) wide smile loud voice 7 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  8. Multistage Aggregation in Humans (Parsini et al. 2015, Taylor et al. 2017) wide smile positive reaction loud voice positive words 8 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  9. Multistage Aggregation in Humans (Parsini et al. 2015, Taylor et al. 2017) wide smile positive reaction excitement loud voice positive words joyous 9

  10. Computational Model for Multistage Fusion wide smile positive reaction excitement loud voice positive words joyous Computational Model 10

  11. Multimodal Descriptors multimodal He’s type … Language average descriptors … Visual … … Acoustic time 11 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  12. Language Descriptors multimodal He’s type … Language average descriptors neutral word … Visual … … Acoustic time 12 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  13. Visual Descriptors multimodal He’s type … Language average descriptors neutral word frown shrug … Visual … … Acoustic time 13 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  14. Acoustic Descriptors multimodal He’s type … Language average descriptors neutral word frown shrug … Visual loud voice speech elongation … … Acoustic time 14 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  15. Multistage Fusion neutral word frown shrug loud voice speech elongation … 15

  16. Multistage Fusion stage 1 neutral word HIGHLIGHT frown shrug loud voice speech elongation … 16

  17. Multistage Fusion stage 1 negative FUSE negative neutral word HIGHLIGHT frown shrug loud voice speech elongation … 17

  18. Multistage Fusion stage 1 stage 2 negative FUSE negative neutral word neutral word HIGHLIGHT frown frown shrug shrug loud voice loud voice speech elongation speech elongation … … 18

  19. Multistage Fusion stage 1 stage 2 negative FUSE negative emphasis neutral word neutral word HIGHLIGHT frown frown shrug shrug loud voice loud voice speech elongation speech elongation … … 19

  20. Multistage Fusion stage 1 stage 2 strongly negative negative FUSE negative emphasis neutral word neutral word HIGHLIGHT frown frown shrug shrug loud voice loud voice speech elongation speech elongation … … 20

  21. Multistage Fusion stage 1 stage 2 stage 3 strongly negative negative FUSE negative emphasis neutral word neutral word neutral word HIGHLIGHT frown frown frown shrug shrug shrug loud voice loud voice loud voice speech elongation speech elongation speech elongation … … … 21

  22. Multistage Fusion stage 1 stage 2 stage 3 strongly negative negative FUSE negative emphasis ambivalence neutral word neutral word neutral word HIGHLIGHT frown frown frown shrug shrug shrug loud voice loud voice loud voice speech elongation speech elongation speech elongation … … … 22

  23. Multistage Fusion stage 1 stage 2 stage 3 strongly negative disappointed negative FUSE negative emphasis ambivalence neutral word neutral word neutral word HIGHLIGHT frown frown frown shrug shrug shrug loud voice loud voice loud voice speech elongation speech elongation speech elongation … … … 23

  24. Intra-modal Recurrent Networks ' $ % & $ % ( $ % LSTHM " LSTHM " LSTHM ! LSTHM ! LSTHM # LSTHM # time ) time ) + + 24

  25. Multistage Fusion Process Multistage Fusion Process $ ! " # ! " % ! " 25 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  26. Multistage Fusion Process Multistage Fusion Process stage 1 $ ! " # ! " % ! " HIGHLIGHT 26 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  27. Multistage Fusion Process Multistage Fusion Process stage 1 $ ! " # ! " % ! " HIGHLIGHT 27 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  28. Multistage Fusion Process Multistage Fusion Process stage 1 FUSE $ ! " # ! " % ! " HIGHLIGHT 28 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  29. Multistage Fusion Process Multistage Fusion Process stage 1 stage 2 FUSE $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT 29 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  30. Multistage Fusion Process Multistage Fusion Process stage 1 stage 2 FUSE Highlight LSTM $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT 30 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  31. Multistage Fusion Process Multistage Fusion Process stage 1 stage 2 Fuse LSTM FUSE FUSE Highlight LSTM $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT 31 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  32. Multistage Fusion Process Multistage Fusion Process ⋯ stage & stage 1 stage 2 Fuse LSTM FUSE FUSE FUSE Highlight LSTM ⋯ $ ! " # ! " % ! " HIGHLIGHT HIGHLIGHT HIGHLIGHT 32 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  33. Multistage Fusion Process Multistage Fusion Process ⋯ stage ' stage 1 stage 2 ! " Fuse LSTM FUSE FUSE FUSE SUMMARIZE Highlight LSTM ⋯ % # " $ # " & # " HIGHLIGHT HIGHLIGHT HIGHLIGHT 33 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  34. Recurrent Multistage Fusion Network Multistage Fusion Process ⋯ stage * stage 1 stage 2 $ % FUSE FUSE FUSE SUMMARIZE ⋯ ( & % ' & % ) & % HIGHLIGHT HIGHLIGHT HIGHLIGHT LSTHM " LSTHM " LSTHM ! LSTHM ! LSTHM # LSTHM # time + time + + - 34

  35. Recurrent Multistage Fusion Network Multistage Fusion Process ⋯ stage * stage 1 stage 2 $ % FUSE FUSE FUSE SUMMARIZE ⋯ ( & % ' & % ) & % HIGHLIGHT HIGHLIGHT HIGHLIGHT LSTHM " LSTHM " LSTHM ! LSTHM ! LSTHM # LSTHM # time + time + + - 35

  36. Baseline Models 1. Non-temporal Models § SVM (Cortes and Vapnik, 1995) , DF ( Nojavanasghari et al., 2016 ) 2. Early Fusion § EF-LSTM (Hochreiter and Schmidhuber, 1997), EF-RHN (Zilly et al., 2016) 3. Late Fusion § LMF (Liu et al., 2018), TFN (Zadeh et al., 2017), BC-LSTM (Poria et al., 2017) 4. Multi-view Learning § MV-LSTM ( Rajagopalan et al., 2016 ) 5. Memory-based models § MARN, MFN (Zadeh et al., 2018) 36 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  37. State-of-the-art Results CMU-MOSI Sentiment (Binary Accuracy) 78.4% 77 76.5 76 75.5 75 74.5 74 73.5 73 SVM-MD DF EF-RHN EF-LSTM TFN BC-LSTM MV-LSTM MARN MFN Graph-MFN MVLSTM MFN DF EF-RHN EF-LSTM TFN BC-LSTM MARN RMFN SVM Baseline Models RMFN 37 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

  38. State-of-the-art Results IEMOCAP Happy Emotion POM Personality Traits IEMOCAP Sad Emotion CMU-MOSI Sentiment (Binary Accuracy) (Multiclass Accuracy) (Binary Accuracy) (Correlation) 63 45.15 70 0.55 45.1 69 0.54 62.5 45.05 68 0.53 67 45 0.52 62 66 44.95 0.51 65 61.5 44.9 0.5 64 44.85 0.49 61 63 0.48 44.8 62 0.47 44.75 60.5 61 0.46 44.7 60 60 0.45 44.65 MFN RMFN MARN RMFN MV-LSTM RMFN MFN RMFN Best Baseline Model RMFN 38 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

Recommend


More recommend