Ef EfficientLo Low-ra rank Multimodal Fusion Wi With h Moda - PowerPoint PPT Presentation

Ef EfficientLo Low-ra rank Multimodal Fusion Wi With h Moda dality-sp specifi fic Factors Zhun Liu, Ying Shen, Varun Bharadwaj, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency

Artifici cial Intelligence ce

Sen Sentimen ent an and Em Emotion An Analysis Speaker’s behaviors Sentiment Intensity ? “This movie is sick ” Smile Loud time

Mul Multi timoda dal Sen Sentimen ent an and Em Emotion An Analysis Speaker’s behaviors Sentiment Intensity ? Unimodal “This movie is sick ” Bimodal Trimodal Smile Loud time ① Intra-modal Interactions ② Cross-modal Interactions Multimodal Representation (Multimodal Fusion) ③ Computational Efficiency

Mul Multi timoda dal Fu Fusion us using ng Te Tensor Re Representation Bimodal |ℎ| Visual ··· Language “This movie Multimodal is sick” ··· Representation Unimodal 𝒶 = 𝑨 # 1 ⊗ 𝑨 % = 𝑨 # 𝑨 # ⊗ 𝑨 % Intra-modal interactions 1 1 𝑨 % Cross-modal interactions Computational efficiency “Tensor Fusion Network for Multimodal Sentiment Analysis” by Zadeh, A., et, al. (2017)

Co Comp mputati tional Co Comp mplexity ty – Tensor Product ct 𝑵 𝑷 3 𝒆 𝒏 𝒏6𝟐 𝒶 𝟐 𝑷(𝒆 𝟐 ×𝒆 𝟑 ×𝒆 𝟒 ) 𝒶 𝟐 𝑷(𝒆 𝟐 ×𝒆 𝟑 ) M= M=2 M= M=3

CO CORE Low-rank Multimodal CO CONTRIBUTIONS Fusion (LMF) 7

Fr From T m Ten ensor Re Representation to Low-ra rank Fusion Low-rank Multimodal Fusion Visual Language ③ Rearrange the computation of ℎ . ② Decomposition of input tensor 𝑎 . ① Decomposition of weight 𝑋 . Visual Language Tensor Fusion Networks 8

Canonical Polyadic c (CP) Decomposition of tensors Rank of tensor 𝑋 : minimum number of vector tuples needed for exact reconstruction 9

Canonical Polyadic (CP) Decomposition of 3D tensors |ℎ| |ℎ| |ℎ| + ⨂ ⨂ 10

Mo Moda dality ty-speci cific De Decomp mpositi tion |ℎ| |ℎ| |ℎ| Retain the dimension for the multimodal representation ℎ during decomposition 11

① De Decomp mpositi tion o of w weight t t tensor W W 𝑨 # 𝟐 ; ⨂ 𝒶 = ℎ 𝒳 𝟐 𝑨 % 𝟐 12

① De Decomp mpositi tion o of w weight t t tensor W W (>) (@) 𝑥 # 𝑥 # 𝑨 # 𝟐 ; ⨂ 𝒶 = + ⋯ ℎ + 𝟐 𝑨 % (>) (@) ⨂ ⨂ 𝑥 % 𝑥 % 𝟐 13

② De Decomp mpositi tion o of Z Z (>) (@) 𝑥 # 𝑥 # 𝑨 # 𝟐 ; ⨂ 𝒶 = + ⋯ ℎ + 𝟐 𝑨 % (>) (@) ⨂ ⨂ 𝑥 % 𝑥 % 𝟐 14

③ Re Rearranging computation 15

Lo Low-ra rank Multimodal Fusion 16

Ea Easily scales to more modalities Intra-modal interactions Cross-modal interactions Computational complexity 17

EX EXPER PERIMEN ENTS AND RE RESUL SULTS 18

Da Datasets ts CMU-MOSI POM IEMOCAP Sentiment Analysis Speaker Trait Recognition Emotion Recognition 2199 video segments 1000 full video clips 10039 video segments Single-speaker Single-speaker Dyadic interaction • • • From 93 Movie reviews Movie reviews From 302 videos • • • Segment level annotations Video level annotations Segment level annotations Sentiment 16 types of speaker traits 10 classes of emotions • • • Real-valued Categorical annotations Categorical annotations • • • 19

Comp Co mpare t e to f full r rank t k ten ensor f fusion CMU-MOSI Low-rank Multimodal Fusion LMF (Our Model) Tensor Fusion Networks TFN (Zadeh, et al., 2017) 0.98 0.97 0.67 0.67 76.5 0.90 76.4 33.5 75.7 32.8 73.9 73.4 0.91 0.63 32.1 31.6 0.60 0.88 71.5 Correlation Acc-2 F1 MAE Acc-7 20

Co Comp mpare t e to f full r rank t k ten ensor f fusion CMU-MOSI POM IEMOCAP 0.98 0.97 0.67 0.89 1.0 0.67 0.90 86.0 85.9 85.8 83.6 0.40 0.91 0.63 82.8 0.80 0.09 0.60 0.75 81.0 0.88 71.5 0.0 Correlation MAE MAE F1-Happy F1-Sad Correlation 21

Comp Co mpare w e with th St State-of of-the the-Ar Art Approach ches CMU-MOSI Low-rank Multimodal Fusion LMF (our model) Memory Fusion Networks MFN 1.15 (Zadeh, et al., 2018) 1.143 1.019 0.968 0.970 0.965 Multi-attention Recurrent Networks MARN 0.912 (Zadeh, et al., 2018) Tensor Fusion Networks TFN (Zadeh, et al., 2017) Multi-view LSTM MV-LSTM (Rajagopalan, et al., 2016) 0.0 Deep Fusion Deep Fusion (Nojavanasghari, et al., 2016) Mean Average Error (MAE) 22

Co Comp mpare w e with th To Top 2 St State-of of-the the-Ar Art Approach ches CMU-MOSI POM IEMOCAP LMF MFN MARN 1.15 0.912 0.965 0.968 0.886 0.6 0.89 90.0 0.67 89.0 TFN 0.668 85.9 MV-LSTM 0.396 0.349 84.3 84.2 0.805 0.270 82.8 0.633 82.1 0.796 0.632 0.0 0.60 0.75 81.0 0.00 Correlation MAE MAE Correlation F1-Angry F1-Sad 23

Effici ciency cy Improvement LMF (Ours) CMU-MOSI TFN (Zadeh, et al., 2017) 2500 2249.9 2000 1500 1177.17 Efficiency Metric: Number of data samples 1134.82 processed per second 1000 • Training Efficiency 500 340.74 • Testing Efficiency 0 Training - samples/s Testing - samples/s 24

Concl clusions Intra-modal interactions Cross-modal interactions Computational complexity State-of-the-art results 25

Thank yo Th you! Code: https://github.com/Justin1904/Low-rank-Multimodal-Fusion http://multicomp.cs.cmu.edu/

Ef EfficientLo Low-ra rank Multimodal Fusion Wi With h Moda - PowerPoint PPT Presentation

Ef EfficientLo Low-ra rank Multimodal Fusion Wi With h Moda dality-sp specifi fic Factors Zhun Liu, Ying Shen, Varun Bharadwaj, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency Artifici cial Intelligence ce Sen Sentimen ent an

Training of Deep Bidirectional RNNs for Hand Motion Filtering via Multimodal Data Fusion Soroosh

Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Multimodal Sentiment Analysis with Word-Level Fusion and Reinforcement Learning Minghai Chen*,

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion

MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia Laboratory & Centre for

Artificial Neural Networks for Multimodal Information Fusion Friedhelm Schwenker Institute of

Fusical : Multimodal Fusion for Video Sentiment Boyang Tom Jin Leila Abdelrahman Cong Kevin Chen

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Fusion Nothing But The Truth Fusion Orbotech s True Commitment To The PCB Industry Overall

Fusion - Everything You Wanted to Know* * But Were Afraid to Ask Sam Eddinger February 7, 2013

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

/k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight

Multimodal Implementation Plan Multimodal Implementation Plan OUTLINE Overview

Samskip Multimodal Short Sea and Multimodal Business www.samskip.com 1 Samskip Group Profile

Next Steps for Realizing Fusion Power and Comparative Analysis of Roadmaps of World Major Fusion

Update of Magnetic Fusion Energy Research Brian A. Nelson for the UW Fusion Energy Research Group

Multimodal Legal Regime A Checklist of What a Multimodal Transport Regime Should Have Prof Dr

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Ef EfficientLo Low-ra rank Multimodal Fusion Wi With h Moda - PowerPoint PPT Presentation

Ef EfficientLo Low-ra rank Multimodal Fusion Wi With h Moda dality-sp specifi fic Factors Zhun Liu, Ying Shen, Varun Bharadwaj, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency Artifici cial Intelligence ce Sen Sentimen ent an

Training of Deep Bidirectional RNNs for Hand Motion Filtering via Multimodal Data Fusion Soroosh

Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Multimodal Sentiment Analysis with Word-Level Fusion and Reinforcement Learning Minghai Chen*,

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion

MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia Laboratory &amp; Centre for

Artificial Neural Networks for Multimodal Information Fusion Friedhelm Schwenker Institute of

Fusical : Multimodal Fusion for Video Sentiment Boyang Tom Jin Leila Abdelrahman Cong Kevin Chen

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

High resolution image fusion via fusion frames Shidong Li San Francisco State University

Fusion Nothing But The Truth Fusion Orbotech s True Commitment To The PCB Industry Overall

Fusion - Everything You Wanted to Know* * But Were Afraid to Ask Sam Eddinger February 7, 2013

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Probabilistic and Model Fusion: . . . Model Fusion: . . . Interval Uncertainty Model Fusion:

Modeling with MOSEK Fusion Ulf Worse INFORMS Minneapolis October 5 2013 http://www.mosek.com

/k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight

Multimodal Implementation Plan Multimodal Implementation Plan OUTLINE Overview

Samskip Multimodal Short Sea and Multimodal Business www.samskip.com 1 Samskip Group Profile

Next Steps for Realizing Fusion Power and Comparative Analysis of Roadmaps of World Major Fusion

Update of Magnetic Fusion Energy Research Brian A. Nelson for the UW Fusion Energy Research Group

Multimodal Legal Regime A Checklist of What a Multimodal Transport Regime Should Have Prof Dr

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Multimodal Corridor Planning &amp; Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

MultiModal I nform ation Fusion Ling Guan Ryerson Multimedia Laboratory & Centre for

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING