Multimodal Affective Analysis Using Hierarchical Attention Strategy - PowerPoint PPT Presentation

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment Yue Gu* , Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic Multimedia Image Processing Lab Electrical and Computer Engineering Department Rutgers, The State University of New Jersey

Why the affective analysis is necessary? Human Speech Question and Answer Affects Recommendation System Accurate Response 2 AI Assistant

Progress of Affective Computing Affective Analysis Affective Analysis Emotion Recognition Emotion Recognition Sentiment Analysis Sentiment Analysis  Happy, Excited  Happy, Excited  Strong Positive  Strong Positive  Sadness  Sadness  Positive  Positive  Anger  Anger  Neutral  Neutral  Neutral  Neutral  Negative  Negative  Frustration  Frustration  Strong Negative  Strong Negative Speech Signal Speech Signal Natural Language Natural Language Processing Processing Processing Processing Multi-Modality  MFCCs  MFCCs  BoW  BoW  Prosody  Prosody  POS  POS  Vocal Quality  Vocal Quality  CNNs, LSTMs  CNNs, LSTMs 3

Is multi-modality needed?  Vocal signal prominence Oh Oh you you don’t don’t like like that that you you are are west-sider west-sider Neutral or Frustration 4

Is multi-modality needed?  Vocal signal prominence Oh Oh Oh you you you don’t don’t don’t like like like that that that you you you are are are west-sider west-sider west-sider Neutral or Frustration Happy 5

Is multi-modality needed?  Vocal signal prominence Oh Oh Oh you you you don’t don’t don’t like like like that that that you you you are are are west-sider west-sider west-sider Neutral or Frustration Happy  Acoustic ambiguity “ I love this city! ” “ I hate this city! ” 6

Challenges: Feature Extraction  Gap between features and actual affective states  Lack of high-level associations  Not all parts contribute equally 7

Challenges: Modality Fusion  Decision-level Fusion  Lack of mutual association learning  Feature-level Fusion  Fail to learn time-dependent interactions  Lack of consistency 8

Proposed Solutions  Feature Extraction  Hierarchical attention based bidirectional GRUs  Modality Fusion  Word-level fusion with attention  An End-to-End multimodal network 9

Data Pre-processing  Text Branch  Word Embedding: word2vec  Audio Branch  Mel-frequency spectral coefficients (MFSCs)  Synchronization  Word-level forced alignment 10

… “mean” Text “I” “guys” (Embedded) Word-level Text … BiGRU 𝑢_ℎ 1 𝑢_ℎ 1 𝑢_ℎ 2 𝑢_ℎ 2 𝑢_ℎ 𝑂 𝑢_ℎ 𝑂 Fusion Word-level … 𝑢_𝑓 1 𝑢_𝑓 2 𝑢_𝑓 𝑂 Textual Attention … 𝑢_𝛽 1 𝑢_𝛽 2 𝑢_𝛽 1 𝑢_𝛽 2 𝑢_𝛽 𝑂 Softmax Layer 𝑢_𝛽 𝑂 Fusion Result … 𝑊 𝑊 𝑊 CNN 1 2 𝑂 … 𝑥_𝛽 1 𝑥_𝛽 1 𝑥_𝛽 2 𝑥_𝛽 2 𝑥_𝛽 𝑂 𝑥_𝛽 𝑂 Word-level … Acoustic 𝑥_𝑓 1 𝑥_𝑓 2 𝑥_𝑓 𝑂 ⊺ 𝑤 𝑔 ) Attention 𝑓𝑦𝑞(𝑔_𝑓 𝑗𝑘 𝑔_𝛽 𝑗𝑘 = … BiGRU 𝑥_ℎ 1 𝑥_ℎ 1 𝑥_ℎ 2 𝑥_ℎ 2 𝑥_ℎ 𝑂 𝑥_ℎ 𝑂 ⊺ 𝑤 𝑔 ) 𝑀 𝑙=1 𝑓𝑦𝑞(𝑔_𝑓 𝑗𝑙 Audio Frame-level 𝑔_𝑓 2𝑘 and 𝑔_𝛽 2𝑘 Acoustic Attention … 𝑔_𝑓 𝑗𝑘 = 𝑢𝑏𝑜ℎ 𝑋 𝑔 𝑔_ℎ 𝑗𝑘 + 𝑐 𝑔 BiGRU 𝑔_ℎ 21 𝑔_ℎ 22 𝑔_ℎ 2𝑀 11 Audio (MFSC)

Word-level Fusion 𝑊 𝑊 𝑊 𝑗 𝑗 𝑗 Dense Layer 𝑣_𝛽 𝑗 ℎ 𝑗 𝑡_𝛽 𝑗 Attention Layer ℎ 𝑗 𝑡_𝛽 𝑗 𝑥_𝑊 𝑢_𝑊 Dense Layer 𝑗 𝑗 Dense Layer c c 𝑢_ℎ 𝑗 𝑢_ℎ 𝑗 𝑢_𝛽 𝑗 𝑥_ℎ 𝑗 𝑢_ℎ 𝑗 𝑢_𝛽 𝑗 𝑥_𝛽 𝑗 𝑥_ℎ 𝑗 𝑥_𝛽 𝑗 𝑥_ℎ 𝑗 𝑥_𝛽 𝑗 𝑢_𝛽 𝑗 (c) Fine-tuning Attention Fusion (b) Vertical Fusion (a) Horizontal Fusion 𝑥_𝛽 𝑗 𝑥_𝛽 𝑗 Word-level acoustic attention distribution 𝑓𝑦𝑞(𝑣_𝑓 𝑗⊺ 𝑤 𝑣 ) 𝑢_𝛽 𝑗 𝑢_𝛽 𝑗 Word-level textual attention distribution 𝑣_𝛽 𝑗 = 𝑓𝑦𝑞(𝑣_𝑓 𝑙 ⊺ 𝑤 𝑣 ) + 𝑡_𝛽 𝑗 𝑂 𝑥_ℎ 𝑗 𝑥_ℎ 𝑗 Word-level acoustic contextual state 12 𝑙=1 𝑢_ℎ 𝑗 𝑢_ℎ 𝑗 Word-level textual contextual state

Baselines  Sentiment Analysis  BL-SVM, LSTM-SVM  C-MKL, TFN, LSTM(A)  Emotion Recognition  SVM Trees, GSV-eVector  C-MKL, H-DMS  Fusion  Decision-level, Feature-level (utterance-level) 13

Sentiment Analysis Result MOSI 78 76 74 72 70 68 66 64 62 60 Weighted Accuracy Weighted F1 14

Emotion Recognition Result IEMOCAP 75 70 65 60 55 50 Weighted Accuracy Unweighted Accuracy 15

Multimodal architecture is needed MOSI 80 70 60 50 T A T+A Weighted Accuracy Weighted F1 IEMOCAP 75 70 65 60 55 T A T+A 16 Weighted Accuracy Weighted F1

Generalization MOSI to YouTube 68 Weighted Accuracy Weighted F1 66 64 62 60 Ours-HF Ours-VF Ours-HAF IEMOCAP to EmotiW 62 Weighted Accuracy Weighted F1 61 60 59 58 57 56 Ours-HF Ours-VF Ours-HAF 17

Attention Visualization Carry representative information in Successfully combine both both text and audio textual and acoustic attentions Label: anger 𝑥_𝛽 𝑗 𝑥_𝛽 𝑗 𝑢_𝛽 𝑗 𝑢_𝛽 𝑗 𝑡_𝛽 𝑗 𝑡_𝛽 𝑗 𝑣_𝛽 𝑗 𝑣_𝛽 𝑗 What about the business what the hell is this 𝑥_𝛽 𝑗 𝑥_𝛽 𝑗 𝑡_𝛽 𝑗 𝑡_𝛽 𝑗 Word-level acoustic attention distribution Shared attention distribution 𝑢_𝛽 𝑗 𝑢_𝛽 𝑗 𝑣_𝛽 𝑗 𝑣_𝛽 𝑗 Word-level textual attention distribution Fine-tuning attention distribution 18

Attention Visualization Capture emphasis and Vocal signal prominence importance variation Label: happy 𝑥_𝛽 𝑗 𝑥_𝛽 𝑗 𝑢_𝛽 𝑗 𝑢_𝛽 𝑗 𝑡_𝛽 𝑗 𝑡_𝛽 𝑗 𝑣_𝛽 𝑗 𝑣_𝛽 𝑗 Oh you don’t like that you’re west-sider 𝑥_𝛽 𝑗 𝑥_𝛽 𝑗 𝑡_𝛽 𝑗 𝑡_𝛽 𝑗 Word-level acoustic attention distribution Shared attention distribution 𝑢_𝛽 𝑗 𝑢_𝛽 𝑗 𝑣_𝛽 𝑗 𝑣_𝛽 𝑗 Word-level textual attention distribution Fine-tuning attention distribution 19

Summary  A hierarchical attention based multimodal structure  The word-level fusion strategies  Word-level attention visualization 20

Thank you ! Email : yg202@scarletmail.rutgers.edu Homepage : www.ieyuegu.com 21

Multimodal Affective Analysis Using Hierarchical Attention Strategy - PowerPoint PPT Presentation

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment Yue Gu* , Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic Multimedia Image Processing Lab Electrical and Computer Engineering Department

Computing with Affective Lexicons Affective, Sentimental, and

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Proposal of a Hierarchical Proposal of a Hierarchical Architecture for Multimodal Architecture for

Feeling the Measure: Evaluating Affective Outcomes John Oughton and Eleanor Pierre Affective

Ch 7 SAQs (Pop Quiz) 1. Why is it difficult to know if the affective principles have been

Ch 7 SAQs (Pop Quiz) 1. Why is it difficult to know if the affective principles have been

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

MULTIMODAL OPTIMIZATION MIKE PREUSS. Multimodal Optimization 1 2014-09-14 Mike Preuss

Volume Analysis Using Multimodal Surface Similarity Multimodal Surface Similarity Martin

Wendy Nelson, PhD Jerry Suls, PhD Laura Buccini, PhD Affective Science Perspectives: Placebo

AFFECTIVE COMPUTING Brendan Lane 301145268 IAT 320 - D101 AFFECT BEHAVIOUR COGNITION Three

Sentiment Expression Conditioned by Affective Transitions and Social Forces Moritz Sudhof Andrs

TAC/CAC, TPO Project Update February 2016 PRESENTATION FOOTERT Multimodal Corridor Planning

2018 Proposed Water & Wastewater Budget 2018 Proposed Water & Wastewater Budget Rate

Creating Balance Balance is not something you find, its something you create. - Jana

Investment Decision Making Framework Review Engagement workshop May / June 2019 Welcome

and understanding genetic variants Prof Michael Sternberg Dr Lawrence Kelley Mr Stefans Mezulis Dr

2020 Carlos T orres Vila Chairman To bring the age of opportunity to everyone DIGITAL SALES

Elders Limited FY19 Year End Results Presentation 11 November 2019 DISCLAIMER AND IMPORTANT

Sound Transit Fare Change proposals for ST Express Operations and Administration Committee March

The Evolution of Todays Onboarding Approach Operational Isolation Most organizations have

Multimodal Affective Analysis Using Hierarchical Attention Strategy - PowerPoint PPT Presentation

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment Yue Gu* , Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic Multimedia Image Processing Lab Electrical and Computer Engineering Department

Computing with Affective Lexicons Affective, Sentimental, and

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Proposal of a Hierarchical Proposal of a Hierarchical Architecture for Multimodal Architecture for

Feeling the Measure: Evaluating Affective Outcomes John Oughton and Eleanor Pierre Affective

Ch 7 SAQs (Pop Quiz) 1. Why is it difficult to know if the affective principles have been

Ch 7 SAQs (Pop Quiz) 1. Why is it difficult to know if the affective principles have been

Multimodal Corridor Planning &amp; Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

MULTIMODAL OPTIMIZATION MIKE PREUSS. Multimodal Optimization 1 2014-09-14 Mike Preuss

Volume Analysis Using Multimodal Surface Similarity Multimodal Surface Similarity Martin

Wendy Nelson, PhD Jerry Suls, PhD Laura Buccini, PhD Affective Science Perspectives: Placebo

AFFECTIVE COMPUTING Brendan Lane 301145268 IAT 320 - D101 AFFECT BEHAVIOUR COGNITION Three

Sentiment Expression Conditioned by Affective Transitions and Social Forces Moritz Sudhof Andrs

TAC/CAC, TPO Project Update February 2016 PRESENTATION FOOTERT Multimodal Corridor Planning

2018 Proposed Water &amp; Wastewater Budget 2018 Proposed Water &amp; Wastewater Budget Rate

Creating Balance Balance is not something you find, its something you create. - Jana

Investment Decision Making Framework Review Engagement workshop May / June 2019 Welcome

and understanding genetic variants Prof Michael Sternberg Dr Lawrence Kelley Mr Stefans Mezulis Dr

2020 Carlos T orres Vila Chairman To bring the age of opportunity to everyone DIGITAL SALES

Elders Limited FY19 Year End Results Presentation 11 November 2019 DISCLAIMER AND IMPORTANT

Sound Transit Fare Change proposals for ST Express Operations and Administration Committee March

The Evolution of Todays Onboarding Approach Operational Isolation Most organizations have

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

2018 Proposed Water & Wastewater Budget 2018 Proposed Water & Wastewater Budget Rate