IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for - PowerPoint PPT Presentation

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 1 Indian Institute of Technology Patna, India 2 Universit¨ at Hamburg, Germany { titas.ee13,deepak.pcs16,asif,pb } @iitp.ac.in { biemann,yimam,kohail } @informatik.uni-hamburg.de Presented by Alexander Panchenko 2 August 3, 2017 1/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Outline Task Description 1 Structure of the Task Related Work System Description 2 Basic Features Implicit Dialogue Identification Statistical Model Results 3 Results on Different Feature Sets Comparison with Other Teams at SemEval 2017 Conclusions 4 2/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

SemEval 2017 Task 3: the Three Sub-Tasks 4/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Related Work Useful ideas from the best systems of 2015 and 2016 tasks : Belinkov (2015) : word vectors and meta-data features Nicosia (2015) : derived features from a comment in the context of the entire thread Filice (2016) : stacking classifiers across subtasks 6/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Outline of the Method 7/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

String Similarity Features String similarity between a question-comment/question pair: Jaro-Winkler Levenshtein Jaccard Sorensen-Dice n-gram LCS 9/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Domain (Task) Specific Features If a comment by asker of the question is an acknowledgement Position of comment in the thread Coverage (the ratio of the number of tokens) of question by the comment and comment by the question Presence of URLs, emails or HTML tags 10/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Word Embedding Features Trained word embedding model using Word2Vec on unannotated data Sentence vectors averaging word vectors w score = w question − w comment Distance scores Based on the computed sentence vectors Cosine Distance (1 − cos ) Manhattan Distance Euclidean Distance 11/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Topic Modeling Features Trained LDA Topic model using Mallet tool on training data Extracted the 20 most relevant topics for the data Topic Vector of a Question/Comment w score = w question − w comment Topic Vocabulary of a Question/Comment 10 Vocabulary( T ) = � topic words ( t i ) i =1 where t i is one of the top topics for comment/question T . 12/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Keyword and Named Entity Features Extracted keywords or focus words from question and comment using the RAKE algorithm (Rose et al., 2010) Keyword match between question and comment Extracted Named Entities from question and comment Entity tags consisted of LOCATION , PERSON , ORGANIZATION , DATE , MONEY , PERCENT and TIME 13/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Implicit Dialogue Identification Identified implicit dialogues among users User Interaction Graph Each user is in dialogue with some other user who came before him/her Asker - desirable Other users - not desirable Vertices - Users in a comment thread Edges - Directed edges showing interaction Edge weight - the level of interaction 15/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

Implicit Dialogue Identification: an Example 16/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for - PowerPoint PPT Presentation

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak

IIT Delhi, IIT Kanpur, IIT Bombay, IIT Madras, IIT Kharagpur, IIT Guwahati, IIT Roorkee, IIIT

SemEval 2012 STS task http://www.cs.york.ac.uk/semeval-2012/task6/ Eneko Agirre Daniel Cer

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Welcome to IIT Madras IIT Madras campus CONN NNECT WIT WITH IIT IIT-M

SemEval-2013 Task 4: Free Paraphrases of Noun Compounds Iris Hendrickx, Zornitsa Kozareva,

SemEval-2013 Task 2: Sentiment Analysis in Twitter Preslav Nakov Sara Rosenthal Zornitsa

SemEval 2019 Task 1: Cross-lingual Semantic Parsing with UCCA Daniel Hershcovich, Leshem

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses David Jurgens

LIMSI-COT at SemEval-2016 Task 12: Temporal relation identification using a pipeline of

SemEval-2019 Task 4: Hyperpartisan News Detection Johannes Maria Rishabh Emmanuel Payam David

Outline Goal of PSI Overview of initiative Sub-themes in the initiative

design of the radiation cooled positron target Sabine Riemann (DESY), Andriy Ushakov (UHH),

Welcome to IIT Madras Offi fice e of Internat atio ional nal & Alumni mni Rela lation

W E L C O M E T O Office of International IIT - Madras & Alumni Relations

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

Fundamentals of Programming C

Data structuring The Pandas way Andreas Bjerre-Nielsen Recap What have we learned about

Getting The Most From LinkedIn Voltron- Sourcing Highlights From Session 5 Of LinkedIn Xtreme

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced

By Shervin Daneshpajouh Legend Legend Legend Legend Software Engineering Observation g g

SQL - The Language of Databases Developed by IBM in the 1970s Create and process database

22 101 10/10/96 58 103 11/12/96

Sophos and Diane Searchable Symmetric Encryption with (Very) Low Overhead Raphael Bost, Brice

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for - PowerPoint PPT Presentation

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak

IIT Delhi, IIT Kanpur, IIT Bombay, IIT Madras, IIT Kharagpur, IIT Guwahati, IIT Roorkee, IIIT

SemEval 2012 STS task http://www.cs.york.ac.uk/semeval-2012/task6/ Eneko Agirre Daniel Cer

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Welcome to IIT Madras IIT Madras campus CONN NNECT WIT WITH IIT IIT-M

SemEval-2013 Task 4: Free Paraphrases of Noun Compounds Iris Hendrickx, Zornitsa Kozareva,

SemEval-2013 Task 2: Sentiment Analysis in Twitter Preslav Nakov Sara Rosenthal Zornitsa

SemEval 2019 Task 1: Cross-lingual Semantic Parsing with UCCA Daniel Hershcovich, Leshem

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses David Jurgens

LIMSI-COT at SemEval-2016 Task 12: Temporal relation identification using a pipeline of

SemEval-2019 Task 4: Hyperpartisan News Detection Johannes Maria Rishabh Emmanuel Payam David

Outline Goal of PSI Overview of initiative Sub-themes in the initiative

design of the radiation cooled positron target Sabine Riemann (DESY), Andriy Ushakov (UHH),

Welcome to IIT Madras Offi fice e of Internat atio ional nal &amp; Alumni mni Rela lation

W E L C O M E T O Office of International IIT - Madras &amp; Alumni Relations

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

Fundamentals of Programming C

Data structuring The Pandas way Andreas Bjerre-Nielsen Recap What have we learned about

Getting The Most From LinkedIn Voltron- Sourcing Highlights From Session 5 Of LinkedIn Xtreme

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced

By Shervin Daneshpajouh Legend Legend Legend Legend Software Engineering Observation g g

SQL - The Language of Databases Developed by IBM in the 1970s Create and process database

22 101 10/10/96 58 103 11/12/96

Sophos and Diane Searchable Symmetric Encryption with (Very) Low Overhead Raphael Bost, Brice

Welcome to IIT Madras Offi fice e of Internat atio ional nal & Alumni mni Rela lation

W E L C O M E T O Office of International IIT - Madras & Alumni Relations