IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 1 Indian Institute of Technology Patna, India 2 Universit¨ at Hamburg, Germany { titas.ee13,deepak.pcs16,asif,pb } @iitp.ac.in { biemann,yimam,kohail } @informatik.uni-hamburg.de Presented by Alexander Panchenko 2 August 3, 2017 1/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Outline Task Description 1 Structure of the Task Related Work System Description 2 Basic Features Implicit Dialogue Identification Statistical Model Results 3 Results on Different Feature Sets Comparison with Other Teams at SemEval 2017 Conclusions 4 2/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Outline Task Description 1 Structure of the Task Related Work System Description 2 Basic Features Implicit Dialogue Identification Statistical Model Results 3 Results on Different Feature Sets Comparison with Other Teams at SemEval 2017 Conclusions 4 3/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
SemEval 2017 Task 3: the Three Sub-Tasks 4/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Outline Task Description 1 Structure of the Task Related Work System Description 2 Basic Features Implicit Dialogue Identification Statistical Model Results 3 Results on Different Feature Sets Comparison with Other Teams at SemEval 2017 Conclusions 4 5/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Related Work Useful ideas from the best systems of 2015 and 2016 tasks : Belinkov (2015) : word vectors and meta-data features Nicosia (2015) : derived features from a comment in the context of the entire thread Filice (2016) : stacking classifiers across subtasks 6/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Outline of the Method 7/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Outline Task Description 1 Structure of the Task Related Work System Description 2 Basic Features Implicit Dialogue Identification Statistical Model Results 3 Results on Different Feature Sets Comparison with Other Teams at SemEval 2017 Conclusions 4 8/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
String Similarity Features String similarity between a question-comment/question pair: Jaro-Winkler Levenshtein Jaccard Sorensen-Dice n-gram LCS 9/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Domain (Task) Specific Features If a comment by asker of the question is an acknowledgement Position of comment in the thread Coverage (the ratio of the number of tokens) of question by the comment and comment by the question Presence of URLs, emails or HTML tags 10/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Word Embedding Features Trained word embedding model using Word2Vec on unannotated data Sentence vectors averaging word vectors w score = w question − w comment Distance scores Based on the computed sentence vectors Cosine Distance (1 − cos ) Manhattan Distance Euclidean Distance 11/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Topic Modeling Features Trained LDA Topic model using Mallet tool on training data Extracted the 20 most relevant topics for the data Topic Vector of a Question/Comment w score = w question − w comment Topic Vocabulary of a Question/Comment 10 Vocabulary( T ) = � topic words ( t i ) i =1 where t i is one of the top topics for comment/question T . 12/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Keyword and Named Entity Features Extracted keywords or focus words from question and comment using the RAKE algorithm (Rose et al., 2010) Keyword match between question and comment Extracted Named Entities from question and comment Entity tags consisted of LOCATION , PERSON , ORGANIZATION , DATE , MONEY , PERCENT and TIME 13/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Outline Task Description 1 Structure of the Task Related Work System Description 2 Basic Features Implicit Dialogue Identification Statistical Model Results 3 Results on Different Feature Sets Comparison with Other Teams at SemEval 2017 Conclusions 4 14/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification Identified implicit dialogues among users User Interaction Graph Each user is in dialogue with some other user who came before him/her Asker - desirable Other users - not desirable Vertices - Users in a comment thread Edges - Directed edges showing interaction Edge weight - the level of interaction 15/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification: an Example 16/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification: an Example 17/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification: an Example 18/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification: an Example 19/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification: an Example 20/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification: an Example 21/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Implicit Dialogue Identification: an Example 22/34 Presented by Alexander Panchenko 2 August Titas Nandi 1 , Chris Biemann 2 , Seid Muhie Yimam 2 , Deepak Gupta 1 , Sarah Kohail 2 , Asif Ekbal 1 and Pushpak Bhattacharyya 1 Answer Selection and Ranking in CQA sites (IIT Patna) / 34
Recommend
More recommend