Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke Wang & Xiaojun Wan {wangke17,wanxiaojun}@pku.edu.cn July 9, 2018 Institute of Computer Science and Technology, Peking University Beijing , China
Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 1/29
Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 2/29
Introduction • The boom of scholarly papers • Motivations • Help review submission system to detect the consistency of review texts and scores. • Help the chair to write a comprehensive meta-review. • Help authors to further improve their paper. Figure 1: An example of peer review text and the analysis results. 3/29
Introduction • Challenges • Long length. • Mixture of non-opinionated and opinionated texts. • Mixture of pros and cons. • Contributions • We built two evaluation datasets. (ICLR-2017 and ICLR-2018) • We propose a multiple instance learning network with a novel abstract-based memory mechanism (MILAM) • Evaluation results demonstrate the efficacy of our proposed model and show the great helpfulness of using abstract as memory. 4/29
Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 5/29
Related Work • Sentiment Classification Sentiment analysis has been widely explored in many text domains, but few studies trying to perform it in the domain of peer reviews for scholarly papers. • Multiple Instance Learning MIL can extract instance labels(sentence-level polarities) from bags (reviews in our case), but none of previous work was applied to this challenging task. • Memory Network Memory network utilizes external information for greater capacity and efficiency. • Study on Peer Reviews These tasks are related but different from the sentiment analysis task addressed in this study. 6/29
Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 7/29
Framework • Architecture document attention Review a a a P Classification 1 2 n softmax review 1 Input Layer h h ... h ... 1 2 n ... Representation P P P V V V 1 2 n 1 2 n ... Sentence 2 Abstract-based Memory Mechanism V V V Sentence response ... ( ) i 1 2 n R Classification Classification content Sum (2) ( ) n Layer (1) R R R MLP MLP MLP Review 3 ... ... matched ( ) i ( ) i ( ) i ( ) i (1) Classification E e e e E (2) ( ) n E E attention ... 1 2 m M M M I I I max pooling 2 ... m 1 1 2 ... n Input Representation convolution Layer ... ... sentence embedding ... ... r r r a a a S S S S S S 1 2 m 1 2 n T T abstract review Figure 2: The architecture of MILAM 8/29
Framework Input Representation Layer: 1 I A sentence S of length L (padded where necessary) is represented as: S = w 1 ⊕ w 2 ⊕ · · · ⊕ w L , S ∈ R L × d , (1) II The convolutional layer: f k = tanh ( W c · W k − l +1: k + b c ) , (2) f ( q ) = [ f ( q ) 1 , f ( q ) 2 , · · · , f ( q ) L − l +1 ] , (3) III A max-pooling layer: u q = max { f ( q ) } . (4) Finally, the representations of the review text { S r i } n i =1 and the abstract text { S a j } m j =1 are denoted as [ I i ] n i =1 , [ M j ] m i =1 respectively. where I i , M j ∈ R z . 9/29
Framework Sentence Classification Layer: 2 I Obtain a matched attention vector E ( i ) = [ e ( i ) t ] m t =1 which indicates the weight of memories. II Calculate the response content R ( i ) ∈ R z using this matched attention vector. III Use a MLP to obtain the final representation vector of each sentence in the review text. V i = f mlp ( I i || R ( i ) ; θ mlp ) , (5) IV Use the softmax classifier to get sentence-level distribution over sentiment labels. P i = softmax ( W p · V i + b p ) , (6) Finally, we obtained new high-level representations of sentences in the review text by leveraging relevant abstract information. 10/29
Framework Review Classification Layer: 3 I use separate LSTM modules to produce forward and back- ward hidden vectors: − → h i = − LSTM ( V i ) , ← − − → h i = ← − LSTM ( V i ) , h i = − − − − → h i ||← − h i (7) II The importance ( a i ) of each sentence is measured as follows: ′ exp ( h i ) ′ i = tanh ( W a · h i + b a ) , a i = (8) h ′ ∑ j exp ( h j ) III Finally, we obtain a document-level distribution over sentiment labels as the weighted sum of sentence-level distributions: P ( c ) ∑ a i P ( c ) review = , c ∈ [1 , C ] (9) i i 11/29
Framework • Abstract-based Memory Mechanism Get the matched attention vector E ( i ) of memories: 1 ′ t = LSTM (ˆ h t − 1 , M t ) , (ˆ h 0 = I i , t = 1 , ..., m ) (10) e ′ exp ( e t ) e ( i ) = (11) t ′ ∑ j exp ( e j ) E ( i ) = [ e ( i ) t ] m (12) t =1 Calculate the response content R ( i ) : 2 m R ( i ) = e ( i ) ∑ t M t (13) t =1 Use R ( i ) and I i to compute the new sentence representation 3 vector V i : V i = f mlp ( I i || R ( i ) ; θ mlp ) , (14) 12/29
Framework • Objective Function • Our model only needs the review ’ s sentiment label while each sentence’s sentiment label is unobserved. • The categorical cross-entropy loss: C ∑ ∑ − P ( c ) review log (¯ P ( c ) L ( θ ) = review ) (15) c =1 T review 13/29
Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 14/29
Experiments • Evaluation Datasets • Statistics for ICLR-2017 and ICLR-2018 datasets. Data Set #Papers #Reviews #Sentences #Words ICLR-2017 490 1517 24497 9868 ICLR-2018 954 2875 58329 13503 • The score distributions: 15/29
Experiments • Comparison of review sentiment classification accuracy on the 2-class task {accept(score ∈ [1, 5]), reject(score ∈ [6, 10])} 16/29
Experiments • Comparison of review sentiment classification accuracy on the 3-class task {accept(score ∈ [1, 4]), borderline(score ∈ [5, 6]), reject(score ∈ [7, 10])} 17/29
Experiments • Sentence-Level Classification Results. We randomly selected 20 reviews, a total of 213 sentences, and manually labeled the sentiment polarity of each sentence. Figure 3: Example opinionated sentences with predicted polarity scores extracted from a review text. 18/29
Experiments • Influence of Abstract Text. Figure 4: Example sentences in a review text and its most relevant sentence in the paper abstract text. The sentence with the largest weight in the matched attention vector E ( i ) is considered most relevant. The red texts indicate similarities in the review text and the abstract text. 19/29
Experiments • Influence of Abstract Text. • A simple method of using abstract texts as a contrast experiment Remove the sentences that are similar to the paper abstract ’ s sentences from the review text and use the remaining text for classification.(The threshold is set to 0.7) Figure 5: The comparison of using and not using the paper abstract via a simple method. 20/29
Experiments • Influence of Borderline Reviews. Figure 6: Experimental results on different datasets with, without and only borderline reviews. 21/29
Experiments • Cross-Year Experiments. Figure 7: Results of cross-year experiments. Model @ ICLR − ∗ means the model is trained on ICLR − ∗ dataset. 22/29
Experiments • Cross-Domain Experiments. We further collected 87 peer reviews for submissions in the NLP conferences (CoNLL, ACL, EMNLP , etc.), including 57 positive reviews (accept) and 30 negative reviews (reject). Figure 8: Results of cross-domain experiments. ∗ means the performance improvement over the first three methods is statistically significant with p-value < 0.05 for sign-test. Model @ ICLR − ∗ means the model is trained on 23/29 ICLR − ∗ dataset.
Experiments • Final Decision Prediction for Scholarly Papers. • Methods to predict the final decision of a paper based on several review scores. • Voting: { if #accept > #reject Accept Decision = (16) Reject Otherwise • Simple Average: Simply average the scores of all reviews. If the average score is larger than or equal to 0.6, then the paper is predicted as final accept, and otherwise final reject. • Confidence-based Average: | S | overall _ score = 1 1 ∑ S i ∗ (17) | S | (6 − ReviewerConfidence i ) i =1 24/29
Experiments • Final Decision Prediction for Scholarly Papers. • Results of final decision prediction for scholarly papers. Figure 9: Results of final decision prediction for scholarly papers. 25/29
Outline 1. Introduction 2. Related Work 3. Framework 4. Experiments 5. Conclusion and Future Work 26/29
Conclusion and Future Work • Contributions • We built two evaluation datasets. (ICLR-2017 and ICLR-2018) • We propose a multiple instance learning network with a novel abstract-based memory mechanism (MILAM) • Evaluation results demonstrate the efficacy of our proposed model and show the great helpfulness of using abstract as memory. • Future Work • Collect more peer reviews. • Try more sophisticated deep learning techniques. • Several other sentiment analysis tasks: Prediction of the fine-granularity scores of reviews, Automatic writing of meta-reviews, Prediction of the best papers... 27/29
Recommend
More recommend